Libro Di Testo
Libro Di Testo
Simone Cerreia-Vioglio
Department of Decision Sciences and IGIER, Universita Bocconi
Massimo Marinacci
AXA-Bocconi Chair, Department of Decision Sciences and IGIER, Universita Bocconi
Elena Vigna
Dipartimento Esomas, Universita di Torino and Collegio Carlo Alberto
August 2022
Ai nostri cari
Contents
Preface xxiii
I Structures 1
iii
iv CONTENTS
21 Correspondences 635
21.1 A set-theoretic notion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
21.2 Back to Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
21.3 Hemicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
21.4 Addition and scalar multiplication of sets . . . . . . . . . . . . . . . . . . . . 642
21.5 Combining correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
V Optima 647
26 Derivatives 787
26.1 Marginal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
26.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
26.3 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
26.4 Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
26.5 One-sided derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
26.6 Derivability and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
26.7 Derivatives of elementary functions . . . . . . . . . . . . . . . . . . . . . . . . 800
26.8 Algebra of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
26.9 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
26.10Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
26.11Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
26.12Di erentiability and linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
26.12.1 Di erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
26.12.2 Di erentiability and derivability . . . . . . . . . . . . . . . . . . . . . 814
26.12.3 Di erentiability and continuity . . . . . . . . . . . . . . . . . . . . . . 816
CONTENTS xiii
29 Approximation 897
29.1 Taylor's polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . 897
29.1.1 Polynomial expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 897
29.1.2 Taylor and Peano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900
29.1.3 Taylor and Lagrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906
29.2 Omnibus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
29.2.1 Omnibus proposition for local extremal points . . . . . . . . . . . . . 907
29.2.2 Omnibus procedure of search of local extremal points . . . . . . . . . 910
29.3 Multivariable Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 911
29.3.1 Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
29.3.2 Second-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 916
29.3.3 Multivariable unconstrained optima . . . . . . . . . . . . . . . . . . . 922
IX Appendices 1429
Reason emerged around the sixth century BC in the Greek world, rst in Ionian and Italian
colonies { from Miletus to Samos, Croton and Elea { and then in mainland Greece, chie y
in Athens. Mathematics and the rational investigation of the empirical world were two
marvelous gifts of this momentous emergence, a turning point in mankind's intellectual
history that marked the beginning of Western thought.
Centuries later, the works of Galileo and Newton { inspired also by a rediscovered
Archimedes { set the foundations of modern science by combining these two gifts into the
mathematization of physics. Mathematics is the best support for any rational empirical
investigation, be it dealing with natural or social phenomena. It empowers theoretical rea-
soning by pursuing the logical implications of scholars' original insights, implications that
often go well beyond what was rst envisioned (even by the subtlest minds). Indeed, the
logical transparency of mathematics favors scholarly communication and the incremental
accumulation of knowledge within and across generations of scholars. As a result, disciplines
that have embraced mathematics built theoretical structures that are far more re ned (and
elegant) than what would have been possible otherwise.
Mathematics also permits the development of the quantitative features of a theoretical
model that make it empirically relevant. A purely literary argument, however subtle it may
be, can at best have qualitative empirical implications.
As empirical disciplines bene t from mathematics, so mathematics draws inspiration
and intellectual discipline from applications. A virtuous circle results. Since the time of
Galileo and Newton, the importance of mathematics in empirical disciplines has been steadily
growing and now goes well beyond physics. In particular, economics has been a major source
of motivation for the development of new mathematics, from game theory to optimization and
probability theory. In turn, mathematics gave economics a rigorous and powerful language
for articulating the theoretical and empirical implications of its models.
This book provides an introduction to the mathematics of economics and is primarily
addressed to undergraduate students in economics. We con ne ourselves to Rn and leave
the more abstract structures that characterize higher mathematics { such as vector and
metric spaces { to more advanced books (for instance, the excellent Ok, 2007). Within
these boundaries, however, we take a rigorous approach by proving and motivating results,
not shying away from the more subtle issues (often covered in \coda" sections that can be
skipped when reading the book for the rst time). Our assumption is that students are
intellectually curious and should be given a chance to fully understand a topic, even the
toughest one. This approach also has an educational value by helping students to master
analytical reasoning as well as to articulate and support arguments by relentlessly exploring
their (pleasant or unpleasant) implications.
xxiii
xxiv PREFACE
In the book there are no formal exercises. Yet, we left the proof of some results to the
reader and, in addition, some proofs have gaps, highlighted by a \why?". These can be seen
as useful exercises to test the reader understanding of the material presented.
During the journey of learning in which we embarked upon to write this book, we collected
several debts of gratitude to colleagues and students. In particular, we thank Gabriella
Chiomio and Claudio Mattalia, who thoroughly translated a rst version of the manuscript,
as well as Alexandra Fotiou, Giacomo Lanzani, Paolo Leonetti and Kelly Gail Strada for
excellent research assistance. We are grateful to Margherita Cigola, Satoshi Fukuda, Fabrizio
Iozzi, Guido Osimo, Lorenzo Peccati and Alberto Za aroni for their very useful comments
that helped us to improve the manuscript. We are much indebted to Massimiliano Amarante,
Pierpaolo Battigalli, Maristella Botticini, Erio Castagnoli (with whom this project started),
Pierre-Andre Chiappori, Larry Epstein, Paolo Ghirardato, Itzhak Gilboa, Lars Peter Hansen,
Peter Klibano , Fabio Maccheroni, Aldo Montesano, Luigi Montrucchio, Sujoy Mukerji, Aldo
Rustichini, Tom Sargent and David Schmeidler for the discussions that over the years shaped
our views on economics and mathematics. Needless to say, any error is ours but, hopefully,
se non e vero, e ben trovato.
Part I
Structures
1
Chapter 1
1.1 Sets
A set is a collection of distinguishable objects. There are two ways to describe a set: by
listing directly its elements or by specifying a property that its elements have in common.
The second way is more common: for instance,
can be described as the set of the prime numbers between 10 and 30. The chairs of your
kitchen form a set of objects, the chairs, that have in common the property of being part
of your kitchen. The chairs of your bedroom form another set, as the letters of the Latin
alphabet form a set, distinct from the set of the letters of the Greek alphabet (and from the
set of chairs or from the set of numbers considered above).
Sets are usually denoted by capital letters: A, B, C and so on; their elements are denoted
by small letters: a, b, c and so on. To denote that an element a belongs to the set A we
write
a2A
where 2 is the symbol of belonging. Instead, to denote that an element a does not belong
to the set A we write a 2
= A.
O the record remark (O.R.) The concept of set, apparently introduced in 1847 by
Bernhard Bolzano, is for us a primitive concept, not de ned through other notions. Like
in Euclidean geometry, where points and lines are primitive concepts (with an intuitive
geometric meaning that readers may give them). H
1.1.1 Subsets
The chairs of your bedroom are a subset of the chairs of your home: a chair that belongs to
your bedroom also belongs to your home. In general, a set A is subset of a set B when all
the elements of A are also elements of B. In this case we write A B. Formally:
3
4CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
and let
B = f11; 13; 15; 17; 19; 21; 23; 25; 27; 29g (1.2)
be the set of the odd numbers between 10 and 30. We have A B.
4 A⊆B
2
-2 A
-4
B
-6
-6 -4 -2 0 2 4 6
by using the so-called Venn diagrams to represent graphically the sets A and B (a simple,
yet e ective, way to visualize sets).
Nota Bene (N.B.) Though the two symbols 2 and are conceptually well distinct and
must not be confused, there exists an interesting relation between them. Indeed, the set
formed by a unique element a, i.e., the singleton fag, permits to establish the relation
between 2 and . O
1
Note that a and fag are not the same thing; a is an element and fag is a set, even if formed by only one
element. For instance, the set A of the Nations of the Earth with the ag of only one color had (until 2011)
only one element, Libya, but it is not \the Libya": Tripoli is not the capital of A.
1.1. SETS 5
1.1.2 Operations
There are three basic operations among sets: union, intersection and di erence. As we will
see, they take any two sets and, starting from them, form a new set.
The rst operation that we consider is the intersection of two sets A and B. As the
term \intersection" suggests, with this operation we select all the elements that belong
simultaneously to both sets.
De nition 2 Given two sets A and B, their intersection A \ B is the set of all the elements
that belong to both A and B; that is, x 2 A \ B if x 2 A and x 2 B.
For example, let A be the set of left-handed and B the set of right-handed citizens of a
country. The intersection A \ B is the set of ambidextrous citizens. If, instead, A is the set
of gasoline cars and B the set of methane cars, the intersection A \ B is the set of bi-fuel
cars that run on both gasoline and methane.
It can happen that two sets have no elements in common. For example, let
C = f10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30g (1.3)
be the set of the even numbers between 10 and 30. It has no elements in common with the
set B in (1.2). In this case we talk of disjoint sets, with no elements in common. Such a
notion gives us the opportunity to introduce a fundamental set.
As a rst use of this notion, note that two sets A and B are disjoint when they have
empty intersection, that is, A \ B = ;. For example, for the sets B and C in (1.2) and (1.3),
we have B \ C = ;.
We write A 6= ; when the set A is not empty, that is, when it contains at least one
element. Conventionally, we regard the empty set as a subset of any set, that is, ; A for
every set A.
Proof \If". Let A B. We want to show that A \ B = A. To prove that two sets are
equal, we always need to prove separately the two opposite inclusions: here A \ B A and
A A \ B.
The inclusion A \ B A is easily proven to be true. Indeed, let x 2 A \ B.2 Then, by
de nition, x belongs to both A and B. In particular, x 2 A and this is enough to conclude
that A \ B A.
Let us prove the inclusion A A \ B. Let x 2 A. Since by hypothesis A B, each
element of A also belongs to B, it follows that x 2 B. Hence, x belongs to both A and B,
i.e., x 2 A \ B. This proves that A A \ B.
We have shown that both the inclusions A \ B A and A A \ B hold; we can therefore
conclude that A \ B = A, which completes the proof of the \if" part.
The next operation we consider is the union. Here again the term \union" already
suggests how in this operation all the elements of both sets are collected together.
De nition 5 Given two sets A and B, their union A [ B is the set of all the elements that
belong to A or to B; that is, x 2 A [ B if x 2 A or x 2 B.3
Note that an element can belong to both sets (unless they are disjoint). For example, if
A is again the set of the left-handed and B is the set of the right-handed citizens, the union
set contains all citizens with at least one hand. There are individuals (the ambidexters) who
belong to both sets.4
It is immediate to show that A A [ B and that B A [ B. It then follows that
A\B A[B
2
In proving an inclusion between sets, say C D, throughout the book we will tacitly assume that C 6= ;
because the inclusion is trivially true when C = ;. For this reason our inclusion proof will show that x 2 C
(i.e., C 6= ;) implies x 2 D.
3
The conjunction \or" has the inclusive sense of the Latin \vel" (x belongs to A or to B or to both) and
not the exclusive sense of \aut" (x belongs to either A or to B, but not to both). Indeed, Giuseppe Peano
gave the symbol [ the meaning \vel" when he rst introduced it, along with the intersection symbol \ and
the membership symbol " that he interpreted as the Latin \et" and \est", respectively (see the \signorum
tabula" in his 1889 Arithmetices principia, a seminal work on the foundations of mathematics).
4
The clause \with at least one hand", though needed, may seem pedantic, even tactless. The distinction
between being precise and pedantic is subtle and, ultimately, subjective. Experience may help to balance
rigor and readability. In any case, in mathematics loose ends have to be handled with care and, de nitely,
are not for beginners.
1.1. SETS 7
4 A ∪ B
-2 A
B
-4
-6
-2 0 2 4 6 8 10
De nition 6 Given two sets A and B, their di erence A B is the set of all the elements
that belong to A, but not to B; that is, x 2 A B if both x 2 A and x 2
= B.
The set A B is, therefore, obtained by eliminating from A all the elements that belong
(also) to B.5 Graphically:
2 A-B
-1 B
A
-2
-3
-3 -2 -1 0 1 2 3 4 5
For example, let us go back to the sets A and B speci ed in (1.1) and (1.2). Their di erence
population of a country, of which they can consider various subsets according to the demo-
graphic properties that are of interest (for instance, age is a standard demographic variable
through which the population can be subdivided in subsets).
The general set of reference is called universal set or, more commonly, space. There
is no standard notation for this set (which is often clear from the context). We denote it
temporarily by S. Given any of its subsets A, the di erence S A is denoted by Ac and
is called the complement set, or simply the complement, of A. The di erence operation is
called complementation when it involves the universal set.
Example 7 If S is the set of all citizens of a country and A is the set of all citizens that
are at least 65 years old, the complement Ac is formed by all citizens that are (strictly) less
than 65 years old. N
A [ Ac = S and A \ Ac = ;
We also have:
Proposition 8 (Ac )c = A.
Proof As for Proposition 4, we have to verify an equality between sets. We thus have to
prove separately the two inclusions (Ac )c A and A (Ac )c . If a 2 (Ac )c , then a 2
= Ac and
c c c
therefore a 2 A. It follows that (A ) A. Vice versa, if a 2 A, then a 2
= A and therefore
c c c c
a 2 (A ) . We conclude that A (A ) .
A B = A \ Bc
We leave to the reader the simple proof. Property (ii) permits to write A [ B [ C
and A \ B \ C and, therefore, to extend without ambiguity the operations of union and
intersection to an arbitrary ( nite) number of sets:
n
[ n
\
Ai and Ai
i=1 i=1
1.1. SETS 9
is the set of the elements that belong at least to one of the An , that is,
1
[
An = fa : a 2 An for at least one index ng
n=1
The intersection
1
\
An
n=1
Example 11 Let An be the setTof the (positive) even numbers n. For example, A3 = f0; 2g
and A6 = f0; 2; 4; 6g. We have 1S1
n=1 An = f0g because 0 is the only even number such that
0 2 An for each n 1. Moreover, n=1 An is the set of all even numbers. N
We turn to the relations between the operations of intersection and union. Note the
symmetry between properties (1.4) and (1.5), in which \ and [ are exchanged.
Proposition 12 The operations of union and intersection are distributive: given any three
sets A, B and C, we have
A \ (B [ C) = (A \ B) [ (A \ C) (1.4)
and
A [ (B \ C) = (A [ B) \ (A [ C) : (1.5)
Proof We prove only (1.4) since (1.5) is similarly proved. We have to consider separately
the two inclusions A \ (B [ C) (A \ B) [ (A \ C) and (A \ B) [ (A \ C) A \ (B [ C).
If x 2 A \ (B [ C), then x 2 A and x 2 B [ C, i.e., x 2 B or x 2 C. It follows that x 2
A \ B or x 2 A \ C, i.e., x 2 (A \ B) [ (A \ C). Therefore, A \ (B [ C) (A \ B) [ (A \ C).
Vice versa, if x 2 (A \ B) [ (A \ C), then x 2 A \ B or x 2 A \ C, that is, x belongs to
A and to at least one of B and C. Therefore, x 2 A \ (B [ C). It follows that (A \ B) [
(A \ C) A \ (B [ C).
Example 14 Let A be the set of all citizens of a country. Its subsets A1 , A2 and A3
formed, respectively, by the citizens of school or pre-school age (from 0 to 17 years old), by
the citizens of working age (from 18 to 65 years old) and by the elders (from 66 years old
on) form a partition of the set A. Relatedly, age cohorts, which consist of citizens who have
the same age, form a partition of A. N
We conclude with the so-called De Morgan's laws for complementation: they illustrate
the relationship between the operations of intersection, union and complementation.
Proof We prove only the rst law, leaving the second one to the reader. As usual, to prove
an equality between sets we have to consider separately the two inclusions that compose it.
(i) (A [ B)c Ac \ B c . If x 2 (A [ B)c , then x 2
= A [ B, that is, x belongs neither to A
nor to B. It follows that x belongs simultaneously to Ac and to B c and, therefore, to their
intersection.
(ii) Ac \ B c (A [ B)c . If x 2 Ac \ B c then x 2 = A and x 2= B; therefore, x does not
belong to their union.
De Morgan's laws show that, when considering complements, the operations [ and \
are, essentially, interchangeable. Often these laws are written in the equivalent form
More importantly, they hold for any collection of sets be that nite or not. For instance, for
a nite collection the last form becomes
n n
!c n n
!c
[ \ \ [
Ai = Aci and Ai = Aci
i=1 i=1 i=1 i=1
the supposed universality of U . Among the bizarre features of a universal set there is the
fact that it belongs to itself, i.e., U 2 U , a completely unintuitive property (as observed by
Russell, \the human race, for instance, is not a human").
As suggested by Russell, let us consider the set A formed by all sets that are not members
of themselves (e.g., the set of red oranges belongs to A because its elements are red oranges
and, obviously, none of them is the entire collection of all them). If A 2 = A, namely if A
does not belong to itself, then A 2 A because it is a set that satis es the property of not
belonging to itself. On the other hand, if A 2 A, namely if A contains itself, then A 2 = A
because, by de nition, the elements of A do not contain themselves. In conclusion, we reach
the absurdity A 2 = A if and only if A 2 A. It is the famous paradox of Russell.
These logical paradoxes, often called antinomies, can be addresses within a non-naive set
theory, in particular that of Zermelo-Fraenkel. In the practice of mathematics, all the more
in an introductory book, these foundational aspects can be safely ignored (their study would
require an ad hoc, highly non-trivial, course). But, it is important to be aware of these
paradoxes because the methods that have been developed to address them have actually
a ected the practice of mathematics, as well as that of the empirical sciences.
1.2 Numbers
To quantify the variables of interest in economic applications { for example, the prices and
quantities of goods traded in some market { we need an adequate set of numbers. This is
the topic of the present section.
The natural numbers
0; 1; 2; 3; :::
do not need any introduction; their set will be denoted by the symbol N.
The set N of natural numbers is closed with respect to the fundamental operations of
addition and multiplication:
(i) m + n 2 N when m; n 2 N;
(ii) m n 2 N when m; n 2 N.
On the contrary, N is not closed with respect to the fundamental operations of subtraction
and division: for example, neither 5 6 nor 5=6 are natural numbers. It is, therefore, clear
that N is inadequate as a set of numbers for economic applications: the budget of a company
is an obvious example in which the closure with respect to the subtraction is crucial {
otherwise, how can we quantify losses?6
The integer numbers
:::; 3; 2; 1; 0; 1; 2; 3; :::
form a rst extension, denoted by the symbol Z, of the set N. It leads to a set that is closed
with respect to addition and multiplication, as well as to subtraction. Indeed, by setting
m n = m + ( n),7 we have
6
Historically, negative numbers have often been viewed with suspicion. It is in economics, indeed, where
they have a most natural interpretation in terms of losses.
7
The di erence m n is simply the sum of m with the negative n of n (recall the notion of algebraic
sum).
12CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
(i) m n 2 Z when m; n 2 Z;
(ii) m n 2 Z when m; n 2 Z.
Z = fm n : m; n 2 Ng
Proposition 16 N Z.
We are left with a fundamental operation with respect to which Z is not closed: division.
For example, the unit fraction 1=3 is not an integer:8 if we want to divide 1 cake among 3
guests, how can we quantify their portions if only Z is available? To remedy this important
shortcoming of the integers, we need a further enlargement to the set of the rational numbers,
denoted by the symbol Q and given by
nm o
Q= : m; n 2 Z with n 6= 0
n
In words, the set of the rational numbers consists of all the fractions with integers in both
the numerator and the denominator (not equal to zero). So the name rational after \ratio".
Proposition 17 Z Q.
The set of rational numbers is closed with respect to all the four fundamental operations:9
(i) m n 2 Q when m; n 2 Q;
(ii) m n 2 Q when m; n 2 Q;
O.R. Rational numbers which are not periodic, so have a nite number of decimals, have
two decimal representations. For example, 1 = 0:9 because
1
0:9 = 3 0:3 = 3 =1
3
Similarly, 2:5 = 2:49, 51:2 = 51:19 and so on. On the contrary, periodic rational numbers
and irrational numbers have a unique decimal representation (which is in nite).
8
Unit fractions have 1 as their numerator, so have the form 1=n. They are the simplest kind of fraction
and historically played an important role because of their natural interpretation as \inverses of integers" (see,
e.g., Ritter, 2000, p. 128, on their use in ancient Egyptian mathematics).
9
The names of the four fundamental operations are addition, subtraction, multiplication and division,
while the names of their results are sum, di erence, product and quotient, respectively (the addition of 3 and
4 has 7 as sum, and so on).
1.2. NUMBERS 13
This is not a simple curiosity: if 0:9 were not equal to 1, we could state that 0:9 is the
number that immediately precedes 1 (without any other number in between), which would
violate a notable property of real numbers that we will see shortly in Proposition 19. H
The set of rational numbers seems, therefore, to have all that we need. Some simple
observations on multiplication, however, will bring us some surprising ndings. If q is a
rational number, the notation q n , with n 1, means
q q ::: q
| {z }
n times
with q 0 = 1 for every q 6= 0. The notation q n , called power of basis q and exponent n, per se
is just shorthand notation for the repeated multiplication of the same factor. Nevertheless,
given a rational q > 0, it is natural to consider the inverse path, that is, to determine the
1 p
positive \number", denoted by q n { or, equivalently, by n q { and called root of order n of
q, such that
1 n
qn =q
p
For example,10 25 = 5 because 52 = 25. To understand the importance of roots, we can
consider the following simple geometric gure:
p
By Pythagoras' Theorem, the length of the hypotenuse is 2. To quantify elementary
geometric entities, we thus need square roots. Here we have a, tragic to some, surprise.11
p
Theorem 18 2 2 = Q.
p
Proof p Suppose, by contradiction, that 2 2 Q. Then there exist m; n 2 Z such that
m=n = 2, and therefore
m 2
=2 (1.6)
n
We can assume that m=n is already reduced to its lowest terms, i.e., that m and n have no
factors in common.12 This means that m and n cannot both be even numbers (otherwise, 2
would be a common factor).
10 p p
The square root 2 q is simply denoted by q, omitting the index 2.
11
For the Pythagorean philosophy, in which the proportions (that is, the rational numbers) were central,
the discovery of the non-rationality of square roots was a traumatic event. We refer the curious reader to
Fritz (1945).
12
For example, 14=10 is not reduced to its lowest terms because the numerator and the denominator have
in common the factor 2. On the contrary, 7=5 is reduced to its lowest terms.
14CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
This magni cent result is one of the great theorems of Greek mathematics. Proved
by the Pythagorean school between the VI and the V century B.C., it is the unexpected
outcome of the { prima facie innocuous { distinction between even and odd numbers that
the Pythagoreans were the rst to make. It represented a turning point in the early history
of mathematics that showed the fundamental role of abstract reasoning, which exposed the
logical inconsistency of a physical, granular, notion of line formed by material points. Indeed,
however small, these points would have a dimension that makes it possible to count them
and so express the ratio between the hypotenuse and the catheti as a rational number m=n.
Pythagoras' Theorem thus questioned the relations between geometry and the physical world
that originally motivated its study (at least under any kind of Atomism, back then advocated
by the Ionian school). The resulting intellectual turmoil paved the way to the notion of point
with no dimension, a purely theoretical construct central to Euclid's Elements that, indeed,
famously start by stating that \a point is that which has no part".
Leaving aside these philosophical aspects, further discussed at the end of the chapter,
here Pythagoras' Theorem shows the need for a further enlargement of the set of numbers,
closed under the square root operation, that permits to quantify basic geometric entities
(as well as basic economic variables, as it will be clear in the sequel). To introduce, at an
intuitive level, this nal enlargement,14 consider the real line:
It is easy to see how on this line we can represent the rational numbers:
The rational numbers do not exhaust, however, the real line. For example, also roots like
p
2, or other non-rational numbers, such as , must nd their representation on the real line:
13
If integer m is odd, we have m = 2n + 1 for some n 2 Z. So, the integer m2 = (2n + 1)2 = 4n2 + 4n + 1
is odd since the integer 4n2 + 4n is even as it is divisible by 2.
14
For a rigorous treatment we refer, for example, to the rst chapter of Rudin (1976).
1.2. NUMBERS 15
We denote by R the set of all the numbers that can be represented on the real line; they are
called real numbers.15
The set R has the following properties in terms of the fundamental operations (here a; b
and c are generic real numbers):
(i) a + b 2 R and a b 2 R;
(ii) a + b = b + a and a b = b a;
(iii) (a + b) + c = a + (b + c) and (a b) c = a (b c);
(iv) a + 0 = a and b 1 = b;
1
(v) a + ( a) = 0 and b b = 1 provided b 6= 0;
(vi) a (b + c) = a b + a c.
Clearly, Q R. But Q 6= R: there are many real numbers, called irrationals, that are
not rational. Many roots and the numbers and e are examples of irrational numbers. It
is actually possible to prove that most real numbers are irrational. Although a rigorous
treatment of this topic would take us too far, the next simple result is already a clear
indication of how rich the set of the irrational numbers is.
Proposition 19 Given any two real numbers a < b, there exists an irrational number c 2 R
such that a < c < b.
In conclusion, R is the set of numbers that we will consider in the rest of the book. It
turns out to be adequate for most economic applications.17
15
When real numbers are rigorously introduced, our intuitive approach relies upon a postulate (of continuity
of the real line) that can be traced back to Descartes. It asserts the existence of a one-to-correspondence (a
bijection) between real numbers and the points of the real line that preserves order and distance.
16
Such n exists because of the Archimedean property of the real numbers, which we will soon see in
Proposition 41.
17
An important further enlargement, which we do not consider, is the set C of complex numbers.
16CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Example 20 The integer 6 is divisible by the integer 2, that is, 2 j 6, because the integer 3
is such that 6 = 2 3. Furthermore, 6 is divisible by 3, that is, 3 j 6 because the integer
2 is such that 6 = 2 3. N
In elementary school one learns how to divide two integers by using remainders and
quotients. For example, if n = 7 and m = 2, we have n = 3 2 + 1, with 3 as the quotient
and 1 as the remainder. The next simple result formalizes the above procedure and shows
that it holds for any pair of integers (something that young learners take for granted).
Proposition 21 Given any two integers m and n, with m strictly positive,18 there is one
and only one pair of integers q and r such that
n = qm + r
with 0 r < m.
Proof Two distinct properties are stated in the proposition: the existence of the pair (q; r)
and its uniqueness. Let us start by proving its existence. We rst consider the case n 0.
Consider the set A = fp 2 N : p n=mg. Since n 0, it is non-empty because it contains
at least the integer 0. Let q be the largest element of A. By de nition, qm n < (q + 1) m.
Setting r = n qm, we have
0 n qm = r < (q + 1) m qm = m
We have thus shown the existence of the desired pair (q; r) when n 0. If n < 0, then
n > 0 and so, by what has been just proved, there exist q; r 2 Z such that n = qm + r
and 0 r < m. Since r < m, if r > 0 then m > m r > 0. By setting q 0 = q 1 2 Z and
r0 = m r 2 Z, we have
n=( q 1) m r + m = q0 m + r0
n = q 0 m + r0 = q 00 m + r00 (1.9)
with 0 r0 ; r00 < m. We need to prove that they coincide, that is, q 0 = q 00 and r0 = r00 .
It is enough to show that r0 = r00 . In fact, by (1.9) this implies (q 00 q 0 ) m = r0 r00 = 0,
yielding that q 0 = q 00 because m > 0. By contradiction, assume that r0 6= r00 . Without loss
of generality, assume that r0 > r00 . By (1.9), we have (q 00 q 0 ) m = r0 r00 > 0. Since m > 0
and q 00 q 0 is an integer, this implies that q 00 q 0 > 0 and r0 r00 = (q 00 q 0 ) m m. At the
same time, since 0 r00 < r0 < m, we reach the contradiction m (q 00 q 0 ) m = r0 r00 < m.
Given two strictly positive integers m and n, their greatest common divisor, denoted by
gcd (m; n), is the largest divisor that both numbers share. The next result, which was proven
by Euclid in his Elements, shows exactly what was taken for granted in elementary school,
namely, that any pair of integers has a unique greatest common divisor.
Theorem 22 (Euclid) Any pair of strictly positive integers has one and only one greatest
common divisor.
Proof Like Proposition 21, this is also an existence and uniqueness result. Uniqueness is
obvious; let us prove existence. Let m and n be any two strictly positive integers. By
Proposition 21, there is a unique pair (q1 ; r1 ) such that
n = q 1 m + r1 (1.10)
with 0 r1 < m. If r1 = 0, then gcd (m; n) = m, and the proof is concluded. If r1 > 0, we
iterate the procedure by applying Proposition 21 to m. We thus have a unique pair (q2 ; r2 )
such that
m = q 2 r1 + r 2 (1.11)
where 0 r2 < r1 . If r2 = 0, then gcd (m; n) = r1 . Indeed, (1.11) implies r1 j m. Further-
more, by (1.10) and (1.11) we have
n q 1 m + r1 q 1 q 2 r1 + r 1
= = = q1 q2 + 1
r1 r1 r1
and so r1 j n. Thus, r1 is a divisor both for n and m. We now need to show that it is the
greatest of those divisors. Suppose p is a strictly positive integer such that p j m and p j n.
By de nition, there are two strictly positive integers a and b such that n = ap and m = bp.
We have
r1 n q1 m
0< = = a q1 b
p p
Hence r1 =p is a strictly positive integer, which implies that r1 p. To sum up, gcd (m; n) =
r1 , if r2 = 0. If this is the case, the proof is concluded.
If r2 > 0, we iterate the procedure once more by applying Proposition 21 to r2 . We thus
have a unique pair (q3 ; r3 ) such that
r 1 = q 3 r2 + r 3
18CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Example 23 Let us consider the strictly positive integers 3801 and 1708. Their greatest
common divisor is not apparent at rst sight. Fortunately, we can calculate it by means of
Euclid's Algorithm. We proceed as follows:
Step 1 3801 = 2 1708 + 385
Step 2 1708 = 4 385 + 168
Step 3 385 = 2 168 + 49
Step 4 168 = 3 49 + 21
Step 5 49 = 2 21 + 7
Step 6 21 = 3 7
In six steps we have found that gcd(3801; 1708) = 7. N
The quality of an algorithm depends on the number of steps, or iterations, that are
required to reach the solution. The fewer the iterations, the more powerful the algorithm is.
The following remarkable property { proven by Gabriel Lame in 1844 { holds for Euclid's
Algorithm.19
19
See Sierpinski (1988) p. 17 for a proof.
1.3. STRUCTURE OF THE INTEGERS 19
Theorem 24 (Lame) Given two strictly positive integers m and n, the number of iterations
needed for Euclid's Algorithm is no greater than ve times the number of digits of min fm; ng.
For example, if we go back to the numbers 3801 and 1708, the number of relevant digits
is 4. Lame's Theorem guarantees in advance that Euclid's Algorithm would have required
at most 20 iterations. It took us only 6 steps, but thanks to Lame's Theorem we already
knew, before starting, that it would not have taken too much e ort (and thus it was worth
giving it a shot without running the risk of getting stuck in a grueling number of iterations).
A natural number which is not prime is called composite. Let us denote the set of prime
numbers by P. Obviously, P N and N P is the set of composite numbers. The reader can
easily verify that the following naturals
12 = 2 2 3 = 22 3
60 = 2 2 3 5 = 22 3 5
In general, the prime factorization (or decomposition) of a composite number n > 1 can be
written as
n = pn1 1 pn2 2 : : : pnk k (1.12)
where pi 2 P and 0 6= ni 2 N for each i = 1; :::; k, with p1 < p2 < : : : < pk .
522 = 2 32 29
What we have just seen raises two questions: whether every natural number > 1 admits
a prime factorization (we have seen only a few examples up to now) and whether such
factorization is unique. The next result, the Fundamental Theorem of Arithmetic, addresses
both questions by showing that every integer admits one and only one prime factorization.
In other words, every integer can be expressed uniquely as a product of prime numbers.
Prime numbers are thus the \atoms" of N: they are \indivisible" { as they are divisible
only by 1 and themselves { and through them any other natural number can be expressed
uniquely. The importance of this result, which shows the centrality of prime numbers, can
be seen in its name. Its rst proof can be found in the famous Disquisitiones Arithmeticae,
published in 1801 by Carl Friederich Gauss, although Euclid was already aware of the result
in its essence.
Proof Let us start by showing the existence of this factorization. We will proceed by
contradiction. Suppose there are natural numbers > 1 that do not have a prime factorization
as in (1.12). Let n > 1 be the smallest among them. Obviously, n is a composite number.
There are then two natural numbers p and q such that n = pq with 1 < p; q < n. Since n
is the smallest number that does not admit a prime factorization, the numbers p and q do
admit such factorization. In particular, we can write
n0 n0 0
p = pn1 1 pn2 2 : : : pnk k and q = q1 1 q2 2 : : : qsns
Thus, we have
n0 n0 0
n = pq = pn1 1 pn2 2 : : : pnk k q1 1 q2 2 : : : qsns
By collecting the terms pi and qj appropriately, n can be rewritten as in (1.12). Hence, n
admits a prime factorization, which contradicts our assumptions on n, thus concluding the
proof of the existence.
Let us proceed by contradiction to prove uniqueness as well. Suppose that there are
natural numbers that admit more than one factorization. Let n > 1 be the smallest among
them: then n admits at least two di erent factorizations, so that we can write
n0 n0 0
n = pn1 1 pn2 2 : : : pnk k = q1 1 q2 2 : : : qsns
Since q1 is a divisor of n, it must be a divisor of at least one of the factors p1 < < pm .20
For example, let p1 be one such factor. Since both q1 and p1 are primes, we have that q1 = p1 .
Hence
n0 1 n0 0
pn1 1 1 pn2 2 : : : pnk k = q1 1 q2 2 : : : qsns < n
which contradicts the minimality of n, as the number pn1 1 1 pn2 2 : : : pnk k also admits multiple
factorizations. The contradiction proves the uniqueness of the prime factorization.
From a methodological viewpoint it must be noted that this proof of existence is carried
out by contradiction and, as such, cannot be constructive. Indeed, these proofs are based
20
This mathematical fact, although intuitive, requires a mathematical proof. This is indeed the content of
Euclid's Lemma, which we do not prove. This lemma permits to conclude that, if a prime p divides a product
of strictly positive integers, then it must divide at least one of them.
1.3. STRUCTURE OF THE INTEGERS 21
on the law of excluded middle { a property is either true or false (cf. Appendix D) {
and the truth of a statement is established by showing its non-falseness. This often allows
for such proofs to be short and elegant but, although logically air-tight,21 they are almost
metaphysical as they do not provide a procedure for constructing the mathematical entities
whose existence they establish (let alone an algorithm to compute them, when relevant).22
To sum up, we invite the reader to compare this proof of existence with the constructive
one provided for Theorem 22. This comparison should clarify the di erences between the two
fundamental types of proofs of existence, constructive/direct and non-constructive/indirect.
It is not a coincidence that the proof of the existence in the Fundamental Theorem of
Arithmetic is not constructive. Indeed, designing algorithms which allow us to factorize
a natural number n into prime numbers { the so-called factorization tests { is exceedingly
complex. After all, constructing algorithms which can assess whether n is prime or composite
{ the so-called primality tests { is already extremely cumbersome and it is to this day an
active research eld (so much so that an important result in this eld dates to 2002).23
To grasp the complexity of the problem it su ces to observe that, if n is composite, there
p p
are two natural numbers a; b > 1 such that n = ab. Hence, a n or b n (otherwise,
p
ab > n), so there is a divisor of n among the natural numbers between 1 and n. To verify
whether n is prime or composite, we can merely divide n by all natural numbers between 1
p
and n: if none of them is a divisor for n, we can safely conclude that n is a prime number,
p
or, if this is not the case, that n is composite. This procedure requires at most n steps.
With this in mind, suppose we want to test whether the number 10100 + 1 is prime or
composite p (it is a number with 101 digits, so it is big but not huge). The procedure requires
at most 10100 + 1 operations, that is, at most 1050 operations (approximately). Suppose we
have an extremely powerful computer which is able to carry out 1010 (ten billion) operations
per second. Since there are 31:536:000 seconds in a year, that is, approximately 3 107
seconds, our computer would be able to carry out approximately 3 107 1010 = 3 1017
operations in one year. To carry out the operations that our procedure might require, our
computer would need
1050 1
17
= 1033
3 10 3
years. We had better get started...
It should be noted that, if the prime factorization of two natural numbers n and m is
known, we can easily determine their greatest common divisor. For example, from
it easily follows that gcd (3801; 1708) = 7, which con rms the result of Euclid's Algorithm.
Given how di cult it is to factorize natural numbers, the observation is hardly useful from
a computational standpoint. Thus, it is a good idea to hold on to Euclid's Algorithm, which
21
Unless one rejects the law of excluded middle, as some eminent mathematicians have done (although it
constitutes a minority view and a very subtle methodological issue, the analysis of which is surely premature).
22
Enriques (1919) pp. 11-13 is an authoritative discussion of this issue.
23
One of the reasons why the study of factorization tests is an active research eld is that the di culty
in factorizing natural numbers is exploited by modern cryptography to build unbreakable codes (see Section
6.4).
22CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
thanks to Lame's Theorem is able to produce the greatest common divisors with reasonable
e ciency, without having to conduct any factorization.
Proof The proof is carried out by contradiction. Suppose that there are only nitely many
prime numbers and denote them by p1 < p2 < : : : < pn . De ne
q = p 1 p2 : : : p n
and set m = q + 1. The natural number m is larger than any prime number, hence it is a
composite number. By the Fundamental Theorem of Arithmetic, it is divisible by at least
one of the prime numbers p1 , p2 , ..., pn . Let us denote this divisor by p. Both natural
numbers m and q are thus divisible by p. It follows that also their di erence, that is, the
natural number 1 = m q, is divisible by p, which is impossible since p > 1. Hence, the
assumption that there are nitely many prime numbers is false.
In conclusion, we have looked at some basic notions in number theory, the branch of
mathematics which deals with the properties of integers. It is one of the most fascinating
and complex elds of mathematics, and it bears incredibly deep results, often easy to state
but hard to prove. A classic example is the famous Fermat's Last Theorem, whose statement
is quite simple: if n 3, there cannot exist three strictly positive integers x, y and z such that
xn +y n = z n . Thanks to Pythagoras' Theorem we know that for n = 2 such triplets of integers
do exist (for example, 32 + 42 = 52 ); Fermat's Last Theorem states that n = 2 is indeed the
only case in which this remarkable property holds. Stated by Fermat, the theorem was rst
proven in 1994 by Andrew Wiles after more than three centuries of unfruitful attempts.
(i) re exivity: a a;
(iv) completeness (or totality): for every pair a; b 2 R, we have a b or b a (or both);
ac bc if c > 0
ac = bc = 0 if c = 0
ac bc if c < 0
(vii) separation: given two sets of real numbers A and B, if a b for every a 2 A and
b 2 B, then there exists c 2 R such that a c b for every a 2 A and b 2 B.
The rst three properties have an obvious interpretation. Completeness guarantees that
any two real numbers can always be ordered. Additive independence ensures that the initial
ordering between two real numbers a and b is not altered by adding to both the same real
number c. Multiplicative independence considers, instead, the stability of such ordering with
respect to multiplication.
Finally, separation permits to separate two sets ordered by { that is, such that each
element of one of the two sets is greater than or equal to each element of the other one {
through a real number c, called separating element.24 Separation is a fundamental property
of \continuity" of the real numbers and it is what mainly distinguishes them from the rational
numbers (for which such property does not hold, as remarked in the last footnote) and makes
them the natural environment for mathematical analysis.
The strict form a > b of the \weak" inequality indicates that a is strictly greater than
b, i.e., a b and a 6= b. We have a > b if and only if b a, that is, the strict inequality
can be de ned as the negation of the weak inequality (of opposite direction). The reader
can verify that transitivity and independence (both additive and multiplicative) hold also
for the strict inequality >, while the other properties of the inequality do not hold for >.
(iii) the half-closed (or half-open) bounded intervals (a; b] = fx 2 R : a < x bg and [a; b) =
fx 2 R : a x < bg.
In these bounded intervals, the points a and b are called endpoints. Other important
intervals are:
24
The property p of separation holds also forpN and Z, but not for Q. For example, the sets of rationals
A = q 2 Q : q < 2 and B = q 2 Q : q > 2 do not have a rational separating element (as the reader
can verify in light of Theorem 18 and of what we will see in Section 1.4.3).
24CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
(iv) the unbounded intervals [a; 1) = fx 2 R : x ag and (a; 1) = fx 2 R : x > ag, and
their analogous ( 1; a] and ( 1; a).25 In particular, the positive half-line [0; 1) is
often denoted by R+ , while R++ denotes (0; 1), that is, the positive half-line without
the origin.
The use of the adjectives open, closed and unbounded will become clear in Chapter 5. To
ease notation, in the rest of the chapter (a; b) will denote both an open bounded interval and
the unbounded ones (a; 1), ( 1; b) and ( 1; 1) = R. Analogously, (a; b] and [a; b) will
denote both the half-closed bounded intervals and the unbounded ones ( 1; b] and [a; 1).
In words, a subset of the real line is an interval when, taken any two of its points, all
points between them also belong to the set. It is easy to see that, indeed, all the previous
examples of intervals have this \betweenness" property. They represent all the forms that
subsets of the real line having this property can take.
Though de ned through an order property, intervals admit a simple algebraic character-
ization, as readers can check.
h x 8x 2 A
while it is called lower bound of A if it is smaller than or equal to each element of A, that
is, if
h x 8x 2 A
For example, if A = [0; 1], the number 3 is an upper bound and the number 1 is a lower
bound since 1 x 3 for every x 2 [0; 1]. In particular, the set of upper bounds of A is
the interval [1; 1) and the set of the lower bounds is the interval ( 1; 0].
We denote by A the set of upper bounds of A and by A the set of lower bounds. In
the example just seen, A = [1; 1) and A = ( 1; 0].
(i) Upper and lower bounds do not necessarily belong to the set A: the upper bound 3
and the lower bound 1 for the set [0; 1] are an example of this.
(ii) Upper and lower bounds might not exist. For example, for the set of positive even
numbers
f0; 2; 4; 6; g (1.13)
there is no real number which is greater than all its elements: hence, this set does not
have upper bounds. Analogously, the set of negative even numbers
f0; 2; 4; 6; g (1.14)
has no lower bounds, while the set of integers Z is a simple example of a set without
upper and lower bounds.
Through upper and lower bounds we can give a rst classi cation of sets of the real line.
For example, the closed interval [0; 1] is bounded, the set (1.13) of positive even numbers
is bounded below but not above (indeed, it has no upper bounds),27 while the set (1.14) of
the negative ones is bounded above but not below.
Note that this classi cation of sets is not exhaustive: there exist sets that do not fall
in any of the types (i)-(iii) of the previous de nition. For example, Z has neither an upper
bound nor a lower bound in R, and therefore it is not of any of the types (i)-(iii). Such sets
are called unbounded.
x
^ x 8x 2 A
x
^ x 8x 2 A
27
By using Proposition 41 below, the reader can formally prove that, indeed, the set of positive even
numbers is unbounded above.
26CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Unfortunately, maxima and minima are fragile notions: sets often do not admit them.
Example 35 The half-closed interval [0; 1) has minimum 0 but has no maximum. Indeed,
suppose by contradiction that there exists a maximum x ^ 2 [0; 1), so that x
^ x for every
x 2 [0; 1). Set
1 1
x
~= x
^+ 1
2 2
Since x^ < 1, we have x
^<x ~. But, it is obvious that x
~ 2 [0; 1), which contradicts the fact
that x
^ is maximum of [0; 1). N
(i) the half-closed interval (0; 1] has maximum 1, but it has no minimum;
(ii) the open interval (0; 1) has neither minimum nor maximum.
The maximum of a set A is denoted by max A, and the minimum by min A. For example,
for A = [0; 1] we have max A = 1 and min A = 0.
Example 37 The set of upper bounds of [0; 1] is the interval [1; 1). In this example, the
equality (1.15) takes the form max [0; 1] = min [1; 1). N
Thus, when it exists, the maximum is the smallest upper bound. But, the smallest upper
bound { that is, min A { might exist also when the maximum does not exist. For example,
consider A = [0; 1): the maximum does not exist, but the smallest upper bound exists and
is 1, i.e., min A = 1.
All of this suggests that the smallest upper bound is the surrogate for the maximum
which we are looking for. Indeed, in the example just seen, the point 1 is, in absence of a
maximum, its closest approximation.
Reasoning in a similar way, the greatest lower bound, i.e., max A , is the natural can-
didate to be the surrogate for the minimum when the latter does not exist. Motivated by
what we have just seen, we give the following de nition.
De nition 38 Given a non-empty set A R, the supremum of A is the least upper bound
of A, that is, min A , while the in mum is the greatest lower bound of A, that is, max A .
Thanks to Proposition 36, both the supremum and the in mum of A are unique, when
they exist. We denote them by sup A and inf A. For example, for A = (0; 1) we have
inf A = 0 and sup A = 1.
As already remarked, when inf A 2 A, it is the minimum of A, and when sup A 2 A, it
is the maximum of A.
Although suprema and in ma may exist when maxima and minima do not, they do not
always exist.
Example 39 Consider the set A of the even numbers in (1.13). In this case A = ; and so
A has no supremum. More generally, if A is not bounded above, we have A = ; and the
supremum does not exist. In a similar way, the sets that are not bounded below have no
in ma. N
To be a useful surrogate, suprema and in ma must exist for a large class of sets; other-
wise, if also their existence were problematic, they would be of little help as surrogates.28
Fortunately, the next important result shows that suprema and in ma do indeed exist for a
large class of sets (with sets of the kind seen in the last example being the only troublesome
ones).
Theorem 40 (Least Upper Bound Principle) Each non-empty set A R has supre-
mum if it is bounded above and in mum if it is bounded below.
An immediate consequence of this result is that bounded sets have both supremum and
in mum.
Proof We limit ourselves to prove the supremum part, the other part being similarly proved.
To say that A is bounded above means that it admits an upper bound, i.e., that A 6= ;.
Since a h for every a 2 A and every h 2 A , by the separation property there exists a
28
The utility of a surrogate depends on how well it approximates the original, as well as on its availability.
28CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Except for the sets that are unbounded above, all the other sets in R admit supremum.
Analogously, except for the sets that are unbounded below, all the other sets in R have
in mum. Suprema and in ma are thus excellent surrogates that exist, and so help us, for a
large class of subsets of R.
We close with some notation. We write sup A = +1 (= 1) if a set A has no supremum
(in mum). Moreover, by convention, we set sup ; = 1 and inf ; = +1. This is motivated
by the fact that each real number can be viewed as both an upper and a lower bound of ;,
so sup ; = inf ; = inf R = 1 and inf ; = sup ; = sup R = + 1.
1.4.3 Density
The order structure is also useful to clarify the relations among the sets N, Z, Q and R. First
of all, we make rigorous a natural intuition: however great is a real number, there always
exists a greater natural number. This is the so-called Archimedean property of real numbers.
Proposition 41 For each real number a 2 R, there exists a natural number n 2 N such that
n a.
Proof By contradiction, assume that there exists a 2 R such that a n for all n 2 N.
By the Least Upper Bound Principle, sup N exists and belongs to R. Recall that, by the
de nition of sup,
sup N n 8n 2 N (1.16)
At the same time, again by the de nition of sup, we have sup N 1 < n for some n 2 N
(otherwise, sup N 1 would be an upper bound of N, thus violating the fact that sup N is
the least of these upper bounds). We conclude that sup N < n + 1 2 N, which contradicts
(1.16).
There is a fundamental di erence between the structures of N and Z, on the one side,
and of Q and R, on the other side. If we take an integer, we can talk in a natural way of
predecessor and successor: if m 2 Z, its predecessor is the integer m 1, while its successor
is the integer m + 1 (for example, the predecessor of 317 is 316 and its successor is 318). In
other words, Z has a discrete \rhythm".
In contrast, we cannot talk of predecessors and successors in Q or in R. Consider rst
Q. Given a rational number q = m=n, let q 0 = m0 =n0 be any rational such that q 0 > q. Set
1 0 1
q 00 = q + q
2 2
The number q 00 is rational since
1 m0 1 m 1 m0 n + mn0
q 00 = + =
2 n0 2 n 2 nn0
1.4. ORDER STRUCTURE OF R 29
Proposition 42 Given any two real numbers a < b, there exists a rational number q 2 Q
such that a < q < b.
This property can be stated by saying that Q is dense in R. In the proof of this result
we use the notion of integer part [a] of a real number a 2 p
R, which is the greatest integer
n 2 Z such that n a. For example, [ ] = 3, [5=2] = 2, 2 = 1, [ ] = 4 and so on.
The reader can verify that
[a + 1] = [a] + 1 (1.18)
since, for each n 2 Z, we have n a if and only if n + 1 a + 1. Moreover, [a] < a when
a2= Z.
Case 2: Let b a > 1, i.e., a < a + 1 < b. From Case 1 it follows that there exists q 2 Q
such that a < q < a + 1 < b.
Case 3: Let b a < 1. By the Archimedean property of real numbers, there exists 0 6= n 2 N
such that
1
n
b a
29
In his famous argument against plurality, Zeno of Elea remarks that a \plurality" is in nite because \...
there will always be other things between the things that are, and yet others between those others." (trans.
Raven; cf. Vlastos, 1996, pp. 241-248). Zeno thus identi es density as the characterizing property of an
in nite collection. With a (twenty ve centuries) hindsight, we can say that he is neglecting the integers.
Yet, it is stunning how he was able to identify a key property of in nite sets.
30CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
So, nb na = n (b a) 1. Then, for what we have just seen in cases 1 and 2, there exists
q 2 Q such that na < q < nb. Therefore a < q=n < b, which completes the proof because
q=n 2 Q.
(i) We have de ned ar only for a > 0 to avoid dangerous and embarrassing
q misunderstand-
3 2 3 p
ings. Think, for example, of ( 5) 2 . It could be rewritten as ( 5) = 2 125 or as
p
2 3
5 , which do not exist among the real numbers. But, it could also be written
q
3 6 p
as ( 5) = ( 5) which, in turn, can be expressed as either 4 ( 5)6 = 4 15; 625, or
2 4
p
4 6
5 . The former exists and is approximately equal to 11:180339, but the latter
does not exist.
p 1
(ii) Let us consider the root a = a 2 . From p high school we know that each positive number
has two algebraic roots, for example 9 = 3. The unique positive value of the root
is called, instead, arithmetical root. For example, 3 and 3 are the two algebraic roots
of 9, while 3 is its unique arithmetical root. In what follows the (even order) roots will
always be in the arithmetical sense (therefore, with a unique value). It is, by the way,
the standard convention: for example, in the classic solution formula
p
b b2 4ac
x=
2a
of the quadratic equation ax2 + bx + c = 0, the root is in the arithmetical sense (this
is why we need to write ).
We now extend the notion of power to the case ax , with 0 < a 2 R and x 2 R. Un-
fortunately, the details of this extension are tedious, so we limit ourselves to saying that, if
a > 1, the power ax is the supremum of the set of all the values aq when the exponent q
varies among the rational numbers such that q x. Formally,
In a similar way we de ne ax for 0 < a < 1. We have the following properties that, by (1.20),
follow from the analogous properties that hold when the exponent is rational.
1.5. POWERS AND LOGARITHMS 31
ax > ay if a > 1
ax < ay if a < 1
ax = ay = 1 if a = 1
The most important base a is Napier's constant e, which will be introduced in Chapter
8. As we will see, the power ex has truly remarkable { almost magical {properties.
Note that point (ii) of the lemma implies, inter alia, that
y
ax = by =) a = b x (1.21)
y y 3
for all a; b > 0 and x; y 2 R. Indeed, (b x )x = b x x = by . For instance, a2 = b3 implies a = b 2 ,
5
while a 3 = b5 implies a = b 3 . p m
A nal remark. Though sometimes we write n am instead of a n , one should not forget
that it is this latter notation which is best suited to carry out operations on powers, as
Lemma 43 just showed. So much p that Newton, in a letter sent to Leibniz in June 1676,30
p p 3 1 1 3
wrote that \instead of a, 3 a, a5 , etc. I write a 2 , a 3 , a 5 , and instead of 1=a, 1=a2 , 1=a3 ,
I write a 1 , a 2 , a 3 ."
1.5.2 Logarithms
The operations of addition and multiplication are commutative: a + b = b + a and ab = ba.
Therefore, they have only one inverse operation, respectively subtraction and division:
The power operation ab , with a > 0, is not commutative: ab might well be di erent from
ba .
Therefore, it has two distinct inverse operations.
Let ab = c. The rst inverse operation { given c and b, nd out a { is called root with
index b of c: p
a = b c = c1=b
The second one { given c and a, nd out b { is called logarithm with base a of c:
b = loga c
30
See Struik (2014) p. 286.
32CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Note that, together with a > 0 and c > 0, one must also have a 6= 1 because 1b = c is
impossible except when c = 1.
aloga c = c
The properties of the logarithms derive easily from the properties of the powers established
in Lemma 43.
The key property of the logarithm is to transform the product of two numbers in a sum
of two other numbers, that is, property (i) above. Sums are much easier to handle than
products, so the importance of logarithms also computationally (till the age of computers,
tables of logarithms were a most important aid to perform computations). To emphasize this
key property of logarithms, denote a strictly positive real number by an upper case letter
and its logarithm by the corresponding lower case letter; e.g., C = loga c. Then, we can
summarize property (i) as:
c d !C +D
The importance of this transformation can be hardly overestimated.32
Thanks to this change of base formula, it is possible to take as base of the logarithms
always the same number, say 10, because
log10 c
loga c =
log10 a
As for the powers ax , also for the logarithms the most common base is Napier's constant
e. In this case we simply write
log x
instead of loge x. Because of its importance, log x is called the natural logarithm of x, which
leads to the notation ln x sometimes used in place of log x.
The next result shows the close connections between logarithms and powers, which can
be actually viewed as inverse notions.
loga ax = x 8x 2 R
and
aloga x = x 8x > 0
We leave the simple proof to the readers. To check their understanding of the material
of this section, they may also want to verify that
bloga c = cloga b
To use positional notation, it is fundamental to adopt the 0 to signal an empty slot: for
example, when writing 4057 the zero signals the absence of the hundreds, that is,
Decimals are represented in a completely analogous fashion through the powers of 1=10 =
10 1 : for example 0:501625 is the abbreviation of
1 2 3 4 5 6
5 10 + 0 10 + 1 10 + 6 10 + 2 10 + 5 10
The choice of decimal notation is due to the mere fact that we have ten ngers, but
obviously is not the only possible one. Some Native American tribes used to count on their
hands using the eight spaces between their ngers rather than the ten ngers themselves.
They would have chosen only 8 digits, say
0; 1; 2; 3; 4; 5; 6; 7
and they would have articulated the integers along the powers of 8, that is 8, 64, 512, 4096,
. . . They would have written our decimal number 4357 as
1 2
4 0:125 + 1 0:0015625 = 4 8 +1 8 = 0:41
In general, given a base b and a set of digits
Cb = fc0 ; c1 ; :::; cb 1g
used to represent the integers between 0 and b 1, every natural number n is written in the
base b as
dk dk 1 d1 d0
where k is an appropriate natural number and
n = d k bk + d k 1b
k 1
+ + d1 b + d0
0; 1; 2; 3; 4; 5; 6; 7; 8; 9; |; •
We have used the symbols | and • for the two additional digits that we need compared to
the decimal notation. The duodecimal number
Duod. 0 1 2 3 4 5 6 7 8 9 | •
Dec. 0 1 2 3 4 5 6 7 8 9 10 11
One can note that the duodecimal notation 9|0•2 requires fewer digits than the decimal
188630, that is, ve instead of six. On the other hand, the duodecimal notation requires 12
symbols to be used as digits, instead of 10. It is a typical trade o one faces in choosing the
base in which to represent numbers: larger bases make it possible to represent numbers with
fewer digits, but require a large set of digits. The solution to the trade o , and the resulting
choice of base, depends on the characteristics of the application of interest.
For example, in electronic engineering it is important to have a set of digits which is as
simple as possible, with only two elements, as computers and electrical appliances are able to
handle only two digits (open or closed circuit, positive or negative polarity). For this reason,
the base 2 is incredibly common: it is the most e cient base in terms of the complexity of
the digit set C2 that only consists of the digits 0 and 1 { which are called bits, from binary
digits.
In binary notation, the integers can be written as
Dec. 0 1 2 3 4 5 6 7 8 9 10 11 16
Bin. 0 1 10 11 100 101 110 111 1000 1001 1010 1011 10000
1011 = 1 23 + 0 22 + 1 21 + 1 20
From a purely mathematical perspective, the choice of base is merely conventional, and
going from one base to another is easy (although tedious).33 Bases 2 and 10 are nowadays
the most important ones, but others have been used in the past, such as 20 (the number of
ngers and toes, a trace of which is still found in the French language where \quatre-vingts"
{ i.e., \four-twenties" { stands for eighty and \four-twenty-ten" stands for ninety), as well
33
Operations on numbers written in a non-decimal notation are not particularly di cult either. For exam-
ple, 11 + 9 = 20 can be calculated in a binary way as
1011+
1001 =
10100
It is su cient to remember that the \carrying" must be done at 2 and not at 10.
36CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
as 16 (the number of spaces between ngers and toes) and 60 (which is convenient because
it is divisible by 2, 3, 4, 5, 6, 10, 12, 15, 20 and 30; a signi cant trace of this system remains
in how we divide hours and minutes and in how we measure angles).
Positional notation has been used to perform manual calculations since the dawn of times
(just think about computations carried out with the abacus), yet it is a relatively recent
conquest in terms of writing, made possible by the fundamental innovation of the zero.
It has been exceptionally important in the development of mathematics and its countless
applications { commercial, scienti c and technological. Though already used in Babylonian
mathematics, the most fruitful formulation of positional notation emerged in India, around
the fth century AD. It was further developed during the early Middle Ages in the Arab world
(especially thanks to the works of Al-Khwarizmi), from which the name \Arabic numerals"
for the decimal digits (1.22) derives.34 It arrived in the Western world thanks to Italian
merchants between the eleventh and twelfth centuries. In particular, the son of one of
those merchants, Leonardo da Pisa (also known as Fibonacci),35 was the most important
medieval mathematician: for the rst time in Western Europe after so many dark centuries,
he conducted original research in mathematics with the overt ambition of going beyond
what the great mathematicians of the classical world had established.36 Inter alia, Leonardo
authored a famous treatise in 1202, the Liber abaci, which was the most important among
the rst essays that brought in Europe the positional notation.37 Until then, non-positional
Roman numerals were used
which made even trivial operations overly complex (try to sum up CXL and MCL, and then
140 and 1150).
Let us conclude with the incipit of the rst chapter of Liber abaci, with the extraordinary
innovation that the book brought to the Western world:
9; 8; 7; 6; 5; 4; 3; 2; 1
Cum his itaque novem guris, et cum hoc signo, quod arabice zephirum appellatur,
scribitur quilibet numerus, ut inferius demonstratur. [...] ut in sequenti cum
guris numeris super notatis ostenditur.
MI M M XXIII M M M XXII M M M XX M M M M M DC MMM
1001 2023 3022 3020 5600 3000
a + 1 = +1; a 1= 1 8a 2 R (1.23)
+1 + 1 = +1 and 1 1= 1
with, in particular,
(v) division:
a a
= =0 8a 2 R
+1 1
(vi) power of a real number:
8
>
> a+1 = +1 if a > 1
>
>
>
< a+1 = 0 if 0 < a < 1
>
> a 1 =0 if a > 1
>
>
>
: 1
a = +1 if 0 < a < 1
zephyr, any number can be written as shown below. [...] the above numbers are shown below in symbols
... And in this way you continue for the following numbers." Interestingly, Roman numerals continued to be
used in book keeping for a long time because they are more di cult to manipulate (just add a 0 to an Arabic
numeral in a balance sheet...).
38CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
While the addition of in nities with the same sign is a well-de ned operation (for example,
the sum of two positive in nities is again a positive in nity), the addition of in nities of
di erent sign is not de ned. For example, the result of +1 1 is not de ned. This is a
rst example of an indeterminate operation in R. In general, the following operations are
indeterminate:
1 0 and 0 ( 1) (1.25)
(iii) divisions with denominator equal to zero or with numerator and denominator that are
both in nities:
a 1
and (1.26)
0 1
with a 2 R;
The indeterminate operations (i)-(iv) are called forms of indetermination and will play
an important role in the theory of limits. Note that, by setting a = 0, formulas (1.26) include
the form of indetermination
0
0
O.R. As we have observed, the most natural geometric image of R is the (real) line: to each
point there corresponds a number and, vice versa, to each number there corresponds a point.
Yet, we can \transport" all the numbers from the real line to the open interval (0; 1), as the
following gure shows:39
39
We refer to the proof of Proposition 276 for the analytic expression of the bijection shown here.
1.8. THE BIRTH OF THE DEDUCTIVE METHOD: AN INTELLECTUAL REVOLUTION39
2
y
1.5
1
1
0.5 1/2
0
O x
-0.5
-1
-1.5
-2
-5 -4 -3 -2 -1 0 1 2 3 4 5
All the real numbers that found a place on the real line also nd a place on the interval (0; 1)
{ maybe packed, but they t all. Two points are left, the endpoints of the interval, to which
it is natural to associate, respectively, +1 and 1. The closed interval [0; 1] is, therefore,
a geometric image of R. H
Little is known about the birth of the deductive method, the survived documentation
is scarce. Reason emerged in the Ionian Greek colonies { rst in Miletus with Thales and
Anaximander { to guide the rst scienti c investigations of physical phenomena. It was,
however, in Magna Graecia that reason rst tackled abstract matters. This miracle, of which
we are all intellectual children, happened within the Eleatic philosophy that ourished at
Elea in the V century B.C. and had in Parmenides and Zeno its best known exponents.41
In Parmenides' famous doctrine of the Being, a turning point in intellectual history that the
reader might have encountered in some high school philosophy course, it is logic that permits
the study of the Being, that is, of the world of truth ( " ). This study is impossible for
the senses, which can only guide us among the appearances that characterize the world of
opinion ( o ). In particular, only the reason can dominate the arguments by contradiction,
which have no empirical substratum, but are the pure result of reason. Such arguments,
developed by the Eleatic school and at the center of its dialectics (culminated in the famous
paradoxes of Zeno),42 for example enabled the Eleatic philosopher Melissus of Samo to state
that the Being \always was what it was and always will be. For if it had come into being,
necessarily before it came into being there was nothing. But, if there was nothing, in no way
could something come into being from nothing".43
True knowledge is thus theoretic. Only the eye of the mind can see the truth through
a sustained, uncompromising, logical argumentation based on the law of excluded middle,
the fundamental principle of the Eleatic theory of knowledge (to paraphrase Parmenides,
what-is has to be, what-is-not cannot be). In contrast, an empirical analysis based on senses
necessarily stops at the appearance. The anti-empirical character of the Eleatic school could
have been decisive in the birth of the deductive method, at least in creating a favorable
intellectual environment.44 Naturally, it is not possible to exclude an opposite causality:
the deductive method could have been developed inside mathematics and could have then
in uenced philosophy, in particular the p Eleatics (allegedly Parmenides had Pythagorean
mentors).45 Indeed, the irrationality of 2 established by the Pythagorean school { the
other great Presocratic school of Magna Graecia { is a rst decisive triumph of such a
method in mathematics: only the eye of the mind could see such a property, which is devoid
of any \empirical" intuition. It is the eye of the mind that explains the inescapable error
in which incurs every empirical measurement of the hypotenuse of a right triangle with
catheti of unitary length: however accurate is this measurement, it will always be a rational
which can only be seen with the eye of the mind? (trans. Jowett).
41
Elea was a town of Magna Graecia, around 140 kilometers south of Naples and 300 kilometers north of
Crotone, the center of the Pythagorean school.
42
Cf. Vlastos (1996) p. 240.
43
Barnes (1982) calls this beautiful fragment the theorem of ungenerability (trans. Allho et al. in \Ancient
philosophy", Blackwell, 2008). In a less transparent way (but it was part of the rst logical argument ever
reported) Parmenides in his poem On Nature had written \And how might what is be then? And how might
it have come into being? For if it came into being, it is not, nor if it is about to be at some time" (trans.
Barnes). We refer to Calogero (1977) for a classic work on Eleatic philosophy, as well as to Barnes (1982)
and to the more recent Warren (2014) for general introductions to the Presocratics.
44
As advocated by Szabo (1978).
45
For instance, arguments by contradiction could have been developed within the Pythagorean school
through
p the odd-even dichotomy for natural numbers which is central in the proof of the irrationality of
2. This is what Cardini Timpanaro (1964) argues, contra Szabo, in her comprehensive book. See also pp.
258-259 in Vlastos (1996). Interestingly, the archaic Greek enigmas were formulated in contradictory terms
(their role in the birth of dialectics is emphasized by Colli, 1975).
1.8. THE BIRTH OF THE DEDUCTIVE METHOD: AN INTELLECTUAL REVOLUTION41
p
approximation of the true irrational distance, 2, with a consequent approximation error
(that, by the way, will probably vary from measurement to measurement).
In any case, between the VI and the V century B.C. two Presocratic schools of Magna
Graecia were the cradle of an intellectual revolution. In the III century B.C. Euclid of Alexan-
dria and another famous Magna Graecia scholar, Archimedes of Syracuse, led this revolution
to its maximum splendor in the classical world (and beyond).46 We close with Plato's fa-
mous (probably ctional) description of two protagonists of this revolution, Parmenides and
Zeno.47
They came to Athens ... the former was, at the time of his visit, about 65 years
old, very white with age, but well favoured. Zeno was nearly 40 years of age,
tall and fair to look upon: in the days of his youth he was reported to have been
beloved by Parmenides.
46
Hellenistic science, with intellectual center in Alexandria, reached an impressive level that remained
unmatched till the time of Galileo and Newton (cf. Russo, 1996).
47
In Plato's dialogue Parmenides (trans. Jowett reported in Barnes ibid.). A caveat: over the centuries
{ actually, over the millennia { the strict Eleatic anti-empirical stance (understandable, back then, in the
excitement of a new approach) has inspired a great deal of metaphysical thinking. Yet, reason without
empirical motivation and discipline becomes, at best, sterile. Already Aristotle lamented, in his treatise De
generatione et corruptione, the \devotion to abstract discussions ... unobservant of the facts" and famously
noted that \opinions appear to follow logically in a dialectic discussion, yet to believe them seems next door
to madness when one considers the facts" (trans. Joachim).
42CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION (SDOGANATO)
Chapter 2
On another label one reads: 1 year of aging and 10 degrees. In this case we can write
(1; 10)
The pairs (2; 12) and (1; 10) are called ordered pairs. In them we distinguish the rst element,
the aging, from the second one, the alcoholic content. In an ordered pair the position is,
therefore, crucial: a (2; 12) wine is very di erent from a (12; 2) wine (try the latter...).
Let A1 be the set of the possible years of aging and A2 the set of the possible alcoholic
contents. We can then write
De nition 47 Given two sets A1 and A2 , the Cartesian product A1 A2 is the set of all
the ordered pairs (a1 ; a2 ) with a1 2 A1 and a2 2 A2 .
In the example, we have A1 N and A2 N, i.e., the elements of A1 and A2 are natural
numbers. More generally, we can assume that A1 = A2 = R, so that the elements of A1
and A2 are real numbers, although with a possible di erent interpretation according to their
position. In this case
A1 A2 = R R = R2
43
44 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)
(i) (a1 ; a2 ) 2 R2 : a1 = 0 , that is, the set of the ordered pairs of the form (0; a2 ); it is
the vertical axis (or axis of the ordinates).
(ii) (a1 ; a2 ) 2 R2 : a2 = 0 , that is, the set of the ordered pairs of the form (a1 ; 0); it is
the horizontal axis (or axis of the abscissae).
(iii) (a1 ; a2 ) 2 R2 : a1 0 and a2 0 , that is, the set of the ordered pairs (a1 ; a2 ) with
both components that are positive; it is the rst quadrant of the plane (also called
positive orthant). In a similar way we can de ne the other quadrants:
y
3
II I
1
0
O x
-1
III IV
-2
-3 -2 -1 0 1 2 3 4 5
(iv) (a1 ; a2 ) 2 R2 : a21 + a22 1 and (a1 ; a2 ) 2 R2 : a21 + a22 < 1 , that is, the closed unit
ball and open unit ball, respectively (both centered at the origin and with radius one).1
1
The meaning of the adjectives \closed" and \open" will become clear in Chapter 5.
2.1. CARTESIAN PRODUCTS AND RN 45
(v) (a1 ; a2 ) 2 R2 : a21 + a22 = 1 , that is, the unit circle; it is the skin of the closed unit
ball:
4
x
2
3
0
O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
Before we classi ed wines according to two characteristics, aging and alcoholic content.
We now consider the slightly more complicated example of a portfolio of assets. Suppose
that there exist four di erent assets that can be purchased in a nancial market. A portfolio
is then described by an ordered quadruple
(a1 ; a2 ; a3 ; a4 )
where a1 is the amount of money invested in the rst asset, a2 is the amount of money
invested in the second asset, and so on. For example,
denotes a portfolio in which 1000 euros have been invested in the rst asset, 1500 in the
second one, and so on. The position is crucial: the portfolio
is very di erent from the previous one, although the amounts of money involved are the
same.
Since amounts of money are numbers that are not necessarily integers, possibly negative
(in case of sales), it is natural to assume A1 = A2 = A3 = A4 = R, where Ai is the set of the
possible amounts of money that can be invested in asset i = 1; 2; 3; 4. We have
(a1 ; a2 ; a3 ; a4 ) 2 A1 A2 A3 A4 = R4
In particular,
(1000; 1500; 1200; 600) 2 R4
In general, if we consider n sets A1 ; A2 ; :::; An we can give the following de nition.
46 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)
A1 A2 An
Q
denoted by ni=1 Ai (or by ni=1 Ai ), is the set of all the ordered n-tuples (a1 ; a2 ; :::; an ) with
a1 2 A1 ; a2 2 A2 ; ; a n 2 An .
When A1 = A2 = = An = A, we write
|A A {z A} = An
n times
R
| R {z R} = Rn
n times
An element
x = (x1 ; x2 ; :::; xn ) 2 Rn
is called vector.2 The Cartesian product Rn is called the (n-dimensional ) Euclidean space.
For n = 1, R is represented by the real line and, for n = 2, R2 is represented by the plane.
As one learns in high school, it was Descartes that in 1637 understood it { so all points of
the plane can be identi ed by a pair (a1 ; a2 ), as seen in a previous gure { a marvelous
insight that permitted to study geometry through algebra (this is why Cartesian products
are named after him). Also the vectors (a1 ; a2 ; a3 ) in R3 admit a graphic representation:
1 z
0.9
0.8
a
3
0.7
0.6
0.5
a
2
0.4 O
0.3 a
1
0.2 y
x
0.1
0
0 0.2 0.4 0.6 0.8 1
2
For real numbers we use the letter x instead of a.
2.2. OPERATIONS IN RN 47
However, this is no longer possible in Rn when n 4. The graphic representation may help
the intuition, but from a theoretical and computational viewpoint it has no importance: the
vectors of Rn , with n 4, are completely well-de ned entities. They actually turn out to
be fundamental in economics, as we will see in Section 2.4 and as the portfolio example
already showed. Indeed, \the economic world is a world of n dimensions", as Irving Fisher
remarked.3
Notation We will denote the components of a vector by the same letter used for the vector
itself, along with ad hoc indexes: for example a3 is the third component of the vector a, y7
the seventh component of the vector y, and so on.
2.2 Operations in Rn
Let us consider two vectors in Rn ,
x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn )
For example, for the two vectors x = (7; 8; 9) and y = (2; 4; 7) in R3 , we have
x = ( x1 ; x2 ; :::; xn )
Also in this case we have x 2 Rn . In other words, also through the operation of scalar
multiplication we constructed a new element of Rn .4
We have introduced in Rn two operations, addition and scalar multiplication, that extend
to vectors the corresponding operations for real numbers. Let us see their properties. We
start with addition.
3
See Fisher (1930) p. 237.
4
A real number is often called scalar. Throughout the book we will use the terms \scalar" and \real
number" interchangeably.
48 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)
(i) x + y = y + x (commutativity),
(ii) (x + y) + z = x + (y + z) (associativity),
Proof We prove (i), leaving the other properties to the reader. We have
as desired.
(iv) ( x) = ( ) x (associativity).
Proof We only prove (ii), the other properties are left to the reader. We have:
( + ) x = (( + ) x1 ; ( + ) x2 ; :::; ( + ) xn )
= ( x1 + x1 ; x2 + x2 ; :::; xn + xn )
= ( x1 ; x2 ; :::; xn ) + ( x1 ; x2 ; :::; xn ) = x + x
as claimed.
The last operation in Rn that we consider is the inner product. Given two vectors x and
y in Rn , their inner product, denoted by x y, is the scalar de ned by
x y = x1 y1 + x2 y2 + + xn yn
x y = 1 ( 2) + ( 1) 3 + 5 + ( 3) ( 1) = 5 2
The inner product is an operation that di ers from addition and scalar multiplication in a
structural aspect: while the latter operations determine a new vector of Rn , the result of the
inner product is a scalar. The next result gathers the main properties of the inner product
(we leave to the reader the simple proof).
(i) x y = y x (commutativity),
(ii) (x + y) z = (x z) + (y z) (distributivity),
(iii) x z= (x z) (distributivity).
Note that the two distributive properties can be summarized in the single property
( x + y) z = (x z) + (y z).
The study of the basic properties of the inequality on Rn reveals a rst important
novelty: when n 2, the order does not satisfy completeness. Indeed, consider for
example x = (0; 1) and y = (1; 0) in R2 : neither x y nor y x. We say, therefore, that
on Rn is a partial order (which becomes a complete order when n = 1).
It is easy to nd vectors in Rn that are not comparable. In the following gure the darker
area represents the points of R2 that are smaller than x = (1; 2), the clearer area those that
are greater than x, and the two white areas represent the points that are not comparable
50 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)
with x.
5
y
4
2
2
1
0
O 1 x
-1
-2
-2 -1 0 1 2 3 4 5
Apart from completeness, it is easy to verify that on Rn continues to enjoy the properties
seen for n = 1:
(i) re exivity: x x,
x y if >0 x z y z if z > 0
x y if <0 x z y z if z < 0
(v) separation: given two sets A and B in Rn , if a b for every a 2 A and b 2 B, then
there exists c 2 Rn such that a c b for every a 2 A and b 2 B.
Another notion that becomes surprisingly delicate when n 2 is that of strict inequality.
Indeed, given two vectors x = (x1 ; x2 ; :::; xn ) and y = (y1 ; y2 ; :::; yn ) of Rn , two cases can
happen.
1. All the components of x are than the corresponding components of y, with some of
them strictly greater; i.e., xi yi for each index i = 1; 2; :::; n, with xi > yi for at least
an index i.
2. All the components of x are > than the corresponding components of y; i.e., xi > yi
for each i = 1; 2; :::; n:
6
In Rn multiplicative independence holds with respect to both scalar and inner products (the asymmetric
position of and z is standard).
2.3. ORDER STRUCTURE ON RN 51
In the rst case we have a strict inequality, in symbols x > y; in the second case a strong
inequality, in symbols x y.
x y =) x > y =) x y
The three notions of inequality among vectors in Rn are, therefore, more and more
stringent. Indeed, we have:
(i) a weak notion, , that permits the equality between the two vectors;
(ii) an intermediate notion, >, that requires at least one strict inequality among the com-
ponents;
(iii) a strong notion, , that requires strict inequality among all the components of the
two vectors.
When n = 1, both > and reduce to the standard > on R. The \reversed" symbols ,
<, and are used for the converse inequalities.
An important comparison is that between a vector x and the zero vector 0. We say that
the vector x is:
(ii) strictly positive if x > 0, i.e., if all the components of x are positive and at least one
of them is strictly positive;
(iii) strongly positive if x 0, i.e., all the components of x are strictly positive.
N.B. The notation and terminology that we introduced is not the only possible one. For
example, some authors use =, >, and > in place of >, >, and ; other authors call \non-
negative" the vectors that we call positive, and so on. O
Together with the lack of completeness of , the presence of the two di erent notions of
strict inequality is the main novelty, relative to what happens in the real line, that we have
in Rn when n 2.
[a; b] = fx 2 Rn : a x bg = fx 2 Rn : ai xi bi g
52 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)
are often denoted by Rn+ and Rn++ , respectively. The intervals Rn = fx 2 Rn : x 0g and
Rn = fx 2 Rn : x 0g are similarly de ned. (ii) The intervals
Qn in Rn can be expressed as
Cartesian products of intervals in R; for example, [a; b] = i=1 [ai ; bi ]. (iii) In the intervals
just introduced we used the inequalities or . By replacing them with the inequality <,
we obtain other possible intervals that, however, are not that relevant for our purposes. O
2.4 Applications
Static choices Consider a consumer who has to choose how many kilograms of apples and
of potatoes to buy at the market. For convenience, we assume that these goods are in nitely
divisible, so that the consumer can buy any real positive quantity { for example, 3 kg of
apples and kg of potatoes. In this case, R+ is the set of the possible quantities of apples or
potatoes that can be bought. Therefore, the collection of all bundles of apples and potatoes
that the consumer can buy is
R2+ = R+ R+ = f(x1 ; x2 ) : x1 ; x2 0g
Graphically, it is the rst quadrant of the plane. In general, if a consumer chooses n goods,
the set of the bundles is represented by the Cartesian product
In production theory, a vector in Rn+ may represent a possible con guration of n inputs
for a producer. In this case the vector x = (x1 ; x2 ; ::; xn ) indicates that the producer has at
his disposal x1 units of the rst input, x2 units of the second input, ..., and xn units of the
last input.
apples in period 1, x2 is the quantity of apples in period 2, and so on, until xn which is the
quantity of apples in the n-th period.
In this case, Rn+ denotes the space of all streams of quantities of a given good, say apples,
over n periods. It is often used the more evocative notation RT , where T is the number of
periods and xt is the quantity of apples in period t, with t = 1; 2; : : : ; T .7 In a similar spirit,
may represent amounts of money in di erent periods: in this case stream x is called a cash
ow. For example, the checking account of a family records in each day the balance between
revenues (wages, incomes, etc.) and expenditures (purchases, rents, etc.). Setting T = 365,
the resulting cash ow is
x = (x1 ; x2 ; ::::; x365 )
So, x1 is the balance of the checking account on January 1st , x2 is the balance on January
2nd , and so on until x365 , which is the balance at the end of the year.
Instead of a stream of quantities of a single good, we can consider a stream of bundles of
several goods. Similarly, in an intertemporal problem of production, we will have streams of
inputs' vectors. Such situations are modeled by means of matrices, a simple notion that will
be studied in Chapter 15. Many economic applications focus, however, on the single good
case, so RT is an important space in economics.
Indeed, since is complete on the real line, requiring that all the points of A be x
^
amounts to require that none of them be > x ^. A similar reformulation can be given for
minima.
Interestingly, this equivalent characterization of the concept of maximum in R becomes
more general in Rn . Indeed, is no longer complete when n 2 and so the \if" is easily seen
to fail in Lemma 54. This motivates the next de nition, of great importance in economic
applications.
In a similar way we can de ne minimals, which are also called Pareto optima (like angels,
optima have no gender).
To understand the nature of maximals,8 say that a point x 2 A is dominated by another
point y 2 A if x < y, that is, if xi yi for each index i, with xi < yi for at least an
index i. A dominated point is thus outperformed by another point available in the set. For
instance, if they represent bundles of goods, a dominated bundle x is obviously a no better
alternative than the dominant one y.9 In terms of dominance, we can say that a point a of A
is maximal when is not dominated by any other point in A, so when is not outperformed by
any other alternative available in A. Maximality is thus the natural extension of the notion
of maximum when dealing { as it is often the case in applications { with alternatives that
are multi-dimensional (and so represented by vectors of Rn ).
The set in the next gure has a maximum, i.e., point a. Thanks to this lemma, a is
therefore also the unique maximal.
8
Here \maximal" is an adjective used as a noun, as it was the case for \maximum" in De nitions 33 and
53. If used as adjectives, we would have a \maximal element" and a \maximum element". Be that as it may,
in the rest of the chapter we focus on maxima and maximals, the most relevant in economic applications,
leaving to the reader the dual properties that hold for minima and minimals.
9
Here we are tacitly assuming the monotonicity of preferences over bundles that will be discussed in Section
6.8.
2.5. PARETO OPTIMA 55
Thus:
maximum =) maximal
But, the converse is false: there exist maximals that are not maxima, that is,
Example 57 In the set A = f(1; 2) ; (2; 1) ; (0; 0)g of the plane, the vectors (2; 1) and (1; 2)
are maximals that are not maxima. N
A vector can be both a maximal and a minimal, as the next example shows. So, the two
properties are not mutually exclusive.10
Example 58 In the binary set A = f(1; 2) ; (2; 1)g of the plane, the vectors (2; 1) and (1; 2)
are both maximals and minimals. N
The set A of the next example illustrates another fundamental di erence between maxima
and maximals in Rn with n > 1: the maximum of a set is, if it exists, unique while a maximal
might well not be unique.
Example 59 The next gure shows a set A of R2 that has no maxima, but in nitely many
10
A vector x can be both a maximum and a minimum of a set A if and only if A = fxg. So, a vector can
be simultaneously a maximum and a minimum only in the very special case of singleton sets.
56 CHAPTER 2. CARTESIAN STRUCTURE (SDOGANATO)
maximals.
3 a
2
A
0
O
-1
-2
-2 -1 0 1 2 3 4 5
It is easy to see that any point a 2 A on the dark edge is maximal: there is no x 2 A such
that x > a. On the other hand, a is not a maximum: we have a x only for the points
x 2 A that are comparable with a, which are represented in the shaded part of A :
Nothing can be said, instead, for the points that are not comparable with a (the non-shaded
part of A). The lack of maxima for this set is thus due to the fact that the order is
n
incomplete in R when n > 1. N
Summing up, because of the incompleteness of the order on Rn , maxima are much less
important than maximals in Rn . That said, maximals might also not exist: the 45 straight
line is a subset of R2 without maximals (and minimals).11
De nition 60 The set of the maximals of a set A Rn is called the Pareto (or e cient)
frontier of A.
In the last example, the dark edge is the Pareto frontier of the set A :
5
2
A
0
O
-1
-2
-2 -1 0 1 2 3 4 5
(see Example 187). The indi erence curves can be \packed" in the following way:
This is the classic Edgeworth box. By condition (2.1), we can think of a point (x1 ; x2 ) 2
[0; 1] [0; 1] as the allocation of Albert. We can actually identify each possible division
between the two agents with the allocations (x1 ; x2 ) of Albert. Indeed, the allocations of
Barbara (1 x1 ; 1 x2 ) are uniquely determined once those of Albert are known.
Each allocation (x1 ; x2 ) has utility ua (x1 ; x2 ) for Albert and ub (1 x1 ; 1 x2 ) for Bar-
bara. Let
be the set of all the utility pro les of the two agents determined by the division of the two
goods. We are interested in the allocations whose utility pro les belong to the Pareto frontier
of A, so are Pareto optima of the set A. Indeed, these are the allocations that cannot be
improved upon with a unanimous consensus.
By looking at the Edgeworth box, it is easy to see that the Pareto frontier P of A is
given by the values of allocations on the diagonal of the box, i.e.,
That is, by the locus of the tangency points of the indi erence curves (called contract curve).
To prove it rigorously, we need the next simple result.
2.5. PARETO OPTIMA 59
Since the last inequality is always true, we conclude that (2.2) holds. Moreover, these
equivalences imply that
p p
1 x1 x2 = (1 x1 ) (1 x2 ) () (x1 x2 )2 = 0
Having established this lemma, we can now prove rigorously what the last picture sug-
gested.
Proof Let D = (d; d) 2 R2+ : d 2 [0; 1] be the diagonal of the box. We start by showing
that, for any division of goods (x1 ; x2 ) 2
= D { i.e., with x1 6= x2 { there exists (d; d) 2 D such
that
(ua (d; d) ; ub (1 d; 1 d)) > (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) (2.3)
For Albert, we have
p p p
ua ( x1 x2 ; x1 x2 ) = x1 x2 = ua (x1 ; x2 )
p p
Therefore, ( x1 x2 ; x1 x2 ) is for him indi erent to (x1 ; x2 ). By Lemma 61, for Barbara we
have
p p p p
ub (1 x1 x2 ; 1 x1 x2 ) = 1 x1 x2 > (1 x1 ) (1 x2 ) = ub (1 x1 ; 1 x2 )
p
where the inequality is strict since x1 6= x2 . Therefore, setting d = x1 x2 , (2.3) holds.
It follows that the o -diagonal divisions (x1 ; x2 ) have utility pro les that are not Pareto
optima. It remains to show that the divisions on the diagonal are so. Let (d; d) 2 D and
suppose, by contradiction, that there exists (x1 ; x2 ) 2 [0; 1] [0; 1] such that
Suppose that14
that is,
p p p p
x1 x2 > dd = d and (1 x1 ) (1 x2 ) (1 d) (1 d) = 1 d
Therefore, p
p
1 x1 x2 < 1 d (1 x1 )(1 x2 )
which contradicts (2.2). It follows that there is no (x1 ; x2 ) 2 [0; 1] [0; 1] for which (2.4)
holds. This completes the proof.
In sum, if agents maximize their Cobb-Douglas utilities, the bargaining will result in
a division of the goods on the diagonal of the Edgeworth box, i.e., such that each agent
has an equal quantity of both goods. Proposition 62 does not say anything about which of
the points of the diagonal is, then, actually determined by the bargaining, that is, how the
ensuing con ict of interests among agents is then solved. Nevertheless, through the notion
of Pareto optimum we have been able to say something highly non-trivial about the problem
of division.
Chapter 3
In this chapter we study more in depth the linear structure of Rn that was introduced in
Section 2.2. The study of this fundamental structure of Rn , which we will continue with the
analysis of linear functions in Chapter 15, is part of linear algebra, an all-important topic
that is also at the heart of innumerable applications (to illustrate, a classic application of
linear algebra to nance will be seen in Section 24.6).
61
62 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)
(i) x + y 2 V if x; y 2 V ;
(ii) x 2 V if x 2 V and 2 R.
It is easy check that the two operations satisfy in V properties (v1)-(v8). Note that, by
(ii), the origin belongs to each vector subspace V { i.e., 0 2 V . Indeed, 0x = 0 for every
vector x 2 V .
The following characterization is useful when one needs to check whether a subset of Rn
is a vector subspace.
x+ y 2V (3.1)
Proof \Only if". Let V be a vector subspace and let x; y 2 V . As V is closed with respect
to scalar multiplication, we have x 2 V and y 2 V . It follows that x + y 2 V since V
is closed with respect to addition.
\If". Putting = = 1 in (3.1), we get x + y 2 V , while putting = 0 we get x 2 V .
Therefore, V is closed with respect to the operations of addition and scalar multiplication
inherited from Rn .
Putting = = 0, (3.1) implies that 0 2 V . This con rms that each vector subspace
contains the origin 0.
Example 65 There are two legitimate, yet trivial, subspaces of Rn : the singleton f0g and
the space Rn itself. In particular, the reader can check that a singleton fxg is a vector
subspace of Rn if and only if x = 0. N
M = fx 2 Rn : x1 = = xm = 0g
x + y = ( x1 + y1 ; :::; xn + yn )
= (0; :::; 0; xm+1 + ym+1 ; :::; xn + yn ) 2 M
In other words, M is the set of the solutions of this system of equations. It is a vector
subspace: the reader can check that, given x; y 2 M and ; 2 R, we have x + y 2 M .
Performing the computations,3 we nd that the vectors
10 2
t; 6t; t; t (3.2)
3 3
10 2
M= t; 6t; t; t :t2R
3 3
Di erently from the intersection, the union of vector subspaces is not in general a vector
subspace, as the next simple example shows.4
V1 [ V2 = x 2 R2 : x1 = 0 or x2 = 0
A nal remark. In the last proposition we intuitively used the intersection of an arbitrary
family of sets. Indeed, unions and intersections are easily de ned for any family whatsoever,
nite or not, of sets: if fAi gi2I is any such family, with a generic ( nite or in nite) index set
I, their union [
Ai
i2I
is the set of the elements that belong to at least one of the Ai , while their intersection
\
Ai
i2I
in which superscripts identify di erent vectors and subscripts their components. We use
immediately this notation in the next important de nition.
1 = 2 = = m =0
A set x1 ; :::; xm is, instead, said to be linearly dependent if it is not linearly independent,
i.e.,5 if there exists a set f 1 ; :::; m g of scalars, not all equal to zero, such that
1 2 m
1x + 2x + + mx =0
e1 = (1; 0; 0; :::; 0)
e2 = (0; 1; 0; :::; 0)
en = (0; 0; :::; 0; 1)
called standard unit vectors or versors of Rn . The set e1 ; :::; en is linearly independent.
Indeed
1
1e + + n en = ( 1 ; :::; n )
and so 1e
1 + + ne
n = 0 implies 1 = = n = 0. N
5
See Section D.7.3 of the Appendix for a careful logical analysis of this important negation.
3.2. LINEAR INDEPENDENCE AND DEPENDENCE 65
Example 72 All sets of vectors x1 ; :::; xm of Rn that include the zero vector 0 are linearly
dependent. Indeed, without loss of generality, set x1 = 0. Given a set f 1 ; :::; m g of scalars
with 1 6= 0 and i = 0 for i = 2; :::; m, we have
1 2 m
1x + 2x + + mx =0
m
which proves the linear dependence of the set xi i=1
. N
Therefore, 1 2 3
1x + 2x + 3x = 0 means
8
>
> +3 2+9 3 =0
< 1
1+ 2+ 3 =0
>
>
:
1 + 5 2 + 25 3 = 0
which is a system of equations whose unique solution is ( 1; 2; 3) = (0; 0; 0). More gener-
ally, to check if k vectors
x1 = x11 ; :::; x1n ; x2 = x21 ; :::; x2n ; :::; xk = (xk1 ; :::; xkn )
If ( 1 ; :::; k ) = (0; :::; 0) is the unique solution, then the vectors are linearly independent in
Rn . For example, consider in R3 the two vectors x1 = (1; 3; 4) and x2 = (2; 5; 1). The system
to solve is 8
>
> 1+2 2 =0
<
3 1+5 2 =0
>
>
:
4 1+ 2=0
It has the unique solution ( 1; 2) = (0; 0), so the two vectors x1 and x2 are linearly inde-
pendent. N
66 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)
10 2
t; 6t; t; t (3.3)
3 3
for each t 2 R. Therefore, (0; 0; 0; 0) is not the unique solution of the system, and so the
vectors x1 , x2 , x3 , and x4 are linearly dependent. Indeed, by setting for example t = 1 in
(3.3), the set of four numbers
10 2
( 1; 2; 3; 4) = ; 6; ;1
3 3
Example 75 Two vectors x1 and x2 that are linearly dependent are called collinear. This
happens when either one of the two vectors is 0 or there exists 6= 0 such that x1 = x2 . In
other words, when there exist two scalars 1 and 2 , not both zero, such that 1 x1 = 2 x2 .
Geometrically, two vectors in the plane are collinear when they belong to the same straight
line passing through the origin. N
The simple proof is left to the reader, who can also check that if we add vectors to a
linearly dependent set, the set remains linearly dependent.
The scalars i are called the coe cients of the linear combination.
Theorem 79 A nite set of vectors S of Rn , with at least two elements, is linearly dependent
if and only if there exists at least an element of S that is a linear combination of other
elements of S.6
m
Proof \Only if". Let S = xi i=1 be a linearly dependent set of Rn , with m 2. Then,
there exists a set f i gm
i=1 of scalars, not all zero, such that
1 2 m
1x + 2x + + mx =0
and so x1 is linear combination of the vectors x2 ; :::; xm . That is, vector x1 of S is a linear
combination of other elements of S.
m
\If". Suppose that the vector xk of a nite set S = xi i=1 is a linear combination of
other elements of S. Without loss of generality, assume k = 1. Then, there exists a set
f i gm 1 2
i=2 of scalars such that x = 2 x + + m xm . De ne the scalars f i gm
i=1 as follows
1 i=1
i =
i i 2
Pm
By construction, f i gm
i=1 is a set of scalars, not all zero, such that i=1 ix
i = 0. Indeed
m
X
i
ix = x1 + 2x
2
+ 3x
3
+ + mx
m
= x1 + x1 = 0
i=1
m
It follows that xi i=1
is a linearly dependent set.
Example 80 Consider the vectors x1 = (1; 3; 4), x2 = (2; 5; 1), and x3 = (0; 1; 7) in R3 .
Since x3 = 2x1 x2 , the third vector is a linear combination of the other two. By Theorem
79, the set x1 ; x2 ; x3 is linearly dependent. It is immediate to check that also each of the
vectors in the set x1 ; x2 ; x3 is a linear combination of the other two, something that, as
the next example shows, does not hold in general for sets of linearly dependent vectors. N
Corollary 82 A nite set S 6= f0g of Rn is linearly independent if and only if none of the
vectors in S is linear combination of other vectors in S.
The next result shows that span S has a \concrete" representation in terms of linear
combinations of S.
Proof We need to prove that x 2 Rn belongs to span S if and only if there P exist a nite
set xi i2I of vectors in S and a nite set f i gi2I of scalars such that x = i2I i xi . \If".
Let x 2 Rn be a linear combination of a nite set xi i2I of vectors of S. For simplicity,
set xi i2I = x1 ; :::; xk . There exists, therefore, a set f i gki=1 of real numbers such that
P
x = ki=1 i xi . By the de nition of vector subspace, we have 1 x1 + 2 x2 2 span S since
x1 ; x2 2 span S. In turn, 1
1x + 2x
2 2 span S implies 1
1x + 2x
2 + 3
3 x 2 span S,
Pk i
and by proceeding in this way we get that x = i=1 i x 2 span S, as claimed.
\Only if". Let V be the set of all vectors x 2 Rn that can be expressed as linear
combinations of vectors of S, that is, x 2 V if there exist nite sets xi i2I S and
i
P k
i2I
R such that x = i=1 i xi . It is easy to see that V is a vector subspace of Rn
containing S. It follows that span S V and so each x 2 span S is a linear combination of
vectors of S.
Before illustrating the theorem with some examples, we state a simple consequence.
In words, the vector subspace generated by a set does not change by adding to the set a
vector that is already a linear combination of its elements. The \generative" capability of a
set is not improved by adding to it vectors that are linear combinations of its elements.
3.4. GENERATED SUBSPACES 69
y
6
3
2
0
O 2 x
-2
-4
-6 -4 -2 0 2 4 6
N
70 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)
3.5 Bases
By Theorem 83, the subspace generated by a subset S of Rn is formed by all the linear
combinations of the vectors in S. Suppose that S is a linearly dependent set. By Theorem
79, some vectors in S are then linear combinations of other elements of S. By Corollary
84, such vectors are, therefore, redundant for the generation of span S. Indeed, if a vector
x 2 span S is a linear combination of vectors of S, then by Corollary 84 we have
(ii) all the vectors of S are essential for this representation, none of them is redundant.
m
X m
X
i i
x= ix = ix
i=1 i=1
Hence,
m
X
i
( i i) x =0
i=1
and, since the vectors in S are linearly independent, it follows that i i = 0 for every
i = 1; :::; m. That is, i = i for every i = 1; :::; m.
\If". Let S = x1 ; :::; xm and suppose that each x 2 Rn can be written in a unique way
as a linear combination of vectors in S. Clearly, by Theorem 83 we have Rn = span S. It
3.5. BASES 71
remains to prove that S is a linearly independent set. Suppose that the scalars f i gm
i=1 are
such that
Xm
i
ix = 0
i=1
Since we also have
m
X
0xi = 0
i=1
we conclude that i = 0 for every i = 1; :::; m because, by hypothesis, the vector 0 can be
written in only one way as a linear combination of vectors in S.
Example 90 The standard basis of Rn is given by the versors e1 ; :::; en . Each vector
x 2 Rn can be written, in a unique way, as a linear combination of these vectors. In
particular,
Xn
x = x1 e1 + + xn en = xi ei (3.4)
i=1
That is, the coe cients of the linear combination are the components of the vector x. N
Example 91 The standard basis of R2 is f(1; 0) ; (0; 1)g. But, there exist in nitely many
other bases of R2 : for example, S = f(1; 2) ; (0; 7)g is another basis of R2 . It is easy to prove
the linear independence of S. To show that span S = R2 , consider any vector x = (x1 ; x2 ) 2
R2 . We need to show that there exist 1 ; 2 2 R such that
1 = x1
2 1+7 2 = x2
Since
x2 2x1
1 = x1 ; 2 =
7
solve the system, we conclude that S is indeed a basis of R2 . N
Theorem 92 For each linearly independent set x1 ; :::; xk of Rn with k n, there exist
n
n k vectors xk+1 ; :::; xn such that the overall set xi i=1 is a basis of Rn .
Because of its importance, we give two di erent proofs of the result. They both require
the following lemma.
72 CHAPTER 3. LINEAR STRUCTURE (SDOGANATO)
x = c1 b1 + + cn bn
It follows that
span x; b2 ; :::; bn = span b1 ; b2 ; :::; bn = Rn
It remains to show that the set x; b2 ; :::; bn is linearly independent, so that we can conclude
that it is a basis of Rn . Let f i gni=1 R be coe cients for which
n
X
i
1x + ib =0 (3.5)
i=2
If 1 6= 0, we have
n
X n
X
i i i i
x= b = 0b1 + b
i=2 1 i=2 1
Since x can be written in a unique way as linear combination of the vectors of the basis
n
bi i=1 , one gets that c1 = 0, which contradicts the hypothesis c1 6= 0. This means that
1 = 0 and (3.5) simpli es to
Xn
1 i
0b + ib = 0
i=2
Proof 1 of Theorem 92 We proceed by ( nite) induction.7 Initial step: the theorem holds
for k = 1. Indeed, P consider a singleton fxg,8 with x 6= 0, and the standard basis e1 ; :::; en
of Rn . As x = ni=1 xi ei , there exists at least one index i such that xi 6= 0. By Lemma 93,
e1 ; :::; ei 1 ; x; ei+1 ; :::; en is a basis of Rn .
Induction step: suppose now that the theorem is true for each set of k 1 vectors; we
want to show that it is true for each set of k vectors. Let therefore x1 ; :::; xk be a set of k
linearly independent vectors. The subset x1 ; :::; xk 1 is linearly independent and has k 1
elements. By the induction hypothesis, there exist n (k 1) vectors y~k ; :::; y~n such that
7
See Appendix E for the induction principle.
8
A singleton fxg is linearly independent when x = 0 implies = 0, which amounts to require that x 6= 0.
3.5. BASES 73
n
x1 ; :::; xk ~k ; :::; y~n
1; y is a basis of Rn . Therefore, there exist coe cients f i gi=1 R such
that
k 1
X n
X
xk = i
ix + ~i
iy (3.6)
i=1 i=k
Proof 2 of Theorem 92 The theorem holds for k = 1 (see the previous proof). So, let
1<k n be the smallest integer for which the property is false. There exists a linearly
independent set x1 ; :::; xk such that there are no n k vectors of Rn that, added to
x1 ; :::; xk , yield a basis of Rn . Given that x1 ; :::; xk 1 is, in turn, linearly independent,
the minimality of k implies that there are xk ; :::; xn such that x1 ; :::; xk 1 ; xk ; :::; xn is a
basis of Rn . But then
xk = c1 x1 + + ck 1x
k 1
+ ck xk + + cn xn
is a basis of Rn , a contradiction.
Proof (i) It is enough to set k = n in Theorem 92. (ii) Let S = x1 ; :::; xk be a linearly
independent set in Rn . We want to show that k n. By contradiction, suppose k > n.
Then, x1 ; :::; xn is in turn a linearly independent set and by point (i) is a basis of Rn .
Hence, the vectors xn+1 ; :::; xk are linear combinations of the vectors x1 ; :::; xn , which,
by Corollary 82, contradicts the linear independence of the vectors x1 ; :::; xk . Therefore,
k n, which completes the proof.
Example 95 By point (i), any two linearly independent vectors form a basis of R2 . Going
back to Example 91, it is therefore su cient to verify that the vectors (1; 2) and (0; 7) are
linearly independent to conclude that S = f(1; 2) ; (0; 7)g is a basis of R2 . N
Proof Recall that Rn has a basis of n elements (e.g., the standard basis seen in Example
90). Let S = x1 ; :::; xk be any other basis of Rn . By de nition, S is linearly independent.
Then, by Corollary 94-(ii), it has at most n elements, that is, k n. To prove the result,
let us show that k = n. By contradiction, suppose that k < n. By Theorem 92, there exist
n k vectors xk+1 ; :::; xn such that the set x1 ; :::; xk ; xk+1 ; :::; xn is a basis of Rn . Yet,
since S = x1 ; :::; xk is a basis of Rn , any vector in xk+1 ; :::; xn is a linear combination
of vectors in S. This contradicts the linear independence of x1 ; :::; xk ; xk+1 ; :::; xn . We
conclude that k = n.
3.6 Dimension
The notions introduced in the previous section for Rn extend in a natural way to its vector
subspaces.
In other words, a linearly independent set S is a basis of the vector subspace that it
generates. Through it we can represent { without redundancies { each vector of the subspace
as a linear combination.
The results of the previous section continue to hold and are similarly proved.9 We start
with Theorem 89.
Theorem 101 All bases of a vector subspace of Rn have the same number of elements.
Although in view of Theorem 96 the result is not surprising, it remains of great elegance
because it shows how, despite their diversity, the bases share a fundamental characteristic
like the cardinality. This motivates the next important de nition, which was implicit in the
discussion that followed Theorem 96.
By Theorem 101, this number is unique. We denote it by dim V . It is the notion of dimen-
sion that, indeed, makes interesting this (otherwise routine) section, as the next examples
show.
Example 103 In the special case V = Rn we have dim Rn = n, which makes rigorous the
discussion that followed Theorem 96. N
Example 104 (i) The horizontal axis is a vector subspace of dimension one of R2 . (ii)
The plane M = x 2 R3 : x3 = 0 is a vector subspace of dimension two of R3 , that is,
dim M = 2. N
Example 105 If V = f0g, that is, if V is the trivial vector subspace formed only by the
origin 0, we set dim V = 0. Indeed, V does not contain linearly independent vectors (why?)
and, therefore, it has as basis the empty set f;g. N
in Example 67 through a simple high school argument. Consider x4 as a known term and
solve the system in x1 , x2 , and x3 ; clearly, we will get solutions that depend on the value of
the parameter x4 :
8 8
< 2x1 x2 + 2x3 + 2x4 = 0 < 2x1 x2 = 2x3 2x4
x1 x2 2x3 4x4 = 0 ) x1 x2 = 2x3 + 4x4 )
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< 2 (x2 + 2x3 + 4x4 ) x2 = 2x3 2x4 < x2 = 6x3 10x4
x1 + ( 2x3 2x4 2x1 ) = 2x3 + 4x4 ) x1 = 4x3 6x4 )
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< x2 = 6x3 10x4 < x2 = 6x3 10x4
x1 = 4x3 6x4 ) x1 = 4x3 6x4 )
: :
( 4x3 6x4 ) 2 ( 6x3 10x4 ) 2x3 10x4 = 0 x3 = 32 x4
8 2
8
< x2 = 6 3 x4 10x4 < x2 = 6x4
2
x1 = 4 x
3 4 6x 4 ) x1 = 10 3 x4
: 2 : 2
x3 = 3 x4 x3 = 3 x4
In conclusion, the vectors of R4 of the form (3.2) are the solutions of the system for every
t 2 R.
Chapter 4
The sum of the squares of the components of a vector is thus the inner product of the vector
with itself. This simple observation will be central in this chapter because it will allow us
to de ne the fundamental notion of norm using the inner product. In this regard, note that
x x = 0 if and only if x = 0: a sum of squares is zero if and only if all addends are zero.
Before studying the norm we introduce the absolute value, its scalar version which is
probably already familiar to the reader.
For example, j5j = j 5j = 5. These equalities exemplify the, readily checked, symmetry of
the absolute value:
jxj = j xj 8x 2 R (4.1)
77
78 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)
Geometrically, the absolute value represents the distance of a scalar from the origin. It
satis es the following elementary properties that the reader can verify:
Property (iii) includes as a special case the symmetry property (4.1) since
j xj = j( 1) xj = j 1j jxj = jxj
as it is easy to check. This formula is a useful algebraic characterization, via the square root,
of the absolute value. For instance, it delivers an alternative, algebraic, proof of the triangle
inequality. Indeed, for every x; y 2 R we have
q p p p p 2
jx + yj jxj + jyj () (x + y)2 x2 + y 2 () (x + y)2 x2 + y 2
p p
() x2 + y 2 + 2xy x2 + y 2 + 2 x2 y 2 () xy x2 y 2
() xy jxyj
Since xy jxyj trivially holds, we conclude that jx + yj jxj + jyj, thus proving the triangle
inequality.
For instance, max f5; 5g = j 5j = j5j = 5. A nice dividend of this characterization is the
following important equivalence
Indeed, max fx; xg < c if and only if x < c and x < c, i.e., if and only if x < c and
x > c. This equivalence also shows how absolute values may permit to write inequalities
more compactly, something that often will come in handy.
We close with a further nice and important inequality involving absolute values.
4.1. ABSOLUTE VALUE AND NORM 79
jxj = jx y + yj jx yj + jyj
i.e., jxj jyj jx yj. By interchanging the roles of x and y, it also holds jyj jxj jy xj =
jx yj. We conclude that
max fjxj jyj ; jxj + jyjg = max fjxj jyj ; jyj jxjg jx yj
In view of the order characterization (4.4), the inequality (4.6) then holds.
Inequality (4.6) is easily seen to be equivalent to jjxj jyjj jx + yj for all x; y 2 R. So,
we can combine it with the triangle inequality in a single expression by writing
for all x; y 2 R.
4.1.3 Norm
The notion of norm generalizes that of absolute value to Rn . Speci cally, the (Euclidean)
norm of a vector x 2 Rn , denoted by kxk, is given by
1
q
kxk = (x x) 2 = x21 + x22 + + x2n
When n = 1, the norm reduces to the absolute value; indeed, by (4.3) we have
p
kxk = x2 = jxj 8x 2 R
q p
For example, if x = 4 we have kxk = ( 4)2 = 16 = 4 = j 4j = jxj.
The norm thus extends to Rn the absolute value by levering on its square root, algebraic,
characterization (4.3).1 Geometrically, the norm of a vector x = (x1 ; x2 ) of the plane is the
length of the segment that joins it with the origin, that is, it is the distance ofpthe vector
from the origin. Indeed, this length is, by Pythagoras' Theorem, exactly kxk = x21 + x22 .
1
Later in the book (Section 20.1) we will study the modulus, an extension of the absolute value to Rn
based on its order characterization (4.4).
80 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)
The norm satis es some elementary properties that extend to Rn those of the absolute
value. The next result gathers the simplest ones.
(i) kxk 0;
(iii) k xk = j j kxk.
Proof We prove
p point (ii), leaving the other points to the reader. If x = 0 = (0; 0; :::; 0),
then kxk = 0 + 0 + + 0 = 0. Vice versa, if kxk = 0 then
Since x2i 0 for each i = 1; 2; :::; n, from (4.7) it follows that x2i = 0 for each i since a sum
of squares is zero if and only if they are all zero.
Property (iii) extends the property jxyj = jxj jyj of the absolute value. The famous
Cauchy-Schwarz inequality is a di erent, more subtle, extension of this property.
4.1. ABSOLUTE VALUE AND NORM 81
where a = y y, b = 2(x y) and c = x x. From high school algebra we know that at2 +bt+c 0
only if the discriminant = b2 4ac is smaller or equal than 0. Therefore,
0 = b2 4ac = 4(x y)2 4(x x)(y y) = 4((x y)2 kxk2 kyk2 ) (4.9)
Hence
(x y)2 kxk2 kyk2
and, by taking square roots of both sides, we obtain the Cauchy-Schwarz inequality (4.8).
It remains to prove that equality holds if and only if the vectors x and y are collinear.
\Only if". Let us assume that (4.8) holds as equality. Then, by (4.9), it follows that = 0.
Thus, there exists a point t^ where the parabola at2 + bt + c assumes value 0, i.e.,
2
0 = (x + t^y) (x + t^y) = x + t^y
By Proposition 108, this implies that x + t^y = 0, i.e., x = t^y. \If". If x and y are collinear,
then x = t^y for some scalar t^. Then, 0 = 0 0 =(x + t^y) (x + t^y). This implies that the
parabola at2 + bt + c, besides being always positive, assumes value 0 at the point t^, and thus
the discriminant must be zero. By (4.9), we deduce that (4.8) holds as equality.
The Cauchy-Schwarz inequality allows us to prove the triangle inequality for the norm,
thereby completing the extension to the norm of the properties (i)-(iv) of the absolute value.
Proof We proceed as in the (algebraic) proof of the triangle inequality for the scalar case.
Compared to it, we only need to resort to the Cauchy-Schwarz inequality for the very last
step that, instead, in the scalar case was trivially true. Note that
Since the last inequality follows from the Cauchy-Schwarz inequality, the statement follows
too.
Proof The proof is similar, mutatis mutandis, to that of (4.6). For each x; y 2 Rn it holds
kxk = kx y + yk kx yk + kyk
i.e., kxk kyk kx yk. By interchanging the roles of x and y, it also holds kyk kxk
ky xk = kx yk. We conclude that (4.11) holds.
4.2 Orthogonality
4.2.1 Normalized vectors
Vectors with norm p1, called
p unit vectors,
p play a special role in linear algebra. In the next
gure the vectors ( 2=2; 2=2) and ( 2
3=2; 1=2) are two unit vectors in R :
x
2
y
0
O
-1
-2
-3 -2 -1 0 1 2 3 4 5
is a unit vector: to \normalize" a vector is enough to divide it by its own norm. Indeed, we
have
x 1
= kxk = 1 (4.12)
kxk kxk
where, being kxk a scalar, the rst equality follows from Proposition 108-(iii).
4.2. ORTHOGONALITY 83
e1 = (1; 0; 0; ::; 0)
e2 = (0; 1; 0; :::; 0)
en = (0; 0; :::; 0; 1)
are the versors of Rn introduced in Chapter 3. To see their special status, note that in R2
they are
e1 = (1; 0) and e2 = (0; 1)
and lie on the horizontal and on the vertical axes, respectively. In particular, e1 ; e2
belong to the Cartesian axes of R2 :
0.8
0.6
2
+e
0.4
0.2
1 1
-e +e
0
O
-0.2
-0.4
2
-e
-0.6
-0.8
-1
-1 -0.5 0 0.5 1
4.2.2 Orthogonality
Through a simple trigonometric analysis, Appendix C.3 shows that two vectors x and y of
the plane can be regarded to be perpendicular when their inner product is zero, i.e., x y = 0.
This suggests the following de nition.
De nition 112 Two vectors x; y 2 Rn are said to be orthogonal (or perpendicular), written
x?y, if x y = 0.
From the commutativity of the inner product it follows that x?y is equivalent to y?x.
Proof We have
as desired.
The basic Pythagoras' Theorem is the case n = 2. Thanks to the notion of orthogonality,
we established a general version for Rn of this celebrated result of Greek mathematics.
De nition 115 A set of vectors of Rn is said to be orthogonal if its elements are pairwise
orthogonal vectors.
The set e1 ; :::; en of the versors is the most classic example of an orthogonal set. Indeed,
ei ej = 0 for every 1 i = 6 j n.
Proposition 116 Any orthogonal set that does not contain the zero vector is linearly inde-
pendent.
Proof Let x1 ; :::; xk be an orthogonal set of Rn and f 1 ; :::; k g a set of scalars such that
Pk i
i=1 i x = 0. We have to show that 1 = 2 = = k = 0. We have:
k k k
!
X X X
j j i
0= jx 0 = jx ix
j=1 j=1 i=1
k k
! k k
X X X X 2
j i 2
= jx ix = j xj xj = 2
j xj
j=1 i=1 j=1 j=1
where the penultimate equality uses the hypothesisP that the2 vectors are pairwise orthogonal,
2
i.e., xi xj = 0 for every i 6= j. Hence, 0 = ki=1 2i xi and so 2i xi = 0 for every
i = 1; :::; k. Since none of the vectors xi is zero, we thus have i = 0 for every i = 1; 2; :::; k.
This yields that 1 = 2 = = k = 0, as desired.
An orthogonal set of unit vectors is called orthonormal. The set e1 ; :::; en is, for
example, orthonormal. In general, given an orthogonal set x1 ; :::; xk of vectors of Rn , the
set
x1 xk
; :::;
kx1 k kxk k
3
In reading this result, recall that a set of vectors containing the zero vector is necessarily linearly dependent
(see Example 72).
4.2. ORTHOGONALITY 85
obtained by dividing each element by its norm is orthonormal. Indeed, by (4.12) each vector
xi = xi has norm 1, so is a unit vector, and for every i 6= j we have
xi xj 1
i j
= i xi xj = 0
kx k kx k kx k kxj k
x1 = (1; 1; 1) ; x2 = ( 2; 1; 1) ; x3 = (0; 1; 1)
Then p p p
x1 = 3; x2 = 6; x3 = 2
By dividing each vector by its norm, we get the orthonormal vectors
x1 1 1 1 x2 2 1 1 x3 1 1
= p ;p ;p ; = p ;p ;p ; = 0; p ;p
kx1 k 3 3 3 kx2 k 6 6 6 kx3 k 2 2
In particular, these three vectors form an orthonormal basis. N
The orthonormal bases of Rn , in primis the standard basis e1 ; :::; en , are the most
important bases of Rn because for them it is easy to determine the coe cients of the linear
combinations that represent the vectors of Rn , as the next result shows.
The coe cients y xi are called Fourier coe cients of y (with respect to the given
orthonormal basis).
Proof Since fx1 ; :::; xn g is a basis, there exist n scalars 1; 2 ; :::; n such that
n
X
i
y= ix
i=1
With respect to the standard basis e1 ; :::; en , each vector y = (y1 ; :::; yn ) 2 Rn has the
Fourier coe cients y ei = yi . In this case, (4.13) thus reduces to (3.4), i.e., to
n
X
y= yi ei
i=1
This way of writing vectors, which plays a key role in many results, is a special case of
the general expression (4.13). In other words, the components of a vector y are its Fourier
coe cients with respect to the standard basis.
For a change, the next example considers an orthonormal basis di erent from the standard
basis.
y = x1 y x1 + x2 y x2 + x3 y x3
9 1 1 1 3 2 1 1 1 1 1
=p p ;p ;p +p p ;p ;p +p 0; p ; p
3 3 3 3 6 6 6 6 2 2 2
p p p
Thus, 9= 3; 3= 6; 1= 2 are the Fourier coe cients of y = (2; 3; 4) with respect to this
orthonormal basis of R3 . N
k 2 k
X X 2
i
x = xi
i=1 i=1
Proof We proceed by induction. Initial step: by Pythagoras' Theorem, the result holds for
k = 2. Induction step: assume that it holds for k 1 (induction hypothesis), i.e.,
k 1 2 k 1
X X 2
i
x = xi (4.14)
i=1 i=1
Pk 1 i
We want show that this implies that it holds for k. Observe that, setting y = i=1 x , we
have y?xk . Indeed, !
k 1
X k 1
X
k i k
y x = x x = xi xk = 0
i=1 i=1
4.2. ORTHOGONALITY 87
k 2 k 1 2
X X 2 2
i
x = i
x +x k
= y + xk = kyk2 + xk
i=1 i=1
k 1 2 k 1 k
X 2 X 2 2 X 2
i
= x + xk = xi + xk = xi
i=1 i=1 i=1
as desired.
88 CHAPTER 4. EUCLIDEAN STRUCTURE (SDOGANATO)
Chapter 5
In this chapter we introduce the fundamental notion of distance between points of Rn that,
by formalizing the notion of \proximity", endows Rn with a topological structure.
5.1 Distances
The norm, studied in Section 4.1, allows us to de ne a distance in Rn . We start with n = 1,
when the norm is simply the absolute value jxj. Consider two points x and y on the real
line, with x > y:
The distance between the two points is x y, which is the length of the segment that joins
them. On the other hand, if we take any two points x and y on the real line, without knowing
their order (i.e., whether x y or x y), the distance becomes the absolute value
jx yj
and so the absolute value of the di erence represents the distance between the two points,
independently of their order. In symbols, we write
d (x; y) = jx yj 8x; y 2 R
In particular, d (0; x) = jxj and therefore the absolute value of a point x 2 R can be regarded
as its distance from the origin.
Let us now consider n = 2. Take two vectors x = (x1 ; x2 ) and y = (y1 ; y2 ) in the plane:
89
90 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
The distance between x and y is given by the length of the segment that joins them (in
boldface in the gure). By Pythagoras' Theorem, this distance is
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 (5.1)
since it is the hypotenuse of the right triangle whose catheti are the segments that join xi
and yi for i = 1; 2. The following gure illustrates:
The distance (5.1) is nothing but the norm of the vector x y (and also of y x), i.e.,
d (x; y) = kx yk
The distance between two vectors in R2 is, therefore, given by the norm of their di erence.
It is easy to see that, by applying again Pythagoras' Theorem, the distance between two
vectors x and y in R3 is given by
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 + (x3 y3 )2
De nition 121 The ( Euclidean) distance d (x; y) between two vectors x and y in Rn is the
norm of their di erence, i.e., d (x; y) = kx yk.
5.1. DISTANCES 91
In particular, d(x; 0) = kxk, which is the norm kxk of the vector x 2 Rn , can be regarded
as its distance from the vector 0 (i.e., the length of the segment that joins 0 and x).
Properties (i)-(iv) are natural for a notion of distance. Property (i) says that a distance
is always a positive quantity, which by (ii) is zero only between vectors that are equal (so,
the distance between distinct vectors is always strictly positive). Property (iii) says that
distance is a symmetric notion: in measuring a distance between two vectors, it does not
matter from which vector we take the measurement. Finally, property (iv) is the so-called
triangle inequality: for example, the distance between cities x and y cannot exceed the sum
of the distances between x and any other city z and between z and y: detours cannot reduce
the distance one needs to cover.
Example 123 (i) If x = 1=3 and y = 1=3, then
1 1 2 2
d (x; y) = = =
3 3 3 3
(ii) if x = a and y = a2 with a 2 R, then d (x; y) = d a; a2 = a a2 = jaj j1 aj;
p p
(iii) if x = (1; 3) and y = (3; 1), then d (x; y) = (1 3)2 + ( 3 ( 1))2 = 2 2;
(iv) if x = (a; b) and y = ( a; b) with a; b 2 R, then
p p p
d (x; y) = (a ( a))2 + (b b)2 = (2a)2 + 0 = 4a2 = 2 jaj
5.2 Neighborhoods
De nition 124 We call neighborhood of center x0 2 Rn and radius " > 0, denoted by
B" (x0 ), the set
B" (x0 ) = fx 2 Rn : d (x; x0 ) < "g
The neighborhood B" (x0 ) is, therefore, the locus of the points of Rn that lie at distance
strictly smaller than " from x0 .1
In R the neighborhoods are the open intervals (x0 "; x0 + "), i.e.,
Indeed,
where we have used (4.5), i.e., jxj < a () a < x < a. Graphically:
Hence, in R the neighborhoods are open intervals. It is easily seen that in R2 they are open
discs (so, without circumference), in R3 open balls (so, without surface) and so on. Indeed,
the points that lie at a distance strictly less than " from x0 form a open, so \skinless", ball
of center x0 . Graphically, in the plane we have:
2 ε
x
0
1
0
O
-1
-2
-3 -2 -1 0 1 2 3 4 5
Next we give some examples of neighborhoods. To ease notation we write B" (x1 ; ::; xn )
instead of B" ((x1 ; ::; xn )).
1
In the mathematical jargon, they are \" close" to x0 .
5.2. NEIGHBORHOODS 93
(iv) We have
B1 (1; 1; 1) = x 2 R3 : d (x; (1; 1; 1)) < 1
n p o
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1
For example, (1=2; 1=2; 1=2) 2 B1 (1; 1; 1). Indeed
2 2 2
1 1 1 3
1 + 1 + 1 = <1
2 2 2 4
Check that, instead, 0 = (0; 0; 0) 2
= B1 (1; 1; 1). N
N.B. Each point x0 of Rn has in nitely many neighborhoods B" (x0 ), one per each value of
the radius " > 0. O
In the real line, sometimes we will use \half neighborhoods" of a point x0 . Speci cally:
De nition 126 Given " > 0, the interval [x0 ; x0 + ") is called the right neighborhood of
x0 2 R of radius ", while the interval (x0 "; x0 ] is called the left neighborhood of x0 of
radius ".
Through them we can give a useful characterization of suprema and in ma of subsets of
the real line (Section 1.4.2).
Proposition 127 Let A R be bounded above. We have a = sup A if and only if
(i) a x for every x 2 A,
(ii) for every " > 0, there exists x 2 A such that x > a ".
Thus, point a 2 R is the supremum of A R if and only if (i) it is an upper bound of A
and (ii) in each left neighborhood of a there are elements of A. A similar characterization
holds for in ma of lower bounded sets by replacing left neighborhoods with right ones.
Proof \Only if". If a = sup A, (i) is obviously satis ed. Let " > 0. Since sup A > a ", the
point a " is not an upper bound of A. Therefore, there exists x 2 A such that x > a ".
\If". Suppose that a 2 R satis es (i) and (ii). By (i), a is an upper bound of A. By (ii),
it is also the least upper bound. Indeed, each b < a can be written as b = a ", by setting
" = a b > 0. Given b < a, by (ii) there exists x 2 A such that x > a " = b. Therefore, b
is not an upper bound of A, which implies that there is no upper bound smaller than a.
94 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
The set of the interior points of A is called the interior of A and is denoted by int A. By
de nition, int A A. The set of the exterior points of A is then int Ac .
Example 129 Let A = (0; 1). Each point of A is interior, that is, int A = A. Indeed, let
x 2 (0; 1). Consider the smallest distance of x from the two endpoints 0 and 1 of the interval,
i.e., min fd (0; x) ; d (1; x)g. Let " > 0 be such that " < min fd (0; x) ; d (1; x)g. Then
Therefore, x is an interior point of A. Since x was arbitrarily chosen, it follows that int A = A.
It is easy to check that the set of exterior points is int Ac = ( 1; 0) [ (1; 1). N
Example 130 Let A = [0; 1]. We have int A = (0; 1). Indeed, by proceeding as above we
see that the points in (0; 1) are all interior, that is, (0; 1) int A. It remains to check the
endpoints 0 and 1. Consider 0. Its neighborhoods have the form ( "; "), so they contain also
points of Ac . It follows that 0 2
= int A. Similarly, 1 2= int A. We conclude that int A = (0; 1).
The set of the exterior points is Ac , i.e., int Ac = Ac (as the reader can easily verify). N
A point x0 is, therefore, a boundary point for A if all its neighborhoods contain both
points of A (because it is not exterior) and points of Ac (because it is not interior). The set
of the boundary points of a set A is called the boundary or frontier of A and is denoted by
@A. Intuitively, the frontier is the \border" of a set.
Example 132 (i) Let A = (0; 1). Given the residual nature of the de nition of boundary
points, to determine @A we need to nd the interior and exterior points. From Example 129,
we know that int A = (0; 1) and int Ac = ( 1; 0) [ (1; 1). It follows that
@A = f0; 1g
i.e., the boundary of (0; 1) is formed by the two endpoints 0 and 1. Note that A \ @A = ;:
in this example the boundary points do not belong to the set A.
(ii) Let A = [0; 1]. In Example 130 we have seen that int A = (0; 1) and int Ac = Ac .
Therefore, @A = f0; 1g. Here @A A, the set A contains its own boundary points.
(iii) Let A = (0; 1]. The reader can verify that int A = (0; 1) and int Ac = ( 1; 0) [
(1; 1). Hence, @A = f0; 1g. In this example, the frontier is partly outside and partly inside
the set: the boundary point 1 belongs to A, while the boundary point 0 does not. N
In view of this example, the boundary points of a bounded interval are easily seen to be
its endpoints (which may or may not belong to the interval).
All the points such that x21 + x22 < 1 are interior, that is,
while all the points such that x21 + x22 > 1 are exterior, that is,
The closed unit ball thus contains all its own boundary points. N
Example 134 Let A = Q be the set of rational numbers, so that Ac is the set of the
irrational numbers. By Propositions 19 and 42, between any two rational numbers q < q 0
there exists an irrational number a such that q < a < q 0 and between any two irrational
96 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
numbers a < b there exists a rational number q 2 Q such that a < q < b. The reader
can check that this implies int A = int Ac = ;, and so @A = R. This example shows that
the interpretation of the boundary as a \border" can be misleading in some cases. Indeed,
mathematical notions have their own life and we must be ready to follow them also when
our intuition may fall short. N
Lemma 135 Let A R be a bounded set. Then sup A 2 @A and inf A 2 @A.
Proof We prove that = sup A 2 @A (the proof for the in mum is similar). Consider any
neighborhood ( "; + ") of . We have ( ; + ") Ac , so ( "; + ") \ Ac 6= ;.
Moreover, by Proposition 127 for every " > 0 there exists x0 2 A such that x0 > ", so
that ( "; ] \ A 6= ;. Thus, ( "; + ") \ A 6= ;. We conclude that, for every " > 0, we
have both ( "; + ") \ A 6= ; and ( "; + ") \ Ac 6= ;, that is, 2 @A.
As the terminology suggests, isolated points are \separated" from the rest of the set.
Example 137 Let A = [0; 1] [ f2g. It consists of the closed unit interval and, in addition,
of the point 2. This point is isolated. Indeed, if B" (2) is a neighborhood of 2 with " < 1,
then A \ B" (2) = f2g. N
As anticipated, we have:
Proof We begin with a simple observation. Let 1 = (1; :::; 1) be the vector with components
equal to 1. Consider x0 2 Rn . Note that x0 + t1 6= x0 as long as t 6= 0. Moreover,
p
d (x0 + t1;x0 ) = jtj k1k = t n
p
for all t 2 R. This implies that, if " > 0 and t 2 (0; "= n), then x0 + t1 2 B" (x0 ).
That said, let now x0 be an isolated point of A. We want to show that, for each " > 0,
B" (x0 ) \ A 6= ; and B" (x0 ) \ Ac 6= ;. Since x0 is isolated, there exists " > 0 such that
A \ B" (x0 ) = fx0 g. Thus, x0 2 A. Fix any " > 0. Trivially x0 2 B" (x0 ), so B" (x0 ) \ A 6= ;.
p
At the same time, if t = min f"; "g =2 n, then x0 + t1 2 B" (x0 ) as well as x0 + t1 2 B" (x0 )
and x0 + tminf";"g 1 2
= A (otherwise, x0 6= x0 + t1 2 A \ B" (x0 ), a contradiction with x0 being
isolated). This proves that B" (x0 ) \ Ac 6= ;, concluding the proof since " > 0 was arbitrarily
chosen.
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 97
Hence, x0 is a limit point of A if, for every " > 0, there exists some x 2 A such that
0 < kx0 xk < ".2 The set of limit points of A is denoted by A0 and is called the derived
set of A. Note that limit points are not required to belong to the set.
Proof (i) If x0 2 int A, there exists a neighborhood B"0 (x0 ) of x0 such that B"0 (x0 ) A.
Let B" (x0 ) be any neighborhood of x0 . The intersection
is, in turn, a neighborhood of x0 of radius min f"0 ; "g > 0. Hence Bminf"0 ;"g (x0 ) A and,
to complete the proof, it is su cient to consider any x 2 Bminf"0 ;"g (x0 ) such that x 6= x0 .
Indeed, x belongs also to the neighborhood B" (x0 ) and it is distinct from x0 .
(ii) \If". Consider a boundary point x0 which is not an isolated point. By the de nition
of boundary points, for every " > 0 we have B" (x0 ) \ A 6= ;. Since x0 is not isolated,
for every " > 0 we have B" (x0 ) \ A 6= fx0 g. This implies that for every " > 0 we have
(B" (x0 ) fx0 g) \ A 6= ;, i.e., that x0 is a limit point of A.
\Only if". Take a point x0 that is both a boundary point and a limit point, i.e., x0 2
@A \ A0 . Each neighborhood B" (x0 ) contains at least a point x 2 A distinct from x0 , that
is, B" (x0 ) \ A 6= fx0 g. It follows that x0 is not isolated.
In view of this result, we can say that the set A0 of the limit points consists of the interior
points of A as well as of the boundary points of A that are not isolated. Therefore, a point
of a set A is either a limit or an isolated point, tertium non datur.
Example 141 (i) The points of the interval A = [0; 1) are all limit points since A0 = [0; 1]
A. Note that the limit point 1 does not belong to A. (ii) The points of the closed unit ball
A = (x1 ; x2 ) 2 R2 : x21 + x22 1 are all limit points since A0 = A. (iii) For the set A = Q it
holds A0 = R. In words, the real numbers are the limits points of the set of rational numbers.
Indeed, let x 2 R. For each " > 0, there exists q" 2 Q such that q" 2 B" (x) = (x "; x + ")
because between any two real numbers, here x " and x + ", there exists a rational number,
here q" (cf. Example 134). N
2
The inequality 0 < kx0 xk is equivalent to the condition x 6= x0 , so it is a way to require that x is a
point of A distinct from x0 .
98 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
4
x
2
3
2 2
0
-1 O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
N
The de nition of limit point requires that its neighborhoods contain at least one point
of A other than itself. As next we show, they actually contain in nitely many of them.
Proposition 143 Each neighborhood of a limit point of A contains in nitely many points
of A.
Proof Let x be a limit point of A. Suppose, by contradiction, that there exists a neigh-
borhood B" (x) of x containing a nite number of points fx1 ; :::; xn g of A distinct from x.
Since the set fx1 ; :::; xn g is nite, the minimum distance mini=1;:::;n d (x; xi ) exists and is
strictly positive, i.e., mini=1;:::;n d (x; xi ) > 0. Let > 0 be such that < mini=1;:::;n d (x; xi ).
Clearly, 0 < < " since mini=1;:::;n d (x; xi ) < ". Hence, B (x) B" (x). It is also clear, by
construction, that xi 2 = B (x) for each i = 1; 2; :::; n. So, if x 2 A we have B (x) \ A = fxg.
Instead, if x 2= A we have B (x) \ A = ;. Regardless of whether x belongs to A or not, we
thus have B (x) \ A fxg. Therefore, the unique point of A that B (x) may contain is x
itself. But, this contradicts the hypothesis that x is a limit point of A.
O.R. The concept of interior point of a set A requires the existence of a neighborhood of the
point that is entirely formed by points of A. This means that it is possible to move away, at
least a bit, from the point by remaining inside A { i.e., it is possible go for a \little walk"
in any direction without showing the passport. Retracing one's steps, it is then possible to
approach the point from any direction by remaining inside A.
The concept of limit point of a set A does not require the point to belong to A but
requires, instead, that we can get as close as we want to the point by \jumping" on points
of the set (by jumping on river stones, we can get as close as we want to our target through
stones that all belong to the set). This idea of approaching a point by remaining within a
given set will be crucial to de ne limits of functions. H
5.4. OPEN AND CLOSED SETS 99
De nition 144 A set A in Rn is called open if all its points are interior, that is, if int A =
A.
Thus, a set is open if it does not contains its borders (so it is skinless).
Example 145 The open interval (a; b) is open (hence the name). Indeed, let x 2 (a; b). Let
" > 0 be such that
" < min fd (x; a) ; d (x; b)g
We have B" (x) (a; b), so x is an interior point of (a; b). Since x was arbitrarily chosen, it
follows that (a; b) is open. N
Example 146 The set x 2 R2 : 0 < x21 + x22 < 1 is open. Graphically, it is the ball de-
prived of both the skin and the origin:
4
x
2
3
0
O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
The neighborhoods in R are all of the type (a; b) and so they are all open. The next
result shows that, more generally, all neighborhoods are open in Rn .
Proof Let B" (x0 ) be a neighborhood of a point x0 2 Rn . To show that B" (x0 ) is open, we
have to show that all its points are interior. Let x 2 B" (x0 ). To prove that x is interior to
B" (x0 ), let
0 < "0 < " d (x; x0 ) (5.2)
100 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
Then B"0 (x) B" (x0 ). Indeed, let y 2 B"0 (x). Then
where the last inequality follows from (5.2). Therefore B"0 (x) B" (x0 ), which completes
the proof.
Clearly, A A. The closure of A is, thus, an \enlargement" of A that includes all its
boundary points, that is, the borders. Naturally, the notion of closure becomes relevant
when the borders are not already part of A.
Example 149 (i) If A = [0; 1) R, then A = [0; 1]. (ii) If A = (x1 ; x2 ) 2 R2 : x21 + x22 1
is the closed unit ball, then A = A. N
De nition 151 A set A in Rn is called closed if it contains all its boundary points, that is,
if A = A.
Hence, a set is closed when it includes its border (so it has a skin).
Example 152 (i) The set A = [0; 1) is not closed since A 6= A, while the closed unit
ball A = (x1 ; x2 ) 2 R2 : x21 + x22 1 is closed since A = A. (ii) The closed interval
[a; b] is closed (hence the name). The unbounded intervals (a; 1) and ( 1; a) are open.
The unbounded intervals [a; 1) and ( 1; a] are closed. (iii) The circumference A =
(x1 ; x2 ) 2 R2 : x1 + x2 = 1 is closed because A = @A = A0 = A. N
5.4. OPEN AND CLOSED SETS 101
Open and closed sets are dual notions, as the next result shows.3
Proof \Only if". Let A be open. We show that Ac is closed. Let x be a boundary point
of Ac , that is, x 2 @Ac . By de nition, x is not an interior point of either A or Ac . Hence,
x2 = int A. But, A = int A because A is open. Therefore x 2 = A, that is, x 2 Ac . It follows
that @A c c c
A since x 2 @A . Therefore, Ac = A , which proves that Ac is closed.
c
Example 154 The nite sets of Rn (so, the singletons) are closed. To verify it, let A =
fx1 ; x2 ; :::; xn g be a generic nite set. Its complement Ac is open. Indeed, let x 2 Ac . If
" > 0 is such that
" < d (x; xi ) 8i = 1; :::; n
then B" (x) Ac . So, x is an interior point of Ac . Since x was arbitrarily chosen, it follows
that Ac is open. As the reader can check, we also have int A = ; and @A = A. N
4
x
2
3
0 -1 2
O x
1
-1
-1
-2
-3 -2 -1 0 1 2 3 4 5
f(2; 1)g [ f(x1 ; x2 ) 2 R2 : x2 = x21 g [ f(x1 ; x2 ) 2 R2 : (x1 + 1)2 + (x2 + 1)2 1=4g
of R2 . N
3
Often, a set is de ned to be closed when its complement is open. It is then proved as a theorem that a
closed set contains its boundary. In other words, the de nition and the theorem are switched relative to the
approach that we have chosen.
102 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
Open and closed sets are, therefore, two sides of the same coin: a set is closed (open) if
and only if its complement is open (closed). Naturally, there are many sets that are neither
open nor closed. Next we can give a simple example of such a set.
Example 156 The set A = [0; 1) is neither open nor closed. Indeed, int A = (0; 1) 6= A and
A = [0; 1] 6= A. N
There is a case in which the duality of open and closed sets takes a curious form.
Example 157 The empty set ; and the whole Rn are simultaneously open and closed. By
Theorem 153, it is su cient to show that Rn is both open and closed. But, this is obvious.
Indeed, Rn is open because, trivially, all its points are interior (all neighborhoods are included
in Rn ) as well as closed because it trivially coincides with its own closure. It is possible to
show that ; and Rn are the unique sets with such double personality. N
Let us go back to the notion of closure A. The next result shows that it can be equivalently
seen as the addition to the set A of its limit points A0 . In other terms, adding the borders
turns out to be equivalent to adding the limit points.
Proof We need to prove that A [ A0 = A [ @A. We rst prove that A [ A0 A [ @A. Since
A A [ @A, we have to prove that A0 A [ @A. Let x 2 A0 . In view of what we observed
after the proof of Lemma 140, x is either an interior or a boundary point, so x 2 A [ @A.
We conclude that A [ A0 A [ @A.
It remains to show that A [ @A A [ A0 . Since A A [ A0 , we have to prove that
@A A [ A0 . Let x 2 @A. If x is an isolated point, then by de nition x 2 A. Otherwise,
by Lemma 140 x is a limit point for A, that is, x 2 A0 . Hence, x 2 A [ A0 . This proves
A [ @A A [ A0 , and so the result.
A corollary of this result is that a set is closed when it contains all its limit points. This
sheds further light on the nature of closed sets.
Corollary 159 A set A in Rn is closed if and only if it contains all its limit points.
Example 160 The inclusion A0 A in this corollary can be strict, in which case the set
A A0 consists of the isolated points of A. For example, let A = [0; 1] [ f 1; 4g. Then A
is closed and A0 = [0; 1]. Hence, A0 is strictly included in A and the set A A0 = f 1; 4g
consists of the isolated points of A. N
The proof relies on a duality, via complements, between the closure and the interior of a
set established in the next lemma.
Proof We rst prove that Ac = int Ac . Given a set A, recall that a point x can be, with
respect to this set, either interior or boundary or exterior (three mutually exclusive and
exhaustive options). Moreover, a point is exterior if and only if it belongs to int Ac . Since
int Ac Ac , this implies that if x 2
= int Ac then both x 2
= A and x 2 = Ac ,
= @A, that is, x 2
proving that int Ac Ac . Similarly, since int A A, if x 2 = Ac , then x 2
= int A and x 2= @A,
yielding that x 2 c
= int A and proving that A c int A . We conclude that Ac = int Ac .
c
Proof of Proposition 161 (i) We rst show that the set int A is open. If int A is empty,
we are done. Otherwise, we need to show that every point of int A is an interior point of
int A, i.e., for each x 2 int A there exists " > 0 such that B" (x) int A. Since x belongs to
int A, by de nition it is an interior point of A, i.e., there exists " > 0 such that B" (x) A.
Let y 2 B" (x). We show that y is also an interior point of A, that is, y 2 int A. In turn,
this proves that B" (x) int A and so that int A is open. Set = " d (x; y) > 0. If
z 2 B (y), then d (x; z) d (x; y) + d (y; z) < d (x; y) + = ", proving that z 2 B" (x). Thus,
B (y) B" (x) A and so y 2 int A.
We next show that, if G is an open subset of A, then G int A, proving that int A is the
largest open set contained in A. Let x 2 G. Since G is open, there exists an " > 0 such that
B" (x) G. Since G A, we conclude that B" (x) A, that is, x 2 int A and G int A.
(ii) By what has been just proved, int Ac is open. In view of Lemma 162, A is then closed
because is the complement of an open set (Theorem 153). To complete the proof, let F be
a closed superset of A. We want to show that F A, proving that A is the smallest closed
set that contains A. If F A, then F c Ac . By Theorem 153 and by point (i), F c is open
c
and so F c int Ac = Ac , that is, F = (F c )c Ac = A.
The set of interior points int A is, therefore, the largest open set that approximates A
\from inside", while the closure A is the smallest closed set that approximates A \from
outside". The relation (5.4) is, therefore, the best topological sandwich { with lower open
slice and upper closed slice { that we can have for the set A.4
It is now easy to prove an interesting and intuitive property of the boundary of a set.
Proof Let A be any set in Rn . Since the exterior points to A are interior to its complement,
we have (@A)c = int A [ int Ac . So, @A is closed because int A and int Ac are open and, as
we will momentarily see in Theorem 165, a union of open sets is open.
The next result, whose proof is left to the reader, shows that the di erence between the
closure and the interior of a set is given by its boundary points.
This result makes rigorous the intuition that open sets are sets without borders (or
skinless). Indeed, it implies that A is open if and only if @A \ A = ;. On the other hand, by
de nition, a set is closed if and only if @A A, that is, when it includes the borders (it has
a skin).
It is, however, no longer true for intersections of in nitely many neighborhoods. For example,
in R we have
\1 \1
1 1
B 1 (x0 ) = x0 ; x0 + = fx0 g (5.5)
n n n
n=1 n=1
i.e., this intersection reduces to the singleton fx0 g, which is closed (Example 154). Therefore,
the intersection of in nitely many neighborhoods might well not be open.
T
To check (5.5), note that a point belongs to the intersection 1 n=1 BT
1=n (x0 ) if and only if
it belongs to each neighborhood B1=n (x0 ). This is true for x0 , so x0 2 1 n=1 B1=n (x0 ). This
is, however, the unique pointTthat satis es this property. Indeed, suppose by contradiction
that y 6= x0 is such that y 2 1 n=1 B1=n (x0 ). Since y 6= x0 , we have d (x0 ; y) > 0. If we take
n su ciently large, in particular if
1
n>
d (x0 ; y)
then its reciprocal 1=n will be su ciently small so to have
1
0< < d (x0 ; y)
n
5.5. SET STABILITY 105
T
Therefore, y 2= B1=n (x0 ), which contradicts the assumption
T1 that y 2 1 n=1 B1=n (x0 ). It
follows that x0 is the only point in the intersection n=1 B1=n (x0 ), i.e., (5.5) holds.
More generally, in the case of in nitely many neighborhoods B"i (x0 ), if supi "i < +1 we
set " = supi "i , so that
1
[
B"i (x0 ) = B" (x0 )
i=1
For example, in R we have
1
[ 1
[ 1 1
B 1 (x0 ) = x0 ; x0 + = B1 (x0 )
n n n
n=1 n=1
Summing up, nite intersections of neighborhoods are open sets and so are their arbitrary
unions. The next result shows that these properties of stability continue to hold for all open
sets.
Theorem 165 The intersection of a nite family of open sets is open, while the union of
any family ( nite or not) of open sets is open.
T
Proof Let A = ni=1 Ai with each Ai open. Each point x 2 A belongs to all sets Ai and is
interior to all of them (because theyT are open), i.e., there exist neighborhoods B"i (x) of x
such that B"i (x) Ai . Put B = ni=1 B"i (x). The set B is still a neighborhood of x { with
radius " = min f"1 ; :::; "n g { and B Ai for each i. So, B is a neighborhood of x contained
in A. Therefore,
S A is open.
Let A = i2I Ai , where i runs over a nite or in nite index set I. Each x 2 A belongs to
at least one of the sets Ai , say to A{ . Since all sets Ai are open, there exists a neighborhood
of x contained in A{ , and so in A. Therefore, x is interior to A and, given the arbitrariness
of x, A is open.
By Theorem 153 and by the De Morgan laws, it is easy to prove that dual properties
hold for closed sets.
106 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
Corollary 166 The union of a nite family of closed sets is closed, while the intersection
of any family ( nite or not) of closed sets is closed.
In general, in nite unions of closed sets are not closed: for example, for the closed sets
[1
An = [ 1 + 1=n; 1 1=n] we have An = ( 1; 1).
n=1
jxj < K 8x 2 A
The next de nition is the natural extension of this idea to Rn , where the absolute value is
replaced by the more general notion of norm.
kxk < K 8x 2 A
By recalling that kxk is the distance of x from the origin d(x; 0), it is easily seen that a
set A is bounded if, for every x 2 A, we have d(x; 0) < K, i.e., all its points have distance
from the origin smaller than K.5 So, a set A is bounded if is contained in a neighborhood
BK (0) of the origin, geometrically if it can be inscribed in a large enough open ball.
The neighborhoods of the origin can thus be seen as the prototypical bounded sets, used
as benchmarks to test, via set inclusion, the boundedness of any set. This brings to mind a
de nition at the beginning of Spinoza's Ethica: \A thing is said to be nite in its kind if it
can be limited by another thing of the same nature."6
Example 168 (i) Neighborhoods and their closures (5.3) are bounded sets: it is su cient
to take K > ". In contrast, (a; 1) is a simple example of an unbounded set (for this reason,
it is called unbounded open interval). (ii) Subsets of bounded sets are, in turn, easily seen
to be bounded. N
Proposition 169 A set A is bounded if and only if there exists K > 0 such that, for every
x = (x1 ; :::; xn ) 2 A, we have
jxi j < K 8i = 1; :::; n
Proof We prove the \if" and leave the converse to the P reader. Let x 2 A. If jxi j < K for
1; :::; n, then x2i < K for all i = 1; :::; n. So, ni=1 x2i < nK. In turn, this implies
all i = q
Pn p p
kxk = 2
i=1 xi < nK. Since x was arbitrarily chosen in A, by setting K 0 = nK it
follows that kxk < K 0 for each x 2 A, so A is bounded.
For example, all intervals [a; b] that are closed and bounded in R are compact.7 More
generally, the closure B" (x0 ) of a neighborhood in Rn is compact. For example,
is a classic compact set in Rn , called closed unit ball. It generalizes to Rn the notion of closed
ball unit ball that in Section 2.1 we presented in R2 (if the inequality is strict we have the
open unit ball, which instead is an open set).
Like closedness, compactness is stable under nite unions and arbitrary intersections, as
the reader can check.8
Example 171 Finite sets { so, the singletons { are compact. Indeed, in Example 154 we
showed that they are closed sets. Since they are obviously bounded, they are then compact.
N
Example 172 Provided there are no free goods, budget sets are a fundamental example of
compact sets in consumer theory, as Proposition 992 will show. N
For instance, the boundary @A of a compact set A is a closed subset of A (cf. Corollary
163) and so is a compact set. The boundary of the closed unit ball
is another classic compact set of Rn , called unit sphere. It generalizes to Rn the unit circle
in R2 .
7
The empty set ; is considered a compact set.
8
Since the empty set is compact, the intersection of two disjoint compact sets is the empty (so, compact)
set.
108 CHAPTER 5. TOPOLOGICAL STRUCTURE (SDOGANATO)
Theorem 174 A set C in Rn is closed if and only if it contains the limit of every convergent
sequence of its points. That is, C is closed if and only if
fxn g C; xn ! x =) x 2 C (5.6)
Proof \Only if". Let C be closed. Let fxn g C be a sequence such that xn ! x. We want
to show that x 2 C. Suppose, by contradiction, that x 2 = C. Since xn ! x, for every " > 0
there exists n" 1 such that xn 2 B" (x) for every n n" . Therefore, x is a limit point for
C, which contradicts x 2= C because C is closed and so contains all its limit points.
\If". Let C be a set for which property (5.6) holds. By contradiction, suppose C is not
closed. Then, there exists at least a boundary point x of C that does not belong to C. Since
it cannot be an isolated point (otherwise it would belong to C), by Lemma 140 x is a limit
point for C. Each neighborhood B1=n (x) does contain a point of C, call it xn . The sequence
of such points xn converges to x 2 = C, contradicting (5.6). Hence, C is closed.
This characterization is important: a set is closed if and only if \it is closed with respect
to the limit operation", that is, if we never leave the set by taking limits of sequences. This
is a main reason why in applications sets are often assumed to be closed: otherwise, one
could get arbitrarily close to a point x without being able to reach it, a \discontinuity" that
applications typically do not feature (it would be like licking the windows of a pastry shop
without being able to reach the, close yet unreachable, pastries).
Example 175 Let us show that the interval [a; b] is closed using Theorem 174. Let fxn g C
be such that xn ! x 2 R. By Theorem 174, to show that C is closed it is su cient to show
that x 2 C. Since a xn b, a simple application of the comparison criterion shows that
a x b, that is, x 2 C. N
9
This section can be skipped at a rst reading, and be read only after having studied sequences in Chapter
8.
Chapter 6
Functions (sdoganato)
In other words, if the shopkeeper buys 10 kg of walnuts he will pay them 4 euros per kg,
if he buys 20 kg he will pay them 3:9 euros per kg, and so on. As it is often the case, the
dealer o ers quantity discounts: the higher the quantity purchased, the lower the unit price.
The table is an example of a supply function that associates to each quantity the
corresponding selling price, where A = f10; 20; 30; 40g is the set of the quantities and
B = f4; 3:9; 3:8; 3:7g is the set of their unit prices. The supply function is a rule that,
to each element of the set A, associates an element of the set B.
In general, we have:
De nition 177 Given any two sets A and B, a function de ned on A and with values in
B, denoted by f : A ! B, is a rule that associates to each element of the set A one, and
only one, element of the set B.
We write
b = f (a)
to indicate that, to the element a 2 A, the function f associates the element b 2 B. Graph-
ically:
109
110 CHAPTER 6. FUNCTIONS (SDOGANATO)
The rule can be completely arbitrary; what matters is that it associates to each element
a of A only one element b of B.1 The arbitrariness of the rule is the key feature of the notion
of function. It is one of the fundamental ideas of mathematics, key for applications, which
has been fully understood not so long ago: the notion of function that we just presented was
introduced in 1829 by Dirichlet after about 150 years of discussions (the rst ideas on the
subject go back at least to Leibniz at the end of the seventeenth century).
Note that it is perfectly legitimate that the same element of B is associated to two (or
more) di erent elements of A, that is,
Legitimate
1
We have emphasized in italics the most important words: the rule must hold for each element of A and,
to each of them, it must associate only one element of B.
6.1. THE CONCEPT 111
that is,
Illegitimate
In terms of the supply function in the initial example, di erent quantities of walnuts might
well have the same unit price (e.g., there are no quantity discounts), but the same quantity
cannot have di erent unit prices!
Before considering some examples, we introduce a bit of terminology. The two variables a
and b are called the independent variable and the dependent variable, respectively. Moreover,
the set A is called the domain of the function, while the set B is its codomain.
The codomain is the set in which the function takes on its values, but not necessarily
contains only such values: it might well be larger. In this respect, the next notion is impor-
tant: given a 2 A, the element f (a) 2 B is called the image of a. Given any subset C of the
domain A, the set
f (C) = ff (a) : a 2 Cg B (6.1)
of the images of the points in C is called the image of C. In particular, the set f (A) of
all the images of points of the domain is called image (or range) of the function f , denoted
Im f . Therefore, Im f is the subset of the codomain formed by the elements that are actually
image of some element of the domain:
Im f = f (A) = ff (a) : a 2 Ag B
Note that any set that contains Im f is a possible codomain for the function: if Im f B
and Im f C, then writing both f : A ! B and f : A ! C is ne. The choice of codomain
is, ultimately, a matter of convenience. For example, throughout this book we will often
consider functions that take on real values, that is, f (a) 2 R for each a in the domain of f .
In this case, the most convenient choice for the codomain is the entire real line, so we will
usually write f : A ! R.
Example 178 (i) Let A be the set of all countries in the world and B a set containing some
colors. If the function f : A ! B associates on a geographic map to each country one of
these colors, then Im f is the set of the colors used at least once on the map.
(ii) The rule that associates to each living human being his date of birth is a function
f : A ! B, where A is the set of the human beings and, for example, B is the set of the dates
of the last 150 years (a codomain su ciently large to contain all the possible birthdates). N
112 CHAPTER 6. FUNCTIONS (SDOGANATO)
Example 179 Consider the rule that associates to each positive scalar x both the positive
p p
and the negative square roots, that is, f x; xg. For example, it associates to 4 the
elements f 2; 2g. This rule does not describe a function f : [0; 1) ! R because, to each
element of the domain di erent from 0, two di erent elements of the codomain are associated.
N
Example 180 The cubic function f : R ! R de ned by f (x) = x3 is a rule that associates
to each scalar its cube. Since each scalar has a unique cube, this rule de nes a function.
Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
5
y
4
1
1
0
-1 O 1 x
-1
-2
-3 -2 -1 0 1 2 3 4
In this case, to two di erent elements of the domain may correspond the same element of
the codomain: for example, f (1) = f ( 1) = 1. N
The clause \is a rule that" is usually omitted, so we will do from now on.
p
Example 182 The square root function f : [0; 1) ! R de ned by f (x) = x associates
to each positive scalar its (arithmetic) square root. The domain is the positive half-line and
Im f = [0; 1). Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
N
114 CHAPTER 6. FUNCTIONS (SDOGANATO)
Example 183 The logarithmic function f : (0; 1) ! R de ned by f (x) = loga x, a > 0
and a 6= 1, associates to each strictly positive scalar its logarithm. Its domain is (0; 1),
while Im f = R. Graphically, for a > 1:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
Example 184 The absolute value function f : R ! R de ned by f (x) = jxj associates to
each scalar its absolute value. This function has domain R, with Im f = [0; 1). Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
Example 185 Let f : R f0g ! R be de ned by f (x) = 1= jxj for every scalar x 6= 0.
6.1. THE CONCEPT 115
Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
Here the domain is A = R f0g, the real line without the origin. Moreover, Im f = (0; 1).
N
f (x1 ; x2 ) = x1 + x2 (6.2)
associates to each vector x = (x1 ; x2 ) 2 R2 the sum of its components.4 For every x 2 R2 ,
such sum is unique, so the rule de nes a function with Im f = f (R2 ) = R.
(ii) The function f : Rn ! R de ned by
n
X
f (x1 ; x2 ; ; xn ) = xi
i=1
associates to each vector x = (x1 ; x2 ) 2 R2+ the square root of the product of the components.
For each x 2 R2+ , this root is unique, so the rule de nes a function with Im f = R+ .
(ii) The function f : Rn+ ! R de ned by
n
Y
f (x1 ; x2 ; ; xn ) = xi i
i=1
4
To be consistent with the notation adopted for vectors, we should write f ((x1 ; x2 )). But, to ease notation
we write f (x1 ; x2 ).
116 CHAPTER 6. FUNCTIONS (SDOGANATO)
P
with the exponents i > 0 such that ni=1 i = 1, generalizes to Rn the function of two
variables (6.3) { which is the special case with n = 2 and 1 = 2 = 1=2. It is widely used
in economics with the name of Cobb-Douglas function. N
f (x1 ; x2 ) = (x1 ; x1 x2 )
For example, if (x1 ; x2 ) = (2; 5), then f (x1 ; x2 ) = (2; 2 5) = (2; 10) 2 R2 .
(ii) De ne f : R3 ! R2 by
b=f(a)
The names of the variables are altogether irrelevant: we can indi erently write a = f (b),
or y = f (x), or s = f (t), or = f ( ), etc., or also = f ( ): these names are just
placeholders, what matters is only the sequence of operations (almost always numerical)
that lead from a to b = f (a). Writing b = a2 + 2a + 1 is exactly the same as writing
y = x2 + 2x + 1, or s = t2 + 2t + 1, or = 2 + 2 + 1, or even = 2 + 2 + 1. This
function is identi ed by the operations \square + double + 1" that allow us to move from
the independent variable to the dependent one. H
We close this introductory section by making rigorous the notion of graph of a function,
until now used intuitively. For the quadratic function f (x) = x2 the graph is the parabola
5
y
4
1
1
0
-1 O 1 x
-1
-2
-3 -2 -1 0 1 2 3 4
that is, the locus of the points x; x2 of the plane, as x varies in the real line { which is
the domain of the function. For example, the points ( 1; 1), (0; 0) and (1; 1) belong to the
parabola.
Gr f = f(x; f (x)) : x 2 Ag A B
Graphically:
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
(ii) When A R2 and B R, the graph is a subset of the tridimensional space R3 , i.e., a
surface (without thickness). Graphically:
6.2 Applications
6.2.1 Static choices
Let us interpret the vectors in Rn+ as bundles of goods (Section 2.4). It is natural to assume
that the consumer will prefer some bundles to others. For example, it is reasonable to assume
that, if x y (bundle x is \richer" than y), then x is preferred to y. In symbols, we then
6.2. APPLICATIONS 119
write x % y, where the symbol % represents the preference (binary) relation of the consumer
over the bundles.
In general, we assume that the preference % over the available bundles of goods can be
represented by a function u : Rn+ ! R, called utility function, such that
That is, bundle x is preferred to y if and only if it gets a higher \utility". The image, Im u,
represents all the levels of utility that can be attained by the consumer.
Originally, around 1870, the rst marginalists { in particular, Jevons and Walras { in-
terpreted u (x) as the level of physical satisfaction caused by the bundle x. They gave,
therefore, a physiological interpretation of utility functions, which quanti ed the emotions
that consumers felt in owing di erent bundles. It is the so-called cardinalist interpretation
of the utility functions that goes back to Jeremy Bentham and to his \pain and pleasure
calculus".5 The utility functions, besides representing the preference %, are inherently inter-
esting because they quantify an emotional state of the consumer, i.e., the degree of pleasure
determined by the bundles. In addition to the comparison u (x) u (y), it is also meaningful
to compare the di erences
u (x) u (y) u (z) u (w) (6.5)
which indicate that bundle x is more intensively preferred to bundle y than bundle z relative
to bundle w. Moreover, since u (x) measures the degree of pleasure that the consumer gets
by the bundle x, in the cardinalist interpretation it is also legitimate to compare these
measures among di erent consumers, i.e., to make interpersonal comparisons of utility. Such
interpersonal comparisons can be then used, for example, to assess the impact of di erent
economic policies on the welfare of the economic agents. For instance, we can ask whether
a given policy, though making some agents worse o , still increases the overall utility across
agents.
The cardinalist interpretation came into question at the end of the nineteenth century
due to the impossibility of measuring experimentally the physiological aspects that were
assumed to underlie utility functions.6 For this reason, with the works of Vilfredo Pareto at
the beginning of the twentieth century, developed rst by Eugen Slutsky in 1915 and then
by John Hicks in the 1930s,7 the ordinalist interpretation of the utility functions prevailed:
more modestly, it is assumed that they are only a mere numerical representation of the
preference % of the consumer. According to such less demanding interpretation, what matters
is only that the ordering u (x) u (y) represents the preference for bundle x over bundle
y, that is, x % y. Instead, it is no longer of interest to know if it also represents the, more
or less intense, consumers' emotions over the bundles. In other terms, in the ordinalist
approach the fundamental notion is the preference %, while the utility function becomes
just a numerical representation of it. The comparisons of intensity (6.5), as well as the
interpersonal comparisons of utility, no longer have meaning.
5
See his Introduction to the Principles of Morals and Legislation, published in 1789.
6
Around 1901, the famous mathematician Henri Poincare wrote to Leon Walras: \I can say that one
satisfaction is greater than another, since I prefer one to the other, but I cannot say that the rst satisfaction
is two or three times greater than the other." Poincare, with great sensibility, understood a key issue.
7
We refer interested readers to Stigler (1950).
120 CHAPTER 6. FUNCTIONS (SDOGANATO)
At the empirical level, the consumers' preferences % are revealed through their choices
among bundles, which are much simpler to observe than emotions or other mental states.
The ordinalist interpretation became the mainstream one because, besides the superior
empirical content just mentioned, the works of Pareto showed that it is su cient for develop-
ing a powerful consumer theory (cf. Section 22.1.4). So, Occam's razor was a further reason
to abandon the earlier cardinalist interpretation. Nevertheless, economists often use, at an
intuitive level, cardinalist categories because of their introspective plausibility.
Be that as it may, through utility functions we can address the problem of a consumer
who has to choose a bundle within a given set A of Rn+ . The consumer will be guided in such
a choice by his utility function u : A Rn+ ! R; namely, u (x) u (y) indicates that the
consumer prefers the bundle x of goods to the bundle y or that he is indi erent between the
two.
For example,
n
X
u (x) = xi
i=1
is the utility function of a consumer that orders the bundles simply according to the sum
of the quantities of the di erent goods that they contain. The classic Cobb-Douglas utility
function is
Yn
u (x) = xi i
i=1
Pn
with the exponents i > 0 such that i=1 i = 1 (see Example 187). When i = 1=n for
each i, we have
n n
!1
Y 1 Y n
u (x) = (xi ) =
n xi
i=1 i=1
with bundles being ordered according to the n-th root of the product of the quantities of the
di erent goods that they contain.8
We close by considering a producer that has to decide how much output to produce
(Section 2.4). In such a decision the production function f : A Rn+ ! R plays a crucial
role in that it describes how much output f (x) is obtained by starting from a vector x 2 Rn+
of input. For example,
n
!1
Y n
f (x) = xi
i=1
is the Cobb-Douglas production function in which the output is equal to the n-th root of the
product of the input components.
8
Because of its multiplicative form, the bundles with at least one zero component xi have zero utility
according to the Cobb-Douglas utility function. Since it is not that plausible that the presence of a zero
component has such drastic consequences, this utility function is often de ned only on Rn
++ (as we will also
often do).
6.3. GENERAL PROPERTIES 121
where 2 (0; 1) is a subjective discount factor that depends on how \patient" the consumer
is. The more patient the consumer is { i.e., the more he is willing to postpone his consumption
of a given quantity of the good { the higher the value of is. In particular, the closer gets
to 1, the closer we approach the form
T
X
U (x) = u (x1 ) + u (x2 ) + + u (xT ) = u (xt )
t=1
in which consumption in each period is evaluated in the same way. In contrast, the closer
gets to 0, the closer U (x) gets to u (x1 ), that is, the consumer becomes extremely impatient
and does not give any importance to future consumptions.
of the elements of the domain whose image is y. More generally, given any subset D of the
codomain B, its preimage f 1 (D) is the set
1
f (D) = fx 2 A : f (x) 2 Dg
Example 190 Consider the function f : A ! B that to each (living) person associates the
year of birth. If y 2 B is a possible such year, f 1 (y) is the set of the persons that have y
as year of birth; in other words, all the persons in f 1 (y) have the same age (they form a
cohort, in the demography terminology). N
9
For the sake of brevity, we will consider as sets D only intervals and singletons, but similar considerations
hold for other types of sets.
122 CHAPTER 6. FUNCTIONS (SDOGANATO)
Note that f 1 (a; b) = f 1 ([0; b)) when a < 0. Indeed, the elements
p p between a and 0 have
no preimage. For example, if D = ( 1; 2), then f 1 (D) = ( 2; 2). Since
1 1 1
f (D) = f ([0; 2)) = f ( 1; 2)
the negative elements of D are irrelevant (as they do not belong to the image of the function).
N
1
f (k) = fx 2 A : f (x) = kg
Example 193 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 + x22 . For every k 0, the level
curve f 1 (k) is the locus in R2 of equation
x21 + x22 = k
10 1 1
To ease notation, we denote the preimage of an open interval (a; b) by f (a; b) instead of f ((a; b)).
6.3. GENERAL PROPERTIES 123
p
That is, it is the circumference with center at the origin and radius k. Graphically, the
level curves can be represented as:
4
x3
0
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
Two di erent level curves of the same function cannot have any point in common, that
is,
1 1
k1 6= k2 =) f (k1 ) \ f (k2 ) = ; (6.7)
Indeed, if there were a point x 2 A that belongs to both the two curves of levels k1 and k2 ,
we would have f (x) = k1 and f (x) = k2 with k1 6= k2 , but this is impossible because, by
de nition, a function may assume only one value at each point.
124 CHAPTER 6. FUNCTIONS (SDOGANATO)
p
Example 194 Let f : A R2 ! R be given by f (x1 ; x2 ) = 7x21 x2 , where A consists
of the points x = (x1 ; x2 ) in the plane suchpthat 7x21 x2 0. For every k 0, the level
curve f 1 (k) is the locus in R2 of equation 7x21 x2 = k, that is, x2 = k 2 + 7x21 . It is a
parabola that intersects the vertical axis at k 2 . Graphically:
7
x
6 2
1
k=0
0
O x
1
-1
k=1
-2
-3
-4
k=2
-2 -1 .5 -1 -0 .5 0 0 .5 1 1 .5 2 2 .5 3
s
x21 + x22
f (x1 ; x2 ) =
x1
is de ned only for x1 > 0. Its level curves f 1 (k) are the loci of equation
s
x21 + x22
=k
x1
that is, x21 + x22 k 2 x1 = 0. Therefore, they are circumferences passing through the origin
6.3. GENERAL PROPERTIES 125
and with centers k 2 =2; 0 , all on the horizontal axis (why?). Graphically:
Although all such circumferences have the origin as common point, the \true" level curves
are the circumferences without the origin because at (0; 0) the function is not de ned. So,
they do not actually have any point in common. N
O.R. The equation f (x1 ; x2 ) = k of a generic level curve of a function f of two variables
can be rewritten, in an apparently more complicated form, as
y = f (x1 ; x2 )
y=k
(ii) the equation y = k represents an horizontal plane (it contains the points (x1 ; x2 ; k) 2
R3 , i.e., all the points of \height" k);
(iii) the brace \f " geometrically means intersection between the sets de ned by the two
previous equations.
The curve of level k is, therefore, viewed as the intersection between the surface that
represents f and a horizontal plane.
126 CHAPTER 6. FUNCTIONS (SDOGANATO)
x3
-2
-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
Hence, the di erent level curves are obtained by cutting the surface horizontally with hori-
zontal planes (at di erent levels). They represent the edges of the \slices" obtained in this
way on the plane (x1 ; x2 ). H
Example 196 Consider the Cobb-Douglas utility function u : R2+ ! R given by u (x) =
p
x1 x2 . We have
u 1 (0) = x 2 R2+ : x1 = 0 or x2 = 0
that is, this indi erence curve is the union of the axes of the positive orthant. On the other
hand, for every k > 0 we have
p
u 1 (k) = x 2 R2+ : x1 x2 = k = x 2 R2+ : x1 x2 = k 2
k2
= x 2 R2+ : x2 =
x1
Therefore, the indi erence curve of level k > 0 is the hyperbola of equation
k2
x2 =
x1
By varying k 0, we get the indi erence map u 1 (k) :k 0 . Graphically:
6.3. GENERAL PROPERTIES 127
8
y
7
6 k=3
5
k=2
4
2 k=1
0
O x
-1
0 0.5 1 1.5 2 2.5 3 3.5
Introductory economics courses emphasize that indi erence curves \do not cross", i.e.,
are disjoint: k1 6= k2 implies u 1 (k1 ) \ u 1 (k2 ) = ;. Clearly, this just a special case of the
more general property (6.7) that holds for any family of level curves.
of a production function f : A Rn+ ! R are called isoquants. An isoquant is, thus, the set
of all the input vectors x 2 A that produce the same output. The set f 1 (k) : k 2 R of
all the isoquants is sometimes called isoquant map.
Finally, the level curves
1
c (k) = fx 2 A : c (x) = kg
of a cost function c : A Rn+ ! R are called isocosts. So, an isocost is the set of all the
levels of output x 2 A that have the same cost. The set c 1 (k) : k 2 R of all the isocosts
is sometimes called isocost map.
In sum, indi erence curves, isoquants and isocosts are all examples of level curves, whose
general properties they inherit. For example, the fact that two level curves have no points in
common { property (6.7) { implies the analogous classic property of the indi erence curves,
as already noted, as well as the property that isoquants and isocosts never intersect.
De nition 197 Given any two functions f and g in RA , the sum function f + g is the
element of RA such that
The sum function f + g : A ! R is thus constructed by adding, for each element x of the
domain A, the images f (x) and g (x) of x under the two functions.
Example 198 Let RR be the set of all the functions f : R ! R. Consider f (x) = x and
g (x) = x2 . The sum function f + g is de ned by (f + g) (x) = x + x2 . N
(iii) the quotient function (f =g) (x) = f (x) =g (x) for every x 2 A, provided g (x) 6= 0.
We have thus introduced four operations in the set RA , based on the four basic operations
on the real numbers. It is easy to see that these operations inherit the properties of the
basic operations. For example, addition is commutative, f + g = g + f , and associative,
(f + g) + h = f + (g + h).
In particular, the negative function f of f de ned by
( f ) (x) = f (x) 8x 2 A
can be seen as the di erence between the function which is constant to 0 and f .
N.B. (i) These operations require the functions to have the same domain A. For example,
p
if f (x) = x2 and g (x) = x, the sum f + g is meaningful only when A = [0; 1) for both
functions, that is, when f is restricted to the positive half-line. Indeed, for x < 0 the function
g is not de ned. (ii) The domain A can be any set: numbers, chairs, or other. Instead, it is
key that the codomain is R because it is among real numbers that we are able to perform
the four basic operations. O
6.3.3 Composition
1.6 A Im f ⊆ C D
1.4
1.2 f g
x f(x) g(f(x))
1
0.8
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
We have, therefore, associated to each element x of the set A the element g (f (x)) of the
set D. This rule, called composition, starts with the functions f and g and de nes a new
function from A in D, denoted by g f . Formally:
Note that the inclusion condition, Im f C, is key in making the composition possible.
Let us give some examples.
Example 203 Let A be the set of all citizens of a country, f : A ! R the function that to
each of them associates his income for this year, and g : R ! R the function that to each
possible income associates the tax that must be paid. The composite function g f : A ! R
establishes the correspondence between each citizen and the tax that he has to pay. For the
revenue service (and also for the citizens) such composite function is of great interest. N
To di erent elements of the domain, an injective f thus associates di erent elements of the
codomain. Graphically:
1.6
A B
1.4
a
1
1.2 b
1
b
3
1 a b
2 2
0.8
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Example 205 A simple example of injective function is the cubic f (x) = x3 . Indeed, two
distinct scalars have always distinct cubes, so x 6= y implies x3 6= y 3 for all x; y 2 R. A
6.4. CLASSES OF FUNCTIONS 131
classic example of non-injective function is the quadratic f (x) = x2 : for instance, to the two
distinct points 2 and 2 of R there corresponds the same square, that is, f (2) = f ( 2) = 4.
N
which requires that two elements of the domain that have the same image be equal.12
Given any two sets A and B, a function f : A ! B is called surjective (or onto) if
Im f = B
that is, if for each element y of B there exists at least an element x of A such that f (x) = y.
In other words, a function is surjective if each element of the codomain is the image of at
least one point in the domain.
Example 206 The cubic function f : R ! R given by f (x) = x3 is surjective because each
1 1
y 2 R is the image of y 3 2 R, that is, f (y 3 ) = y. On the other hand, the quadratic function
f : R ! R given by f (x) = x2 is not surjective, because no y < 0 is the image of a point of
the domain. N
Finally, given any two sets A and B, a function f : A ! B is called bijective if it is both
injective and surjective. In this case, we can go \back and forth" between the sets A and B
by using f : from any x 2 A we arrive to a unique y = f (x) 2 B, while from any y 2 B we
go back to a unique x 2 A such that y = f (x). Graphically:
12
Given two properties p and q, we have p =) q if and only if :q =) :p (: stands for \not"). The
implication :q =) :p is the contrapositive of the original implication p =) q. See Appendix D.
132 CHAPTER 6. FUNCTIONS (SDOGANATO)
1.6
A B
1.4
a b
1 1
1.2
1 a b
2 2
0.8
a b
3 3
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Through bijective functions we can establish a simple, but interesting, result about nite
sets. Here jAj denotes the cardinality of a nite set A, that is, the number of its elements.
Proposition 207 Let A and B be any two non-empty nite sets. There exists a bijection
f : A ! B if and only if jAj = jBj.
Proof Since A and B are nite, let A = fa1 ; a2 ; : : : ; an g and B = fb1 ; b2 ; : : : ; bm g with
m; n 1. \If". Let n = jAj = jBj = m. Then de ne the bijection f : A ! B by f (ai ) = bi
for i = 1; 2; :::; n. \Only if". Let f : A ! B be a bijection. Since f is injective, the elements
f (ai ) in B are distinct. Since f is surjective, we have that B = Im f = ff (a1 ) ; :::; f (an )g.
This implies that B = ff (a1 ) ; :::; f (an )g and so the two sets have the same number of
elements, jBj = jff (a1 ) ; :::; f (an )gj = n = jAj.
We have both
1
f (f (x)) = x 8x 2 A (6.9)
and
1
f f (y) = y 8y 2 Im f (6.10)
6.4. CLASSES OF FUNCTIONS 133
Inverse functions go in the opposite way than the original ones, they retrace their steps back
to the domain: from x 2 A we arrive to f (x) 2 B, and we go back with f 1 (f (x)) = x.
Graphically:
1.6
A B
1.4
1.2
f
1 x y
-1
f
0.8
0.6
0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
It makes sense to talk about the inverse function only for injective functions, which are
then called invertible. Indeed, if f were not injective, there would be at least two elements
of the domain x1 6= x2 with the same image y = f (x1 ) = f (x2 ). So, the set of the preimages
of y would not be a singleton (because it would contain at least the two elements x1 and x2 )
and the relation f 1 would not be a function.
We actually have f 1 : B ! A when the function f is also surjective, and so bijective.
In such a case the domain of the inverse is the entire codomain of f . In particular, when
A = B the relations (6.9) and (6.10) can be summarized as
1 1
f f =f f
In this important case { think of A = B = Rn { the function and its inverse properly
commute.
Example 209 (i) Let f : R ! R be the bijective function f (x) = x3 . From y = x3 it
1 1
follows x = y 3 . The inverse f 1 : R ! R is given by f 1 (y) = y 3 . That is, because of the
1
irrelevance of the label of the independent variable, f 1 (x) = x 3 .
(ii) Let f : R ! R be the bijective function f (x) = 3x . From y = 3x it follows x = log3 y.
The inverse f 1 : R ! R is given by f 1 (y) = log3 y, that is, f 1 (x) = log3 x. N
Example 210 Let f : R ! R be de ned by
8 x
< if x < 0
f (x) = 2
:
3x if x 0
From y = x=2 it follows x = 2y, while from y = 3x it follows x = y=3. Therefore,
8
< 2y if y < 0
f 1 (y) =
: y if y 0
3
134 CHAPTER 6. FUNCTIONS (SDOGANATO)
Example 211 Let f : R f0g ! R be de ned by f (x) = 1=x. From y = 1=x, it follows
that x = 1=y, so f 1 : R f0g ! R is given by f 1 (y) = 1=y. In this case f = f 1 . Note
that R f0g is both the domain of f 1 and the image of f: N
Example 213 On the open unit ball B1 (0) = fx 2 Rn : kxk < 1g, de ne the map f :
B1 (0) ! Rn by
x
f (x) =
1 kxk
For instance, when n = 2 we have
!
x x
f (x1 ; x2 ) = p1 ; p2
1 x21 + x22 1 x21 + x22
This map is injective. For, suppose per contra that there exist x; z 2 B1 (0), with x 6= z,
such that f (x) = f (z). Without loss of generality, let x 6= 0. Then,
x z
= =) x = z
1 kxk 1 kzk
where = (1 kxk)=(1 kzk). Thus, x and z are collinear (Example 75). As x; z 2 B1 (0),
we actually have > 0. Thus, f (x) = f ( x) and so
x x
=
1 kxk 1 kxk
that is, (1 kxk) x = (1 kxk) x. By taking the norm on both sides of this equality, we
get
(1 kxk) kxk = (1 kxk) kxk
As x 6= 0, so kxk > 0, this implies that = 1. Being x = z, we thus reach the contradiction
x = z. We conclude that f is injective. This map is also surjective. For, let y 2 Rn . Set
x = y= (1 + kyk). We have
y kyk
kxk = = <1
1 + kyk 1 + kyk
as well as y y y
1+kyk 1+kyk 1+kyk
f (x) = = kyk
= 1 =y
y 1
1 1+kyk 1+kyk 1+kyk
6.4. CLASSES OF FUNCTIONS 135
as desired. N
1 1 1
g2 1 (x) = x 3 and g 1 1 (x) = 4x 3
2 4
Indeed,
1
1 1 1
1 x3 3
g2 (g2 (x)) = 8x3 3
=x and g 1 (g 1 (x)) = 4 =x
2 4 4 64
Formula (6.11) continues to hold when g is strictly monotone, be it increasing or decreasing,
on the real line, as the reader can check. N
The last example, which involves the composition g f with f (x) = mx, suggests a rule
that connects composition and inversion. It is easy to see that, when it exists, the inverse
(g f ) 1 of the composite function g f is
1 1
f g (6.12)
That is, it is the composition of the inverse functions, but exchanged of place. Indeed, from
y = g (f (x)) we get g 1 (y) = f (x) and nally f 1 g 1 (y) = x. On the other hand, in
dressing, we rst put the underpants, f , and then the pants, g; in undressing, rst we take
o the pants, g 1 , and then the underpants, f 1 .13
13 1 1
A caveat. Formula (6.12) presupposes that both f and g exist. This is the case when Im f is equal
to the domain C of g, as the reader can check.
136 CHAPTER 6. FUNCTIONS (SDOGANATO)
Inverses and cryptography The computation of the cube x3 of any scalar x is much
p
easier than the computation of the cube root 3 x: it is much easier to compute 803 = 512; 000
p
(three multiplications su ce) than 3 512; 000 = 80. In other words, the computation of the
p
cubic function f (x) = x3 is much easier than the computation of its inverse f 1 (x) = 3 x.
This computational di erence increases signi cantly as we take higher and higher odd powers
(for example f (x) = x5 , f (x) = x7 and so on).
Similarly, while the computation of ex is fairly easy, that of log x is much harder (be-
fore electronic calculators became available, logarithmic tables were used to aid such com-
putations). From a computational viewpoint (in the theoretical world everything works
smoothly), the inverse function f 1 may be very di cult to deal with. Injective functions
for which the computation of f is easy, while that of the inverse f 1 is complex, are called
one-way.14
For example, let A = f(p; q) 2 P P : p < qg and consider the function f : A P P ! N
de ned by f (p; q) = pq that associates to each pair of prime numbers p; q 2 P, with p < q,
14
The notions of \simple" and \complex", here used qualitatively, can be made more rigorous (as the
curious reader may discover in cryptography texts).
6.4. CLASSES OF FUNCTIONS 137
their product pq. For example, f (2; 3) = 6 and f (11; 13) = 143. By the Fundamental
Theorem of Arithmetic, it is an injective function.15 Given two prime numbers p and q, the
computation of their product is a trivial multiplication. Instead, given any natural number
n it is quite complex, and may require a long time even to a powerful computer, to deter-
mine if it is the product of two prime numbers. In this regard, the reader may recall the
discussion regarding factorization and primality tests from Section 1.3.2 (to experience the
di culty rst-hand, the reader may try to check whether the number 4343 is the product of
two prime numbers). This makes the computation of the inverse function f 1 very complex,
as opposed to the very simple computation of f . For this reason, f intuitively quali es as a
one-way function.
15
But not surjective: for example 4 2
= Im f because there are no two di erent prime numbers whose product
is 4.
138 CHAPTER 6. FUNCTIONS (SDOGANATO)
(i) bounded (from) above if its image Im f is a set bounded above in R, i.e., if there exists
M 2 R such that f (x) M for all x 2 A;
(ii) bounded (from) below if its image Im f is a set bounded below in R, i.e., if there
exists m 2 R such that f (x) m for all x 2 A;
Lemma 215 A function f : A ! R is bounded if and only if there exists k > 0 such that
jf (x)j k 8x 2 A (6.13)
Proof If f is bounded, there exist m; M 2 R such that m f (x) M . Let k > 0 be such
that k m M k. Then (6.13) holds. Vice versa, suppose that (6.13) holds. By (4.5),
which holds also for , we have k f (x) k, so f is bounded both above and below.
Thus, we have a rst taxonomy of the real-valued functions f : A ! R, that is, of the
elements of the space RA .16 This taxonomy is not exhaustive: there exist functions that
do not satisfy any of the conditions (i)-(iii). This is the case, for example, of the identity
function f (x) = x. Such \unclassi ed" functions are called unbounded (their image being
an unbounded set).
By the de nition of the supremum, for a scalar M we have f (x) M for all x 2 A if and
only if supx2A f (x) M .
Similarly, we denote by inf x2A f (x) the in mum of the image of a function f : A ! R
bounded below, that is,
inf f (x) = inf (Im f )
x2A
By the de nition of the in mum, for a scalar m we have f (x) m for all x 2 A if and only
if inf x2A f (x) m.
Clearly, a bounded function f : A ! R has both extrema, with
In particular, for two scalars m and M we have m f (x) M for all x 2 A if and only if
m inf x2A f (x) supx2A f (x) M .
Example 216 For the function (6.14) we have supx2R f (x) = 1 and inf x2R f (x) = 2. For
the function f : R f0g ! R given by f (x) = 1= jxj, which is bounded below but not above,
one has inf x2R f0g f (x) = 0. N
(i) increasing if
x > y =) f (x) f (y) 8x; y 2 A (6.15)
strictly increasing if
(ii) decreasing if
x > y =) f (x) f (y) 8x; y 2 A (6.17)
strictly decreasing if
f (x) = k 8x 2 A
140 CHAPTER 6. FUNCTIONS (SDOGANATO)
Note that a function is constant if and only if it is both increasing and decreasing. In
other words, constancy is equivalent to having both monotonicity properties. This is why
we have introduced constancy among the forms of monotonicity. Soon, we will see that in
the multivariable case the relation between constancy and monotonicity is more subtle.
Increasing or decreasing functions are called, generically, monotone (or monotonic). They
are called strictly monotone when they are either strictly increasing or strictly decreasing
(two mutually exclusive properties: no functions can be both strictly increasing and strictly
decreasing). The next result shows that strict monotonicity excludes the possibility that the
function is constant on some region of its domain. Formally:
A similar result holds for strictly decreasing functions since f is, clearly, strictly decreas-
ing if and only if f is strictly increasing. Strictly monotone functions are therefore injective,
and so invertible.17
Proof \Only if". Let f be strictly increasing and let f (x) = f (y). Suppose, by contradiction,
that x 6= y, say x > y. By (6.16), we have f (x) 6= f (y), which contradicts f (x) = f (y). It
follows that x = y, as desired.
\If". Suppose that (6.18) holds. Let f be increasing. We prove that it is also strictly
increasing. Let x > y. By increasing monotonicity, we have f (x) f (y), but we cannot
have f (x) = f (y) because (6.18) would imply x = y. Thus f (x) > f (y), as claimed.
Example 219 The functions f : R ! R given by f (x) = x and f (x) = x3 are strictly
increasing, while the function (
x if x 0
f (x) =
0 if x < 0
is increasing, but not strictly increasing, because it is constant for every x < 0. The same is
true for the function 8
>
> x 1 if x 1
<
f (x) = 0 if 1<x<1 (6.19)
>
>
:
x + 1 if x 1
because it is constant on [ 1; 1]. N
Note that in (6.15) we can replace x > y by x y without any consequence because we
have f (x) = f (y) if x = y. Hence, increasing monotonicity is equivalently stated as
In words, a function is strictly increasing if and only if, to larger values of the image,
correspond larger values of the argument. Next we report a simple variation on this theme {
with () in place of =) { that plays an important role in the ordinalist approach of utility
theory, as we will see later in this section.
To see why we can replace =) with () it is enough to observe that, for a strictly
increasing (so, in particular, increasing) function f , we have
A dual version holds, of course, for strictly decreasing functions. That said, we close with
a noteworthy mirror e ect on monotonicity (see the second gure of Section 6.4.2).
Proposition 222 A scalar function is strictly increasing (decreasing) if and only if its in-
verse is strictly increasing (decreasing).
Proof We prove the \only if" as the converse is similarly established. Let f : A R ! R
be a strictly increasing scalar function, with inverse f 1 : Im f ! R. Let z1 ; z2 2 Im f with
z2 > z1 . We want to show that f 1 (z2 ) > f 1 (z1 ). By de nition, there exist x1 ; x2 2 A
such that x1 = f 1 (z1 ) and x2 = f 1 (z2 ). Suppose, by contradiction, that x1 x2 . Then,
142 CHAPTER 6. FUNCTIONS (SDOGANATO)
The monotonicity notions seen in the case n = 1 generalize in a natural way to the case
of arbitrary n, though some subtle issues arise because of the two peculiarities of the case
n 2, that is, the incompleteness of and the presence of two inequality notions > and .
Basic monotonicity is easily generalized: a function f : A Rn ! R is said to be:
(i) increasing if
x y =) f (x) f (y) 8x; y 2 A (6.24)
(ii) decreasing if
x y =) f (x) f (y) 8x; y 2 A
f (x) = k 8x 2 A
This notion of increasing and decreasing function has bite only on vectors x and y that
can be compared, while vectors x and y that cannot be compared, such as for example (1; 2)
and (2; 1) in R2 , are ignored. As a result, while constant functions are both increasing and
decreasing, the converse is no longer true when n 2, as the next example shows.
Example 223 Let A = fa; a0 ; b; b0 g be a subset of the plane with four elements. Assume
that a a0 and b b0 are the only comparisons that can be made in A. For instance,
a = ( 1; 0), a0 = (0; 1), b = (1; 1=2) and b0 = (2; 1=2). The function f : A R2 ! R
0 0
de ned by f (a) = f (a ) = 0 and f (b) = f (b ) = 1 is both increasing and decreasing, but it
is not constant. N
Fortunately, for Cartesian domains the converse holds; next we consider the basic rect-
angular case, leaving to the reader more general Cartesian domains.
Proof We prove the \if" as the converse is simple. De ne x and y to be the vectors such
that xi = bi and yi = ai for all i = 1; :::; n. Set k = f (x). Clearly, we have that x z y for
all z 2 A. Since f is increasing, this implies that k = f (x) f (z) f (y) for all z 2 A and,
in particular, f (y) f (z) k for all z 2 A. Since f is decreasing, we can also conclude
that k = f (x) f (z) f (y) k for all z 2 A. Combining these two facts, we obtain that
f (z) = k for all z 2 A.
More delicate is the generalization to Rn of strict monotonicity because of the two distinct
inequalities > and . We say that a function f : A Rn ! R is:
6.4. CLASSES OF FUNCTIONS 143
Proof A strongly increasing function is, by de nition, increasing. It remains to prove that
strictly increasing implies strongly increasing. Thus, let f be strictly increasing. We need to
prove that f is increasing and satis es (6.25). If x y, we have x = y or x > y. In the rst
case f (x) = f (y). In the second case f (x) > f (y), so f (x) f (y). Thus, f is increasing.
Moreover, if x y a fortiori we have x > y, and therefore f (x) > f (y). We conclude that
f is strongly increasing.
The converses of the previous implications do not hold. An increasing function that,
like (6.19), has constant parts is an example of an increasing, but not strongly increasing
function { so, not strictly increasing either.18 Therefore,
Moreover, the next example shows that there exist functions that are strongly but not strictly
increasing, so
strongly increasing 6=) strictly increasing
is strongly increasing, but not strictly increasing. For example, x = (1; 2) > y = (1; 1) yet
f (x) = f (y) = 1. N
Proposition 227 Let A be a subset of Rn containing two vectors x and x such that x
x . A continuous function f : A ! R is strongly increasing if and only if (6.25) holds, i.e.,
x y =) f (x) > f (y) 8x; y 2 A
Proof We prove the \if" part as the converse is trivially true. So, assume that (6.25) holds.
To show that f is strongly increasing it then su ces to show that f is increasing. Let x y.
We want to show that f (x) f (y). If x = y, we clearly have f (x) = f (y). So, let x > y. By
hypothesis, there exist x and x in X such that x x . For each n 1, we then have
1 1 1 1
x + (1 )x x + (1 )y
n n n n
By (6.25),
1 1 1 1
f
x + (1 )x >f x + (1 )y
n n n n
Hence, by the continuity of f we have
1 1 1 1
f (x) = lim f x + (1 )x lim f x + (1 )x = f (y)
n!1 n n n!1 n n
We conclude that f (x) f (y), as desired.
Dual decreasing notions can be introduced in the obvious way. In particular, increasing or
decreasing functions are called, generically, monotone, while they are called strictly (strongly)
monotone when they are either strictly (strongly) increasing or strictly (strongly) decreasing.
consists of points not comparable. If we add the origin to this segment, we get the set
De ne f : A ! R by (
0 if x1 = x2 = 0
f (x1 ; x2 ) =
x1 otherwise
This function is easily seen to be strictly increasing and injective. We have
3 1 3 1 3 1
f ; = >f ; =
4 4 4 4 4 4
but not (3=4; 1=4) (1=4; 3=4) because these two vectors are not comparable. Thus, f does
not satisfy condition (6.27). Note that the inverse f 1 : [0; 1) ! A, de ned by f 1 (x) =
(x; 1 x) for all x 2 [0; 1), is not monotone (why?). So, the mirror property established in
Proposition 222 for scalar functions fails in the multivariable case. N
For an operator
f :A Rn ! Rm
the notions of monotonicity can be de ned in the, by now, obvious way. For instance, this
operator is strictly increasing if
Yet, when m > 1 the notions of monotonicity studied for the case m = 1 have less bite as they
confront non-comparabilities also on the codomain as images f (x) and f (y) may happen
not be comparable, that is, neither f (x) f (y) nor f (y) f (x) may hold. For example,
if f : R2 ! R2 is such that f (0; 1) = (1; 2) and f (3; 4) = (2; 1), the images (1; 2) and (2; 1)
are not comparable. Yet, an interesting result can be proved about the monotonicity of
operators.
Proof \If". Assume that condition (6.29) holds. It is easy to check that f is injective
(see the proof of Proposition 228). It remains to prove that f 1 is strictly increasing. Let
x; y 2 Im f with x > y. We want to show that f 1 (x) > f 1 (y). Let x ~; y~ 2 A be such that
f (~
x) = x and f (~y ) = y. As f is injective, x
~ 6= y~. By (6.29),
1 1
x y () f (~
x) f (~
y ) =) f (x) = x
~ y~ = f (y)
As x
~ 6= y~, we conclude that f 1 (x) >f 1 (y).
146 CHAPTER 6. FUNCTIONS (SDOGANATO)
\Only if". Assume that f is injective and has a strictly increasing inverse f 1 . Let
x; y 2 A be such that f (x) f (y). If f (x) = f (y), then x = y by the injectivity of f . So,
let f (x) > f (y). As f 1 is strictly increasing, x = f 1 (f (x)) f 1 (f (y)) = y. Thus,
(6.29) holds.
This result is the correct multivariable version of Proposition 220, which is the special
case when n = m = 1. Indeed, in the scalar case f is strictly increasing if and only if its
inverse f 1 is strictly increasing (Proposition 222). This mirror property fails, in general,
for injective operators, as Example 229 showed. In particular, that example shows that the
verbatim multivariable version of Proposition 220, involving the strict monotonicity of f ,
fails. The real protagonist is the strict monotonicity of the inverse: a silent companion in
the scalar case that takes center stage in the general case.
A dual version of Collatz's Theorem says that an operator f : A Rn ! Rm is injective
and strictly increasing if and only if its inverse f 1 : Im f ! Rn satis es condition (6.29),
i.e.,
f 1 (x) f 1 (y) =) x y 8x; y 2 Im f (6.30)
Summing up, for an operator f : A Rn ! Rm consider the following properties:
By Collatz's Theorem,
Example 231 Let A be the set (6.28) and let D = f(d; d) : d 2 [0; 1)g. De ne f : A2 ! R
by
f (x1 ; x2 ) = (x1 ; x1 )
For instance, f (1=4; 3=4) = (1=4; 1=4). This function is easily seen to be strictly increasing
and injective. Its image is the \diagonal" set D and its inverse f 1 : D ! A is given
by f 1 (d; d) = (d; 1 d) for all d 2 [0; 1). For instance, f 1 (3=4; 3=4) = (3=4; 1=4). Let
x = (1=4; 1=4) and y = (3=4; 3=4) be two points of D. We have x > y but their preimages
1 1 3 3 1
f (x) = ; and f (y) = ;
4 4 4 4
The monotonicity notions so far introduced play a key role in utility theory. Speci cally,
let u : A ! R be a utility function de ned on a set A Rn+ of bundles of goods. A
transformation f u : A ! R of u, where f : Im u R ! R, de nes a mathematically
di erent but conceptually equivalent utility function provided
Indeed, under this condition the function f u orders the bundles in the same way as the
original utility function u, that is,
The utility functions u and f u are thus equivalent because they represent the same under-
lying preference %.
By Proposition 221, the function f satis es (6.31) if and only if it is strictly increasing.
Therefore, f u is an equivalent utility function if and only if f is strictly increasing. To
describe such a fundamental property of invariance of utility functions, we say that they are
ordinal, that is, unique up to monotone (strictly increasing) transformations. This property
lies at the heart of the ordinalist approach, in which utility functions are regarded as mere
numerical representation of the underlying preference %, which is the fundamental notion
(recall the discussion in Section 6.2.1).
in which the goods are perfect complements, is strongly increasing. As we saw in Example
226, it is not strictly increasing.
(iii) The reader can check which properties of monotonicity hold if we consider the two
previous utility functions on the entire positive orthant Rn+ rather than just on Rn++ . N
Consumers with strictly or strongly monotone utility functions are \insatiable" because,
by suitably increasing their bundles, their utility also increases. This property of utility
functions is sometimes called insatiability, and it is thus shared by both strict and strong
monotonicity. The only form of monotonicity compatible with satiety is increasing mono-
tonicity (6.33): as observed for the drunk consumer, this weaker form of monotonicity allows
for the possibility that a given good, when it exceeds a certain level, does not result in a
further increase of utility. However, it cannot happen that utility decreases: if (6.33) holds,
utility either increases or remains constant, but it never (strictly or strongly) decreases.
Therefore, if an extra glass of wine results in a decrease of the drunk's utility, this cannot
be modelled by any form of increasing monotonicity, no matter how weak.
6.4. CLASSES OF FUNCTIONS 149
The class of concave and convex functions is of great importance in economics. The concept,
which will be fully developed in Chapter 17, is anticipated here in the scalar case.
f ( x + (1 ) y) f (x) + (1 ) f (y)
Geometrically, a function is concave if the segment (called chord ) that joins any two
points (x; f (x)) and (y; f (y)) of its graph lies below the graph of the function, while it is
convex if the opposite happens, that is, if such chord lies above the graph of the function.
The following gure illustrates:
Note that the domain of concave and convex functions is an interval, so the points x+
(1 ) y belong to it and the expression f ( x + (1 ) y) is meaningful.
Example 235 The functions f; g : R ! R de ned by f (x) = x2 and g(x) = ex are convex,
while the function f : R ! R de ned by f (x) = log x is concave. The function f : R ! R
given by f (x) = x3 is neither concave nor convex. All this can be checked analytically
150 CHAPTER 6. FUNCTIONS (SDOGANATO)
5 5
4 4
3 3
2 2
1 1
0 0
x y x y
-1 -1
-2 -2
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
8
3
6
2 4
2
1
0
x
y
0
x -2
y
-4
-1
-6
-2 -8
-3 -2 -1 0 1 2 3 4 5
-3
-1 0 1 2 3 4
The importance P of this class of functions is due to their great tractability. The simplest
example is f (x) = ni=1 xi , for which the functions gi are the identity, i.e., gi (x) = x for
each i. Let us give some more examples.
6.4. CLASSES OF FUNCTIONS 151
In other words, they assumed that the utility of a bundle x is decomposable into the utility
of the quantities xi of the various goods that compose it. It is a restrictive assumption that
ignores any possible interdependence, for example of complementarity or substitutability,
among the di erent goods of a bundle. Due to its remarkable tractability, however, (6.34)
remained for a long time the standard form of the utility functions until, at the end of the
nineteenth century, the works of Edgeworth and Pareto showed how to develop consumer
theory for utility functions that are not necessarily separable. N
Example 241 If in (6.34) we set ui (xi ) = xi for all i, we obtain the important special case
n
X
u (x) = xi
i=1
where the goods are perfect substitutes. The utility of bundles x depends only on the sum of
the amounts of the di erent goods, regardless of the speci c amounts of the individual goods.
For example, think of x as a bundle of di erent types of oranges, which di er in origin and
taste, but are identical in terms of nutritional values. In this case, if the consumer only cares
152 CHAPTER 6. FUNCTIONS (SDOGANATO)
about such values, then these di erent types of oranges are perfect substitutes. This case is
opposite to that of perfect complementarity that characterizes the Leontief utility function.
More generally, if in (6.34) we set ui (xi ) = i xi for all i, with i > 0, we have
n
X
u (x) = i xi
i=1
In this case, the goods in the bundle are no longer perfect substitutes; rather, their relevance
depends on their weights i . Therefore, to keep utility constant each good can be replaced
with another according to a linear trade-o . Intuitively, one unit of good i is equivalent
to j = i units of good j. The notion of marginal rate of substitution formalizes this idea
(Section 34.3.2). N
f (x) = a0 + a1 x + + an xn
with ai 2 R for every 0 i n and an 6= 0. The two coe cients a0 and an are called
constant and leading coe cients, respectively. The leading coe cient determines the degree
of the polynomial, while the constant coe cient is equal to f (0), i.e., to the value of the
function at the origin.
Let Pn be the set of all polynomials of degree lower than or equal to n. Clearly,
P0 P1 P2 Pn
f (x) = ax
is called the exponential function of base a. By Lemma 43-(iv), the exponential function is:
(ii) constant if a = 1;
loga ax = x 8x 2 R
and
aloga x = x 8x 2 (0; 1)
are therefore nothing but the relations (6.9) and (6.10) for inverse functions { i.e., the
relations f 1 (f (x)) = x and f f 1 (y) = y { in the special case of the exponential and
logarithmic functions.
The next result summarizes the monotonicity properties of these elementary functions.
Lemma 244 Both the exponential function ax and the logarithmic function loga x are in-
creasing if a > 1 and decreasing if 0 < a < 1.
Proof For the exponential function, observe that, when a > 1, also ah > 1 for every h > 0.
Therefore ax+h = ax ah > ax for every h > 0. For the logarithmic function, after observing
that loga k > 0 if a > 1 and k > 1, we have
h h
loga (x + h) = loga x 1 + = loga x + loga 1 + > loga x
x x
That said, in the sequel we will mostly use Napier's constant e as base and so we will
refer to f (x) = ex as the exponential function, without further speci cation (sometimes it is
denoted by f (x) = exp x). Thanks to the remarkable properties of the power ex (Section 1.5),
the exponential function plays a fundamental role in mathematics and in its applications.
Its image is (0; 1) and its graph is:
154 CHAPTER 6. FUNCTIONS (SDOGANATO)
5
y
4
1 1
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
The negative exponential function f (x) = e x is also important. Its graph is:
2
y
1
0
O x
-1 -1
-2
-3
-4
-5
-3 -2 -1 0 1 2 3 4
In a similar vein, in view of the special importance of the natural logarithm (Section 1.5),
we refer to f (x) = log x as the logarithmic function, without further speci cation. Like the
exponential function f (x) = ex , which is its inverse, the logarithmic function f (x) = log x
is widely used in applications. Its image is R and its graph is:
6.5. ELEMENTARY FUNCTIONS ON R 155
5
y
4
0
O 1 x
-1
-2
-3 -2 -1 0 1 2 3 4
The functions ex and log x, being one the inverse of the other, have graphs that are mirror
images of each other:
To test their understanding of the material of this section, readers may want to check {
analytically and graphically { that the inverse of the negative exponential is the logarithmic
function f : ( 1; 0) ! R de ned by f (x) = log ( x).
Trigonometric functions
The sine function f : R ! R de ned by f (x) = sin x is the rst example of a trigonometric
function. For each x 2 R we have
sin (x + 2k ) = sin x 8k 2 Z
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
The function f : R ! R de ned by f (x) = cos x is the cosine function. For each x 2 R
we have
cos (x + 2k ) = cos x 8k 2 Z
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
6.5. ELEMENTARY FUNCTIONS ON R 157
tan (x + k ) = tan x 8k 2 Z
10
y
8
0
O x
-2
-4
-6
-8
-10
-4 -3 -2 -1 0 1 2 3 4
It is immediate to see that, for x 2 (0; =2), we have the sandwich 0 < sin x < x < tan x.
The functions sin x, cos x and tan x are monotone (so invertible) on, respectively, the
intervals [ =2; =2], [0; ], and ( =2; =2). Their inverse functions are denoted by arcsin x
(or sin 1 x), arccos x (or cos 1 x), and arctan x (or tan 1 x), respectively.
Speci cally, by restricting ourselves to an interval [ =2; =2] of strict monotonicity of
the function sin x, we have
h i
sin x : ; ![ 1; 1]
2 2
h i
arcsin x : [ 1; 1] ! ;
2 2
with graph:
158 CHAPTER 6. FUNCTIONS (SDOGANATO)
3 y
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
cos x : [0; ] ! [ 1; 1]
Therefore, the inverse function of cos x is
arccos x : [ 1; 1] ! [0; ]
with graph:
y
3
0
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
tan x : ; !R
2 2
6.5. ELEMENTARY FUNCTIONS ON R 159
arctan x : R ! ;
2 2
with graph:
3 y
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Note that (2= ) arctan x is a bijective function between the real line and the open interval
( 1; 1). As we will learn in the next chapter, this means that the open interval ( 1; 1) has
the same cardinality of the real line.23
Periodic functions
The smallest (if it exists) among such p > 0 is called the period of f . In particular,
the periodic functions sin x and cos x have period 2 , while the periodic function tan x has
period . Their graphs well illustrate the property that characterizes periodic functions,
that is, that of repeating themselves identical on each interval of width p.
Example 246 The functions sin2 x and log tan x are periodic of period . N
Example 247 The function f : R ! R given by f (x) = x [x] is called mantissa.24 The
mantissa of x > 0 is its decimal part; for example f (2:37) = 0:37. The mantissa function is
periodic with period 1. Indeed, by (1.18) we have [x + 1] = [x] + 1 for every x 2 R. So,
Its graph
2.5
2
y
1.5
0.5
-0.5
-1
O x
-1.5
-2
-2.5
-3 -2 -1 0 1 2 3
Finally, readers can verify that periodicity is preserved by the fundamental operations
among functions: if f and g are two periodic functions of same period p, the functions
f (x) + g (x), f (x) g (x) and f (x) =g (x) are also periodic (of period at most p).
Proposition 248 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then
i=1
c1 c2 cn
f (x) = + + + (6.36)
x r1 x r2 x rn
24
Recall from Proposition 42 that the integer part [x] of a scalar x 2 R is the greatest integer x.
6.5. ELEMENTARY FUNCTIONS ON R 161
Proof We rst establish that there exist n coe cients c1 , c2 , ..., cn such that (6.36) holds.
For simplicity, we only consider the case
b0 + b1 x
f (x) =
a0 + a1 x + a2 x2
leaving to readers the general case. Since the denominator is (x r1 ) (x r2 ), we look for
coe cients c1 and c2 such that
b0 + b1 x c1 c2
= +
q (x) (x r1 ) (x r2 )
Since
c1 c2 c1 (x r2 ) + c2 (x r1 ) (c1 + c2 ) x (c1 r2 + c2 r1 )
+ = =
(x r1 ) (x r2 ) q (x) q (x)
we have
b0 + b1 x (c1 + c2 ) x (r2 + r1 )
=
q (x) q (x)
So, by equating coe cients we have the simple linear system
c1 + c2 = b0
c1 r2 + c2 r1 = b1
Since r1 6= r2 , the system is easily seen to have a unique solution (c1 ; c2 ) that provides the
sought-after coe cients.
It remains to show that the coe cients of (6.36) satisfy (6.37). We have
c1 c2 cn
lim (x ri ) f (x) = lim (x ri ) + + +
x!ri x!ri x r1 x r2 x rn
c1 (x ri ) c2 (x ri ) cn (x ri )
= lim + + + ci + = ci
x!ri x r1 x r2 x rn
as well as, by de l'Hospital's rule,
p (x) (x ri ) 1
lim (x ri ) f (x) = lim (x ri ) = p (ri ) lim = p (ri ) 0
x!ri x!ri q (x) x!ri q (x) q (x)
Putting the two limits together, we conclude that ci = p (ri ) =q 0 (ri ) for all i = 1; :::; k, as
desired.
The roots of the polynomial at the denominator are 1 and 2, so by (6.37) we have
c1 = p ( 1) =q 0 ( 1) = 2 and c2 = p ( 2) =q 0 ( 2) = 3. So, the partial fraction expansion
of f is
2 3
f (x) = +
x+1 x+2
This can be also checked directly. Indeed, since the denominator is (x + 1)(x + 2), let us look
for c1 and c2 such that
c1 c2 x 1
+ = 2 (6.38)
x+1 x+2 x + 3x + 2
The rst term in (6.38) is equal to
c1 (x + 2) + c2 (x + 1) x(c1 + c2 ) + (2A + c2 )
= (6.39)
(x + 1)(x + 2) (x + 1)(x + 2)
Expressions (6.38) and (6.39) are equal if and only if c1 and c2 satisfy the system:
c1 + c2 = 1
2c1 + c2 = 1
Therefore, c1 = 2 and c2 = 3. This con rms what established via formula (6.37). N
f (^
x) f (x) 8x 2 A
The value f (^
x) of the function at x
^ is called ( global) maximum value of f on A.
Maximizers thus attain the highest values of the function f on its domain, they outper-
form all other elements of the domain. Note that the maximum value of f on A is nothing
but the maximum of the set Im f , which is a subset of R. That is,
f (^
x) = max f (A) = max Im f
By Proposition 36, the maximum value is unique. We denote such unique value by
max f (x)
x2A
6.6. MAXIMIZERS AND MINIMIZERS: A PREVIEW 163
4 y
3
0
O x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
The maximizer of f is 0 and the maximum value is 1. Indeed, 1 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = ( 1; 1], we have 1 = max ( 1; 1]. N
Similar de nitions hold for the minimum value of f on A and for the minimizers of f on
A.
Example 252 Consider the quadratic function f (x) = x2 , whose graph is the parabola
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
The minimizer of f is 0 and the minimum value is 0. Indeed, 0 = f (0) f (x) for every
x 2 R. On the other hand, being Im f = [0; 1), we have 0 = min [0; 1). N
While the maximum (minimum) value is unique, maximizers and minimizers might well
not be unique, as the next example shows.
164 CHAPTER 6. FUNCTIONS (SDOGANATO)
Example 253 Let f : R ! R be the sine function f (x) = sin x. Since Im f = [ 1; 1], the
unique maximum of f on R is 1 and the unique minimum of f on R is 1. Nevertheless,
there are both in nitely many maximizers { i.e., all the points x = =2 + 2k with k 2 Z -
and in nitely many minimizers { i.e., all the points x = =2 + 2k with k 2 Z. The next
graph should clarify.
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
The restriction fjC can, therefore, be seen as f restricted to the subset C of A. Thanks
to the smaller domain, the function fjC can satisfy properties di erent from those of the
original function f .
Example 255 (i) Let g : [0; 1] ! R be de ned by g(x) = x2 . The function g can be seen as
the restriction to the interval [0; 1] of the quadratic function f : R ! R given by f (x) = x2 ;
that is g = fj[0;1] . Thanks to its restricted domain, the function g has better properties
than the function f . For example: g is strictly increasing, while f is not; g is injective
(so, invertible), while f is not; g is bounded, while f is only bounded below; g has both a
maximizer and a minimizer, while f does not have a maximizer.
(ii) Let g : ( 1; 0] ! R be de ned by g(x) = x. The function g can be seen as
the restriction to ( 1; 0] of both f : R ! R given by f (x) = jxj and h : R ! R given
by h(x) = x. Indeed, a function may be the restriction of several functions (rather, of
in nitely many functions) and it is the speci c application at hand that may suggest which
is the most relevant. In any case, let us analyze the di erences between g and f and those
between g and h. The function g is injective, while f is not; g is monotone decreasing, while
f is not. The function g is bounded below, while h is not; g has a global minimizer, while h
does not. N
p
Example 256 The function f (x1 ; x2 ) = x1 x2 has as natural domain R2+ [ R2 , i.e., the
rst and third quadrants of the plane. Nevertheless, when we regard it as a utility function
of Cobb-Douglas type, its domain is restricted to the rst quadrant, R2+ , because bundles
of goods always have positive components. Moreover, since f (x1 ; x2 ) = 0 even when just
one component is zero, something not that plausible from an economic viewpoint, this util-
ity function is often considered only on R2++ . Therefore, purely economic considerations
determine the domain on which to study f when interpreted as a utility function. N
Example 257 (i) Let g : [0; +1) ! R be de ned by g (x) = x3 : The function g can be seen
as the restriction to the interval [0; +1) of the cubic function f : R ! R given by f (x) = x3 ,
that is, g = fj[0;1] . We observe that g is convex, while f is not; g is bounded below, while f
is not; g has a minimizer, while f does not.
(ii) Let g : ( 1; 0] ! R be de ned by g (x) = x3 . The function g can be seen as the
restriction to the interval ( 1; 0] of the function f : R ! R given by f (x) = x3 , that is,
g = fj( 1;0] . We observe that g is concave, while f is not; g is bounded above, while f is
not; g has a maximizer, while f does not.
(iii) Sometimes smaller domains may actually deprive functions of some of their prop-
erties. For instance, the restriction of the sine function on the interval [0; =2] is no longer
periodic, while the restriction of the quadratic function on the open unbounded interval
(0; 1) has no minimizers. N
is called an extension of f to C.
Restriction and extension are thus two sides of the same coin: g is an extension of f if
and only if f is a restriction of g. In particular, a function de ned on its natural domain
extends to it all its restrictions. Moreover, if a function has an extension, it has in nitely
many ones.
is an extension of the function f (x) = 1=x, which has as natural domain R f0g.
(ii) The function g : R ! R de ned by
(
x for x 0
g(x) =
log x for x > 0
is an extension of the function f (x) = log x, which has natural domain (0; 1). N
(i) we write x y if the bundle x is strictly preferred to y, that is, if x % y but not y % x;
(ii) we write x y if the bundle x is indi erent relative to the bundle y, that is, if both
x % y and y % x.
Relations and are, obviously, mutually exclusive: between two indi erent bundles
there cannot exist strict preference, and vice versa. The next simple result further clari es
the di erent nature of the two relations.
Lemma 260 The strict preference relation is asymmetric (i.e., x y implies not y x),
while the indi erence relation is symmetric (i.e., x y implies y x).
25
The preference relation is an important example of a binary relation (see Appendix A).
26
In the weak sense of \prefers or is indi erent".
6.8. GRAND FINALE: PREFERENCES AND UTILITY 167
This rst axiom re ects the \weakness" of %: each bundle is preferred to itself. The next
axiom is more interesting.
It is a rationality axiom that requires that the preferences of the decision maker have no
cycles:
x%y%z x
Strict preference and indi erence inherit these rst two properties (with the obvious
exception of re exivity for the strict preference).
(ii) is transitive.
Proof (i) Consider x and set y = x. By de nition of and since % is re exive, we have
that x % y and y % x, yielding that x y, that is, x x. Hence, the relation is re exive.
To prove transitivity, suppose that x y and y z. We show that this implies x z. By
de nition, x y means that x % y and y % x, while y z means that y % z and z % y.
Thanks to the transitivity of %, from x % y and y % z it follows x % z, while from y % x
and z % y it follows z % x. We therefore have both x % z and z % x, i.e., x z.
(ii) Suppose that x y and y z. We show that this implies x z. By de nition of
and transitivity of %, this implies that x % y and y % z, which in turn yields that x % z. By
contradiction, suppose that x z does not hold. Since x % z, this implies that z % x. By
transitivity of % and since, as we have seen, x % y, this yields that z % y, a contradiction
with y z.
The last two lemmas together show that, if % is re exive and transitive, the indi er-
ence relation is re exive, symmetric, and transitive (so, it is an equivalence relation; cf.
Appendix A). For each bundle x 2 A, denote by
[x] = fy 2 A : x yg
the collection of the bundles indi erent to it. This set is the indi erence class of % determined
by bundle x.
168 CHAPTER 6. FUNCTIONS (SDOGANATO)
and
x y () [x] \ [y] = ; (6.41)
Relations (6.40) and (6.41) express two fundamental properties of the indi erence classes.
By (6.40), the indi erence class [x] does not depend on the choice of the bundle x: each
indi erent bundle determines the same indi erence class. By (6.41), di erent indi erence
classes do not have elements in common, they do not intersect.
Proof By the previous lemmas, is re exive, symmetric, and transitive. We rst prove
(6.40). Suppose that x y. We show that this implies [x] [y]. Let z 2 [x], that is, x z.
Since is transitive and symmetric, x y and x z imply that y z, that is, z 2 [y],
which shows that [x] [y]. By symmetry, x y implies y x. Then, the previous argument
shows that [y] [x]. So, we conclude that x y implies [x] = [y]. Since is re exive, the
converse is then obvious and (6.40) is proved.
We move now to (6.41) and suppose that x y. This implies that [x] \ [y] = ;. Let
us suppose, by contradiction, that this is not the case and there exists z 2 [x] \ [y]. By
de nition, we have both x z and y z. By the transitivity and symmetry of , we
then have x y, which contradicts x y. The contradiction shows that x y implies
[x] \ [y] = ;. In light of (6.40) and since x 2 [x], the converse is obvious and the proof is
complete.
The collection f[x] : x 2 Ag of all the indi erence classes is denoted by A= and is
sometimes called indi erence map. Thanks to the last lemma, A= forms a partition of A.
Let us continue the study of %. The next axiom does not concern the rationality, but
rather the information of the consumer.
Completeness requires the consumer to be able to compare any two bundles of goods,
even very di erent ones. Naturally, to do so the consumer must, at least, have su cient
information about the two alternatives: it is easy to think of examples where this assumption
is unrealistic. So, completeness is a non-trivial assumption on preferences.
In any case, note that completeness requires, inter alia, that each bundle be comparable
to itself, that is, x % x. Thus, it implies re exivity.
Given the completeness assumption, the relations and are both exclusive (as seen
above) and exhaustive.
Lemma 263 Let % be complete. Given any two any bundles x and y, we always have either
x y or y x or x y.27
27
These \or" are intended as the Latin \aut".
6.8. GRAND FINALE: PREFERENCES AND UTILITY 169
Since we are considering bundles of economic goods (and not of \bads"), it is natural
to assume monotonicity, i.e., that \more is better". The triad , >, and leads to three
possible incarnations of this simple principle of rationality:
The relationships among the three notions are similar to those seen for the analogous
notions of monotonicity studied (also for utility functions) in Section 6.4.4. For example,
strict monotonicity means that, given a bundle, an increase of the quantity of any good of
the bundle determines a strictly preferred bundle.
Similar considerations hold for the other notions. In particular, (6.26) takes the form:
when % is re exive.
Proof We have
which proves (6.43). Now consider (6.44). When y % x does not hold, we simply write y 6% x.
By de nition of , we have that
The equivalence (6.43) allows to represent the indi erence classes as indi erence curves
of the utility function:
[x] = fy 2 A : u (x) = u (y)g
Thus, when a preference admits a utility representation, (6.41) reduces to the standard
property that indi erence curves are disjoint (Section 6.3.1).
As already observed, in the ordinalist approach the utility function is a mere represen-
tation of the preference relation, without any special psychological meaning. Indeed, we
already noted that each strictly increasing function f : Im u ! R de nes an equivalent
utility function f u, for which it still holds that
x % y () (f u) (x) (f u) (y)
More is actually true: two functions represent the same underlying preference %, so they are
utility functions for it, if and only if they are a strictly increasing transformation one of the
other. This easily follows from the following result.
Proof We rst prove the \if" part. By Proposition 221 and since f is strictly increasing, we
have
g (x) g (y) () f (g (x)) f (g (y)) () h (x) h (y)
proving (6.45). We next prove the \only if" part. De ne f : Im g ! R to be such that, for
each t 2 Im g,
f (t) = h (x)
where x 2 g 1 (t). Note that f is well de ned. Indeed, since t 2 Im g there always exists a
x 2 A such that g (x) = t, i.e., g 1 (t) 6= ;. Moreover, by (6.45), if x and y are such that
g (x) = g (y) = t, then h (x) = h (y). So, f is well de ned since the rule de ning f assigns to
each t in Im g a unique value in R (that is, h (x)) which does not depend on the speci c x
chosen in g 1 (t), but only on t. Note also that if x 2 A, by de ning t = g (x) we have that
f (g (x)) = f (t) = h (x). Since x was arbitrarily chosen, we conclude that h = f g. We are
left to show that f is strictly increasing. Consider t; s 2 Im g such that t > s. Then there
exists x; y 2 A such that g (x) = t > s = g (y). By (6.45), this implies that h (x) > h (y).
By the de nition of f , we conclude that f (t) = f (g (x)) = h (x) > h (y) = f (g (y)) = f (s),
proving that f is strictly increasing.
Theorem 266 Let % be a preference de ned on a nite set A. The following conditions are
equivalent:
Thus, if there is a nite number of alternatives, transitivity and completeness are neces-
sary and su cient conditions for the existence of a utility function. Matters become more
complicated when A is in nite: later we will present the famous lexicographic preferences on
R2+ , which do not admit any numerical representation. The next theorem solves the existence
problem on the key in nite set Rn+ . To this end we need a nal axiom, which reminds of the
Archimedean property of the real numbers seen in Section 1.4.3.29
Archimedean: given any three bundles x; y; z 2 Rn+ with x y z, there exist weights
; 2 (0; 1) such that
x + (1 )z y x + (1 )z
The axiom implies that there exist no in nitely preferred and no in nitely \unpreferred"
bundles. Given the preferences x y and y z, for the consumer the bundle x cannot be
in nitely better than y, nor the bundle z can be in nitely worse than y. Indeed, by suitably
combining the bundles x and z we get both a bundle better than y, that is, x+(1 )z, and
a bundle worse than y, that is, x + (1 )z. This would be impossible if x were in nitely
better than y, or if z were in nitely worse than y.
In this respect, recall the analogous property of real numbers: if x; y; z 2 R are three
scalars with x > y > z, there exist ; 2 (0; 1) such that
The property does not hold if we consider 1 and 1, that is, the extended real line
R = [ 1; 1]. In this case, if y 2 R but x = +1 and/or z = 1, the scalar x is in nitely
greater than y, and z is in nitely smaller than y, and there are no ; 2 (0; 1) that satisfy
the inequality (6.46). Indeed, 1 = +1 and ( 1) = 1 for every ; 2 (0; 1), as seen
in Section 1.7.
29
For simplicity, we will assume that the consumption set A is the entire Rn + . The axiom can be stated
more generally for convex sets, an important notion that we will study in Chapter 17.
172 CHAPTER 6. FUNCTIONS (SDOGANATO)
Theorem 267 Let % be a preference de ned on A = Rn+ . The following conditions are
equivalent:
This is a remarkable result: most economic applications use utility functions and the
theorem shows which conditions on preferences justify such use.31
To appreciate the importance of Theorem 267, we close the chapter with a famous ex-
ample of a preference that does not admit a utility function. Let A = R2+ and, given two
bundles x and y, write x % y if either x1 > y1 or x1 = y1 and x2 y2 . The consumer starts
by considering the rst coordinate: if x1 > y1 , then x % y. If, on the other hand, x1 = y1 ,
then he turns his attention to the second coordinate: if x2 y2 , then x % y.
The preference is inspired by how dictionaries order words; for this reason, it is called
lexicographic preference. In particular, we have x y if x1 > y1 or x1 = y1 and x2 > y2 ,
while we have x y if and only if x = y. The indi erence classes are therefore singletons, a
rst remarkable feature of this preference.
The lexicographic preference is complete, transitive and strictly monotone, as the reader
can easily verify. It is not Archimedean, however. Indeed, consider for example x = (1; 0),
y = (0; 1), and z = (0; 0). We have x y z and
x + (1 ) z = ( ; 0) y z 8 2 (0; 1)
Proposition 268 The lexicographic preference does not admit any utility function.
Proof Suppose, by contradiction, that there exists u : R2+ ! R that represents the lex-
icographic preference. Let a < b be any two positive scalars. For each x 0 we have
(x; b) (x; a) and, therefore, u (x; a) < u (x; b). By Proposition 42, there exists a rational
number q (x) such that u (x; a) < q (x) < u (x; b). The rule x 7 ! q (x) de nes, therefore, a
function q : R+ ! Q. It is injective. If x 6= y, say y < x, then:
u (y; a) < q (y) < u (y; b) < u (x; a) < q (x) < u (x; b)
and so q (x) 6= q (y). But, since R+ has the same cardinality of R, the injectivity of the
function q : R+ ! Q implies jQj jRj, contradicting Cantor's Theorem 277. This proves
that the lexicographic preference does not admit any utility function.
30
Continuity is an important property, to which Chapter 13 is devoted.
31
There exist other results on the existence of utility functions, mostly proved in the 1940s and in the 1950s.
Chapter 7
Cardinality (sdoganato)
173
174 CHAPTER 7. CARDINALITY (SDOGANATO)
Example 269 The set A = f11; 13; 15; 17; 19g of the odd integers between 10 and 20 is
nite, with jAj = 5. N
Thanks to Proposition 207, two nite sets have the same cardinality if and only if their
elements can be put in a bijective correspondence. For example, if we have seven seats and
seven students, we can assign one (and only one) seat to each student, say by putting a name
tag on it. All this motivates the following de nition which elaborates on what we mentioned
at the end of Section 6.4.1.
De nition 270 A set A is nite if it can be put in a bijective correspondence with a subset
of the form f1; 2; :::; ng of N. In this case, we write jAj = n.
In other words, A is nite if there exist a set f1; 2; :::; ng of natural numbers and a
bijective function f : f1; 2; :::; ng ! A. The set f1; 2; :::; ng can be seen as the \prototypical"
set of cardinality n, a benchmark that permits to \calibrate" all the other nite sets of the
same cardinality via bijective functions.
This de nition provides a functional angle on the cardinality of nite sets, based on
bijective functions and on the identi cation of a prototypical set. For nite sets, however, this
angle is not much more than a curiosity. Yet, it becomes fundamental when we want to extend
the notion of cardinality to in nite sets. This was the key insight of Georg Cantor that, by
nding the right angle, led to the birth of the theory of in nite sets. Indeed, the possibility
of establishing a bijective correspondence among in nite sets allows for a classi cation of
these sets by \size" and leads to the discovery of deep and surprising properties.
Relative to nite sets, countable sets immediately exhibit a remarkable, possibly puzzling,
property: it is always possible to put a countable set into a bijective correspondence with
an in nite proper subset of it. In other words, losing elements might not a ect cardinality
when dealing with countable sets.
Proof Let X be a countable set and let A X be an in nite proper subset of X, i.e.,
A 6= X. Since X is countable, its elements can be listed as a sequence of distinct elements
X = fx0 ; x1 ; : : : ; xn ; : : :g = fxi gi2N . Let us denote by n0 the smallest integer larger than
or equal to 0 such that xn0 2 A (if, for example, x0 2 A, we have n0 = 0, if x0 2 = A and
x1 2 A we have n0 = 1, and so on). Analogously, let us denote by n1 the smallest integer
(strictly) larger than n0 such that xn1 2 A. Given n0 ; n1 ; : : : ; nj , with j 1, let us de ne
nj+1 as the smallest integer larger than nj such that xnj+1 2 A. Consider now the function
f : N ! A de ned by f (i) = xni , with i = 0; 1; : : : ; n; : : :. It is easy to check that f is a
bijective function between N and A, so A is countable.
The following example should clarify the scope of the previous theorem. The set E of
even numbers is, clearly, a proper subset of N that we may think contains only \half" of
its elements. Nevertheless, it is possible to establish a bijective correspondence with N by
putting in correspondence each even number 2n to its half n, that is,
2n 2 E !n2N
Therefore, jEj = jNj. Already Galileo realized this remarkable peculiarity of in nite sets,
which clearly distinguishes them from nite sets, whose proper subsets have always smaller
cardinality.4 In a famous passage of the Discorsi e dimostrazioni matematiche intorno a due
nuove scienze,5 published in 1638, he observed that the natural numbers can be put in a
bijective correspondence with their squares by setting n2 $ n. The squares, which prima
facie seem to form a rather small subset of N, are thus in equal number with the natural
numbers: \in an in nite number, if one could conceive of such a thing, he would be forced
4
The mathematical fact considered here is at the basis of several little stories. For example, The Paradise
Hotel has countably in nite rooms, progressively numbered 1; 2; 3; . At a certain moment, they are all
occupied when a new guest checks in. At this point, the hotel manager faces a conundrum: how to nd a
room for the new guest? Well, after some thought, he realizes that it is easier than he imagined! It is enough
to ask every other guest to move to the room coming after the one they are actually occupying (1 ! 2; 2 ! 3;
3 ! 4, etc.). In this way, room number 1 will become free. He also realizes that it is possible to improve
upon this new arrangement! It is enough to ask everyone to move to the room with a number which is twice
the one of the room actually occupied (1 ! 2; 2 ! 4; 3 ! 6, etc.). In this way, in nite rooms will become
available: all the odd ones.
5
The passage is in a dialogue between Sagredo, Salviati, and Simplicio, during the rst day.
176 CHAPTER 7. CARDINALITY (SDOGANATO)
to admit that there are as many squares as there are numbers all taken together". The
clarity with which Galileo exposes the problem is worthy of his genius. Unfortunately, the
mathematical notions available to him were completely insu cient for further developing
his intuitions. For example, the notion of function, fundamental for the ideas of Cantor,
emerged (in a primitive form) only at the end of the seventeenth century in the works of
Leibniz.
Clearly, the union of a nite number of countable sets is also countable. Much more is
actually true.
Theorem 273 The union of a countable collection of countable sets is also countable.
A1 = fa11 ; a12; :::a1n ; :::g ; A2 = fa21 ; a22; :::a2n ; :::g ; ::: An = fan1 ; an2; :::ann ; :::g ; :::
We can then construct an in nite matrix A in which the elements of the set An form the
n-th row: 2 3
a11 a12 a13 a14 a15
6 a21 a22 a23 a24 a25 7
6 7
6 a31 a32 a33 a34 a35 7
A=6 6 a41 a42
7
7 (7.1)
6 a43 a44 a45 7
4 a51 a52 a53 a54 a55 5
1
[
The matrix A contains at least as many elements as the union An . Indeed, it may contain
n=1
more elements because some elements can be repeated more than once in the matrix, while
they would only appear once in the union (net of such repetitions, the two sets have the
same number of elements).
We now introduce another in nite matrix, denoted by N , which contains all the natural
numbers except 0.
2 3
1 3 6 10 15
6 2 5 9 14 7
6 7
6 4 8 13 7
N =6 6 7 (7.2)
7
6 7 12 7
4 11 5
Observe that:
1. The rst diagonal of A (moving from SW to NE) consists of one element: a11 . We map
this element into the natural number 1, which is the corresponding element in the rst
diagonal of N . Note that the sum of the indexes of a11 is 1 + 1 = 2.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 177
2. The second diagonal of A consists of two elements: a21 and a12 . We map these elements,
respectively, into the natural numbers 2 and 3, which are the corresponding elements
in the second diagonal of N . Note that the sum of the indexes of a21 and a12 is 3.
3. The third diagonal of A consists of three elements: a31 , a22 , and a13 . We map these
elements, respectively, into the natural numbers 4, 5, and 6, which are the correspond-
ing elements in the third diagonal of N . Note that the sum of the indexes of a31 , a22 ,
and a13 is 4.
4. The fourth diagonal of A consists of four elements: a41 , a32 , a23 , and a14 . We map
these elements, respectively, into the natural numbers 7, 8, 9, and 10, which are the
corresponding elements in the fourth diagonal of N . Note that the sum of the indexes
of a41 , a32 , a23 , and a14 is 5.
0.9
0.8
a a a a ...
11 12 13 14
0.7
0.6 a a a a ...
21 22 23 24
0.5
a a a a ...
0.4 31 32 33 34
0.3
a a a a ...
41 42 43 44
0.2
0.1
..........................
0
0 0.2 0.4 0.6 0.8 1
At each step we have an arrow, indexed by the sum of the indexes of the entries that it hits,
minus 1. So, arrow 1 hits entry a11 , arrow 2 hits entries a21 and a12 , arrow 3 hits entries
a31 , a22 , and a13 , and arrow 4 hits entries a41 , a32 , a23 , and a14 . Each arrow hits one more
entry than the previous one.
Intuitively, by proceeding in this way we cover the entire matrix A with countably many
arrows, each hitting a nite number of entries. So, matrix A has countably many entries.
1
[
The union An is then a countable set.
n=1
That said, next we give a rigorous proof.
Claim 1 N N is countable.
Proof Claim 1 Consider the function f1 : N N ! N given by f1 (m; n) = 2n+1 3m+1 . Note
that f1 (m; n) = f1 (m; n) means that 2n+1 3m+1 = 2n+1 3m+1 . By the Fundamental Theorem
of Arithmetic, this implies that n+1 = n+1 and m+1 = m+1, proving that (m; n) = (m; n).
178 CHAPTER 7. CARDINALITY (SDOGANATO)
With a similar argument it is possible to prove that also the Cartesian product of a nite
number of countable sets is countable. Moreover, the previous result yields that the set of
rational numbers is countable.
The reader can verify that f is bijective, thus proving that Z is countable. On the other
hand, the set nm o
Q= : m 2 Z and 0 6= n 2 N
n
of rational numbers can be written as union of in nitely many countable sets:
+1
[
Q= An
n=1
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 179
where
0 1 1 2 2 m m
An = ; ; ; ; ;:::; ; ;:::
n n n n n n n
This corollary is quite surprising: though the rational numbers are much more numerous
than the natural numbers, there exists a way to put these two classes of numbers into
a bijective correspondence. The cardinality of N, and so of any countable set, is usually
denoted by @0 , that is, jNj = @0 . We can then write as
jQj = @0
De nition 275 A set A has the cardinality of the continuum if it can be put in a bijective
correspondence with the set R of the real numbers. In this case, we write jAj = jRj.
The cardinality of the continuum is often denoted by c, that is, jRj = c. Also in this case
there exist subsets that are, prima facie, much smaller than R but turn out to have the same
cardinality. Let us see an example which will be useful in proving that R is uncountable.
Proposition 276 The interval (0; 1) has the cardinality of the continuum.7
Proof We want to show that j(0; 1)j = jRj. To do this we have to show that the numbers of
(0; 1) can be put in a bijective correspondence with those of R. The bijection f : R ! (0; 1)
de ned by
(
1 21 ex if x < 0
f (x) = 1 x
2e if x 0
6
@ (aleph) is the rst letter of the Hebrew alphabet. In the next section we will formalize also for in nite
sets the notion of same or greater cardinality. For the time being, we treat these notions intuitively.
7
At the end of Section 6.5.3 we noted that the trigonometric function f : R ! ( 1; 1) de ned by
(2= ) arctan x is a bijection In view of what we learned so far, this shows that ( 1; 1) has the cardinal-
ity of the continuum.
180 CHAPTER 7. CARDINALITY (SDOGANATO)
with graph
2
y
1.5
1
1
0.5 1/2
0
O x
-0.5
-1
-1.5
-2
-5 -4 -3 -2 -1 0 1 2 3 4 5
shows that, indeed, this is the case (as the reader can also formally verify).
Proof Assume, by contradiction, that R is countable. Hence, there exists a bijective function
g : N ! R. By Proposition 276, it follows that there exists a bijective function f : R ! (0; 1).
The reader can easily prove that f g is a bijective function from N to (0; 1), yielding that (0; 1)
is countable. We will next reach a contradiction, showing that (0; 1) cannot be countable. To
this end, we write all the numbers in (0; 1) using their decimal representation: each x 2 (0; 1)
will be written as
x = 0:c0 c1 cn
with ci 2 f0; 1; :::; 9g, using always in nitely many digits (for example 3:54 will be written
3:54000000 : : :). Since until now we obtained that (0; 1) is countable, there exists a way to
list its elements as a sequence.
and so on. Let us take then the number x = 0:d0 d1 d2 d3 dn such that its generic
decimal digit dn is di erent from cnn (but without choosing in nitely many times 9, thus to
avoid a periodic 9 which, as we know, does not exist on its own). The number x belongs
to (0; 1), but sadly does not belong to the list written above since dn 6= cnn (and therefore
it is di erent from x0 since d0 6= c00 , from x1 since d1 6= c11 , etc.). We conclude that the
8
In Section ?? below, we formally de ne the inequality jRj > @0 , whose intuitive meaning should be
nonetheless clear.
7.3. A PANDORA'S BOX 181
list written above cannot be complete and hence the numbers of (0; 1) cannot be put in a
bijective correspondence with N. So, the interval (0; 1) is not countable, a contradiction.
The set R of real numbers is, therefore, much richer than N and Q. The rational numbers
{ which have, as we remarked, a \quick rhythm" { are comparatively very few with respect to
the real numbers: they form a kind of ne dust that overlaps with the real numbers without
covering them all. At the same time, it is dust so ne that between any two real numbers,
no matter how close they are, there are particles of it.
Summing up, the real line is a new prototype of in nite set.
It is possible to prove that both the union and the Cartesian product of a nite or
countable collection of sets that have the cardinality of the continuum has, in turn, the
cardinality of the continuum. This has the next consequence.
This is another remarkable nding, which is surprising already in the special case of the
plane R2 that, intuitively, may appear to contain many more points than the real line. It is
in front of results of this type, so surprising for our \ nitary" intuition, that Cantor wrote
in a letter to Dedekind \I see it, but I do not believe it". His key intuition on the use of
bijective functions to study the cardinality of in nite sets opened a new and fundamental
area of mathematics, which is also rich in terms of philosophical implications (mentioned at
the beginning of the chapter).
respectively. Looking at (7.4), it is natural to wonder whether @0 and c are the only in nite
cardinal numbers. As we will see shortly, this is far from being true. Indeed, we are about
to uncover a genuine Pandora's box (from which, however, no evil will emerge but only
wonders). To do this, we rst need to generalize to any pairs of sets the comparative notion
of size we considered in De nitions 271 and 275.
De nition 279 Two sets A and B have the same cardinality if there exists a bijective
function f : A ! B. In this case, we write jAj = jBj.
182 CHAPTER 7. CARDINALITY (SDOGANATO)
In particular, when A is nite we have jAj = jf1; :::; ngj = n, when A is countable we
have jAj = jNj = @0 , and when A has the cardinality of the continuum we have jAj = jRj = c.
We denote by 2A the power set of the set A, that is, the collection
2A = fB : B Ag
of all its subsets. The notation 2A is justi ed by the cardinality of the power set in the nite
case, as we next show.
n n n n
2A = 1 + + + + +
1 2 n 1 n
n
X n k n k
= 1 1 = (1 + 1)n = 2n
k
k=0
where the penultimate equality follows from Newton's binomial formula (B.7).
Example 281 More compactly, we can write 2A = 2jAj . For instance, when A has three
elements, say A = fa1 ; a2 ; a3 g, its power set
has cardinality 23 = 8. N
Sets can have the same size, but also di erent sizes. This motivates the following de ni-
tion:
(i) A has cardinality less than or equal to B, written jAj jBj, if there exists an injective
function f : A ! B;
(ii) A has cardinality strictly less than B, written jAj < jBj, if jAj jBj and jAj =
6 jBj.
(ii) jAj jBj and jBj jCj imply that jAj jCj;
(iii) jAj jBj and jBj jAj if and only if jAj = jBj;
Example 284 We have jNj < jRj. Indeed, by Theorem 277 jNj 6= jRj and, by point (iv),
N R implies jNj jRj. N
Properties (i) and (ii) say that the order is re exive and transitive (cf. Appendix
A). As for property (iii), it tells us that and = are related in a natural way. Finally,
(iv) con rms the intuitive idea that smaller sets have a smaller cardinality. Remarkably,
this intuition does not carry over to < { i.e., A B does not imply jAj < jBj { because,
as already noted, a proper subset of an in nite set may have the same cardinality as the
original set (as Galileo had envisioned).
When a set A is nite and non-empty, we clearly have jAj < 2A . Remarkably, the
inequality continues to hold for in nite sets.
Theorem 285 (Cantor) For each set A, nite or in nite, we have jAj < 2A .
Proof Consider a set A and the collection of all singletons C = ffagga2A . It is immediate to
see that there is a bijective mapping between A and C, that is, jAj = jCj, and C 2A . Since
jCj 2A , we conclude that jAj 2A . Next, by contradiction, assume that jAj = j2A j.
Then there exists a bijection between A and 2A which associates to each element a 2 A an
element b = b (a) 2 2A and vice versa: a $ b. Observe that each b (a), being an element of
2A , is a subset of A. Consider now all the elements a 2 A such that the corresponding subset
b (a) does not contain a. Call S the subset of these elements, that is, S = fa 2 A : a 62 b (a)g.
Since S is a subset of A, S 2 2A . Since we have a bijection between A and 2A , there must
exist an element c 2 A such that b (c) = S. We have two cases:
(ii) if c 2
= S, then by the de nition of S, b (c) contains c, so c 2 b (c) = S.
In both cases, we have reached a contradiction, thus proving jAj < j2A j.
Cantor's Theorem o ers a simple way to make a \cardinality jump" starting from a given
set A: it is su cient to consider the power set 2A . For example,
R
2R > jRj ; 22 > j2R j
and so on. We can, therefore, construct an in nite sequence of sets of higher and higher
cardinality. In this way, we enrich (7.4) that now becomes
n R
o
0; 1; 2; :::; n; :::; @0 ; c; 2R ; 22 ; ::: (7.5)
Here is the Pandora's box mentioned above, which Theorem 285 has allowed us to uncover.
The breathtaking sequence (7.5) is only the incipit of the theory of the in nite sets, whose
study (even the introductory part) would take us too far away.
Before moving on with the book, however, we consider a nal famous aspect of the
theory, the so-called continuum hypothesis (which the reader might have already heard of).
By Theorem 285, we know that 2N > jNj. On the other hand, by Theorem 277 we also
have jRj > jNj. The next result (we omit its proof) shows that these two inequalities are
actually not distinct.
Therefore, the power set of N has the cardinality of the continuum. The continuum
hypothesis states that there is no set A such that
That is, there is no in nite set of cardinality intermediate between @0 and c. In other words,
a set that has cardinality larger than @0 must have at least the cardinality of the continuum.
The validity of the continuum hypothesis is the rst among the celebrated Hilbert prob-
lems, posed by David Hilbert in 1900, and represents one of the deepest questions in math-
ematics. By adopting this hypothesis, it is possible to set
@1 = c
and to consider the cardinality of the continuum as the second in nite cardinal number @1
after the rst one @0 = jNj.
The continuum hypothesis can be reformulated in a suggestive way by writing
@1 = 2@0
That is, the smallest cardinal number greater than @0 is equal to the cardinality of the power
set of N or, equivalently, of any set of cardinality @0 (like, for example, the rational numbers).
The generalized continuum hypothesis implies that, for each n, we have
@n+1 = 2@n
7.4. CODA: WHAT IS A NATURAL NUMBER? 185
All the jumps of cardinality in (7.5), not only the rst one from @0 to @1 , are thus obtained
by considering power sets. Therefore,
R 2R
@ 2 = 22 ; @3 = 22
The elements of this sequence are the cardinal numbers that represent all the di erent car-
dinalities ( nite or in nite) that sets might have, however large they might be. According
to the generalized continuum hypothesis, the power sets in (7.5) are the prototype sets of
the in nite cardinal numbers (the rst two being the two in nite cardinal numbers @0 = jNj
and @1 = c with which we started this section).
Summing up, the depth of the problems that the use of bijective functions opened is
incredible. As we have seen, this study started by Cantor is, at the same time, rigorous
and intrepid { as typical of the best mathematics, at the basis of its beauty. It relies on
the use of bijective functions to capture the fundamental principle of similarity (in terms of
numerosity) among sets.10
0=;
1 = f;g
2 = f;; f;gg
3 = f;; f;g ff;; f;gggg
because this clari es a key feature of this approach: natural numbers are de ned as sets that
consist of their own predecessors.
This beautiful iterative construction of the natural numbers was introduced by John von
Neumann in a 1923 article (so, when he was twenty years old). Ex nihilo nihil, but maybe
the von Neumann construction is an exception: from the empty set, the natural numbers!
188 CHAPTER 7. CARDINALITY (SDOGANATO)
Part II
Discrete analysis
189
Chapter 8
Sequences (sdogonato)
where each number occupies a place of order, a position, so it follows (except the rst one)
a number and precedes another one. The next de nition formalizes this idea. We denote by
N+ the set of the natural numbers without 0.
n 7 ! 2n (8.2)
and so we have the sequence of even integers (that are strictly positive). The image f (n)
is usually denoted by xn . With such notation, the sequence of even integers is xn = 2n for
each n 1. The images xn are called terms (or elements) of the sequence. We will denote
sequences by fxn g1
n=1 , or brie y by fxn g.
1
There are di erent ways to de ne a speci c sequence fxn g, that is, to describe the
underlying function f : N+ ! R. A rst possibility is to describe it in closed form through
a formula: for instance, this is what we did with the sequence of the even numbers using
(8.2). Other de ning rules are, for example,
n 7 ! 2n 1 (8.3)
n 7 ! n2 (8.4)
1
n7 ! p (8.5)
2n 1
1
The choice of starting the sequence from n = 1 instead of n = 0 (or of any other natural number k) is a
mere convention. When needed, it is perfectly legitimate to consider sequences fxn g1n=0 or, more generally,
fxn g1
n=k .
191
192 CHAPTER 8. SEQUENCES (SDOGONATO)
1 1 1
1; p ; p ; p ; ::: (8.7)
2 4 8
To de ne a sequence in closed form thus amounts to specify explicitly the underlying function
f : N+ ! R. The next example presents a couple of classic sequences de ned in closed form.
Example 288 The sequence of unit fractions with xn = 1=n, that is,
1 1 1 1
1; ; ; ; ; :::
2 3 4 5
a; aq; aq 2 ; aq 3 ; aq 4 ;
is called geometric (or geometric progression) with rst term a and common ratio q. For
example, if a = 1 and q = 1=2, we have f1; 1=2; 1=4; 1=8; 1=16; :::g. N
in which each term is the sum of the two terms that precede it, with xed initial values 0
and 1. For example, in the fourth position we nd the number 2, i.e., the sum 1 + 1 of the
two terms that precede it, in the fth position we nd the number 3, i.e., the sum 1 + 2 of
the two terms that precede it, and so on. The underlying function f : N+ ! R is, hence,
(
f (1) = 0 ; f (2) = 1
(8.8)
f (n) = f (n 1) + f (n 2) for n 3
We have two initial values, f (1) = 0 and f (2) = 1, and a recursive rule that allows to
compute the term in position n once the two preceding terms are known. Di erently from
the sequences de ned through a closed formula, such as (8.3)-(8.5), to obtain the term xn
we now have to rst construct, using the recursive rule, all the terms that precede it. For
example, to compute the term x100 in the sequence of the odd numbers (8.6), it is su cient
to substitute n = 100 in formula (8.3), nding x100 = 199. In contrast, to compute the term
2
Indeed, 1=2; 1=3; 1=4; ::: are the positions in which we have to put a nger on a vibrating string to obtain
the di erent notes.
8.1. THE CONCEPT 193
x100 in the Fibonacci sequence we rst have to construct by recurrence the rst 99 terms of
the sequence. Indeed, it is true that to determine x100 it is su cient to know the values of
x99 and x98 and then to use the rule x100 = x99 + x98 , but to determine x99 and x98 we must
rst know x97 and x96 , and so on.
Therefore, the recursive de nition of a sequence consists of one or more initial values and
of a recurrence rule that, by starting from them, allows to compute the subsequent terms of
the sequence. The initial values are arbitrary. For example, if in (8.8) we choose f (1) = 2
and f (2) = 1 we have the following Fibonacci sequence
Starting from the initial value f (1) = a, it is possible to construct the entire sequence
through the recursive formula f (n) = f (n 1) + b. This is the so-called arithmetic sequence
(or arithmetic progression) with rst term a and common di erence b. For example, if a = 2
and b = 4, we have f2; 6; 10; 14; 18; 22; :::g. N
To ease notation, the underlying function f is often omitted in recursive formulas. For
instance, the arithmetic sequence is written as
(
x1 = a
(8.9)
xn = xn 1 +b for n 2
In words, at each position we can go either up or down of one unit: we go down if we are
getting to positions that are multiples of 3, we go up otherwise. This sequence is an example
of a random walk: it may describe the walk of a drunk person who, at each block, goes
either North, +1, or South, 1, and that, for some (random) reason, always goes South
3
In this chapter we illustrate the idea of recurrence through examples. A formal analysis will be presented in
Section 14.2, which is best read after an intuitive understanding of this fundamental idea has been developed.
194 CHAPTER 8. SEQUENCES (SDOGONATO)
after having gone twice North. For instance, if the initial condition is a = 0 we have:
More generally, given any subset P ( nite or not) of N+ , the recurrence (8.10) is called
random walk. N
Example 291 A Star Wars' jedi begins his career as a padawan apprentice under a jedi
master, then becomes a knight and, once ready to train, becomes a master and takes a
padawan apprentice.
Let
pt = number of jedi padawans at time t
kt = number of jedi knights at time t
mt = number of jedi masters at time t
Assume that, as one (galactic) year passes, padawans become knights, knights become mas-
ters, and masters take a padawan apprentice. Formally:
8
>
> k = pt
< t+1
mt+1 = mt + kt
>
>
:
pt+1 = mt+1
The total number of jedis at time t + 2, denoted by xt+2 , is then:
xt+2 = kt+2 + mt+2 + pt+2 = pt+1 + mt+1 + kt+1 + mt+1 + kt+1
= xt+1 + mt+1 + kt+1 = xt+1 + mt + kt + pt = xt+1 + xt
So, we have a Fibonacci recursion
xt+2 = xt+1 + xt
which says something simple but not so obvious a priori : the number of jedis at time t+2 can
be regarded as the sum of the numbers of jedis at time t + 1 and at time t. Indeed, a jedi
is a master at t + 2 if and only if he was a jedi (of any kind) at t. So, xt gives the number
of all masters at t = 2, who in turn increase at t + 2 the population of jedis by taking new
apprentices.
The recursion is initiated at t = 1 by a \self-taught" original padawan, who becomes
knight at t = 2 and master with a new padawan at t = 3. So:
(
x1 = 1 ; x2 = 1
xt = xt 1 + xt 2 for t 3
8.1. THE CONCEPT 195
p 1=1
k 1=1
mp 1+1=2
mpk 1+2=3
mpkmp 2+3=5
mpkmpmpk 3+5=8
mpkmpmpkmpkmp 5+8=13
Note how every string is the concatenation of the previous two ones. N
We have
x2 = 4; x3 = 8; x4 = 16
and so on. This suggests that the closed form is the geometric sequence
xn = 2n 8n 1 (8.12)
of both rst term and common ratio 2. Let us verify that this guess is correct. We proceed
by induction. Initial step: for n = 1 we have x1 = 2, as desired. Induction step: assume
that (8.12) holds for some n 2; then
as the reader can prove. This recursion also motivates the \ rst term" and \common ratio"
terminology. N
x2 = a + b; x3 = a + 2b; x4 = a + 3b
xn = a + (n 1) b 8n 1 (8.13)
Let us verify that this guess is correct. We proceed by induction. Initial step: for n = 1 we
have x1 = a, as desired. Induction step: assume that (8.13) holds for some n 2; then
xn+1 = xn + b = a + (n 1) b + b = a + nb
Example 295 An investor can at each period of time invest an amount of money x, a
monetary capital, and receive at the end of next period the original amount invested x along
with an additional amount rx computed according to the interest rate r 0. Such additional
amount is the fruit of his investment. For instance, if x = 100 and r = 0:1, then rx = 10 is
such an amount.
Assume that the investor has an initial monetary capital c that he keeps investing at all
periods. The resulting cash ow is described by the following recursion
(
x1 = c
xt = (1 + r) xt 1 for t 2
We have
x2 = c (1 + r) ; x3 = x2 (1 + r) = c (1 + r)2 ; x4 = x3 (1 + r) = c (1 + r)3
8.1. THE CONCEPT 197
xt = (1 + r)t 1
c 8t 1 (8.14)
To verify this guess, we can proceed by induction. Initial step: for t = 1 we have x1 = c, as
desired. Induction step: assume that (8.14) holds for some t 2; then
xt+1 = (1 + r) xt = (1 + r) (1 + r)t 1
c = (1 + r)t c
and so (8.14) holds for t + 1. By induction, it then holds for all t 1. Formula (8.14) is the
classic compound interest formula of nancial mathematics. N
Not all sequences can be described in closed or recursive form. In this regard, the most
famous example is the sequence fpn g of prime numbers: it is in nite by Euclid's Theorem,
but it does not have a (known) explicit description. In particular:
(i) Given n, we do not know any formula that tells us what pn is; in other words, the
sequence fpn g cannot be de ned in closed form.
(ii) Given pn (or any smaller prime), we do not know any formula that tells us what pn+1
is; in other words, the sequence fpn g cannot be de ned by recurrence.
(iii) Given any prime number p, we do not know of any (operational) formula that gives us
a prime number q greater than p; in other words, the knowledge of a prime number
does not give any information on the subsequent prime numbers.
Hence, we do not have a clue on how prime numbers follow one another, that is, on the
form of the function f : N+ ! R that de nes such sequence. We have to consider all the
natural numbers and check, one by one, whether or not they are prime numbers through the
primality tests (Section 1.3.2). Having at our disposal the eternity, we could then construct
term by term the sequence fpn g. More modestly, in the short time that passed between
Euclid and us, tables of prime numbers have been compiled; they establish the terms of the
sequence fpn g until numbers that may seem huge to us, but that are nothing relative to the
in nity of all the prime numbers.
O.R. As to (iii), for centuries mathematicians have looked for a (workable) rule that, given
a prime number p, would make it possible to nd a greater prime q > p, that is, a function
q = f (p). A famous example of a possible such rule is given by the so-called Mersenne
primes, which are the prime numbers that can be written in the form 2p 1 with p prime.
It is possible to prove that if 2p 1 is prime, then so is p. For centuries, it was believed (or
198 CHAPTER 8. SEQUENCES (SDOGONATO)
hoped) that the much more interesting converse was true, namely: if p is prime, so is 2p 1.
This conjecture was de nitely disproved in 1536 when Hudalricus Regius showed that
211 1 = 2047 = 23 89
thus nding the rst counterexample to the conjecture. Indeed, p = 11 does not satisfy it.
In any case, Mersenne primes are among the most important prime numbers. In particular,
as of 2016, the greatest prime number known is
274207281 1
We close the section by observing that, given any function f : R+ ! R, its restriction fjN+
to N+ is a sequence. So, functions de ned on (at least) the positive half-line automatically
de ne also a sequence.
x = fxn g = fx1 ; x2 ; : : : ; xn ; : : :g
The operations on functions studied in Section 6.3.2 have, as a special case, the operations
on sequences { that is, on elements of the space R1 . In particular, given any two sequences
x = fxn g and y = fyn g in R1 , we have:
To ease notation, we will denote the sum directly by fxn + yn g instead of f(x + y)n g.
We will do the same for the other operations.6
(ii) x > y if x y and x 6= y, i.e., if x y and there is at least a position n such that
xn > yn ;
5
See the Great Internet Mersenne Prime Search.
6
If f; g : N+ ! R are the functions underlying the sequences fxn g and fyn g, their sum is equivalently
written (x + y)n = (f + g) (n) = f (n)+g (n) for every n 1. A similar remark holds for the other operations.
So, the operations on functions imply those on sequences, as claimed.
8.3. APPLICATION: INTERTEMPORAL CHOICES 199
x y =) x > y =) x y 8x; y 2 R1
That said, like in Rn also in R1 the order is not complete and sequences might well
be not comparable. For instance, the alternating sequence xn = ( 1)n and the constant
sequence yn = 0 cannot be compared. Indeed, they are f 1; 1; 1; 1; :::g and f0; 0; 0; 0; :::g,
respectively.
(i) increasing if
x y =) g (x) g (y) 8x; y 2 A (8.15)
g (x) = k 8x 2 A
The decreasing counterparts of these notions are similarly de ned. For brevity, we do
not dwell upon these notions. We just note that, as in Rn , strict monotonicity implies
the other two kinds of monotonicity and that constancy implies increasing and decreasing
monotonicity, but not vice versa (cf. Example 223).
function U : R1+ ! R. For example, if we assume that the consumer evaluates the consump-
tion xt of each period through a instantaneous (bounded) utility function u : R+ ! R, then
a standard form of the intertemporal utility function is
t 1
U (x) = u (x1 ) + u (x2 ) + + u (xt ) +
where 2 (0; 1) can be interpreted as a subjective discount factor that, as we have seen,
depends on the degree of patience of the consumer (Section 6.2.2).
The monotonicity properties of intertemporal utility functions U : R1 + ! R are, clearly,
those seen in points (i)-(iv) of the previous section for a generic function g de ned on subsets
of R1 .
Economic agents' decisions are often based on variables the value of which they will only
learn in the future. At the moment of the decision, agents can only rely on their subjective
expectations about such values. For this reason, expectations come to play a key role in
economics and the relevance of this subjective component is a key feature of economics
as a social science that distinguishes it from, for instance, the natural sciences. Through
sequences we can give a rst illustration of their importance.
Let us consider the market, denoted by M , of some agricultural good, say potatoes. It is
formed by a demand function D : [a; b] ! R and by a supply function S : [a; b] ! R, with
0 a < b. The image D (p) is the overall amount of potatoes demanded at price p by
consumers, while the image S (p) is the overall amount of potatoes supplied at price p by
producers. We assume that both such quantities respond instantaneously to changes in the
market price p: in particular, producers are able to adjust in real time their production levels
according to the market price p.
De nition 297 A pair (p; q) 2 [a; b] R+ of prices and quantities is called an equilibrium
of market M if
q = D (p) = S (p)
The pair (p; q) is the equilibrium of our market of potatoes. Graphically, it corresponds
to the classic intersection of supply and demand:
8.4. APPLICATION: PRICES AND EXPECTATIONS 201
6
y
D
5
S
3
0
O b x
-1
-0.5 0 0.5 1 1.5 2
where both quantities are positive. So, we consider demand and supply functions de ned
only on such interval even though, mathematically, they are straight lines de ned on the
entire real line.
For our linear economy, the equilibrium condition becomes
p= p
So, the equilibrium price and quantity are
p= (8.17)
+
and
q = D (p) = p= =
+ +
Note that, equivalently, we can retrieve the equilibrium quantity via the supply function:
q = S (p) = p = =
+ +
Thus, the pair
;
+ +
is the equilibrium of our market of potatoes.
202 CHAPTER 8. SEQUENCES (SDOGONATO)
D (pt ) = pt (Mt )
S (pt ) = pt
qt = D (pt ) = S (pt ) 8t 1
It is easy to check that the resulting sequence of equilibrium prices fpt g is constant:
pt = 8t 1 (8.18)
+
We thus go back to the equilibrium price (8.17) of market M . This is not surprising: because
of our assumptions, the markets Mt are independent and, at each t, we have a market identical
to M .
The hypothesis of instantaneous production upon which our analysis relies is, however,
implausible. Let us make the more plausible hypothesis that producers can adjust their
production only after one period: their production technology requires that the quantity
that they supply at t has to be decided at t 1 (to harvest potatoes at t, we need to sow at
t 1).
At the decision time t 1, producers do not know the value of the future equilibrium
price pt , they can only have a subjective expectation about it. Denote by Et 1 (pt ) such
expected value. In this case the market at t, denoted by MRt , has the form
D (pt ) = pt (MRt )
S (Et 1 (pt )) = Et 1 (pt )
where the expectation Et 1 (pt ) replaces the price pt as an argument of the supply function.
Indeed, producers' decisions now rely upon such expectation.
7
Here [a; b]1 denotes the collection of sequences with terms that all belong to the interval [a; b].
8.4. APPLICATION: PRICES AND EXPECTATIONS 203
De nition 299 A triple of sequences of prices fpt g 2 [a; b]1 , quantities fqt g 2 R1 + , and
expectations fEt 1 (pt )g 2 [a; b]1 is called a uniperiodal market equilibrium of markets MRt
if
qt = D (pt ) = S (Et 1 (pt )) 8t 1
Since prices are positive and belong to the interval [0; = ], we must have
0 Et 1 (pt ) 8t 1 (8.20)
Et 1 (pt ) = pt 1 8t 2 (8.21)
with an arbitrary initial expectation E0 (p1 ).8 With this process of expectation formation {
the so-called classic expectations 9 { the market MRt becomes
D (pt ) = pt
S (pt 1) = pt 1
In view of (8.19), at a uniperiodal market equilibrium, prices then evolve according to the
linear recursion
pt = pt 1 8t 2 (8.22)
So, starting from an initial expectation, prices are determined by recurrence. Expecta-
tions no longer play an explicit role in the evolution of prices, thus dramatically simplifying
the analysis. Yet, one should not forget that, though they do not appear in the recursion,
expectations are key in the underlying economic process. Speci cally, once xed a value of
E0 (p1 ), from (8.23) we have the initial equilibrium price, which in turn determines both the
expectation E1 (p2 ) via (8.21) and the next equilibrium price p2 via the recursion (8.22), and
so on so forth. Starting from an initial expectation, this process thus features equilibrium
sequences fpt g and fEt 1 (pt )g of prices and expectations.
Assume, instead, that producers expect that the future price be an average of the last
two observed prices:
1 1
Et 1 (pt ) = pt 1 + pt 2 8t 3 (8.24)
2 2
with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). In view of (8.19), at a uniperiodal
market equilibrium, prices then evolve according to the following linear recursion of order 2:
(
p1 = E0 (p1 ) ; p2 = E1 (p2 )
pt = 2 pt 1 2 pt 2 for t 3
with arbitrary initial expectations E0 (p1 ) and E1 (p2 ). Expectations based on (possibly
weighted) averages of past prices { the so-called extrapolative expectations { make it possible
to describe equilibrium prices via a linear recurrence, a very tractable form.
It is, however, a quite naive mechanism of price formation, agents might well feature more
sophisticated ways to form expectations. For instance, this is the case for adaptive expecta-
tions, the most important mechanism of expectation formation: a sequence of expectations
fEt 1 (pt )g is said to be adaptive if there exists a coe cient 2 (0; 1] such that
Et 1 (pt ) Et 2 (pt 1 )
pt 1 Et 2 (pt 1 )
Over time, expectations are updated according to the previous error forecasts. If = 1 we
8.4. APPLICATION: PRICES AND EXPECTATIONS 205
get back to the classic case Et 1 (pt ) = pt 1. In general, for 2 (0; 1) by iterating we get:
t 1
X
Et 1 (pt ) = (1 )i 1
pt i + (1 )t 1
E0 (p1 ) (8.26)
i=1
Adaptive expectations are an average of the initial expectation E0 (p1 ) and past prices. It
is an average in which the more recent prices have a higher weight than the older ones.
However, being 2 (0; 1), we have limt!1 (1 )t 1 = 0 and, as t gets larger, the term
t 1
(1 ) gets smaller. The initial expectation E0 (p1 ) thus becomes, as t increases, less
and less important in the formation of the expectation Et 1 (pt ), only past prices eventually
matter.
Except in the classic case = 1, adaptive and extrapolative expectations are distinct
mechanisms. Indeed, in (8.26) all past prices matter, though the more recent ones have a
higher weight. In contrast, extrapolative expectations only rely on the last prices { e.g., in
(8.24) only the ones of the past two periods.
Assume that producers' expectations are adaptive. In view of the equilibrium relation
(8.19), we have
pt 1 pt = pt 1 + pt 1
that is,
pt 1 pt = pt 1 1+
206 CHAPTER 8. SEQUENCES (SDOGONATO)
with arbitrary initial expectation E0 (p1 ). Remarkably, also adaptive expectations result in
a simple linear recurrence for equilibrium prices. If = 1, we get back to the \classic"
recurrence (8.22)-(8.23).
f 1; 1; 1; 1; :::g (8.28)
Im f = ff (n) : n 1g
of the sequence, which consists exactly of the values that the sequence takes on, disregarding
repetitions. For example, the image of the alternating sequence (8.28) is f 1; 1g, while for
the constant sequence (8.29) it is the singleton f2g. The image thus gives an important
piece of information in that it indicates which values the sequence actually takes on, net of
repetitions: as we have seen, such values may be very few and just repeat themselves over and
over again along the sequence. On the other hand, the sequence of the odd numbers (8.6) does
not contain any repetition; its image consists of all its terms, that is, Im f = f2n 1 : n 1g.
Through the image, in Section 6.4.3 we studied some notions of boundedness for functions.
In the special case of sequences { i.e., of the functions f : N+ ! R { these notions take the
following form. A sequence fxn g is:
(i) bounded (from) above if there exists k 2 R such that xn k for every n 1;
(ii) bounded (from) below if there exists k 2 R such that xn k for every n 1;
(iii) bounded if it is bounded both above and below, i.e., if there exists k > 0 such that
jxn j k for every n 1.
For example, the alternating sequence xn = ( 1)n is bounded, while that of the odd
numbers (8.6) is only bounded below. Note that, as usual, this classi cation is not exhaustive
because there exist sequences that are both unbounded above and below: for example, the
(strongly) alternating sequence xn = ( 1)n n.11 Such sequences are called unbounded.
11
By \unbounded above (below)" we mean \not bounded from above (below)".
8.6. EVENTUALLY: A KEY ADVERB 207
Monotone sequences are another important class of sequences. By applying to the un-
derlying function f : N+ ! R the notions of monotonicity introduced for functions (Section
6.4.4), we say that a sequence fxn g is:
(i) increasing if
xn+1 xn 8n 1
strictly increasing if
xn+1 > xn 8n 1
(ii) decreasing if
xn+1 xn 8n 1
strictly decreasing if
xn+1 < xn 8n 1
(iii) constant if it is both increasing and decreasing, i.e., if there exists k 2 R such that
xn = k 8n 1
De nition 300 We say that a sequence satis es a property P eventually if, starting from
a certain position n = nP , all the terms of the sequence satisfy P.
Example 301 (i) The sequence f2; 4; 6; 32; 57; 1; 3; 5; 7; 9; 11; g is eventually increas-
ing: indeed, starting from the 6th term, it is increasing.
(ii) The sequence fng is eventually 1:000: indeed, all the terms of the sequence, starting
from those of position 1:000, are 1:000.
(iii) The same sequence is also eventually 1:000:000:000 as well as 10123 .
(iv) The sequence f1=ng is eventually smaller than 1=1:000:000.
(v) The sequence
f27; 65; 13; 32; ; 125; 32; 3; 3; 3; 3; 3; 3; 3; 3; g
is eventually constant. N
O.R. To satisfy eventually a property, the sequence in its \youth" can do whatever it wants;
what matters is that when old enough (i.e., from a certain n onward) it settles down. Youthful
blunders are forgiven as long as, sooner or later, all the terms of the sequence will satisfy
the property. H
208 CHAPTER 8. SEQUENCES (SDOGONATO)
8.8.1 Convergence
We start with convergence, that is, with case (i) above.
Therefore, a sequence fxn g converges to L when, for each quantity " > 0, arbitrarily small
but positive, there exists a position n" { that depends on "! { starting from which the
distance between the terms xn of the sequence and the limit L is always smaller than ".
Intuitively, the sequence's terms xn eventually approximate the value L within any stan-
dard of approximation " > 0 that one may posit, however small (so, however demanding),
this posited standard " may be. A sequence fxn g that converges to a point L 2 R is called
convergent.
O.R. To converge to L, the sequence has to pass a highly demanding test: given any threshold
" > 0 selected by a relentless examiner, there has to be a position n" far enough so that all
terms of the sequence that come after such position are " close to L. A convergent sequence
is able to pass any such test, however tough the examiner can be { i.e., however small is the
posited " > 0. H
We emphasized through an exclamation point that the position n" depends on ", a key
feature of the previous de nition. Moreover, such n" is not unique: if there exists a position
n" such that jxn Lj < " for every n n" , the same is true for any subsequent position,
which then also quali es as n" . The choice of which among these positions to call n" is
irrelevant for the de nition, which only requires the existence of, at least, one of them.
That said, there is always a smallest n" , which is a genuine threshold. As such, its
dependence on " takes a natural monotone form: such n" becomes larger and larger as "
becomes smaller and smaller. The smallest n" thus best captures, because of its threshold
nature, the spirit of the de nition: for each arbitrarily small " > 0, there exists a threshold
n" { the larger, the smaller (so, more demanding) " is { beyond which the terms xn are
" close to the limit L. The two examples that we will present shortly should clarify this
discussion.
So, in view of (8.31) we can rewrite the de nition of convergence in the language of neigh-
borhoods. Conceptually, it is an important rewriting that deserves a separate mention.
210 CHAPTER 8. SEQUENCES (SDOGONATO)
De nition 303 A sequence fxn g converges to a point L 2 R if, for every neighborhood
B" (L) of L, there exists n" 1 such that
Example 304 Consider the sequence xn = 1=n. The natural candidate for its limit is 0.
Let us verify that this is the case. Let " > 0. We have
1 1 1
0 < " () < " () n >
n n "
Therefore, if we take as n" any integer greater that 1=", for example the smallest one n" =
[1="] + 1,12 we then have
1
n n" =) 0 < < "
n
Therefore, 0 is indeed the limit of the sequence. For example, if " = 10 100 , we have
n" = 10100 + 1. Note that we could have chosen n" to be any integer greater than 10100 + 1,
which is indeed the smallest n" . N
p
Example 305 Consider the sequence (8.7), that is, xn = 1= 2n 1 . Also here the natural
candidate for its limit is 0. Let us verify this. Let " > 0. We have
1 1 n 1 1 1
p 0 < " () n 1 < " () 2 2 > () n > 1 + 2 log2
2n 1 2 2 " "
Therefore, by taking n" to be any integer greater than 1 + 2 log2 " 1, we have
1
n n" =) 0 < p <"
2n 1
Therefore, 0 is the limit of the sequence. When " < 1 the smallest n" is 1 + 1 + 2 log2 " 1 ;
for example, when " = 10 100 it is 1 + 1 + 2 log2 10100 = 1 + [1 + 200 log2 10]. N
We saw two examples of sequences that converge to 0. Such sequences are called in nites-
imal (or null ). Thanks to the next result, the computation of their limits is of particular
importance.
Proof \If". Suppose that limn!1 d (xn ; L) = 0. Let " > 0. There exists n" 1 such that
d (xn ; L) < " for every n n" . Therefore, xn 2 B" (L) for every n n" , as desired.
\Only if". Let limn!1 xn = L. Consider the sequence of distances, whose term is
yn = d(xn ; L). We have to prove that limn!1 yn = 0, i.e., that for every " > 0 there exists
12
Recall that [ ] denotes the integer part (Section 1.4.3).
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 211
n" 1 such that n n" implies jyn j < ". Since yn 0, this is actually equivalent to showing
that
n n" =) yn < " (8.32)
Since xn ! L, given " > 0 there exists n" 1 such that yn = d(xn ; L) < " for every n n" .
Therefore, (8.32) holds.
We can thus reduce the study of the convergence of any sequence to the convergence to
0 of the sequence of distances fd (xn ; L)g. In other words, to check if xn ! L, it is su cient
to check if d (xn ; L) ! 0, that is, if the sequence of distances is in nitesimal.
Since d (xn ; 0) = jxn j, a simple noteworthy consequence of the last proposition is that
xn ! 0 () jxn j ! 0 (8.33)
A sequence is, thus, in nitesimal if and only if it is \absolutely" in nitesimal, in that the
distances of its terms from the origin become smaller and smaller.
The notions of limits from above and from below can be also stated in terms of right and
left neighborhoods of L, as readers can check.
8.8.3 Divergence
We now consider divergence. We begin with positive divergence. The spirit of the de nition
is similar, mutatis mutandis, to that of convergence (as soon will be clear).
n nK =) xn > K
In other words, a sequence diverges positively when it eventually becomes greater than
every scalar K. Since the constant K can be taken arbitrarily large, this can happen only
if the sequence is not bounded above (it is easy to be > K when K is small, increasingly
di cult the larger K is).
Example 311 The sequence of even numbers xn = 2n diverges positively. Indeed, let
K 2 R. We have:
K
2n > K () n >
2
and so we can choose as nK any integer greater than K=2. For example, if K = 10100 , we
can put nK = 10100 =2 + 1. Therefore, xn = 2n diverges positively. N
O.R. For divergence there is a demanding \above the bar" test to pass: a relentless examiner
now sets an arbitrary bar K, for a sequence to diverge there has to be a position nK far enough
so that all terms of the sequence that come after such position are above the arbitrarily
posited bar. A divergent sequence is able to pass any such test, however tough the examiner
can be { i.e., however high K is. H
n nK =) xn < K
8.8. LIMITS AND ASYMPTOTIC BEHAVIOR 213
In such a case, the terms of the sequence are eventually smaller than every scalar K:
although the constant can take arbitrarily large negative values (in absolute value), there
exists a position beyond which all the terms of the sequence are smaller than or equal to the
constant. This characterizes the convergence to 1 of the sequence.
Proposition 313 A sequence fxn g, with eventually xn > 0, diverges positively if and only
if the sequence f1=xn g converges to zero.
Proof \If". Let 1=xn ! 0. Let K > 0. Setting " = 1=K > 0, by De nition 302 there exists
n1=K 1 such that 0 < 1=xn < 1=K for every n n1=K . Therefore, xn > K for every
n n1=K , and by De nition 310 we have xn ! +1.
\Only if". Let xn ! +1 and let " > 0. Setting K = 1=" > 0, by De nition 310 there
exists n1=" such that xn > 1=" for every n n1=" . Therefore, 0 < 1=xn < " for every n n1="
and so 1=xn ! 0.
Adding, subtracting or changing in any other way a nite number of terms of a sequence
does not alter its asymptotic behavior: if it is regular, i.e., convergent or (properly) divergent,
it remains so, and with the same limit; if it is irregular (oscillating), it remains so. Clearly,
this depends on the fact that the notion of limit requires that a property { either \hitting"
an arbitrarily small neighborhood in case of convergence or being greater (smaller) than
an arbitrarily large (small) number in case of (negative) positive divergence { holds only
eventually.
O.R. The smaller " > 0 is, the smaller a neighborhood B" (x) of a point. In contrast, the
greater K > 0 is, the smaller a neighborhood (K; +1] of +1 is. For this reason, for a
neighborhood of +1 the value of K becomes signi cant when positive and arbitrarily large
(while for a neighborhood of 1 the value of K becomes signi cant when negative and
arbitrarily large, in absolute value). H
15
The hypothesis \eventually xn > 0" is redundant in the \only if" since a sequence that diverges positively
always satis es this condition.
214 CHAPTER 8. SEQUENCES (SDOGONATO)
The neighborhoods (K; +1] and [ 1; K) are open intervals in R for every K 2 R.16
That said, we can state a lemma that will be useful in de ning limits of sequences.
Proof We only prove (i) since the proof of (ii) is similar. \If". Let A be unbounded above,
i.e., A has no upper bounds. Let (K; +1] be a neighborhood of +1. Since A has no upper
bounds, K is not an upper bound of A. Therefore, there exists x 2 A such that x > K,
i.e., x 2 (K; +1] \ A and x 6= +1. It follows that +1 is a limit point of A. Indeed, each
neighborhood of +1 contains points of A di erent from +1.
\Only if". Let +1 be a limit point of A. We show that A does not have any upper
bound. Suppose, by contradiction, that K 2 R is an upper bound of A. Since +1 is a limit
point of A, the neighborhood (K; +1] of +1 contains a point x 2 A such that x 6= +1.
Therefore K < x, contradicting the fact that K is an upper bound of A.
Example 316 The sets A such that (a; +1) A for some a 2 R are an important class of
sets unbounded above. By Lemma 315, +1 is a limit point for such sets A. Similarly, 1
is a limit point for the sets A such that ( 1; a) A for some a 2 R. N
Using the topology of R we can give a general de nition of convergence that generalizes
De nition 303 of convergence so to include De nitions 310 and 312 of divergence as special
cases. In the next de nition, which uni es all previous de nitions of limit of a sequence, we
set: 8
>
> B (L) if L 2 R
< "
U (L) = (K; +1] if L = +1
>
>
:
[ 1; K) if L = 1
De nition 317 A sequence fxn g in R converges to a point L 2 R if, for every neighborhood
U (L) of L, there exists nU 1 such that
n nU =) xn 2 U (L)
O.R. If L 2 R, the position nU depends on an arbitrary radius " > 0 (in particular, as small
as we want), so we can write nU = n" : If, instead, L = +1, then nU depends on an arbitrary
16
Each point x 2 (K; +1] is interior because, by taking K 0 with K < K 0 < x, we have x 2 (K 0 ; +1]
(K; +1]. A similar argument shows that each point x 2 [ 1; K) is interior.
8.9. PROPERTIES OF LIMITS 215
scalar K (in particular, positive and arbitrarily large), so we can write nU = nK . Finally,
if L = 1, then nU depends on any negative real number K (in particular, negative and
arbitrarily large, in absolute value) and, without loosing generality, we can set nU = nK .
Thus, when L is nite it is crucial that the property holds also for arbitrarily small values
of ". When L = 1, it is instead key that the property holds also for K arbitrarily large
in absolute value. H
Theorem 318 (Uniqueness of the limit) A sequence fxn g converges to at most one limit
L 2 R.
Proof Suppose, by contradiction, that there exist two distinct limits L0 and L00 that belong
to the set R. Without loss of generality, we assume that L00 > L0 . We consider di erent
cases and show that in each of them we reach a contradiction. So, L0 = L00 and we conclude
that the limit is unique.
We begin with the case when both L0 and L00 are nite, i.e., L0 ; L00 2 R. Take " > 0 so
that
L00 L0
"<
2
Then
B" L0 \ B" L00 = ;
as the reader can verify and the next gure illustrates:
10 y
8 L''+ε
L''
L''- ε
6
L'+ ε
L'
4
L'- ε
O x
0
-2
-2 -1 0 1 2 3 4
By De nition 303, there exists n0" 1 such that xn 2 B" (L0 ) for every n n0" , and there
exists n" 1 such that xn 2 B" (L ) for every n n" . Setting n" = max fn" ; n00" g, we have
00 00 00 0
therefore both xn 2 B" (L0 ) and xn 2 B" (L00 ) for every n n" , i.e., xn 2 B" (L0 ) \ B" (L00 )
216 CHAPTER 8. SEQUENCES (SDOGONATO)
for every n n" . But this contradicts B" (L0 ) \ B" (L00 ) = ;. We conclude that L0 = L00 , so
the limit is unique.
Turn now to the case in which L0 is nite and L00 = +1. For every " > 0 and every
K > 0, there exist n" and nK such that
It is now su cient to take K = L0 + " to realize that, for n max fn" ; nK g, the two
inequalities cannot coexist. Also, in this case we reached a contradiction.
The remaining cases can be treated in a similar way and are thus left to the reader.
The next classic result shows that the terms of a convergent sequence eventually have the
same sign of the limit point. In other words, the sign of the limit point eventually determines
the sign of the terms of the sequence.
Theorem 319 (Permanence of sign) Let fxn g be a sequence that converges to a limit
L 6= 0. Then, eventually xn has the same sign as L, that is, xn L > 0.
Proof Suppose L > 0 (a similar argument holds if L < 0). Let " 2 (0; L). By De nition
302, there exists n 1 such that jxn Lj < ", i.e., L " < xn < L + " for every n n.
Since " 2 (0; L), we have L " > 0. Therefore,
This last theorem has established a property of the limits with respect to the order
structure of the real line. Next we give another simple result of the same kind, leaving the
proof to the reader. A piece of notation: xn ! L 2 R indicates that the sequence fxn g
either converges to L 2 R or diverges (positively or negatively).
Proposition 320 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If eventually xn yn , then L H.
The scope of this proposition is noteworthy. It allows, for example, to check the positive
or negative divergence of a sequence through a simple comparison with other divergent
sequences. Indeed, if xn yn and xn diverges negatively, so does yn ; if xn yn and yn
diverges positively, so does xn .
The converse of the proposition does not hold: for example, let L = H = 0, fxn g =
f 1=ng and fyn g = f1=ng. We have L H, but xn < yn for every n. However, if we
assume L > H, the converse then holds \strictly":
8.9. PROPERTIES OF LIMITS 217
Proposition 321 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If L > H, then eventually xn > yn .
Proof We prove the statement for L; H 2 R, leaving the other cases to the reader. Let
0 < " < (L H) =2. Since H +" < L ", we have (H "; H +")\(L "; L+") = ;. Moreover,
there exist n0" ; n00" 1 such that yn 2 (H "; H + ") for every n n0" and xn 2 (L "; L + ")
for every n n00" . For every n maxfn0" ; n00" g, we then have yn 2 (H "; H + ") and
xn 2 (L "; L + "), so xn > L " > H + " > yn . We conclude that eventually xn > yn .
Proof Suppose xn ! L. Setting " = 1, there exists n1 > 1 such that xn 2 B1 (L) for every
n n1 . Let M > 0 be a constant such that
We have d (xn ; L) < M for every n 1, i.e., jxn Lj < M for every n 1. This implies
that, for all n 1,
L M < xn < L + M
Therefore, the sequence is bounded.
Thanks to this proposition, the convergent sequences form a subset of the bounded ones.
Therefore, if a sequence is unbounded, it cannot be convergent.
In general, the converse of Proposition 322 is false. For example the alternating sequence
xn = ( 1)n is bounded but does not converge. A partial converse will be soon established
by the Bolzano-Weierstrass' Theorem. A full- edged converse, however, holds for the im-
portant class of monotone sequences: for such sequences, boundedness is both a necessary
and su cient condition for convergence. This result is actually a corollary of the following
general theorem on the asymptotic behavior of monotone sequences.
Proof Let fxn g be an increasing sequence (the proof for decreasing sequences is similar). It
can be either bounded or unbounded above (for sure, it is bounded below because x1 xn
for every n 1). Suppose that fxn g is bounded. We want to prove that it is convergent. Let
E be the image of the sequence. By hypothesis, it is a bounded subset of R. By the Least
Upper Bound Principle, sup E exists. Set L = sup E. Let us prove that xn ! L. Let " > 0.
218 CHAPTER 8. SEQUENCES (SDOGONATO)
Since L is the supremum of E, by Proposition 127 we have: (i) L xn for every n 1, (ii)
there exists an element xn" of E such that xn" > L ". Since fxn g is an increasing sequence,
it then follows that
L xn xn" > L " 8n n"
Thus, monotone sequences cannot be irregular. We are now able to state and prove the
result anticipated above on the equivalence of boundedness and convergence for monotone
sequences.
Needless to say, the results just discussed hold, more generally, for sequences that are
eventually monotone.
the sequence
fxnk g1
k=1 = fxn1 ; xn2 ; xn3 ; :::; xnk ; :::g
is called a subsequence of fxn g. In words, the subsequence fxnk g is a new sequence con-
structed from the original sequence fxn g by taking only the terms of position nk . A few
examples should clarify.
1 1 1 1
1; ; ; ; : : : ; ; : : : (8.34)
2 3 4 n
1 1 1 1
1; ; ; ; : : : ; ;:::
3 5 7 2k + 1
8.9. PROPERTIES OF LIMITS 219
where fnk gk 1 is the sequence of the odd numbers f1; 3; 5; : : :g. Thus, this subsequence has
been constructed by selecting the elements of odd position in the original sequence. Another
subsequence of (8.34) is given by
1 1 1 1 1
; ; ; ;:::; n;:::
2 4 8 16 2
where now fnk gk 1 is formed by the powers of 2, that is, 2; 22 ; 23 ; : : : . This subsequence
is constructed by selecting the elements of the original sequence whose position is a power
of 2. N
f 1; 1; 1; : : : ; 1; : : :g (8.36)
By taking fnk gk 1 = f1000kg, i.e., by selecting only the elements of positions 1; 000, 2; 000,
3; 000, ... we still get the subsequence (8.35). On the other hand, (8.35) is not a subsequence
of (8.34) because the term 1 appears only at the initial position of (8.34). N
Proposition 327 A sequence is regular, with limit L 2 R, if and only if all its subsequences
are regular and with the same limit L.
Proof We prove the result for L 2 R, leaving the case L = 1 to the reader. \Only if".
Suppose that fxn g converges to L. Let " > 0. There exists n" 1 such that jxn Lj < "
for every n n" . Let fxnk g1 k=1 be a subsequence of fxn g. Since nk k for every k 1, a
fortiori we have jxnk Lj < " for every k n" , so that fxnk g converges to L.
\If". Suppose that each subsequence of fxn g converges to L. Suppose, by contradiction,
that fxn g does not converge to L. Then, there exists "0 > 0 such that, for every integer
n 1 there exists a position mn n for which xmn 2= B"0 (L), i.e., jxmn Lj "0 . This
implies that the set M = fm 2 N : jxm Lj "0 g contains in nitely many elements. Let
n1 = min fm 2 M : m 1g and, for each k 1, de ne recursively
where the set on the right hand side is nonempty because M is in nite. By construction,
nk+1 > nk and jxnk Lj "0 , yielding that fxnk g is a subsequence of fxn g that does not
converge to L. This contradiction allows us to conclude that fxn g converges to L.
well happen that, by suitably selecting the elements, we can extract a convergent \trend"
out of an irregular one. There might be order even in chaos (and method in madness).
Bolzano-Weierstrass' Theorem shows that this is always possible, as long as the sequence is
bounded.
In other words, from any bounded sequence fxn g, even if highly irregular, it is always
possible to extract a convergent subsequence fxnk g, i.e., such that there exists L 2 R for
which limk!1 xnk = L. So, we can always extract convergent behavior from any bounded
sequence, a truly remarkable property.
Example 329 The alternating sequence xn = ( 1)n is bounded because its image is the
bounded set f 1; 1g. By Bolzano-Weierstrass' Theorem, it has at least one convergent
subsequence. Indeed, such are the constant subsequences (8.35) and (8.36). N
for all k 1 where the set on the right hand side is well de ned because M is in nite.
By construction, nk+1 > nk and fnk g M . By the de nition of M , this implies that
xnk+1 xnk , proving that fxnk g is increasing.
Proposition 331 Each unbounded sequence has a divergent subsequence (to +1 if un-
bounded above, to 1 if unbounded below).18
Proof Suppose that the sequence is unbounded above (the other case is similar). Then,
for every K > 0 there exists at least one element of the sequence greater than K. We
denote by xnK the smallest term in the sequence fxn g that turns out to be > K. By taking
K = 1; 2; : : :, the resulting sequence fxnK g is clearly a subsequence of fxn g (indeed, all its
terms have been taken among those of fxn g) that diverges to +1.
Summing up:
Remarkably, from any sequence, however wild it can be, we can always extract a regular
asymptotic behavior.
O.R. The Bolzano-Weierstrass' Theorem says that it is not possible to take in nitely many
scalars (the elements of the sequence) in a bounded interval in a way that make them (or a
part of them) \well separated" one from the other: necessarily they crowd in the proximity of
(at least) one point. More generally, the last proposition says that there is no way of taking
in nitely many scalars without at least a part of them crowding somewhere (in proximity of
either a nite number or of +1 or of 1; i.e., of some point of R). H
+1 1 or 1+1
18
If it is both unbounded above and below, it has both a subsequence diverging to +1 and a subsequence
diverging to 1.
19
Recall that xn ! L 2 R indicates that the sequence fxn g either converges to L 2 R or diverges positively
or negatively.
222 CHAPTER 8. SEQUENCES (SDOGONATO)
(ii) xn yn ! LH, provided that LH is not an indeterminate form (1.25), of the type
1 0 or 0 ( 1)
By adding the inequalities member by member, for every n n3 = max fn1 ; n2 g we have
xn + yn > K + L "
Since K + L " > 0 is arbitrary, it follows that xn + yn ! +1. The other cases with in nite
limit are treated similarly.
(ii) Let xn ! L and yn ! H, with L; H 2 R. This means that, for every " > 0, there
exist n1 and n2 such that
Moreover, being convergent, fyn g is bounded (recall Proposition 322): there exists b > 0
such that jyn j b for every n. Now, for every n n3 = max fn1 ; n2 g,
jxn yn LHj = jyn (xn L) + L (yn H)j jyn j jxn Lj + jLj jyn Hj < " (b + jLj)
we also have, for every K > 0, yn > K for every n n2 . It follows that, for every
n n3 = max fn1 ; n2 g,
xn yn > (L ") K
20
Note that a=0 is equivalent to H = 0.
8.10. ALGEBRA OF LIMITS 223
By the arbitrariness of (L ") K > 0, we conclude that xn yn ! +1. If L < 0 and H = +1,
we have xn yn < (L + ") K, with " 2 (0; L). By the arbitrariness of (L + ") K < 0, we
conclude that xn yn ! 1. The other cases of in nite limits are treated in an analogous
way.
The following result shows that the case a=0 of point (iii) with a 6= 0 is actually not
indeterminate for the algebra of limits, although it is so for the extended real line (as seen
in Section 1.7).
This proposition does not, unfortunately, say anything for the case a = 0, that is, for the
indeterminate form 0=0.
Proof Let us prove the \only if" part (we leave to the reader the rest of the proof). Let
L > 0 (the case L < 0 is similar). Suppose that the sequence fynng does
o not have eventually
constant sign. Hence, there exist two subsequences fynk g and yn0k such that ynk ! 0+
and yn0k ! 0 . Therefore, xnk =ynk ! +1 while xnk =yn0k ! 1. Since two subsequences of
xn =yn have distinct limits, Proposition 327 shows that the sequence xn =yn has no limit.
Summing up, in view of the last two propositions we have the following indeterminate
forms for the limits:
+1 1 or 1+1 (8.37)
1 0 or 0 ( 1) (8.38)
1 0
or (8.39)
1 0
which are often denoted by just writing 1=1 and 0=0. Section 8.10.3 will be devoted to
these indeterminate forms.
Besides the basic operations, the next result shows that limits nicely interchange also
with the power (and the root, which is a special case), the exponential, and the logarithm.
Indeed, (13.10) of Chapter 13 will show that such nicely interchange holds, more generally,
for all functions that { like the power, exponential, and logarithm functions { are continuous.
We thus omit the proof of the next result.
1
1 ; 00 ; (+1)0
we have:22
We have, therefore, also the following indeterminate forms for the limits:
1
1
lim n = +1
As anticipated, from these two elementary limits we can infer, via the algebra of limits,
many other ones. Speci cally:
(iii) we have: 8
< +1 if > 1
n
lim = 1 if = 1
: +
0 if 0 < < 1
+1 if > 1
lim log n =
1 if 0 < < 1
Many other limits hold; for example,
7
lim 5n + n2 + 1 = +1 + 1 + 1 = +1
as well as
3 1
lim n2 3n + 1 = lim n2 1 + 2 = +1 (1 0 + 0) = +1
n n
5 7
n2 5n 7 n2 1 n n2 1 0 0 1
lim = lim 4 6 = =
2n2 + 4n + 6 n2 2 + n + n2
2+0+0 2
1
5 n
lim = [0 (5 0)] = 0
2n2
226 CHAPTER 8. SEQUENCES (SDOGONATO)
and
n (n + 1) (n + 2) n n 1 + n1 n 1 + n2
lim = lim 1 2 4
(2n 1) (3n 2) (5n 4) 2n 1 2n 3n 1 3n 5n 1 5n
1 2
1+ n 1+ n
= lim 1 2 4
30 1 2n 1 3n 1 5n
1 1 1
= =
30 1 1 1 30
Indeterminate form 1 1
Consider the indeterminate form 1 1. For example, the limit of the sum xn + yn of the
sequences xn = n and yn = n2 falls under this form of indetermination, so one cannot
resort to the algebra of limits. We have, however,
xn + yn = n n2 = n (1 n)
where n ! 1 and 1 n ! 1, so that, being in the case +1 ( 1), it follows that
xn + yn ! 1. Through a very simple algebraic manipulation, we have been able to nd
our way out of the indeterminacy.
Now take xn = n2 and yn = n. Also in this case, the limit of the sum xn + yn falls
under the indeterminacy 1 1. By proceeding as we just did, this time we get
lim (xn + yn ) = lim n (n 1) = lim n lim (n 1) = +1
1
Next, take xn = n and yn = n n, still of type 1 1. Here again, a simple calculation
allows us to nd a way out:
1 1
lim (xn + yn ) = lim n + n = lim =0
n n
Finally, take xn = n2 + ( 1)n n and yn = n2 , which is again of type 1 1 since xn ! +1
because xn n2 n = n (n 1). Now, the limit
lim (xn + yn ) = lim ( 1)n n
8.10. ALGEBRA OF LIMITS 227
Indeterminate form 0 1
Let, for example, xn = 1=n and yn = n3 . The limit of their product has the indeterminate
form 0 1, so we cannot use the algebra of limits. We have, however,
1
lim xn yn = lim n3 = lim n2 = +1
n
If xn = 1=n3 and yn = n, then
1 1
lim xn yn = lim 3
n = lim 2 = 0
n n
If xn = n3 and yn = 7=n3 , then
7
lim xn yn = lim n3 =7
n3
If xn = 1=n and yn = n(cos n + 2),24 then
lim xn yn = lim(cos n + 2)
On the other hand, by exchanging xn with yn , the indeterminate form 1=1 remains but
yn n2
lim = lim = lim n = +1
xn n
23
In contrast, if the case were, say, of type 1 + a with a 6= 1, then { even without knowing the speci c
form of the two sequences { the algebra of limits (speci cally, Proposition 333-(i)) would allow us to conclude
that the limit of their sum is 1.
24
Using the comparison criterion, that we will study soon (Theorem 338), it is possible to prove easily that
yn ! +1 since yn n.
228 CHAPTER 8. SEQUENCES (SDOGONATO)
xn n2 1 1
lim = lim = lim 1 =
yn 1 + 2n2 n2
+2 2
xn
lim = lim (sin n + 7)
yn
The indeterminate form 1=1 and 0=0 are closely connected: if the limit of the ratio
of the sequences fxn g and fyn g falls under the indeterminate form 1=1, then the limit of
the ratio of the sequences f1=xn g and f1=yn g falls under the indeterminate form 0=0. The
vice versa requires convergence to 0 from either above or below (cf. Proposition 313). The
reader should think of xn = ( 1)n =n which converges to 0, while the reciprocal 1=xn wildly
oscillates and so does not converge.
sum +1 L 1
+1 +1 +1 ??
H +1 L+H 1
1 ?? 1 1
Turn to the product: the cells now report the value of lim xn yn .
product +1 L>0 0 L<0 1
+1 +1 +1 ?? 1 1
H>0 +1 LH 0 LH 1
0 ?? 0 0 0 ??
H<0 1 LH 0 LH +1
1 1 1 ?? +1 +1
Here there are four indeterminate cases out of twenty- ve.
Finally, for the ratio we have the following table, where the cells report the value of
lim (xn =yn ).
ratio +1 L > 0 0 L<0 1
+1 ?? 0 0 0 ??
L L
H>0 +1 H 0 H 1
0 1 1 ?? 1 1
L L
H<0 1 H 0 H +1
1 ?? 0 0 0 ??
In view Proposition 335, in the third row we assumed that yn tends to 0 from above, yn ! 0+ ,
or from below, yn ! 0 . In turn, this determines the sign of the in nity; for example,
1 1
lim 1 = lim n = +1 and lim 1 = lim ( n) = 1
n n
For the ratio, we thus have ve indeterminate cases out of twenty- ve.
The tables make it clear that in the majority of the cases we can rely upon the algebra
of limits (in particular, Propositions 333 and 337). Only relatively few case are actually
indeterminate.
O.R. The case 0 1 is not indeterminate. Clearly, it is shorthand notation for lim xynn ,
where the base is a positive sequence approaching 0 (more precisely, 0+ ) and the exponent
is a divergent sequence. We can set 0+1 = 0: if we multiply 0 by itself \in nitely many
times" we still get a zero (a \zerissimo", if you wish). The form 0 1 is the reciprocal, so
0 1 = +1. H
(i) If xn ; yn ! 1, their ratio xn =yn appears in the form 1=1, but it is su cient to write
the ratio as
1
xn
yn
to get the form 0 1.
(ii) If xn ; yn ! 0, their ratio xn =yn appears in the form 0=0, but it is su cient to write
the ratio as
1
xn
yn
to get the form 0 1.
(iv) For the last three cases it is su cient to consider the logarithm to end up, again, in
the case 0 1. Indeed:
The reader can try to reduce all the forms of indeterminacy to either 0=0 or 1=1.
Theorem 338 (Comparison criterion) Let fxn g, fyn g, and fzn g be three sequences. If,
eventually,
yn xn zn (8.40)
and
lim yn = lim zn = L 2 R (8.41)
then
lim xn = L
26
In this book the term \criterion" (or \test") will be always understood as \su cient condition".
8.11. CONVERGENCE CRITERIA 231
We can think of fxn g as a convict who is escorted by the two policemen fyn g and fzn g
(one on each \side"), so he is forced to go wherever they go.
Proof Suppose L 2 R (we leave to the reader the case L = 1). Let " > 0. From (8.41) it
follows, by De nition 303, that there exists n1 such that yn 2 B" (L) for every n n1 , and
there exists n2 such that zn 2 B" (L) for every n n2 . Finally, let n3 be the position starting
from which one has yn xn zn . Setting n = max fn1 ; n2 ; n3 g, we then have yn 2 B" (L),
zn 2 B" (L), and yn xn zn for every n n. So,
The typical use of this result is in proving the convergence of a given sequence by showing
that it can be \trapped" between two suitable convergent sequences.
Example 339 (i) Consider the sequence xn = n 2 sin2 n. Since 1 sin n 1 for every
n 1, we have 0 sin2 n 1 for every n 1. So,
sin2 n 1
0 8n 1
n2 n2
(ii) Consider the sequences yn = 0 and zn = 1=n2 . Conditions (8.40) and (8.41) hold with
L = 0. By the comparison criterion, we conclude that lim xn = 0. N
1 sin n 1
8n 1
n n n
and both sequences f1=ng and f 1=ng converge to 0. N
The previous example suggests that, if fxn g is a bounded sequence, say k xn k for
all n 1, and yn ! +1 or yn ! 1, then
xn
!0
yn
Indeed, we have
k xn k
jyn j yn jyn j
and k= jyn j ! 0.
Theorem 341 (Ratio criterion) If there exists a scalar q < 1 such that, eventually,
xn+1
q (8.42)
xn
then lim xn = 0.
Condition (8.42) requires that the sequence of the absolute values jxn j to be eventually
strictly decreasing, i.e., eventually jxn+1 j < jxn j =
6 0. By Corollary 324, we then have jxn j # L
for some L 0. The theorem claims that, indeed, L = 0.
Proof Suppose that the inequality holds starting from n = 1 (if it held from a certain n
onwards, just recall that eliminating a nite number of terms does not alter the limit). By
Proposition 306, it is enough to prove that jxn j ! 0. From (8.42), it follows jxn+1 j q jxn j.
In particular, by iterating this inequality from n = 1 we have:
So,
0 jxn j qn 1
jx1 j 8n 2
Since 0 < q < 1, we have q n 1 ! 0. So, by the comparison criterion we get jxn j ! 0.
Note that the theorem does not simply require the ratio jxn+1 =xn j to be < 1, that is,
xn+1
<1
xn
but that it be \far from it", i.e., smaller than a number q which, in turn, is itself smaller
than 1. The next example clari es this observation.
Example 342 The sequence xn = ( 1)n (1 + 1=n) does not converge { indeed, the subse-
quence of its terms of even positions tends to +1, whereas that of its terms of odd positions
tends to 1. Yet:
1
xn+1 1 + n+1 n2 + 2n
= = <1
xn 1 + n1 n2 + 2n + 1
for every n 1. N
xn+1 L
q =) xn ! L
xn L
The ratio criterion (and also the root criterion that we will see soon) thus applies, mutatis
mutandis, to the study of any convergence xn ! L.
8.11. CONVERGENCE CRITERIA 233
An important case when condition (8.42) holds is when the ratio jxn+1 =xn j has a limit,
and such limit is < 1, that is,
xn+1
lim <1 (8.43)
xn
Indeed, denote by L this limit and let " > 0 be such that L + " < 1. By the de nition of
limit, eventually we have
xn+1
L <"
xn
that is, L " < jxn+1 =xn j < L + ". Therefore, by setting q = L + " it follows that eventually
jxn+1 =xn j < q, which is property (8.42).
The limit form (8.43) is actually the most common form in which the ratio criterion is
applied. The next common limits illustrate its use:
(i) For any > 1 and k 2 R, we have
nk
lim n
=0 (8.44)
Indeed, set
nk
xn = n
By taking the ratio of two consecutive terms (the absolute value is here irrelevant since
all terms are positive), we have
k k
xn+1 (n + 1)k n n+1 1 1 1 1
= n+1
= = 1+ ! <1
xn nk n n
Theorem 343 (Root criterion) If there exists a scalar q < 1 such that, eventually,
p
n
jxn j q (8.46)
then lim xn = 0.
Proof As in the previous proof, suppose that (8.46) holds starting with n = 1. From
p
n
jxn j q
we immediately get jxn j q n , i.e., q n xn q n . Since 0 < q < 1, then q n ! 0, so the
result follows from the comparison criterion.
For the root criterion we can make observations similar to those n pthatowe made for the
ratio criterion. In particular, property (8.46) holds if the sequence n jxn j has a limit, and
such limit is < 1, that is, p
lim n jxn j < 1 (8.47)
This limit form is the most common with which the criterion is applied.
The next simple example shows that both the ratio and the root criteria are su cient,
but not necessary, conditions for convergence. However useful, they might turn out to be
useless to establish the convergence of some sequences.
Finally, note that both sequences xn = 1=n and xn = ( 1)n n satisfy condition
xn+1
!1
xn
although the rst sequence converges to 0 and the second one does not converge at all.
Therefore, this condition does not allow us to draw any conclusion about the asymptotic
behavior of a sequence. The same is true for the condition
p
n
jxn j ! 1
Indeed, it is enough to look again at the sequences xn = 1=n and xn = ( 1)n n. All this
con rms the key importance of the \strict" clause < 1 in (8.43) and (8.47). The next classic
limit further illustrates this remark.
p
Proposition 346 For every k > 0, we have lim n k = 1.
p
n
This can
p be proved by setting y n = log k = (log k) =n ! 0. By Proposition 337, we
n y
then have k =e ! 1. Below, we o er a proof that proceeds through more elementary
n
methods.
to do this, consider the following simple intuition: if a sequence converges, then its elements
become closer and closer to the limit; but, if they become closer and closer to the limit, then
as a by-product they also become closer and closer one another. The next result formalizes
this intuition.
Theorem 347 (Cauchy) A sequence fxn g is convergent if and only if it satis es the Cauchy
condition, that is, for each " > 0 there exists an integer n" 1 such that
Sequences that satisfy the Cauchy condition are called Cauchy sequences. The Cauchy
condition is an intrinsic condition that only involves the terms of the sequence. According
to the theorem, a sequence converges if and only if it is Cauchy. Thus, to determine whether
a sequence converges it is enough to check whether it is Cauchy, something that does not
require to consider any extraneous object and just rely on the sequence itself.
But, as usual, there are no free meals: checking that a sequence is Cauchy informs us
about its convergence, but it does not say anything about the actual limit point. To nd it,
we need to go back to the usual procedure that requires that a candidate be posited.
Proof \Only if". If xn ! L then, by de nition, for each " > 0 there exists n" 1 such that
jxn Lj < " for every n n" . This implies that, for every n; m n" ,
(i) A and B are not empty. Indeed, we have xn" " 2 A and xn" + " 2 B.
(iii) We have sup A = inf B. Indeed, by the Least Upper Bound Principle and by the
previous two points, sup A and inf B are well-de ned and are such that sup A inf B.
Since, by point (i), xn" " 2 A and xn" + " 2 B, we have xn" " sup A inf B
xn" + "; in particular, jinf B sup Aj 2". Since " can be chosen to be arbitrarily
small, we then have jinf B sup Aj = 0, that is, inf B = sup A.
Call z the common value of sup A and inf B. We claim that xn ! z. Indeed, by xing
arbitrarily a number > 0, there exist a 2 A and b 2 B such that 0 b a < and,
therefore,
z <a<b<z+
8.12. THE CAUCHY CONDITION 237
because a z b, and so z < a and b < z + . But, by the de nition of A and B, the
sequence is eventually strictly larger than a and strictly smaller than b. So, eventually,
z < xn < z +
Example 348 (i) The harmonic sequence xn = 1=n is Cauchy. Indeed, let " > 0. We have
to show that there exists n" 1 such that for every n; m n" one has jxn xm j < ".
Without loss of generality, assume that n m. Note that for n m we have
1 1 1
0 < jxn xm j = <
m n m
Since " > 1=m amounts to m > 1=", by choosing n" = [1="] + 1 we have jxn xm j < " for
every n m n" , thus proving that xn = 1=n is a Cauchy sequence.
(ii) The sequence xn = log n is not Cauchy. Suppose, by contradiction, that for a xed
" > 0 there exists n" 1 such that for every n; m n" we have jxn xm j < ". First, note
that if n = m + k with k 2 N, we have
m+k
jxn xm j = log < " () k < m(e" 1)
m
Thus, by choosing k = [m(e" 1)] + 1 and m n" , we obtain jxn xm j = log m+k m ".
This contradicts jxn xm j < " since n; m n" . We conclude that xn = log n is not a Cauchy
sequence. N
The previous theorem states a fundamental property of convergent sequences, yet its
relevance is also due to the structural property of the real line that it isolates, the so-called
completeness of the real line. For example, let us assume { as it was the case for Pythagoras
{ that we only knew the rational numbers: so, the space on which we operate is Q. Consider
the sequence whose elements (all rationals) are the decimal approximations of :
Being a decimal approximation, this sequence satis es the Cauchy condition because the
inequality
jxn xm j < 10 minfm 1;n 1g
can be made arbitrarily small. The sequence, however, does not converge to any point of Q:
if we knew R, we could say that it converges to . Therefore, in Q the Cauchy condition is
necessary, but not su cient, for convergence. The reason is that Q has not \enough points"
to handle well convergence, unlike R. For instance, the previous sequence converges in R
because of the point , which is missing in Q. We thus say that R is complete (with respect
to convergence), while Q is incomplete. Indeed, R can be seen as a way to complete Q by
adding all the missing limit points, like , as readers will learn in more advanced courses.
We close with a remark on the proof of Cauchy's Theorem. The \if" is its more di cult
part and we proved it via the Least Upper Bound Principle. Next we report a di erent,
arguably more illuminating, proof.
238 CHAPTER 8. SEQUENCES (SDOGONATO)
Alternative proof Assume that the sequence fxn g is Cauchy, i.e., for each " > 0, there
exists n" such that jxn xm j < " for every n; m n" . We want to prove that fxn g converges.
We start by proving that it is bounded. Setting " = 1, there exists n1 1 such that
jxn xm j < 1 for all n; m n1 . Hence, for each n n1 we have:
jxn j = jxn xn1 + xn1 j jxn xn1 j + jxn1 j < 1 + jxn1 j
which implies that the sequence fxn g is bounded. This allows us to de ne two scalar se-
quences fzn g and fyn g by setting
zn = inf xk and yn = sup xk (8.49)
k n k n
Since fzn g and fyn g are bounded and monotone, both limits y = lim yn and z = lim zn exist.
Let " > 0. Since fzn g and fyn g are convergent, there exist ny ; nz 1 such that jy yn j < "=5
for all n ny and jzn zj < "=5 for all n nz . On the other hand, since fxn g is Cauchy
there exists nx 1 such that jxn xm j < "=5 for all n; m nx . Let n = max fnx ; ny ; nz g.
In view of (8.49), there exist nxy ; nxz n such that yn xnxy < "=5 and xnxz zn < "=5.
Since nxy ; nxz n nx , we have xnxy xnxz < "=5. Being n ny ; nz , we conclude that
jy zj = jy yn + y n zn + zn zj jy yn j + jyn zn j + jzn zj
= jy yn j + yn xnxy + xnxy xnxz + xnxz zn + jzn zj
jy yn j + yn xnxy + xnxy xnxz + jxnxz zn j + jzn zj
" " " " "
< + + + + ="
5 5 5 5 5
Since " > 0 was arbitrarily chosen, this yields that y = z. That is,
lim zn = z = y = lim yn
Being zn xn yn for all n 1, this implies that lim xn exists (and is equal to z).
The sequences fzn g and fyn g used in the proof are instances of the two fundamental
notions of inferior and superior limits that will be studied in Section 10.1. Speci cally, as it
will be seen, we have lim sup xn = lim yn and lim inf xn = lim zn .
Theorem 349 The sequence (8.50) is convergent. Its limit is denoted by e, i.e.,
n
1
e = lim 1 + (8.51)
n
Since the sequence involves powers, the root criterion is a rst possibility to consider to
prove the result. Unfortunately,
s
n 1 n 1
1+ =1+ !1
n n
and, therefore, this criterion cannot be applied. The proof is based, instead, on the following
classic inequality.
Proof The proof is done by induction. Inequality (8.52) holds for n = 2. Indeed, for each
a 6= 0 we have:
(1 + a)2 = 1 + 2a + a2 > 1 + 2a
Suppose now that (8.52) holds for some n 2 (induction hypothesis), i.e.,
(1 + a)n > 1 + an
where the rst inequality, due to the induction hypothesis, holds because a > 1. This
completes the induction step.
29
For n = 1, equality holds trivially.
240 CHAPTER 8. SEQUENCES (SDOGONATO)
bn b1
0 < bn an = < !0
n+1 n+1
By step 1, the sequence fbn g is decreasing and bounded below (being positive). So,
lim bn = inf bn . By step 2, the sequence fan g is increasing and, being an < bn for each
n (step 3), is bounded above. Hence, lim an = sup an . Since bn an ! 0 (step 3), from
bn inf bn sup an an it follows sup an = inf bn , so lim an = lim bn .
One obtains
a1 = 21 = 2 b1 = 22 = 4
3 2 3 3
a2 = 2 = 2:25 b2 = 2 = 3:375
11 10 11 11
a10 = 10 ' 2:59 b10 = 10 ' 2:85
Therefore, Napier's constant lies between 2:59 and 2:85. Indeed, it is equal to 2:71828:::
30
Note that 1 < 1= n2 1 6= 0 for n 2.
8.13. NAPIER'S CONSTANT 241
Later we will prove that it is an irrational number (Theorem 400). It can be proved that it
is actually a transcendental number.31
Napier's constant is, inter alia, the most convenient base of exponential and logarithmic
functions (Section 6.5.2). Later in the book we will see that it can be studied from di erent
angles: as many important mathematical entities, Napier's constant is a multi-faceted dia-
mond. Besides the \sequential" angle just seen in Theorem 349, a summation angle will be
studied in Section 9.3.4, a functional angle { with a compelling economic interpretation in
terms of compounding { will be presented in Section 17.5, and a di erential angle in Section
26.7.
From the fundamental limit (8.51), we can deduce many other important limits.
(1 + zn ) 1
lim =
zn
The result is obvious for = 1. Let 6= 1, and set an = (1 + zn ) 1. That is,
log (1 + an ) = log (1 + zn ), so that also an ! 0. We have, therefore,
log (1 + an ) log (1 + zn ) log (1 + zn ) zn
= =
an (1 + zn ) 1 zn (1 + zn ) 1
Since
log (1 + an ) log (1 + zn )
lim = lim =1
an zn
the result then follows.
The next de nition formalizes this intuition, important both conceptually and computa-
tionally.
De nition 351 Let fxn g and fyn g be two sequences, with the terms of the former eventually
di erent from zero.
(i) If
yn
!0
xn
we say that fyn g is negligible with respect to fxn g, and write
yn = o (xn )
(ii) If
yn
! k 6= 0 (8.54)
xn
we say that fyn g is of the same order (or comparable) with fxn g, and write
yn xn
yn xn
This classi cation is comparative. For example, if fyn g is negligible with respect to fxn g,
it does not mean that fyn g is negligible per se, but that it becomes so when compared to
fxn g. The sequence yn = n2 is negligible with respect to xn = n5 , but it is not negligible at
all per se (it tends to in nity!).
Observe that, thanks to Proposition 313, we have
yn xn
! 1 () ! 0 () xn = o (yn )
xn yn
Therefore, we can use the previous classi cation also when the ratio yn =xn diverges, no
separate analysis is needed.
Lemma 352 Let fxn g and fyn g be two sequences with terms eventually di erent from zero.
(ii) The relation of negligibility is transitive, i.e., zn = o (yn ) and yn = o (xn ) implies
zn = o (xn ).
We now consider the more interesting cases in which both sequences are either in-
nitesimal or divergent. We start with two in nitesimal sequences fxn g and fyn g, that
is, lim xn = lim yn = 0. In this case, the negligible sequence tends faster to zero. Consider,
for example, xn = 1=n and yn = 1=n2 . Intuitively, yn goes to zero faster than xn . Indeed,
1
n2 1
1 = !0
n
n
Suppose now that the sequences fxn g and fyn g are both divergent, positively or nega-
tively, that is, limn!1 xn = 1 and limn!1 yn = 1. In this case, negligible sequences
tend slower to in nity (independently on the sign), that is, they take on values greater and
greater, in absolute value, less rapidly. For example, let xn = n2 and yn = n. Intuitively, yn
goes to in nity more slowly than xn . Indeed,
yn n 1
= 2 = !0
xn n n
that is, yn = o (xn ). On the other hand, the same is true if xn = n2 and yn = n because
it is not the sign of the in nity that matters, but the rate of divergence.
N.B. Setting xn = n and yn = n + k, with k > 0, the sequences fxn g and fyn g are
asymptotic. Indeed, no matter how large k is, the divergence to +1 of the two sequences
will make negligible, from the asymptotic point of view, the role of k. Such a fundamental
viewpoint, central to the theory of sequences, should not make us forget that two asymptotic
sequences are, in general, very di erent (to x ideas, set for example k = 1010 , i.e., 10 billions,
and consider the asymptotic, yet very di erent, sequences xn = n and yn = n + 1010 ). O
Proposition 353 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:
The relation o(xn ) + o (xn ) = o (xn ) in (i), bizarre at rst sight, simply means that the
sum of two little-o of a sequence is still a little-o of such sequence, that is, it continues to be
negligible with respect to that sequence. Similar re-readings hold for the other properties in
the proposition. Note that (ii) has the remarkable special case
Proof If fyn g is little-o of fxn g it can be written as xn "n , where "n is an in nitesimal
sequence. Indeed, just set "n = yn =xn . The proof will be based on this useful remark.
(i) Let us call xn "n the rst of the two little-o on the left-hand side of the equality and
xn n the second one, with "n and n two in nitesimal sequences. Then
xn "n + xn n
lim = lim ("n + n) =0
xn
xn "n yn n
lim = lim ("n n) =0
xn yn
(iii) Let us call xn "n the little-o of xn , with "n in nitesimal sequence. Then
c xn "n
lim = c lim "n = 0
xn
that shows that c o(xn ) is o (xn ).
(iv) Let us call yn = xn "n , with "n an in nitesimal sequence. Then, the little-o of yn
can be written as yn n that is, xn "n n , with n an in nitesimal sequence. Moreover, we call
xn n the little-o of xn , with n an in nitesimal sequence. Then
xn "n n+ xn n
lim = lim ("n n + n) =0
xn
so that o(yn ) + o (xn ) = o (xn ).
(i) Adding up the two sequences we obtain yn + zn = 2 log n n, which is still o(n2 ) in
accordance with (i) proved above.
(ii) Multiplying the two sequences we obtain yn zn = 2n log n 2n2 , which is o(n2 n2 ) ,
i.e., o(n4 ), in accordance with (ii) proved above (in the special case o(xn )o(xn )). Note
that yn zn is not o(n2 ).
(iii) Take c = 3 and consider c yn = 3n. It is immediate that 3n is still o(n2 ), in accordance
with (iii) proved above.
p
(iv) Consider the sequence wn = n 1. It is immediate that wn = o(yn ) = o(n). Consider
now the sum wn + zn (with zn de ned above), which is the sum of a o(yn ) and a o(xn ),
p
with yn = o(xn ). We have wn + zn = n 1 + 2 log n 2n, which is o(xn ) = o(n2 ) in
accordance with (iv) proved above. Note that wn + zn is not o(yn ), even if wn is o(yn ).
N
N.B. (i) To say that a sequence is o (1) simply means that it tends to 0. Indeed, xn = o (1)
means that xn =1 = xn ! 0. Note that, by the de nition of little-o, we have
a simple property that becomes handy in some cases. (ii) The fourth property in the last
proposition is especially important because it highlights that, if yn is negligible with respect
to xn , in the sum o(yn ) + o (xn ) the little-o o(yn ) is subsumed in o (xn ). O
yn ! L () xn ! L (8.57)
In detail:
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 247
All this suggests that it is possible to replace xn by yn (or vice versa) in the calculation
of the limits. Intuitively, such possibility is attractive because it might allow to replace a
complicate sequence by a simpler one that is asymptotic to it.
To make this intuition precise we start by observing that the asymptotic equivalence
is preserved under the fundamental operations.
xn
k
xn + wn
(ii) yn zn xn wn ;
Note that for sums, di erently from the case of products and ratios, the result does not
hold in general, but only with a non-trivial ad hoc hypothesis. For this reason, points (ii)
and (iii) are the most interesting ones. In the sequel we will thus focus on the asymptotic
equivalence of products and ratios, leaving to the reader the study of sums.
Proof (i) We have
y n + zn yn zn yn xn zn wn
= + = +
xn + wn xn + wn xn + wn xn xn + wn wn xn + wn
yn xn zn xn yn zn xn zn
= + 1 = +
xn xn + wn wn xn + wn xn wn xn + wn wn
yn zn xn
!0
xn wn xn + wn
33
For example, the condition holds if fxn g and fwn g are both eventually positive (in this case, any k 1
works).
248 CHAPTER 8. SEQUENCES (SDOGONATO)
The next simple lemma is very useful: in the calculation of a limit, one should neglect
what is negligible.
xn + o (xn ) ! L () xn ! L
What is negligible with respect to the sequence fxn g { i.e., what is o (xn ) { is asymptotically
irrelevant and one can safely ignore it. Together with Lemma 355, this implies for products
and ratios, that
(xn + o (xn )) (yn + o (xn )) xn yn (8.58)
and
xn + o (xn ) xn
(8.59)
yn + o (xn ) yn
We illustrate these very useful asymptotic equivalences with some examples, which should
be read with particular attention.
n4 3n3 + 5n2 7
lim
2n5 + 12n4 6n3 + 4n + 1
By (8.59), we have
n4 3n3 + 5n2 7 n4 + o n4 n4 1
= = !0
2n5 + 12n4 6n3 + 4n + 1 2n5 + o (n5 ) 2n 5 2n
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 249
1 3
n2 7n + 3 2 + = n2 + o n2 (2 + o (1)) 2n2 ! +1
n n2
n (n + 1) (n + 2) (n + 3) n4 + o n4 n4
= 4 =1!1
(n 1) (n 2) (n 3) (n 4) n + o (n4 ) n4
(iv) Consider the limit
n 1
lim e 7+
n
By (8.58), we have
n 1 n n
e 7+ =e (7 + o (1)) 7e !0
n
N
By (8.55), we have
yn xn zn wn
() (8.60)
zn wn yn xn
provided that the ratios are (eventually) well-de ned and not zero. Therefore, once we have
established the asymptoticity of the ratios yn =zn and xn =wn , we \automatically" have also
the asymptoticity of their reciprocals zn =yn and wn =xn .
e5n n7 4n2 + 3n
lim
6n + n 8 n4 + 5n3
By (8.59),
n
e5n n7 4n2 + 3n e5n + o e5n e5n e5
= = ! +1
6n + n8 n4 + 5n3 6n + o (6n ) 6n 6
If, instead, we consider the reciprocal limit
6n + n 8 n4 + 5n3
lim
e5n n7 4n2 + 3n
34
For 0 6= k 2 R, we have k + o(1) k. Indeed,
k + o(1) 1
= 1 + o(1) ! 1
k k
250 CHAPTER 8. SEQUENCES (SDOGONATO)
then, by (8.60),
n
6n + n8 n4 + 5n3 6
!0
e5n n7 4n2 + 3n e5
N
In conclusion, a clever use of (8.58)-(8.59) often allows to simplify substantially the
calculation of limits. But, beyond calculations, they are illuminating relations conceptually.
When k < 0, the condition (8.61) characterizes the sequences that converge to zero at
exponential rate. In that case, we speak of exponential decay. When k > 0, there is instead
an explosive exponential behavior.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 251
8.14.5 Terminology
Due to its importance, for the comparison both of in nitesimal sequences and of divergent
sequences there is a speci c terminology. In particular,
(i) if two in nitesimal sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is in nitesimal of higher order with respect to fxn g;
(ii) if two divergent sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is of lower order of in nity with respect to fxn g.
In other words, a sequence is in nitesimal of higher order if it tends to zero faster, while
it is of lower order of in nity if it tends to in nity slower. Besides the terminology (which is
not universal), it is important to recall the idea of negligibility that lies at the basis of the
relation yn = o (xn ).
(ii) nk = o ( n ) for every > 1, as already proved with the ratio criterion. We have
n = o nk if, instead, 0 < < 1 and k < 0.
logk2 n 1
k1
= k1 k2
!0
log n log n
The next lemma reports two important comparisons of in nities that show that expo-
nentials are of lower order of in nity than factorials (we omit the proof).
Note that this implies, by Lemma 352, that n = o (nn ). Exponentials are, therefore, of
lower order of in nity also compared with sequences of the type nn .
The di erent orders of in nity and in nitesimal are sometimes organized through scales.
If we limit ourselves to the in nities (similar considerations hold for the in nitesimals), the
252 CHAPTER 8. SEQUENCES (SDOGONATO)
most classic scale of in nities is the logarithmic-exponential one. Taking xn = n as the basis,
we have the ascending scale
2 k n
n; n2 ; :::; nk ; :::; en ; e2n ; :::; ekn ; :::; en ; :::; en ; :::; ee ; :::
They provide \benchmarks" to caliber the asymptotic behavior of a sequence fxn g that tends
to in nity. For example, if xn log n, the sequence fxn g is asymptotically logarithmic; if
2
xn n , the sequence fxn g is asymptotically quadratic, and so on.
n
In applications one rarely considers orders of in nity higher than ee and lower than
log log n. Indeed, log log n has an almost imperceptible increase, it is almost constant:
n 3 4 5 6
n
ee 5:284 9 108 5:148 4 1023 2:851 1 1064 1:610 3 10175
The asymptotic behavior of divergent sequences that are relevant in applications usually
n
ranges between the slowness of log log n and the explosiveness of ee . But, from a theoretical
point of view, we can go well beyond them.35 The study of the scales of in nity, pioneered
by Paul Du Bois-Reymond in the 1870s, is of great elegance (see, Hardy, 1910).
Proof We will only show the rst equality. Set xn = n!=nn . We next show that
xn+1 1
lim =
xn e
By Theorem 349, note that
xn+1 (n + 1)! nn n + 1 nn 1 1
lim = lim n+1 n! = lim n = lim n =
xn (n + 1) n + 1 (n + 1) 1 + n1 e
as desired.
p
We conclude that n! = nn e n 2 neo(1) , and so
n!
p = eo(1) ! 1
nn e n 2 n
We thus obtain the following remarkable formula
p
n! nn e n 2 n
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(n) 0 1 2 2 3 3 4 4 4 4 5 5 6 6 6
It is, of course, not possible to fully describe the sequence as this would be equivalent to
describing the sequence of prime numbers, which we have argued to be hopeless (at least,
254 CHAPTER 8. SEQUENCES (SDOGONATO)
operationally). Nevertheless, we can still ask ourselves whether there is a sequence fxn g that
is described in closed form and is asymptotically equal to . In other words, the question is
whether we can nd a reasonably simple sequence that asymptotically approximates well
enough.
Around the year 1800, Gauss and Legendre noted independently that the sequence
fn= log ng well approximates , as we can check by inspection of the following table.
n (n)
n (n) log n n= log n
10 4 4; 3 0; 921
102 25 21; 7 1; 151
103 168 145 1; 161
104 1:229 1:086 1; 132
105 9:592 8:686 1; 104
1010 455:052:511 434:294:482 1; 048
1015 29:844:570:422:669 28:952:965:460:217 1; 031
1020 2:220:819:602:560:918:840 2:171:472:409:516:250:000 1; 023
This suggests that the ratio
(n)
n
log n
becomes closer and closer to 1 as n increases. Gauss and Legendre's conjectured that this
was so because is asymptotically equal to fn= log ng. Their conjecture remained open
for about a century, until it was, independently, proven to be true in 1896 by two great
mathematicians, Jacques Hadamard and Charles de la Vallee Poussin. The importance of
such a result is testi ed by its name, which is as simple as it is demanding.37
Theorem 363 (Prime Number Theorem) It holds that
n
(n)
log n
Although we are not able to describe the sequence , thanks to the Prime Number
Theorem we can say that its asymptotic behavior is similar to that of the simple sequence
fn= log ng, that is, their number (n) (m) in any given interval of natural numbers [m; n]
is approximately
n m
log n log m
with increasing accuracy. In particular, by Lemma 355-(iii) we have
(n) 1
!0
n log n
37
The proof of this theorem requires complex analysis methods which we do not cover in this book. The
use of complex analysis in the study of prime numbers is due to a Bernhard Riemann's deep insight. Only in
1949 two outstanding mathematicians, Paul Erd• os and Atle Selberg, were able to prove this results using real
analysis methods. We refer readers to Ivic (1985) for a comprehensive study of this topics, with all relevant
references.
8.14. ORDERS OF CONVERGENCE AND OF DIVERGENCE 255
Most natural numbers are thus not prime and, as n gets larger, prime numbers become less
frequent.
The Prime Number Theorem is a wonderful result, which undoubtedly has a statisti-
cal \ avor", is incredibly elegant. Even more so if we consider its following remarkable
consequence, which relies on the simple observation that (pn ) = n.
The sequence of prime numbers fpn g is thus asymptotically equivalent to fn log ng. The
n-th prime number's value is, approximately, n log n. For example, by inspecting the prime
number table one can see that for n = 100 one has that pn = 541 while its \estimate" is
n log n = 460 (rounding down). Similarly:
pn
n pn n log n n log n
One can see that the ratio between pn and its estimate n log n stays steadily around 1.
log n
(n) 1 <" 8n n" (8.64)
n
Since pn ! 1, there is an n" such that pn n" for every n n" . Hence, (8.64) implies that
log pn
(pn ) 1 <" 8n n"
pn
log pn
n 1 <" 8n n"
pn
256 CHAPTER 8. SEQUENCES (SDOGONATO)
that is,
log pn
n !1 (8.65)
pn
from which it follows that
log pn
log n ! log 1 = 0
pn
or, log n + log log pn log pn ! 0. Since log pn ! +1,
log n log log pn log n + log log pn log pn
+ 1= !0
log pn log pn log pn
Yet, log log pn = log pn ! 0 (can you explain why?), and so
log n
!1
log pn
Multiplying by (8.65), we get that
n log n log pn log n
=n !1
pn pn log pn
thus showing that (8.63) holds.
O.R. Counting objects is one of the most basic activities common across cultures, arguably
the most universal one: counting emerges as soon as similar, yet distinguished, entities come
up (cf. Section 7.4). If so, the identi cation of prime numbers { the atoms of numbers { can
be viewed as an important step in the evolution of a civilization. Indeed, their study emerged
in the Greek world, which also marked the emergence of reason (Section 1.8). The depth
with which a civilization studies prime numbers is, then, a possible universal benchmark to
assess its degree of evolution. Under this scale, the Prime Number Theorem is one of best
evidence of its evolution that mankind can o er when going where no one has gone before
(unless sure of their intentions, better not to meet civilizations that have found the closed
form of the sequence of primes). H
8.15.1 Big-O
De nition 365 Given two sequences fxn g and fyn g, if there exists c > 0 such that, even-
tually,
jyn j c jxn j
we say that fyn g is asymptotically dominated by fxn g. In symbols, yn = O (xn ).
8.15. CONVERGENCE RATE 257
Note that the value of the constant c is left unspeci ed: it only matters that such constant
exists.
The relation O is easily seen to be transitive, i.e., zn = O (yn ) and yn = O (xn ) implies
zn = O (xn ). It can be manipulated with an algebra similar to the little-o one.
Proposition 367 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:
Proof (i) Let yn = O(xn ) and yn0 = O(xn ). By de nition, there exist c; c0 > 0 such that
eventually jyn j c jxn j and jyn0 j c0 jxn j. Then
yn + yn0 jyn j + yn0 c jxn j + c0 jxn j = c + c0 jxn j
So, yn + yn0 = O(xn ), as desired.
(ii) Let zn = O(xn ) and zn0 = O(yn ). By de nition, there exist c; c0 > 0 such that
eventually jzn j c jxn j and jzn0 j c0 jyn j. Then
zn zn0 kk 0 jxn j jyn j kk 0 jxn yn j
So, zn zn0 = O(xn yn ), as desired.
(iii) is obvious. (iv) Let zn = O(xn ) and zn0 = O(yn ), with yn = O(xn ). By de nition,
there exist c; c0 > 0 such that eventually jzn j c jxn j and jzn0 j c0 jyn j. So, being yn = O(xn ),
there exists c00 > 0 such that eventually
zn + zn0 jzn j + zn0 = c jxn j + c0 jyn j c jxn j + c0 c00 jxn j = c + c0 c00 jxn j
as desired.
Conditions yn xn and yn = o (xn ) are thus su cient for yn = O (xn ) but not necessary.
A trivial example is yn = ( 1)n and xn = 1: the limit yn =xn does not exist, so neither
yn xn nor yn = o (xn ), but we still have yn = O (xn ).
Proof (i) If yn xn then jyn j = jxn j = jyn =xn j ! 1, so if we take any c > 1 we have
eventually jyn j c jxn j. (ii) If yn = o (xn ) then jyn j = jxn j = jyn =xn j ! 0, so, if we take any
c > 0 we have eventually jyn j c jxn j.
Proposition 369 We have both yn = O (xn ) and xn = O (yn ) if and only if there exists
c > 0 such that, eventually,
1
jxn j jyn j c jxn j (8.67)
c
Proof If yn = O (xn ) there exists c0 > 0 such that, eventually, jyn j c0 jxn j, while if xn =
O (yn ) there exists c00 > 0 such that, eventually, jxn j c00 jyn j. By setting c = max fc0 ; c00 g,
we get (8.67).
Jointly, the relations yn = O (xn ) and xn = O (yn ) thus generalize the relation of
comparability between sequences, which requires the convergence of the sequence of ratios
yn =xn . When such a convergence occurs, by the last result yn xn is equivalent to have both
yn = O (xn ) and xn = O (yn ). But, lacking such a convergence, the relations yn = O (xn ) and
xn = O (yn ) still amount to (8.67), which is a form of comparability between two sequences,
though rougher than yn xn .
De nition 370 Given two sequences fxn g and fyn g, if yn = O (xn ) and xn = O (yn ), we
say that fyn g is of the same order (or comparable) with fxn g. In symbols yn xn .
This de nition of comparability is fully consistent with the previous De nition 351, the
scope of which is enlarged because the convergence of the ratios yn =xn is no longer required.
This is why we kept the notation .
Example 371 (i) The sequences with terms xn = 1 and yn = ( 1)n are comparable: take
c = 1 in (8.67). (ii) The sequences with terms xn = n (2 + sin n) and yn = n are comparable:
take c = 3 in (8.67). N
en = x xn
By Proposition 306, the sequence fxn g convergence to x if and only if errors vanish, i.e.,
en ! 0. To wonder about the rate of the converge of xn to x thus amounts to wonder about
the rate of the converge to 0 of errors. This observation motivates the following de nition.
1
en = O
nk
5 7 7 20
xn = + ( 1)n + 2 and x0n = + 2
+ 10
n n n n
converge to with rates of order 1 and 2, respectively. Indeed, en = O (1=n) and e0n =
O 1=n2 . N
The higher k is, the faster the error vanishes, so the better the convergence is. For in-
stance, a O 1=n2 error is substantially better than a O (1=n) error. So, a solution procedure
that generates a sequence that converges to the solution with rate of order 2 is substantially
better than an alternative solution procedure that converges to the solution with rate of
order 1.
Section 37.7 will best illustrate the contents of this section. We close by noting that,
though powers 1=nk form a natural scale of in nitesimals (which are duals of the scales of
in nities studied before), one could also use other scales, such as logarithmic and exponential
ones.
260 CHAPTER 8. SEQUENCES (SDOGONATO)
8.16 Sequences in Rn
We close the chapter by considering sequences xk of vectors in Rn . For them we give a de -
nition of limit that follows closely the one already given for sequences in R. The fundamental
di erence is that each element of the sequence is now a vector xk = (xk1 ; xk2 ; :::; xkn ) 2 Rn and
not a scalar.
In other words, xk = (xk1 ; xk2 ; :::; xkn ) ! L = (L1 ; L2 ; :::; Ln ) if the scalar sequence of
distances xk L converges to zero (cf. Proposition 306). Since
r
Xn 2
k
x L = xki Li
i=1
xk L ! 0 () xki Li ! 0 8i = 1; 2; : : : ; n (8.68)
That is, xk ! L if and only if the scalar sequences xki of the i-th components converge
to the component Li of the vector L. The convergence of a sequence of vectors, therefore,
reduces to the convergence of the sequences of the single components. So, it is a componen-
twise notion of convergence that, as such, does not present any signi cant novelty relative
to the scalar case.
1 1 2k + 3
1 + ; 2;
k k 5k 7
in R3 . Since
1 1 2k + 3 2
1+ !1 , !0 and !
k k2 5k 7 5
the sequence converges to the vector (1; 0; 2=5). N
Series (sdoganato)
To make rigorous this new operation of \addition of in nitely many summands", which is
di erent from the ordinary addition (as we will realize), we will sum a nite number of terms,
say n, then make n tend to in nity and take the resulting limit, if it exists, as the value to
assign to the series. We are, therefore, thinking of constructing a new sequence fsn g de ned
by
s1 = x1 (9.2)
s2 = x1 + x2
s3 = x1 + x2 + x3
sn = x1 + + xn
and to take the limit of fsn g as the sum of the series. Formally:
261
262 CHAPTER 9. SERIES (SDOGANATO)
P1 nition 376 The series with terms given by a sequence fxn g of scalars, in symbols
De
n=1 xn , is the sequence fsn g de ned in (9.2). The terms sn of the sequence are called
partial sums of the series.
P
The series 1 n=1 xn is therefore de ned as the sequence fsP n g of the partial sums (9.2).
Its limit behavior determines its value;1 in particular, a series 1 n=1 xn is:
P
1
(i) convergent, with sum S, in symbols xn = S, if lim sn = S 2 R;
n=1
P1
(ii) positively divergent, in symbols n=1 xn = +1, if lim sn = +1;
P1
(iii) negatively divergent, in symbols n=1 xn = 1, if lim sn = 1;
We thus attribute to the series the same character { convergence, divergence, or irregu-
larity { as that of its sequence of partial sums.2
This formulation can be operationally useful to construct partial sums through a guess and
verify procedure: we rst posit a candidate expression for the partial sum, which we then
verify by induction. At the end of Example 379 we will illustrate this procedure. However,
as little birds suggesting guesses are often not around, the main interest of this recursive
formulation is, ultimately, theoretical in that it further clari es that a series is nothing but a
new sequence constructed from an existing one. Indeed, given a sequence fxn g, the recursion
(9.3) de nes the sequence of partial sums fsn g. It is this recursion that, thus, underlies the
notion of series.3
O.R. Sometimes it is useful to start the series with the index n = 0 rather than from n = 1.
When the option exists (we will see that this is not the case for some types of series, like the
harmonic series, which cannot be de ned for n = 0), the choice to start a series from either
n = 0 or n = 1 (or from another value of n) is a pure matter of convenience (as it was for
sequences). Actually, one can start the series from any k in N. The context itself typically
suggests the best choice. In any case, this choice does not alter the character of the series
and, therefore, it does not a ect the problem of determining whether the series converges or
not. H
1
We thus resorted to a limit, that is, to a notion of potential in nity. On the other hand, we cannot really
sum in nitely many summands: all the world paper would not su ce, nor would our entire life (and, by
the way, we would not know where to put the line that one traditionally writes under the summands before
adding them).
2
Using the terminology already employed for the sequences, a series is sometimes called regular when it
is not irregular, that is when one of the cases (i)-(iii) holds. The systematic study of series began with the
works of Abel, Bolzano, and Cauchy in the rst part of the nineteenth century.
3
As we will see later in the book, we can nicely express this recursion in the di erence form (10.4).
9.1. THE CONCEPT 263
O.R. The variables xn and sn in recursion (9.3) can be interpreted as \ ow" and \stock"
variables, respectively. If xn is the n-th amount of water that we decide to pour into our
bathtub and sn 1 is the amount of water that we already poured there in the rst n 1
pourings, then sn is the resulting amount of water in the bathtub after n pourings. If xn
is the number of kilometers that we decide to travel in the n-th day of our trip and sn 1 is
how many kilometers we travelled in the rst n 1 days of the trip, then sn is the resulting
kilometers travelled after n days. The values of the stock variables thus keep track of the
accumulation determined by the values of the ow variables.
Flows are typically controlled by some decision maker { an individual, a government, a
company, and the like. For this reason, the ow variable xn is often called a control variable.
The stock variable sn is, instead, called a state variable: it describes the \state" of ows'
accumulation. So, (9.3) can be viewed as a \controlled recursion", in which a sequence of
controls fxn g determines, via recursion (9.3), a sequence of states fsn g. H
Since
1 1 1
=
n (n + 1) n n+1
one has that
1 1 1
sn = + + +
1 2 2 3 n (n + 1)
1 1 1 1 1 1 1 1
=1 + + + + =1 !1
2 2 3 3 4 n n+1 n+1
Therefore,
1
X 1
=1
n (n + 1)
n=1
Example 378 (Harmonic series) The harmonic series adds up unit fractions:
1
X
1 1 1 1
1+ + + + + =
2 3 n n
n=1
Example 379 (Geometric series) The geometric series with ratio q is de ned by:
1
X
2 3 n
1+q+q +q + +q + = qn
n=0
sn = |1 + 1 +
{z + 1} = n + 1 ! +1
n+1 times
sn qsn = 1 + q + q 2 + q 3 + + qn q 1 + q + q2 + q3 + + qn
= 1 + q + q2 + q3 + + qn q + q2 + q3 + + q n+1 = 1 q n+1
we have
(1 q) sn = 1 q n+1
and therefore, since q 6= 1,
1 q n+1
sn =
1 q
4
In Appendix E.2, we present another proof of the divergence of the harmonic series, due to Pietro Mengoli.
9.1. THE CONCEPT 265
It follows that
1
X 1 q n+1
q n = lim
n!1 1 q
n=0
The study of this limit is divided into several cases:
(iii) if q = 1, the partial sums of odd order are equal to zero, while those of even order
are equal to 1. The sequence formed by them is hence irregular;
(iv) if q < 1, the sequence q n+1 is irregular and, therefore, so is fsn g as well.
To conclude the study of the geometric series, observe that we can use the recursive
de nition of partial sums (9.3) to guess and verify (by induction) what are the partial sums
of the geometric series. The, highly inspired, guess is that
1 q n+1
sn =
1 q
We verify the guess by induction. For n = 0 it is trivially true. Assume it is true for n
(induction hypothesis). Then
as desired.5 N
Epicurus in a letter to Herodotus wrote \Once one says that there are in nite parts in
a body or parts of any degree of smallness, it is not possible to conceive how this should
be, and indeed how could the body any longer be limited in size?" The previous examples
show that, indeed, if these \parts", these particles, have a strictly positive, but di erent size
{ for example either 1=n (n + 1) or q n , with q 2 (0; 1) { then the series might converge, so
the size of the \body" can be de ned. Nevertheless, Epicurus was right in the sense that, if
we assume { as it seems he does too { that all the particles have same size, no matter how
small, then the series
"+"+"+ +"+
P1
positively diverges. That is, n=1 " = +1 for every " > 0. Indeed, for the partial sums
we have sn = n" ! +1. This simple series thus helps to clarify the import of Epicurus'
argument (properties of series have been often used, even within philosophy, to try to clarify
the nature of the potential in nite).
5 P
1 P
1
Sometimes it is convenient to start a geometric series at n = 1. In this case qn = qn 1 =
n=1 n=0
q= (1 q).
266 CHAPTER 9. SERIES (SDOGANATO)
and
1
X 1
X 1
X
(xn + yn ) = xn + yn
n=1 n=1 n=1
when we do not fall in a indeterminate form 0 1 or 1 1, respectively.
The next result is simple, yet important. If a series converges, then its terms necessarily
tend to 0: summands must eventually vanish to avoid having an exploding sum (memento
Epicurus).
9.3. SERIES WITH POSITIVE TERMS 267
P1
Theorem 380 If the series n=1 xn converges, then xn ! 0.
Proof Clearly, we have xn = sn sn 1 and, given that the series converges, sn ! S as well
as sn 1 ! S. Therefore, xn = sn sn 1 ! S S = 0.
Convergence to zero of the sequence fxn g is, therefore, a necessary condition for conver-
gence P
of its series. This condition is only necessary: even though 1=n ! 0, the harmonic
series 1n=1 1=n diverges.
Proposition 382 Each series with positive terms is either convergent or positively diver-
gent. In particular, it is convergent if and only if it is bounded above.7
Series with positive terms thus inherit the remarkable regularity properties of monotone
sequences. This gives them an important status among series. In particular, for them we
now recast the convergence criteria presented in Section 8.11 for sequences.
P P1
Proposition 383 (Comparison criterion) Let 1 n=1 xn and n=1 yn be two series with
positive terms, with xn yn eventually.
P P1
(i) If 1 n=1 xn diverges positively, then so does n=1 yn .
P1 P1
(ii) If n=1 yn converges, then so does n=1 xn .
P 0
Proof Let n0 1 be such that xn yn for all n n0 , and set = nn=1 (yn xn ). By
calling sn (resp., n ) the partial sums of the sequence fxn g (resp., fyn g), for n > n0 we have
Xn
n sn = + (yk xk )
k=n0 +1
That is, n sn + . Therefore, the result follows from Proposition 320 (which is the
sequential counterpart of this statement).
6
Nothing changes if the terms are positive only eventually. Indeed, we can always discard a nite number
of terms without altering the asymptotic behavior of the series. Hence, all the results on the asymptotic
behavior of series with positive terms hold, more generally, for series with terms that are eventually positive.
7
By de nition, a series is bounded above when the sequence of the partial sums is so, i.e., there exists
k > 0 such that sn k for every n 1.
268 CHAPTER 9. SERIES (SDOGANATO)
Note that (i) is the contrapositive of (ii), and vice versa: indeed, thanks to Proposition
382, for a series with positive terms the negation of convergence is positive divergence.8
Because of their usefulness, we stated both; but, it is the same property seen in two equivalent
ways.
the convergence of the geometric series with ratio 2=5 guarantees, via the comparison crite-
rion, the convergence of the series. N
1 1
> (i.e., n < n)
n n
Therefore, by the comparison criterion,
1
X 1
= +1
n
n=1
1 1
(n + 1)2 n (n + 1)
which is the genericPterm of the convergent Mengoli series.10 By the P comparison criterion,
the convergence of 1 n=1 1=n 2 is a consequence of the convergence of 1 2
n=1 1= (n + 1) .
If > 2, then
1 1
< 2
n n
for every n > 1 and therefore we still have convergence.
Finally, it is possible to see, but it is more delicate, that the generalized harmonic series
converges also if 2 (1; 2).
Summing up, the generalized harmonic series
1
X 1
n
n=1
For the generalized harmonic series, the case = 1 is thus the \last" case of divergence:
it is su cient to very slightly increase the exponent, from 1 to 1+" with " > 0, and the series
will converge. This suggests that the divergence is extremely slow, as the reader can check
by calculating some of the partial sums.11 This intuition is made precise by the following
beautiful result.
P
10
Indeed, 1 2
n=1 1=n =
2
=6 but here we do not have the tools to prove this remarkable result.
11
A \cadaverous in nity", in the words of a professor.
270 CHAPTER 9. SERIES (SDOGANATO)
Proof The proof of this result may be skipped on a rst reading since it relies on integration
notions that will be presented in Chapter 44. De ne : [0; 1) ! R by
1
(x) = 8x 2 [i 1; i)
i
with i 1. That is, (x) = 1 if x 2 [0; 1), (x) = 1=2 if x 2 [1; 2), and so on. It is easy to
see that
1 1
(x) 8x > 0 (9.10)
x+1 x
Therefore, the restriction of on every closed interval is a step function. By Proposition
1856, we then have
n
X n Z
X i Z n
1
= (x) dx = (x) dx 8k = 1; :::; n
i i 1 k 1
i=k i=k
Example 388 The last example can be generalized by showing that the series14
1
X 1
n=2
n log n
converges for > 1 and any > 0, as well as for = 1 and > 1. It diverges for <1
and any 2 R, as well as for = 1 and any 1. N
The comparison criterion has a nice and useful asymptotic version, based on the asymp-
totic comparison of the terms of the sequences.
P P1
Proposition 389 (Asymptotic comparison criterion) Let 1 n=1 xn and n=1 yn be two
series with strictly positive terms.15 If xn yn , then the two series have the same character.
Therefore, the character of a series is invariant with respect to the asymptotic equivalence
relation.
Proof Since xn yn , for every " > 0 there exists n" 1 such that
xn
1 " 1+" 8n n"
yn
For every n > n" , we have
n
X n"
X n
X n"
X n
X n
X
xk
xk = xk + yk xk + (1 + ") yk c + (1 + ") yk (9.11)
yk
k=1 k=1 k=n" +1 k=1 k=n" +1 k=n" +1
and
n
X n"
X Xn Xn
xk
xk = xk + yk c + (1 ") yk (9.12)
yk
k=1 k=1 k=n" +1 k=n" +1
Pn" P1 P
where c = n=1 xk . The character of the series n=1 yn is the same as that of 1 k=n" +1 yk
because the value assumedPby a nite number of initial terms is irrelevant P1for the character
of a series. PTherefore, if 1 n=1 yn converges, by (9.11) it follows thatP n=1 xn converges,
whereas if 1 n=1 yn diverges to +1, from (9.12) it follows that also
1
n=1 xn diverges to
+1. Since the roles of the two series can be interchanged, the proof is complete.
14
The series starts with n = 2 because for n = 1 the term is not de ned.
15
The hypothesis that the terms are strictly positive, so non-zero, is necessary to make the ratio xn =yn well
de ned. This hypothesis will be used several times throughout the chapter.
272 CHAPTER 9. SERIES (SDOGANATO)
We can use the asymptotic comparison criterion to establish a celebrated result, proved
in 1737 by Leonhard Euler, that says that the sum of the reciprocals of the prime numbers
is in nite.
Euler's Theorem, along with the comparison criterion, con rms the divergence to +1 of
the harmonic series. Indeed
1 1
pn n
for every n 1.16 Euler's Theorem is, however, a truly remarkable result with respect to
the divergence of the harmonic series in that it involves only the reciprocals of the prime
numbers, whereas the harmonic series considers the reciprocals of all natural numbers (be
they prime or not).
Euler's Theorem con rms also that there are in nitely many prime numbers P
and that they
tend to +1 more slowly than the powers n , with > 1, for which we have 1 n=1 1=n <
+1.
We conclude our analysis of the comparison criterion with an important economic appli-
cation.
16
We have 1=p1 = 1=2 1, 1=p2 = 1=3 1=2, 1=p3 = 1=5 1=3, 1=p4 = 1=7 1=4, and so on.
9.3. SERIES WITH POSITIVE TERMS 273
0 ut (x) M 8x 2 R+ (9.14)
converges. In view of Example 379, we conclude that the series (9.13) converges if < 1.17
We can extend the analysis to streams and utility functions that are not necessarily
positive. This becomes relevant when streams represent, for example, cash ows that at
some point of time might well feature losses, so negative amounts of money. For utility
functions ut : A R ! R, where A is any set in the real line (possibly the real line itself),
the uniform boundedness condition (9.14) takes the form jut (x)j M for all x 2 A, i.e.,
M ut (x) M 8x 2 A (9.15)
P1 t 1
The comparison criterion continues to ensure the convergence of the geometric series t=1
if < 1. N
(i) If
xn+1
lim <1
xn
the series converges.
(ii) If
xn+1
lim >1
xn
the series diverges positively.
17
The asymptotic behavior as ! 1, that is, as patience becomes in nite, will be addressed by the
Hardy-Littlewood's Theorem in Section 11.3.2.
274 CHAPTER 9. SERIES (SDOGANATO)
The criterion is thus based on the study of the limit of the ratio
xn+1
xn
of the terms of the series. The condition that the limit lim xn+1 =xn exists is rather demand-
ing, as we will see in the next section. But, when it is satis ed, the elementary limit form of
the ratio criterion is the easiest to apply.
(ii) If, instead, the ratio is eventually 1, then the series diverges positively.
The theorem requires the ratios xn+1 =xn to be (uniformly) smaller than a number q < 1
(so, the terms form a strictly decreasing sequence), not just that they are all < 1. Indeed,
the ratios of the harmonic series
1
n+1 n
1 = n+1
n
are all < 1, but the series diverges (since the ratios tend to 1, there is no room to insert a
number q < 1 greater than all them).
Since the convergence of a series implies that the sequence of its terms is in nitesimal
(Theorem 380), the ratio criterion for series can be seen as an extension of the homonymous
criterion for sequences. The same is true for the root criterion that we will see in the next
chapter.
Proof (i) Without loss of generality, assume that xn > 0 and (9.16) holds for every n. From
xn+1 qxn we deduce, as in the analogous criterion for sequences, that 0 < xn q n 1 x1 ,
and the rst statement follows from the comparison criterion (Proposition 383) and from
the convergence of the geometric series. (ii) If we have eventually xn+1 =xn 1 and xn > 0,
then eventually xn+1 xn > 0. In other words, the sequence fxn g is eventually increasing
and therefore it cannot tend to 0, yielding that the series must diverge positively.
It is possible to prove (see Section 10.4) that, if the lim xn+1 =xn exists, then the ratio
criterion assumes exactly the tripartite form that follows from Proposition 393. That is:
(iii) lim xn+1 =xn = 1, then the criterion fails and gives no indication about the character
of the series.
276 CHAPTER 9. SERIES (SDOGANATO)
Operationally, this tripartite form is the standard form in which the ratio criterion is
applied. At a mechanical level, it might be su cient to recall this tripartition and the
illustrative examples given in the prelude. But, not to do plumbing rather than mathematics,
it is important to keep in mind the theoretical foundations provided by Proposition 395 (the
last simple example, in which the tripartite form is useless, shows that it can be even useful).
Let us see other tripartite examples.
P1 n =n
Example 397 (i) By the ratio criterion, the series n=1 q converges for every 2R
and every 0 < q < 1. Indeed,
n q n+1 n
= q!q<1
(n + 1) q n n+1
xn+1 n! x
n
= !0 8x > 0
(n + 1)! x n+1
xn+1 n n
n
= x!x
n+1 x n+1
which obviously is < 1 when 0 < x < 1. If x > 1, the ratio criterion implies that the
series diverges positively. Finally, if x = 1 we are back to the harmonic series, which
diverges positively. N
We stop here our study of convergence criteria. Much more can be said: in Section 10.4
we will continue to investigate this topic in some more depth.
Proof In Example 385 we showed that the series converges. Let us compute its sum. By
Newton's binomial formula (B.7) for each n 1, we have
n n
X n
X
1 n 1 1 n! 1
1+ = =
n k nk k! (n k)! nk
k=0 k=0
Therefore,
n! 1
1
(n k)! nk
which implies
n n
X n
X
1 1 n! 1 1
1+ =
n k! (n k)! nk k!
k=0 k=0
By passing to the limit, it follows that
1
X 1
e (9.18)
n!
n=0
The equality (9.20) holds for every number x and reduces to (9.17) in the special case
x = 1. Note the remarkable series expansion of the exponential function
X1
xn x2 x3 xn
ex = =1+x+ + + + + (9.21)
n! 2 3! n!
n=0
Proof We are going to generalize some of the arguments in the proof of Theorem 398. As
in that proof, we start by applying Newton's binomial formula:
n n
x n X n xk X xk n! 1
1+ = k
=
n k n k! (n k)! nk
k=0 k=0
As before, note that
n! k
= n (n 1) (n k + 1) n
| {z n} = n
(n k)! | {z }
k times k times
and
n! 1
1
(n k)! nk
Fix m 1. For every n > m, we have
m
X n
X
x n xk n! 1 xk n! 1
1+ =
n k! (n k)! nk k! (n k)! nk
k=0 k=m+1
Xn n
X
jxjk n! 1 jxjk
k! (n k)! nk k!
k=m+1 k=m+1
1
X 1
X 1
X 1
X
xk ; xk ; xk ; :::; xk ; :::
k=0 k=1 k=2 k=m
converges to 0 as m ! +1. Intuitively, if from a nite sum we rst remove the rst
summand, then the rst two summands, then the rst three summands, and so on and so
forth, then what is left is going to vanish. The reader may want to make this argument
rigorous. O
Later in the book we will see that (9.20) is a power series (Chapter 11). For this reason,
the equality (9.21) is called the power series expansion of the exponential function. It is a
result, as elegant as important, that allows to \decompose" the exponential function in a
sum of (in nitely many) simple functions such as the powers xn .
We will study in greater generality series expansions with the tools of di erential calculus,
of which series expansions are one of the most remarkable applications.
Proof We have:
n
X 1
X
1 1
0 < e =
k! k!
k=0 k=n+1
1 1 1 1
= + + + +
n! n + 1 (n + 1) (n + 2) (n + 1) (n + 2) (n + k)
! 1
1 1 1 1 1 X 1 1 1
< + + + + = =
n! n + 1 (n + 1)2 (n + 1)k n! (n + 1)k n! n
k=1
where the last equality holds because the geometric series that starts at k = 1 with ratio
1= (n + 1) has sum 1=n. By Theorem 398, we then have the following interesting bounds:
n
X 1 1 1
0<e <
k! n! n
k=0
280 CHAPTER 9. SERIES (SDOGANATO)
Suppose, by contradiction, that e is rational, i.e., e = p=q for some natural numbers p and
q. By multiplying both sides of the last inequality by n!, we then have
n
X
p 1 1
0 < n! n! < (9.22)
q k! n
k=0
jxn+1 j xn+1 n! x
= n
= !0 8x 2 R
jxn j (n + 1)! x n+1
it follows that it converges absolutely. So, the series in Theorem 399 is, indeed, con-
vergent.
9.4. SERIES WITH TERMS OF ANY SIGN 281
P1 n
(iii) The series n=1 x =n converges for every 1 < x < 1. Indeed,
jxn+1 j xn+1 n n
= = jxj ! jxj
jxn j n + 1 xn n+1
which obviously is < 1 when 1 < x < 1. Thus, also this series converges absolutely.N
x2n+2 (2n)! x2
= !0 8x 2 R
(2n + 2)! x2n (2n + 2) (2n + 1)
The next example, due to Dirichlet (1829) p. 158, con rms the elusive nature of series
with terms of any sign. Indeed, it shows that the asymptotic comparison criterion fails when
we consider series with terms of any sign: two such series may have terms of the same order
{ so, arbitrarily close as n gets larger { and, yet, have a di erent character.
P1 P1
Example 405 Consider the series n=1 xn and n=1 yn with terms
N
282 CHAPTER 9. SERIES (SDOGANATO)
Theorem 402 is a consequence of the following simple lemma, which should also further
clarify its nature.
P P1
Lemma 406 Given a series 1 n=1 xn , suppose there is a convergent series n=1 yn with
positive terms such that, for every n 1,
(i) xn + yn 0,
the three series involved. Both lim szn and lim syn exist. Clearly, sxn = szn syn for every n 1.
By Proposition 333-(i), we then have lim sxn = lim szn lim syn , as desired.
P P1
The series 1 n=1 yn thus \lifts", via addition, the series of interest n=1 xnPand takes it
back to the familiar terrain of series with positive terms. The convergence of 1 n=1 xn can
then be established by studying two auxiliary series with positive terms, for which we have
at our disposal all the tools learned in the previous sections.
Theorem 402 follows from the lemma by considering yn = jxn j because jxn j + xn 0 and
xn jxn j for every nP 1. ThisPclari es the \lifting" P1nature of absolute convergence. In
1 1
particular, it implies n=1 xn = n=1 (xn + jxn j)
P n=1 jxn j, so that the sum of the series
1
n=1 x n can be expressed in terms of the sums of two series with positive terms.
Absolute convergence is only a su cient condition for convergence. Indeed, the alternat-
ing harmonic (or Mercator ) series
1
X ( 1)n+1 1 1 1 1 1
=1 + + + (9.23)
n 2 3 4 5 6
n=1
converges to log 2, as the next elegant result will show. However, it does not converge
absolutely:
X1 X1
( 1)n+1 1
= = +1
n n
n=1 n=1
are decreasing and increasing, respectively. So, they converge to two scalars Lodd and Leven ,
respectively. Since s2n+1 s2n = x2n+1 ! 0, we then have Lodd = Leven . If we call L this
common limit, we conclude that sn ! L, so the alternating harmonic series converges.
It remains to show that L = log 2 . It is enough to consider the even partial sums s2n
and show that lim s2n = log 2. We have
2n
X n 1 n n n
X1 1 n
( 1)k+1 X 1 X 1 X 1 X 1
s2n = = = + 2
k 2k + 1 2k 2k 2k + 1 2k
k=1 k=0 k=1 k=1 k=0 k=1
X2n n
X
1 1
=
k k
k=1 k=1
By (9.9),
n
X 2n
X
1 1
= + log n + o (1) and = + log 2n + o (1)
k k
k=1 k=1
It is easy to check that the argument just used to show the convergence of the alternating
P
1
series (9.23) proves, more generally, that any alternating series ( 1)n+1 xn , with xn 0
n=1
for every n 1, converges provided the sequence fxn g is decreasing and in nitesimal, i.e.,
xn # 0.
19
We refer interested readers to Chapter 3 of Rudin (1976) for a more detailed analysis, which includes the
proofs of the results of this section.
20
Recall that a permutation is a bijective function (see Appendix B).
284 CHAPTER 9. SERIES (SDOGANATO)
for any permutation : N ! N? In other words, are series stable under permutations of
their elements?
This stability seems inherent to any proper notion of \addition", which should not be
a ected by mere rearrangements of the summands. Indeed, the answer is obviously positive
for nite sums because of the classic associative and commutative properties of addition.
The next result shows that the answer continues to be positive for series that are absolutely
convergent.
P P1
Proposition 408 Let 1 n=1 xn be a series that converges absolutely. Then, n=1 xn and
all its rearrangements have the same sum.
Absolutely convergent series thus exhibit the same nice behavior that characterizes nite
sums. Unfortunately, this is no longer the case if we drop absolute convergence. For instance,
consider the alternating harmonic series
1 1 1 ( 1)n+1
1 + + + +
2 3 4 n
We learned that it converges, with sum log 2, but that it is not absolutely convergent.
Through a suitable permutation, we con construct the rearrangement
1 1 1 1 1
1+ + + +
3 2 5 7 4
p
which is still convergent, but with sum log 2 2. So, rearrangements have, in general, di erent
sums. The next classic result of Riemann shows that everything goes, so the answer to the
previous question turns out to be dramatically negative.
P
Theorem 409 (Riemann) Let 1
P n=1 xn be a series that converges
P1but not absolutely (i.e.,
1
n=1 jxn j = +1). Given any L 2 R, there is a rearrangement of n=1 xn that has sum L.
Summing up, series that are absolutely convergent behave as the standard addition. But,
as soon as we drop absolute convergence, everything goes.
Chapter 10
Discrete calculus deals with problems analogous to those of di erential calculus, with the
di erence that sequences, that is, functions f : N f0g ! R with discrete domain, are
considered instead of functions on the real line. Despite a more rough domain, some highly
non-trivial results hold that make discrete calculus useful in applications.1 In particular, in
this chapter we will show its use in the study of series and sequences, allowing for a deeper
analysis of some issues which we have already discussed.
so fyn g is decreasing and fzn g is increasing. Being monotone, both fyn g and fzn g converge
(Theorem 323). If we denote their limits as y and z, that is, yn ! y and zn ! z, we can
write
lim sup xk = y and lim inf xk = z
n!1 k n n!1 k n
1
Some parts of this chapter require a basic knowledge of di erential calculus (so it can be read seamlessly
after reading Chapter 26).
285
286 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
The limits y and z are, respectively, called limit superior and limit inferior of fxn g, and are
denoted by lim sup xn and lim inf xn .
Example 411 In view of the last example, for the alternating sequence xn = ( 1)n we have
This example shows two key properties of the limits inferior and superior: they always
exist, even if the original sequence has no limit and their equality is a necessary and su cient
condition for the convergence of the sequence fxn g.2 Formally:
Proof Thanks to (10.1), Proposition 320 implies (10.2) and Theorem 338 yields the \if". As
for the \only if", we leave the easy proof to the reader (just use the de nition of convergence).
lim inf xn = lim sup xn and lim sup xn = lim inf xn (10.3)
They are duality properties that relate the limit superior and limit inferior of a sequence fxn g
with those of the opposite sequence f xn g. For instance, this simple duality allows to easily
translate some properties of the limit superior into properties of the limit inferior, and vice
versa (this is exactly what will happen in the next proof). Another interesting consequence
of the duality is the possibility to rewrite the inequality (10.2) as lim inf xn lim inf xn .
The next result lists some basic properties of the limits superior and inferior. Thanks to
the previous result, they imply the analogous properties that we established for convergent
sequences.3
Lemma 413 Let fxn g and fyn g be two bounded sequences. We have:
(iii) lim inf xn lim inf yn and lim sup xn lim sup yn if eventually xn yn .
2
Since it is bounded, fxn g converges or oscillates, but does not diverge.
3
Speci cally, (i) and (ii) extend Proposition 333-(i), while (iii) extends Proposition 320.
10.1. PREAMBLE: LIMIT POINTS 287
Proof We start by observing that fxn + yn g is bounded. (i) For every n we have inf k n (xk + yk )
inf k n xk + inf k n yk . Since the sequences finf k n (xk + yk )g, finf k n xk g and finf k n yk g
converge, (i) follows from Propositions 333-(i) and 320. (ii) follows from (i) and the duality
formulas contained in (10.3):
Point (iii) readily follows from the de nitions of lim inf and lim sup, and from Proposition
320.
If the sequence converges, there exists a unique limit point: the limit of the sequence.
If the sequence does not converge, the limit points are the scalars that are approached by
in nitely many elements of the sequence. Indeed, it can be easily shown that L is a limit
point for a sequence if and only if there exists a subsequence that converges to L.
Example 415 (i) The interval [ 1; 1] is the set of limit points of the sequence xn = sin n,
whereas f 1; 1g are the limit points of the alternating sequence xn = ( 1)n . (ii) The
singleton f0g is the unique limit point of the convergent sequence xn = 1=n. N
The next result shows that the limit points belong to the interval determined by the
limits superior and inferior.
Proposition 416 Let fxn g be a bounded sequence. If x 2 R is a limit point for the sequence,
then x 2 [lim inf xn ; lim sup xn ].
Proof Consider a limit point x. By contradiction, assume that lim inf xn > x. De ne
" = lim inf xn x > 0 and zn = inf k n xk for every n. On the one hand, in light of the
previous part of the chapter, we know that zn+1 zn for every n and zn ! lim inf xn . This
implies that there exists n" 2 N such that
" "
lim inf xn < zn < lim inf xn +
2 2
for every n n" . On the other hand, since x is a limit point, there exists xn such that
"
x 2 < xn < x + 2" where n can be chosen to be strictly greater than n" (recall that
each neighborhood of x must contain an in nite number of elements of the sequence). By
construction, we have that zn = inf k n xk xn . This yields that
" "
lim inf xn < zn xn < x +
2 2
288 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
thus lim inf xn < x+". We reached a contradiction since by de nition " = lim inf xn x which
we just proved being strictly smaller than ". An analogous argument yields that lim sup xn
x (why?).
Intuitively, the larger the set of limit points, the more the sequence is divergent; in par-
ticular, this set reduces to a singleton when the sequence converges. In light of the last result,
the di erence between superior and inferior limits, that is, the length of [lim inf xn ; lim sup xn ],
is a (not that precise) indicator of the divergence of a sequence.
Thanks to the inequality lim inf xn lim inf xn , the interval [lim inf xn ; lim sup xn ]
can be rewritten as [lim inf xn ; lim inf xn ]. For instance, if xn = sin n or xn = cos n, we
have that [lim inf xn ; lim inf xn ] = [ 1; 1].
N.B. Up to this point, we have considered only bounded sequences. Versions of the previ-
ous results, however, can be provided for generic sequences. Clearly, we need to allow the
limits superior and inferior to assume in nity as a value. For instance, if we consider the
sequence xn = n, which diverges to +1, we have lim inf xn = lim sup xn = +1; for the
sequence xn = en , which diverges to 1, we have lim sup xn = lim inf xn = 1, whereas
for the sequence xn = ( 1)n n we have lim inf xn = 1 and lim sup xn = +1, so that
[lim inf xn ; lim sup xn ] = R. We leave to the reader the extension of the previous results to
generic sequences. O
As a rst use of this notion, note that it permits to cast the recursive de nition (9.3) of
a series in the succinct di erence form
(
s1 = x1
(10.4)
sn = xn for n 2
The next result lists the algebraic properties of the di erences, that is, their behavior
with respect to the fundamental operations.5
4
See Section 26.14.
5
It is the discrete counterpart of the results in Section 26.8.
10.2. DISCRETE CALCULUS 289
Proposition 418 Let fxn g and fyn g be any two sequences. For every n, we have:
On the one hand, (i) shows that the di erence preserves addition and subtraction,
on the other hand, (ii) and (iii) show that more complex rules hold for multiplication and
division. Properties (ii) and (iii) are called product rule and quotient rule, respectively.
Therefore, the monotonicity of the original sequence is revealed by the sign of the di er-
ences.
Example 420 (i) If xn = c for all n 1, then xn = 0 for all n 1. In words, constant
sequences (which are both increasing and decreasing) have zero di erences. (ii) If xn = an ,
with a > 0, we have
xn = an+1 an = (a 1) an = (a 1) xn
The sequence xn = 2n thus equals the sequence of its own nite di erences, so it is the
discrete counterpart of the exponential function in di erential calculus.
Proof \If". From the last example, if a = 2 then for the increasing sequence f2n g we have
xn = xn for every n and x1 = 2. \Only if". Suppose that xn = xn for all n 1, that is,
xn+1 xn = xn . A simple induction argument shows that xn = 2n 1 x1 . Since x1 = 2, we
obtain xn = 2n for every n.
k
X k
k
xn = k 1
xn = k 1
xn+1 k 1
xn = ( 1)k i
xn+i (10.5)
i
i=0
This formula can be proved by induction on k (a common technique for this chapter). Here,
we only outline the induction step. Assume that (10.5) holds for k. We show it holds for
k + 1. Fix n. First, observe that (why?)
k+1 k k
= + 8i = 1; :::; k (10.6)
i i 1 i
k
X k
X
k k
k+1
xn = k
xn+1 k
xn = ( 1)k i
xn+1+i ( 1)k i
xn+i
i i
i=0 i=0
k 1
X k
X
k k
= ( 1)k i
xn+1+i + xn+k+1 k
( 1) xn ( 1)k i
xn+i
i i
i=0 i=1
k
X k
X
k+1 k k+1 i k
= ( 1) xn + ( 1) xn+i + ( 1)k+1 i
xn+i + xn+k+1
i 1 i
i=1 i=1
X k+1 k
k+1 k+1
= ( 1)k+1 xn + ( 1)k+1 i
xn+i + xn+k+1
0 i k+1
i=1
k+1
X k+1
= ( 1)k+1 i
xn+i
i
i=0
n = (n + 1) n=1
10.2. DISCRETE CALCULUS 291
n2 = (n + 1)2 n2 = 2n + 1
2 2
n = 2 (n + 1) + 1 (2n + 1) = 2
Formula (10.5) permits the following beautiful generalization of the series expansion
(9.20) of the exponential function. From now on, we set 0 xn = xn for every n. Note that
if we set 00 = 1 too, then (10.5) holds for k = 0 as well.
Theorem 423 Let fyn g be any bounded sequence. Then, for each n 1,
1
X 1
X
xk k x xj
yn = e yn+j 8x 2 R (10.7)
k! j!
k=0 j=0
The series expansion (9.20) extends the one of (10.7). Indeed, let n = 1 so that (10.7)
becomes
X1 X1
xk k xj
y1 = e x y1+j (10.8)
k! j!
k=0 j=0
Proof Since fyn g is bounded, the two series in the formula converge. By (10.5), we have to
show that, for each n,
1
X k 1
xk X k X xj
( 1)k i yn+i = e x
yn+j 8x 2 R (10.9)
k! i j!
k=0 i=0 j=0
In reality, we are going to prove a much stronger fact. Fix an integer j 0. We show that
the coe cients of yn+j on the two sides of (10.9) are equal. Clearly, on the right-hand side
this coe cient is e x xj =j!. As to the left-hand side, note that yn+j appears as soon as k j
and this coe cient is
1
X xk k
( 1)k j
k! j
k=j
Set i = k j. Then,
1
X 1
X 1
X
xk k xi+j i+j xi+j (i + j)!
( 1)k j
= ( 1)i = ( 1)i
k! j (i + j)! j (i + j)! i!j!
k=j i=0 i=0
1 1
xj X ( 1)i xj X ( x)i xj
= xi = = e x
j! i! j! i! j!
i=0 i=0
where the last equality follows from Theorem 399, thus proving (10.10) and the statement.
Lemma 425 Let fxn g be a sequence. For every k and for every n, we have k+1 x =
n
kx = k x .
n n
We leave the proof of this lemma to the reader and move to the proof of the last propo-
sition.
We proceed by induction. For k = 1, note that s can only be either 0 or 1 and the result
holds in view of the last example. Assume now that k+1 ns = 0 for all s 2 f0; 1; :::; kg
(induction hypothesis on k), we need to show that k+2 ns = 0 for all s 2 f0; 1; :::; k + 1g.
Let s belong to f1; :::; k + 1g: either s < k + 1 or s = k + 1. In the rst case, by the induction
hypothesis, we have that k+2 ns = k+1 ns = 0. In the second case, by using Newton's
binomial, we have
k+1 k k+1 k
nk+1 = (n + 1)k+1 nk+1 = nk+1 + n + n 1
+ +1 nk+1
1 2
k+1 k
= (k + 1) nk + n 1
+ +1
2
10.2. DISCRETE CALCULUS 293
Proof Before starting, note that for every sequence fxn g and for n 1 and m 1 equality
(10.14) can be rewritten as
m
X m
X
m! j m j
xn+m = xn = xn
j! (m j)! j
j=0 j=0
Assume now the statement is true for m. We need to show it holds for m + 1. Note that
m
X m
X
m j m j
xn+m+1 = xn+m + xn+m = ( xn ) + xn
j j
j=0 j=0
m
X Xm
m j+1 m j
= xn + xn
j j
j=0 j=0
m
X1 m
X
m+1 m j+1 m j 0
= xn + xn + xn + xn
j j
j=0 j=1
Xm Xm
m+1 m j m j 0
= xn + xn + xn + xn
j 1 j
j=1 j=1
m
X m+1
X
m+1 m+1 j 0 m+1 j
= xn + xn + xn = xn
j j
j=1 j=0
where the second to last equality follows from (10.6), proving the statement.
Therefore, even if the ratio xn =yn does converge, the behavior of the ratio xn = yn
of the di erences may not. On the other hand, the next result shows that the asymptotic
behavior of the ratio xn = yn determines the one of xn =yn .
Theorem 430 (Cesaro-Stolz) Let fyn g be a strictly increasing sequence that diverges to
in nity, that is, yn " +1, and let fxn g be any sequence. Then,
xn xn xn xn
lim inf lim inf lim sup lim sup (10.15)
yn yn yn yn
In particular, this inequality implies that, if the ( nite or in nite) limit of the ratio
xn = yn exists,7 we have
xn xn xn xn
lim inf = lim inf = lim sup = lim sup (10.16)
yn yn yn yn
that is, xn =yn converges to the same limit. Therefore, as stated above, the \regularity" of
the asymptotic behavior of the ratio xn = yn implies the \regularity" of the original ratio
xn =yn . At the same time, if the ratio xn =yn presents an \irregular" asymptotic behavior, so
will the di erence ratio.
At a conceptual level, in the next section we will see how Cesaro-Stolz's Theorem allows
for a better understanding of convergence criteria for series (see Section 10.4). To this end,
the following remarkable consequence of Cesaro-Stolz's Theorem will be crucial.
Corollary 432 Let fxn g be a sequence such that, eventually, xn > 0. Then,
xn+1 p p xn+1
lim inf lim inf n
xn lim sup n
xn lim sup (10.18)
xn xn
Proof Without loss of generality, let fxn g be a strictly positive sequence. We have
xn+1 p 1
log = log xn+1 log xn and log n
xn = log xn
xn n
Consider log xn and yn = n , (10.15) takes the form
log xxn+1
n
p p log xxn+1
n
lim inf lim inf log n
xn lim sup log n
xn lim sup
1 1
from which (10.18) follows since, for every strictly positive sequence fzn g, we have
elim inf zn = lim inf ezn and elim sup zn = lim sup ezn
Proof De ne the partial sums fsxn g and fsyn g by sxn = x1 + + xn and syn = y1 + + yn .
Then,
x1 + + xn sx xn sxn
= yn and = 8n 2
y1 + + yn sn yn syn
By the Cesaro-Stolz Theorem, inequality (10.19) then holds.
P1
P1This corollary con rms the asymptotic comparison criterion for two series n=1 xn and
n=1 yn with positive terms that diverge. Indeed, by the inequality (10.19) we have
x1 + x2 + + xn
!L
n
x1 + x2 + + xn
lim zn = lim =L
n
as desired.
The sequence Pn
i=1 xi
n
of arithmetic means converges always to the same limit of the sequence fxn g, whereas the
converse does not hold: the sequence of means may converge while the original one does not.
Example 435 The alternating sequence xn = ( 1)n does not converge, whereas
Pn
k=1 xk
!0
n
Indeed (
x1 + x2 + + xn 0 if n is even
=
n 1
if n is odd
n
N
Therefore, the sequence of means is more \stable" than the original one: by averaging, we
smooth out the behavior of a sequence. This motivates the following, more general, de nition
of limit of a sequence, named after Ernesto Cesaro. It is fundamental in probability theory
(and in its applications).
10.3. CONVERGENCE IN MEAN 299
De nition 436 We say that a sequence fxn g converges in the sense of Cesaro (or in mean)
C
to L, written xn ! L, when
x1 + x2 + + xn
!L
n
From the last result, it follows that ordinary convergence to a limit implies Cesaro con-
vergence to the same limit. The converse does not hold: we may have Cesaro convergence
without ordinary convergence.
Example 437 The alternating sequence xn = ( 1)n from the last example does not con-
C
verge but it converges in the sense of Cesaro, i.e., ( 1)n ! 0. N
It is useful to nd conditions under which the converse holds, so the convergence of the
sequence of means implies that of the original sequence. These results are called Tauberian
theorems. We state one of them as an example. To this end, we say that a sequence of
scalars is one-sided bounded when bounded below or above.
Theorem 438 (Landau) Let fxn g be a sequence with one-sided bounded auxiliary sequence
fn xn g. Given L 2 R, we have
C
xn ! L () xn ! L
C
Proof By Theorem 434, we need only to prove the \(" part. So, suppose xn ! L. We
want to show that xn ! L. We prove this implication under the stronger condition that
the sequence n xn converges (so, it is bounded). We begin with a claim.
Claim It holds n xn ! 0.
where the penultimate equality follows from the Cesaro-Stolz Theorem. Thus, L0 + L = L
and so L0 = 0.
x1 + x2 2x2 x1 x2 x2 x1 x1
x2 = = =
2 2 2 2
8
We follow Cesaro (1894) p. 108.
300 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
Induction step: suppose (10.21) holds for n. We want to show that it holds for n + 1.
C
as desired. We conclude that formula (10.21) holds. As xn ! L, by this formula we have
Pn
k=1 k xk
! 0 () xn ! L (10.22)
n+1
as desired.
s1 = 1 ; s2 = 0 ; s3 = 1 ; s4 = 0 ; s5 = 1 ;
Even if this is not his main scienti c contribution, the name of Guido Grandi is remem-
bered for his treatment of this series. It is curious to note that, until the mid-nineteenth
century, also the greatest mathematicians believed { like Grandi { that this series summed
to 1=2. Until then, mathematics had been developing untidily: highly complex theorems
were known, but attention to well posed de nitions and rigor, which we are now used to,
was lacking.
The monk Guido Grandi proposed the following explanation, which contains two mis-
takes. First of all, he identi ed
1 1+1 1+1 1+
as a geometric series with common ratio q = 1 (correct) and therefore having sum
1 1 1
= =
1 q 1 ( 1) 2
(wrong: the geometric series converges only when jqj < 1). In an unfortunate crescendo, by
pairing the addends (wrong: the associative property does not generally hold for series; cf.
Section 9.4.2), Grandi then derived the equality
(1 1) + (1 1) + =0+0+
302 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
that is, Cesaro sums reduce to ordinary ones.10 In particular, this condition holds when
xn = O (1=n).
Lemma 440 Let fxn g be a sequence and k 2 R. There exists q < k such that, eventually,
xn q if and only if
lim sup xn < k (10.23)
Proof \Only if". Suppose that there exists q < k such that eventually xn q holds. There
exists n such that xn q for every n n. Therefore, for any such n we have supm n xm q,
which implies
lim sup xn = lim sup xm q < k
n!1 m n
\If". Suppose that (10.23) holds. Set L = limn!1 supm n xm . Since L < k, for every
" > 0 there exists n such that
sup xm L <" 8n n
m n
that is
L " < sup xm < L + " 8n n
m n
If we choose " su ciently small so that L + " < k, by setting q = L + " we obtain the desired
condition.
This lemma has the following consequence for sequences of ratios, our main object of
interest here.
10
This is the version of Landau's Theorem originally considered in 1910 by Edmund Landau.
11
For the sake of brevity, we will only consider series. Nonetheless, similar considerations hold for sequences
(Section 8.11). Example 431 is explanatory.
10.4. CONVERGENCE CRITERIA FOR SERIES 303
Lemma 441 Let fxn g be a sequence with, eventually, xn > 0. There exists q < 1 such that,
eventually, xn+1 =xn q if and only if
xn+1
lim sup <1 (10.24)
xn
Proof In view of the last lemma, it is enough to consider the sequence with term yn =
xn+1 =xn and to put k = 1.
The previous analysis leads to the following corollary, which is useful for computations,
in which the ratio criterion is expressed in terms of limits.
P1
Corollary 442 Let n=1 xn be a series with, eventually, xn > 0.
(i) If
xn+1
lim sup <1
xn
then the series converges.
(ii) If
xn+1
lim inf >1
xn
then the series diverges positively.
Note that, thanks to Lemma 441, point (i) is equivalent to point (i) of Proposition 395.
In contrast, point (ii) is weaker than point (ii) of Proposition 395 since condition (10.25) is
only su cient, but not necessary, to have that xn+1 =xn 1 eventually.
As shown by the following examples, this speci cation of the ratio criterion is particularly
useful when the limit
xn+1
lim
xn
exists, that is, whenever
xn+1 xn+1 xn+1
lim = lim sup = lim inf
xn xn xn
In this particular case, the ratio criterion takes the useful tripartite form of Proposition 393:
304 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
(i) if
xn+1
lim <1
xn
the series converges;
(ii) if
xn+1
lim >1
xn
the limit of the series is 1;
(iii) if
xn+1
lim =1
xn
the criterion fails and it does not determine the behavior of the series.
As we have seen in Section 8.11, this form of the ratio criterion is the one which is usually
used in applications. Examples P394 and 397 have
P1shown 2cases (i) and (ii). The unfortunate
case (iii) is well exempli ed by 1n=1 1=n and n=1 1=n .
The next convergence criterion is, from a theoretical point of view, the most powerful
one.
P1
Proposition 443 (Root criterion) Let n=1 xn be a series with positive terms.
Let us see the limit form of this result. In view of Lemma 440, point (i) can be equivalently
stated as
p
lim sup n xn < 1
p
As to point (ii), it requires that n xn 1 for in nitely many values of n, that is, that there
p
is a subsequence fnk g such that nk xnk 1 for every k. Such a condition holds if
p
lim sup n
xn > 1 (10.27)
and only if
p
lim sup n
xn 1 (10.28)
The constant sequence xn = 1 exempli es how condition (10.28) can hold even if (10.27)
does not. The sequence xn = (1 1=n)n on the other hand, shows how even condition (ii)
10.4. CONVERGENCE CRITERIA FOR SERIES 305
from Proposition 443 may not hold although (10.28) holds. It is, therefore, clear that (10.27)
implies point (ii) of Proposition 443, which in turn implies (10.28), but that the opposite
implications do not hold.
All this brings us to the following limit form, in which point (i) is equivalent to that of
Proposition 443, while point (ii) is weaker than its counterpart since, as we have seen above,
p
condition (10.27) only is a su cient condition for n xn 1 to hold for in nitely many values
of n.
P
Corollary 444 (Root criterion in limit form) Let 1 n=1 xn be a series with positive terms.
p
(i) If lim sup n xn < 1, the series converges.
p
(ii) If lim sup n xn > 1, the series diverges positively.
p p
Proof If lim sup n xn < 1, we have that n xn q for some q < 1, eventually. The desider-
p p
atum follows from Proposition 443. If lim sup n xn > 1, then n xn 1 for in nitely many
values of n, and the result follows from Proposition 443.
As for the limit form of the ratio criterion, also that of the root criterion is particularly
p
useful when lim n xn exists. Under such circumstances the criterion takes the following
tripartite form:
(i) if
p
lim n
xn < 1
the series converges;
(ii) if
p
lim n
xn > 1
the series diverges positively;
(iii) if
p
lim n
xn = 1
the criterion fails and it does not determine the behavior of the series.
The tripartite form of the root criterion is, like that of the ratio criterion, most useful
computationally. Nonetheless, we hope that the reader will always keep in mind the theoret-
ical underpinning the criterion: \ye were not made to live like unto brutes, but for pursuit
of virtue and of knowledge", as Dante's Ulysses famously remarked.12
P p
(ii) Let 0 q < 1. The series 1 k n
n=1 n q converges for every k: indeed
n
nk q n = qnk=n ! q
because n k=n ! 1 (since log n k=n = (k=n) log n ! 0). N
p
The ratio and root criteria are based on the behavior of sequences fxn+1 =xn g and n xn ,
which are related via the important inequalities (10.18). In particular, if lim xn+1 =xn exists,
we have
xn+1 p
lim = lim n xn (10.29)
xn
and so the two criteria are equivalent in their limit form. However, if lim xn+1 =xn does not
exist, we still obtain from (10.18) that
xn+1 p
lim sup < 1 =) lim sup n xn < 1
xn
and
xn+1 p
lim inf > 1 =) lim sup n xn > 1
xn
This suggests that the root criterion is more powerful than the ratio criterion in determining
convergence: whenever the ratio criterion rules in favor of convergence or of divergence, we
would have reached the same conclusion by using the root criterion. The opposite does not
hold, as the next example shows: the ratio criterion fails while the root criterion determines
that the series in question converges.
that is:
1 1 1 1 1 1 1
+1+ + + + + + +
2 8 4 32 16 128 64
We have 8 1
>
> 2(n+1) 2
=2 if n odd
xn+1 < 1
2n
=
xn >
>
1
1
: 2n+1
1 = 8 if n even
2n 2
and ( 1
p 2 if n odd
n
xn = p
n
4
2 if n even
so that
xn+1 xn+1 1
lim sup =2 , lim inf =
xn xn 8
and
1 p
lim sup n
xn =
2
The ratio criterion thus fails, while the root criterion tells us that the series converges. N
13
See Rudin (1976) p. 67.
10.5. MULTIPLICATION OF SERIES 307
Even though the root criterion is more powerful, the ratio criterion can still be useful as
it is generally easier to compute the limit of ratios than that of roots. The root criterion
may be more powerful from a theoretical standpoint, but harder to use from a computational
perspective.
In light of this, when using the criteria for solving problems, one should rst check
whether lim xn+1 =xn exists and, if it does, compute it. In such a case, thanks to (10.29) we
p
can also know the value of lim n xn and thus we can use the more powerful root criterion.
In the unfortunate case in which lim xn+1 =xn does not exist, and we can at best compute
lim sup xn+1 =xn and lim inf xn+1 =xn , we can either use the less powerful ratio criterion (which
p
may fail, as we have seen in the previous example), or we may try to compute lim sup n xn
directly, hoping it exists (as in the previous example) so that the root criterion can be used
in its handier limit form.
Finally, note that, however powerful it may be, the root criterion { a fortiori, the weaker
ratio criterion { only gives a su cient condition for convergence, as the following example
shows.
This propertyP does not hold for the analog operation of multiplication: the series' term-
by-term product 1 n=1 xn yn does not converge, in general, to the limits' product S
x Sy,
as the next example shows.
p P P1
Example 448 TakeP xn = yn = ( P 1)n = n. The series 1 n=1 xn = n=1 yn converges (see
1 1
Example 405). Yet, n=1 xn yn = n=1 1=n = +1. N
In this section we present a notion of product of series that preserves limits. To this end,
in the next subsection we rst introduce an important operation on sequences.
14 P1
Throughout this section, Lx denotes the limit of a sequence x = fxn g and S x the sum of a series n=0 xn .
308 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
Thus, the operation of convolution associates to two sequences x = fxn g and y = fyn g
the sequence
x y = fx1 y1 ; x1 y2 + x2 y1 ; x1 y3 + x2 y2 + x3 y1 ; :::g
Through sums we can write the n-th term of a convolution in two equivalent ways:
n
X X
(x y)n = xk yn+1 k = xk ym (10.30)
k=1 k+m=n+1
The convolution of two convergent sequences thus converges in the sense of Cesaro to the
product of their limits. That is,
x1 yn + x2 yn 1 + + xn y1
! Lx Ly
n
Theorem 434 is the special case when yn = 1 for all n.
Proof We have
x1 yn + x2 yn 1 + + xn y1
(x y)n = (10.31)
n Pn
(x1 Lx ) yn + (x2 Lx ) yn 1 + + (xn Lx ) y1 k=1 yi
= + Lx
n n
The convergent sequence fyn g is bounded (Proposition 322), so there exists a positive scalar
M such that jyn j M for all n. Thus,
(x1 Lx ) yn + (x2 Lx ) yn 1 + + (xn Lx ) y1
n
jx1 Lx j + jx2 x
L j+ + jxn Lx j
M
Pn n
k=1 jxi Lx j
= M !0
n
P
where the convergence to zero holds because,
P by Theorem 434, n 1 nk=1 xi ! Lx .
Again by Theorem 434, we have n 1 nk=1 yi ! Ly . From (10.31) it then follows that
(x y)n ! Lx Ly , as desired.
(x y)n ! Lx Ly
P1 P1
provided at least one of the series n=1 jyn Ly j and n=1 jxn Lx j converges.
P1
Proof Assume that n=1 jyn Ly j < 1. We begin with a claim.
P1
Claim If n=1 jyn j < 1 and xn ! 0, then (y x)n ! 0.
Proof of the Claim Let " > 0. Since xn ! 0, there exists nx" 1 such that jxn j
P1 x y Pnx"
"= n=1 jyn j for all n n" . Since yn ! 0, there exists n" 1 such that jyn j "= n=1 jxn j
for all n ny" . Hence, given any n > nx" + ny" , we have
"
jyn+1 kj Pnx" 8k = 1; :::; nx"
n=1 jxn j
By the Claim,
This apparently ad hoc notion of product, introduced by Cauchy in 1821, turns out to
be the sought-after one that preserves limits, as the following classic result, proved by Franz
Mertens in 1875, shows.
P P1
Theorem 453 (Mertens) Let 1 n=1 xn and n=1 yn be two convergent series. If at least
one of them converges absolutely, then their Cauchy product converges and
1
X
(x y)n = S x Sy
n=1
Under a simple hypothesis of absolute convergence, the product of the sums of two
convergent series is equal to the sum of their Cauchy product. We can thus regard the map
that associates to two convergent series their Cauchy product as a multiplication operation
of convergent series.
The proof is a simple consequence of Proposition 451, which indeed can be seen as a
version of Mertens' Theorem for convolutions, along with the following interesting lemma.
Lemma 454 It holds
sxn y
= sx1 yn + sx2 yn 1 + + sxn y1 (10.33)
and
sx1 y
+ + sxn y
= sx1 syn + sx2 syn 1 + + sxn sy1 (10.34)
Proof It holds
Xn
sxn y = (x y)k = x1 y1 + (x1 y2 + x2 y1 ) + (x1 y3 + x2 y2 + x3 y1 ) + + (x1 yn + x2 yn 1 +
k=1
= sx1 yn + sx2 yn 1 + + sxn y1
Thus,
sx1 y
+ sx2 y
+ sx3 y
+ + sxn y
Example 455 Consider the series expansion of the exponential function (9.20), i.e.,
1
X
x xn
e =
n!
n=0
1
X
The series xn =n! converges absolutely for each x 2 R (why?). Along with the Newton
n=0
binomial formula, Merten's Theorem implies:15
1 n
! 1 n
!
X X 1 1 X X 1 n k n
x y
e e = xk y n k
= x y k
k! (n k)! n! k
n=0 k=0 n=0 k=0
X1
1
= (x + y)n = ex+y
n!
n=0
Thus, a Cauchy product underlies the standard multiplication rule for exponential functions.
N
The next example shows the importance of absolute convergence in Merten's Theorem.
p
Example 456 As in Example 448, take xn = yn = ( 1)n = n. It holds
n
X n
X n
X
n+1 1 1 1
j(x y)n j = ( 1) p p = p p
k=1
k n+1 k k=1
k (n + 1 k) k=1
kn
n
X n
X
1 1
p = =1
n 2 n
k=1 k=1
P1 P1
P1 Cauchy product n=1 j(x y)n j thus diverges positively even though both series n=1 xn =
The
n=1 yn converge. N
In sum, the ordinary convergence of the Cauchy product does not hold under plain
convergence. Yet, a remarkable convergence result, due to Cesaro (1890), still holds.16
P P1
Theorem 457 (Cesaro) Let 1 n=1 xn and n=1 yn be two convergent series. Their Cauchy
product converges in the sense of Cesaro and
1
X C
(x y)n = S x Sy
n=1
15
Pn For series that start at n = 0, the convolution formula (10.30) is easily seen to become (x y)n =
k=0 xk yn k :
16
This result marked the beginning of the systematic study of divergent series, as summarized about sixty
years later in Hardy (1949).
312 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
Thus, it is the Cesaro sum of the Cauchy product that equals the product of the sums of
two convergent series. A suitable Tauberian theorem may then ensure that this Cesaro sum
is actually an ordinary one.
Proof By (10.34),
sx1 y
+ + sxn y
= sx1 syn + sx2 syn 1 + + sxn sy1
C
sx1 syn + sx2 syn 1 + + sxn sy1 ! S x Sy
C
In turn, this implies that sx1 y
+ + sxn y
! Sx S y , as desired.
Cesaro's Theorem has, as an immediate corollary, an earlier result that Niels Abel proved
in 1826.
P1 P1
Corollary 458 (Abel) Let n=1 xn and n=1 yn be two convergent series. If their Cauchy
product converges, then
1
X
(x y)n = S x Sy
n=1
Proposition 459 Let fxn g be a bounded sequence and k 2 R. Consider the following prop-
erties:
(i) xn k eventually,
So, a lower bound for the limit inferior is, up to an arbitrarily small " > 0, also a \tail"
lower bound for the sequence, which holds eventually. To see that (ii) does not imply (i) it
is enough to consider the sequence xn = 1=n: we have lim xn = 0 though xn < 0 for all
n 1.
Proof (i) easily implies (ii). We prove that (ii) implies (iii). Let lim inf xn k. Fix " > 0.
By the de nition of limit inferior, there exists n" 1 such that for all n n" we have
zn = inf k n xk > k ". Since xn zn for all n 1, we conclude that xn > k " eventually.
(iii) implies (ii). Assume that, for each " > 0, there exists n" 1 such that xn k "
for all n n" . In turn, this implies zn = inf k n xk k " for all n 1, so that lim inf xn =
lim zn k ". Since this inequality holds for each " > 0, we conclude that lim inf xn k.
If we reverse the inequality in Proposition 459, from to <, we get the following inter-
esting result.
Proposition 460 Let fxn g be a bounded sequence and k 2 R. The following conditions are
equivalent:
Proof (i) implies (ii). Let lim inf xn = x < k. Let " > 0 such that x + " < k. By de nition
of limit inferior, there exists n" 1 such that zn = inf k n xk < x + " < k for all n n" .
Thus, for all n n" there exists some m n such that xm < k. We conclude that, for each
n 1, there exists some n n such that xn < k.
(ii) implies (i). Assume that, for each n 1, there exists n n such that xn < k. Then,
zn = inf k n xk < k and so, being fzn g a decreasing sequence, lim inf xn = lim zn < k.
In point (ii) of this proposition emerges a kind of tail behavior for which the adverb
\eventually" is too stringent. For this reason, next we introduce a more relaxed adverb,
\in nitely often".
De nition 461 We say that a sequence satis es a property P in nitely often if, starting
from each position n = nP , there exists at least a position n nP whose term of the sequence
satis es P.
Corollary 462 Let fxn g be a bounded sequence and k 2 R. The following conditions are
equivalent:
Proof (i) implies (ii). Let lim inf xn k. Let " > 0. We have lim inf xn < k + ", so
by Proposition 459 we have xn < k + " in nitely often, as desired (since " was arbitrarily
chosen). (ii) implies (i). Assume that xn k + " in nitely often, for each " > 0. That is,
for each n 1, there exists some n n such that xn k + ". Then, zn = inf k n xk k + "
for all n 1 and so lim inf xn k + ". Since this holds for each " > 0, we conclude that
lim inf xn k.
The next corollary nicely summarizes what we did so far in this section by characterizing
limit inferiors.
Corollary 463 Let fxn g be a bounded sequence and k 2 R. The following conditions are
equivalent:
(ii) xn k " eventually and xn k + " in nitely often, for each " > 0.
The results seen so far for limit inferiors and lower bounds have, of course, dual versions
for limit superiors and upper bounds that are based on the duality properties (10.3). Next
we state the dual version of Proposition 459 and leave to the reader the dual versions of the
other results of the section.
(i) xn k eventually,
A nice consequence of the equivalence of (ii) and (iii) in this result is that a bounded
sequence fxn g converges to some x 2 R if and only if
as the reader can check. In contrast, the condition lim inf jxn xj = 0 is irrelevant for the
convergence of fxn g: the sequence xn = ( 1)n does not converge even though lim inf jxn 1j =
0.
10.6. INFINITELY OFTEN: A SECOND KEY ADVERB 315
The two adverbs are easily applied to the comparison of sequences. For instance, we
write xn yn i.o. when xn yn 0 i.o., while we write xn yn eventually when xn yn 0
eventually. So, the last two tables is easily adapted to the comparison of sequences, with an
interesting twist in view of Lemma 413.
lim inf (xn yn ) < 0 lim sup (xn yn ) > 0
m m
xn < yn i.o. xn > yn i.o.
+ +
xn yn i.o. xn yn i.o.
+ +
lim inf (xn yn ) 0 lim sup (xn yn ) 0
m m
8" > 0; xn yn + " i.o. 8" > 0; xn + " yn i.o.
+ +
lim inf xn lim sup yn lim inf xn lim sup yn
We close with a nal remark. Say that a set A of natural numbers is dense if, for each
natural number n, there is some element a of A such that a n. The set of even numbers
316 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
and the set of powers 2n are instances of dense sets. A moment's re ection shows that a
sequence satis es a property i.o. when it does for all terms xn whose indexes n belong to
a dense set of natural numbers. For example, the sequence xn = ( 1)n is > 0 i.o. (resp.,
< 0 i.o.) because all the terms with even (resp., odd) indexes are strictly positive (resp.,
negative).
In contrast, a sequence satis es a property eventually when it does for all terms xn whose
indexes n belong to a co nite set of natural numbers, that is, a set whose complement is
nite (a co nite set contains all natural numbers except at most nitely many of them).
Clearly, co nite sets are dense, but the converse is obviously false. This provides another
angle on the two adverbs and should further clarify why one is much stronger than the other.
of a sequence often behave better than the sequence x = fxn g itself. So, two sequences may
have asymptotic partial sums even though they are not asymptotic, that is, we may have
xn yn but sxn syn . Our new adverb makes it possible to characterize this case.
Proposition 465 Let x = fxn g and y = fyn g be two sequences such that
In words, if two sequences have (unbounded) asymptotic partial sums, their terms are,
in nitely often, arbitrarily close. Note that the hypothesis sxn ! +1 is needed: if x =
f2; 0; 0; 0; : : :g and y = f1; 1=2; 1=4; 1=8; :::g, then sxn syn but xn < yn eventually.
Proof Fix " > 0 and suppose, by contradiction, that eventually xn (1 + ") yn , i.e., there
exists n" 1 such that xn (1 + ") yn for all n n" . It follows that, for all integers n n" ,
n"
X n
X n"
X n
X n"
X
sxn = xk + xk xk + (1 + ") yk = (xk yk ) + (1 + ")syn
k=1 k=n" +1 k=1 k=n" +1 k=1
The next corollary is a consequence of this proposition that will be soon useful.
10.6. INFINITELY OFTEN: A SECOND KEY ADVERB 317
Proof We have
log n! = log 1 + log 2 + log 3 + + log n n log n (10.41)
Indeed, we have, for each n 1,
Z n n
X
log x dx log i n log n:
1 i=1
Since Z n
log x dx = x(log x 1)jn1 = n(log n 1) + 1 n log n;
1
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
pn 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47
pn 1 2 2 4 2 4 2 4 6 2 6 4 2 4
Except the rst prime gap, all other prime gaps pn with n > 1 are even: all prime
numbers > 2 are odd and so their di erences are even numbers.17 So, pn 2 for all n > 1.
The smallest prime gap is 1 and is given only by the initial term
p1 = p 2 p1 = 3 2=1
Indeed, if we take any other pair of consecutive natural numbers, one of them has to be even
and so it cannot be prime.
17
Two odd numbers have the form 2n1 + 1 and 2n2 + 1, with n1 > n2 . So, their di erence is the even
number 2 (n1 n2 ).
318 CHAPTER 10. DISCRETE CALCULUS (SDOGANATO)
In contrast, there are many pairs of prime with a prima gap 2, the so-called twin primes.
For instance, among the rst fteen prime numbers we have six pairs of twin primes
f3; 5g ; f5; 7g ; f11; 13g ; f17; 19g ; f29; 31g ; f41; 43g
This simple example seems to indicate that the are plenty of twin primes, but also that
they seem to become less frequent as we go on along the sequence of primes. Indeed, it is
not know even if there exist in nitely many of them, that is, if
lim inf pn = 2
It is one of the (many) prime mysteries, the so-called twin prime conjecture.
Be that as it may, this is only the beginning: if the prime gap 2 is already elusive, let alone
larger prime gaps. Something non-trivial about prime gaps can be said, however, thanks to
the Prime Number Theorem.
Proof Suppose, by contradiction, that lim sup pn < +1. So, there exists K > 0 such that
pn K 8n 1 (10.43)
n! + 2; n! + 3; :::; n! + n
because
log (n + 1) log (n + 1) log n + log n log (n+1)
n
= = +1!1
log n log n log n
So,
pn pn+1 pn pn+1
= = 1!0
pn pn pn
We conclude that pn = o (pn ).
Again by Theorem 364 and Lemma 355-(iii), we have:
This suggests logarithmic lower and upper bounds for prime gaps. Indeed, in view of Corol-
lary 466, from (10.42) and log pn log n we infer, for every " > 0, the upper bound
Along the sequence of primes one can thus nd, in nitely often, two consecutive primes
whose gap is such that
pn < c log pn
i.e., for each n 1 there exists n n such that pn < c log pn . The sequence c log pn
provides, in nitely often, an upper bound for prime gaps. The constant c is < 1 and so it
improves the constant (1 + ") in the upper bound (10.44).
However interesting, Erdos' logarithmic upper bound (10.46) diverges because log n !
+1 as n ! +1. So, it does not say much about the persistence of small gaps as n gets
large, for instance whether one can hope to nd in nitely many twin primes { as the twin
prime conjecture claims.
A remarkable result in this direction has been recently proved.
Theorem 469 (Zhang) There is a constant b > 0 such that lim inf pn < b.
So, along the sequence of primes one can nd, in nitely often, two consecutive primes
whose gap is < b, i.e., for each n 1 there exists n n such that pn < b. This result
substantially improves Erdos' (and subsequent) results. Indeed, it implies inter alia that in
Erdos' result the constant c is actually equal to 0, that is, lim inf pn = log n = 0.
Though it is not know whether there exist in nitely many twin primes, this result permits
to say that there are in nitely many gaps bounded above by a constant b, so in nitely many
small gaps.19 To prove the twin prime conjecture, one would need to prove that b = 3, i.e.,
that
lim inf pn = 2
So far, the best bound is b = 247, i.e., lim inf pn 246. Still far from 2, but a truly
remarkable bound nevertheless.
All that said, let us turn to the lower bound (10.45) for large prime gaps. Here we would
like to understand how large can become prime gaps pn , as n gets larger. In the 1930s
few remarkable results appeared that shed light on the issue, summarized in Erdos (1940).
Among them, maybe the easiest to report is the next one, due to Ricci (1934) p. 194.
Theorem 470 (Ricci) There is a constant c > 0 such that
pn
lim sup >c
log pn log log log pn
In words, along the sequence of primes one can nd, in nitely often, two consecutive
primes whose gap is such that
pn > c log pn log log log pn
i.e., for each n 1 there exists n n such that pn > c log pn log log log pn . The sequence
c log pn log log log pn provides, in nitely often, a lower bound for prime gaps. It improves the
lower bound (1 ") log n in (10.45) because, for any c > 0, eventually c log log log pn > 1 ".
The logarithms involved in this lower bound grow extremely slowly { recall table 8.62 on
log log n. Recently, these lower bounds from the 1930s have been signi cantly improved. We
refer interested readers to Maynard (2018) for a recent perspective on this fascinating topic.
19
When he published this major result Yitang Zhang was 59 years old and held a teaching position (for-
tunately, mathematics is much more than a young man's game, unlike what Hardly famously claimed in his
Apology).
Chapter 11
Power series
In the nal chapter of this part we study in some detail power series, a fundamental class of
series that plays a key role in many applications (for instance, in the economic analysis of
temporal choices).
with an 2 R for every n 0. The scalars an are called coe cients of the series.
The generic term of a power series is xn = an xn . The scalar x parameterizes the series:
to di erent values of x correspond di erent series, possibly with a di erent character. In
P
1 X1
particular, a power series an xn converges (diverges) at x0 2 R if the series an xn0
n=0 n=0
converges (diverges).
We set 00X= 1. In this way, a power series always converges at 0: indeed, from 00 = 1 it
1
follows that an 0n = a0 .
n=0
X1
Proposition 471 If a power series an xn converges at x0 6= 0, then it converges
n=0
(absolutely) at each x 2 R such that jxj < jx0 j. If it diverges at x0 6= 0, then it diverges at
each x 2 R such that jxj > jx0 j.
X1
Proof We rst prove convergence. Since an xn0 converges, we have an xn0 ! 0 (The-
n=0
orem 380). By (8.33), we then have jan xn0 j ! 0, so the sequence fjan xn0 jg is bounded
(Proposition 322). That is, there is M > 0 such that jan xn0 j M for all n 1. Let
jxj < jx0 j. Set q = jxj = jx0 j. Then, for all n 1 we have:
n
n jan xn j n jxn j jxj
jan x j = n jx0 j = jan j jxn0 j n = jan xn0 j M qn
jx0 j jx0 j jx0 j
321
322 CHAPTER 11. POWER SERIES
X1 X1 X1
Being 0 q < 1, the geometric series q n converges. From jan xn j M qn
X1 n=0 X
n=0
1
n=0
it then follows that the series jan xn j converges, so the series an xn is absolutely
n=0 n=0
convergent (Theorem X1 402).
Suppose that an xn0 diverges. Let jxj > jx0 j. Suppose, by contradiction, that
X1 n=0 X1
an xn converges. By the previous part of the proof, an xn0 converges absolutely.
n=0 X1 n=0
On the other hand, since an xn0 diverges, by Theorem 402 we have
n=0
X1 X1
jan j jxn0 j = jan xn0 j = +1
n=0 n=0
Inspired by thisX result, we say that a positive scalar r 2 [0; 1] is the radius of convergence
1
of a power series an xn if the series converges at each jxj < r and diverges at each
n=0
jxj > r. If it exists, the radius of convergence is a watershed that separates convergent and
divergent behavior of the power series (at jxj = r the character of the series is ambiguous,
it may be regular or not). In particular, if r = +1 the power series converges at all x 2 R,
while if r = 0 it converges only at the origin.
The next important result, a simple yet remarkable consequence of the root criterion,
proves the existence of such radius and gives a formula to compute it.
X1
Theorem 472 (Cauchy-Hadamard) The radius of convergence of a power series an xn
n=0
is 8
>
> 0 if = +1
<
1
r= if 0 < < +1
>
>
:
+1 if = 0
where p
n
= lim sup jan j 2 [0; 1]
Proof Assume 2 (0; 1). We already remarked that the power series converges at x = 0.
So, let x 6= 0. We have
p p jxj
lim sup n jan xn j = jxj lim sup n jan j = jxj =
r
So, by the root criterion the series converges if jxj =r < 1, namely if jxj < r, and it diverges
if jxj =r > 1, namely if jxj > r. We leave the case 2 f0; +1g to the reader.
X1
The interval A formed by the points at which the power series an xn converges is
n=0
called interval of convergence of the power series. By the Cauchy-Hadamard Theorem, we
have
( r; r) A [ r; r]
11.1. POWER SERIES 323
where r 2 [0; 1] is the radius of convergence of the power series. Depending on the character
of the series at x = r, the inclusions may become equalities. For instance, if the power
series converges at both points r, we have A = [ r; r], while if it does not converge at
either point we have A = ( r; r). The next examples illustrate.
Next we consider two important power series with \probabilistic" coe cients that are
positive and add up to 1.
n
with > 0, has positive coe cients e =n! that add up to 1:
1
X n 1
X n
e =e =e e =1
n! n!
n=0 n=0
Since r r
n
n n 1
= lim sup e =e lim sup =0
n! n!
its radius of convergence is r = +1.
(ii) The normalized geometric power series
1
X
n n
(1 ) x
n=0
In the following example we consider a power series that plays a key role in many economic
models.
where the utility functions ut : R ! [0; 1) are positive (cf. Example 392). By Cauchy-
Hadamard's Theorem, this power series has radius of convergence
1
r= p
t 1
lim sup ut (xt )
Assume that all ut are uniformly bounded by a constant M , i.e., the inequality (9.14) of
Example 392 holds. In this case,
1 1
r= p p =1 (11.5)
lim sup t 1
ut (xt ) lim sup t 1 M
where the last equality follows from Proposition 346. So, the series now converges for all
j j < 1, in particular for all 2 (0; 1) which are the values of economically meaningful
when it is interpreted as a subjective discount rate.
11.1. POWER SERIES 325
In the uniformly bounded case, the convergence of the power series can be also easily
checked directly, without invoking Cauchy-Hadamard's Theorem. Indeed, in view of the
properties of the geometric series, for all j j < 1 we have
T
X T
X T
X
t 1 t 1 t 1 T !1 M
0 jut (xt )j M =M !
1
t=1 t=1 t=1
the result readily follows from the Cauchy-Hadamard and Mertens Theorems.
P
Example 477 The power series 1 n n
n=0 ( 1) x has radius 1 and sum 1= (1 + x) (why?).
Since !
X1 X n X1
( 1)k ( 1)n k xn = ( 1)n (n + 1) xn
n=0 k=0 n=0
by Proposition 476 we have
1
X 1
( 1)n (n + 1) xn =
n=0
(1 + x)2
for all jxj < 1. N
We close by observing that, for simplicity, so far we (tacitly) considered power series
centered at x0 = 0. Of course, everything goes through if x0 is any scalar. In this case, the
power series has the form
X1
an (x x0 )n
n=0
and converges at each jx x0 j < r and diverges at each jx x0 j > r. So, its interval of
convergence A is such that (x0 r; x0 + r) A [x0 r; x0 + r].
1
As previously (foot)noted,
P for series that start at n = 0, the convolution formula (10.30) is easily seen to
become (x y)n = n k=0 xk y n k:
326 CHAPTER 11. POWER SERIES
of the sequence f1=n!g, so de ned via the power series (11.2), has the entire real line as
its domain. By Theorem 399, it is the exponential f (x) = ex . Relatedly, the generating
function
X1 X1
n
n ( x)n
f (x) = e x =e =e e x=e + x
n! n!
n=0 n=0
n
of the Poisson sequence e =n! has the entire real line as its domain.
(ii) The generating function
X1
xn
f (x) =
n
n=1
of the harmonic sequence f1=ng, so de ned via the power series (11.3), has domain [ 1; 1).
(iii) The \geometric" function
1
X
n n 1
f (x) = x =
1 x
n=0
is the generating function of the sequence fut (xt )g when all ut are positive and uniformly
bounded. N
11.2. GENERATING FUNCTIONS 327
It is always rewarding to nd the analytic expression of a generating function, like the ex-
ponential function for the generating function (11.7). The next example shows that, however
nice they might be, one should not be too enchanted by such expressions.
Note that x = 1 do not belong to the domain of f , which is the open interval ( 1; 1).
The function 1= (1 + x) is de ned for all x 6= 1, but outside the open interval ( 1; 1) is no
longer the generating function of the sequence f( 1)n g. In other words, outside such open
interval the generating function and its analytic expression part ways. N
Proposition 480 The generating function f for a sequence fan g is in nitely di erentiable
on ( r; r), with
f (n) (0)
an = 8n 0 (11.8)
n!
Proof Let f : ( r; r) ! R be the generating function for the sequence fan g restricted
P
1
to the open interval ( r; r). By de nition, f (x) = an xn for all x 2 ( r; r). By the
n=0
Cauchy-Hadamard Theorem, the derived series
1
X
nan xn 1
n=1
P1
has the same radius of convergence of the original series an xn . To see why, observe that
p n=0
lim n n = 1. Indeed,
p p
n 1
lim n n = lim elog n = lim e n log n = 1
because (log n) =n ! 0. Thus,
p p p p
lim sup n jnan j = lim sup n njan j = lim sup n n n jan j
p p p
= lim n n lim sup n jan j = lim sup n jan j
as desired. With this, a uniform continuity argument (see Rudin, 1976, p. 173) shows that
P
1
the function f 0 : ( r; r) ! R given by f 0 (x) = nan xn 1 is the derivative function of f .
n=0
Thus, f is di erentiable and
f 0 (0) = a1
By applying a similar argument to f 0 , which is the generating function for the sequence
fnan g, one proves that f is twice di erentiable with
1
X
00
f (x) = n (n 1) an xn 2
n=2
328 CHAPTER 11. POWER SERIES
Hence, f 00 (0) = a2 . By iterating, one proves that f is k times di erentiable, for each k 1,
with
1
X
(n)
f (x) = n (n 1) (n k + 1) an xn k
n=k
Later in the book, we will learn more about this remarkable di erential representation (Sec-
tion 30.3). Here we observe that it implies that generating functions are uniquely determined.
Corollary 481 There is a one-to-one relation between generating functions and scalar se-
quences.
Proof Let fa and fb be the generating functions of two sequences a = fan g and b = fbn g.
We want to show that
a = b () fa = fb
The implication =) readily follows from the de nition of generating function. As to the
(n)
converse implication, let fa = fb . By the last result, an = bn = fa (0) =n! for all n 0.
Thus, a = b.
Theorem 482 (Abel) Let f : A R ! R be the generating function for a sequence fan g,
with A bounded.
P
1
(i) If the series an rn converges (so, r 2 A), then
n=0
1
X
lim f (x) = f (r) = an rn
x!r
n=0
Proof To ease notation, we prove the result when the interval of convergencePis ( 1; 1) and
r = 1 (cf. Rudin, 1964). By hypothesis, ( 1; 1] A. Let s 1 = 0 and sn = nk=0 xk for all
n 0. By the combinatorial formula (11.19), proved later in the chapter, for each n 0 we
have
n
X1 n
X
(1 x) sk xk + xn sn = ak xk
k=0 k=0
1
X
By hypothesis, sn ! f (1) and the series ak xk converges for all x 2 ( 1; 1). So,
k=0
1
X 1
X
(1 x) sk xk = ak xk 8x 2 ( 1; 1)
k=0 k=0
Fix " > 0. Since sn ! f (1), there is n" 1 such that jsn f (1)j < "=2 for all n n" .
X1
Since (1 x) xk = 1 for all x 2 (0; 1), for all such x we then have
k=0
1
X 1
X 1
X
k k
jf (x) f (1)j = (1 x) x sk f (1) = (1 x) x sk (1 x) f (1) xk
k=0 k=0 k=0
X1 nX
" 1 1
X
"
= (1 x) xk (sk f (1)) (1 x) xk jsk f (1)j + (1 x) xk
2
k=0 k=0 k=n"
nX
" 1
"
(1 x) xk jsk f (1)j +
2
k=0
There exists " 2 (0; 1) small enough such that, for all 1 " < x < 1, we have
nX
" 1
"
(1 x) xk jsk f (1)j
2
k=0
Thus,
" "
1 " < x < 1 =) jf (x) f (1)j + ="
2 2
We conclude that limx!1 f (x) = f (1).
Example 483 As in (ii) of the last example, consider the generating function f : [ 1; 1) !
P
1 P
1
R given by f (x) = xn =n. Since ( 1)n =n converges, by Abel's Theorem it holds
n=1 n=1
limx! 1+ f (x) = f ( 1). N
P
1
provided the series an converges. Next we illustrate this observation.
n=0
330 CHAPTER 11. POWER SERIES
Example 485 Recall from formula (10.13) that we can express binomial coe cients for two
natural numbers 0 k n through falling factorials as
n n(k)
=
k k!
If rather than a natural number n we consider a scalar 2 R, we de ne a generalized falling
factorial by
(k)
= ( 1) ( k + 1)
The standard notion of falling factorial is the special case when is a natural number k.
Note that (k) = 0 for all k + 1 and (k) 0 for all 0 k . These generalized falling
factorials allow us to de ne the generalized binomial coe cients
(k) ( 1) ( k + 1)
=1 ; = =
0 k k! k!
When is a natural number k, we go back to the standard notion of binomial coe cient
for natural numbers. Note that, for all k + 1,
=0 (11.10)
k
Armed with these notions, given any 2 R we look for the generating function f for the
binomial sequence
1
(11.11)
k k=0
11.2. GENERATING FUNCTIONS 331
which, thanks to the inequality (10.18), implies = 1, namely r = 1. So, the domain of f
includes the open interval ( 1; 1). Later in the book (see Example 1409), we will show that
1
X
(1 + x) = xk 8x 2 ( 1; 1) (11.12)
k
k=0
where the power series on the right-hand side is called binomial series. This beautiful
formula allows us to conclude that f (x) = (1 + x) is the sought-after generating function
on ( 1; 1).
Note that, in view of (11.10), for = n formula (11.12) reduces to formula (B.8), i.e., to
n
X n k
n
(1 + x) = x
k
k=0
!
a fa
This observation is important because, remarkably, it turns out that a generating function
fa may be constructed by just using a de nition by recurrence of the sequence a = fan g.
This makes it possible to solve the recurrence if one is able to retrieve (in closed form) the
coe cients of the sequence a = fan g that generates fa . Indeed, such a sequence is unique
and so it has then to be the one de ned by the recurrence at hand.2 We can diagram this
solution scheme as follows:
(
a0 = 0 ; a1 = 1
(11.13)
an = an 1 + an 2 for n 2
that is, f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; :::g. We want to construct its generating function
p f :
n
A R ! R. Since the sequence is positive and increasing, clearly lim sup jan j > 0.
By the Cauchy-Hadamard's Theorem, the domain A contains an open interval ( "; ") with
0 < " < 1. For each scalar x, we have
N
X N
X N
X
an xn = a0 + a1 x + an xn = a0 + a1 x + (an 1 + an 2) x
n
n=1 n=2
x
f (x) = 8x 2 ( "; ")
1 x x2
p
1 5
x=
2
p1 p1
1 5 5
= p p (11.14)
1 x x2 x 1+ 5
+ 2 x + 2 5
1
! 0 p p 1
1 5 1+ 5
x 1 1 x @ 2 2 A
f (x) = p p p =p p p p p
5 1+ 5 1 5 5 1 5 x+ 1 5 1+ 5 1+ 5
x+ 2 x+ 2 2 2 2 x + 2
p p ! p p !
1 5 1+ 5 1+ 5 1 5
x x
= p p 2 p 2 =p 2 p 2 p
5 1 5 1+ 5 5 1 1+ 5 x 1 1 5
2 x 1 2 x 1 2 2 x
11.2. GENERATING FUNCTIONS 333
By the properties of the geometric series, for each x 2 ( "; ") we then have
" p 1 p !n p 1 p !n #
x 1+ 5X 1+ 5 n 1 5X 1 5
f (x) = p x xn
5 2 2 2 2
n=0 n=0
2 3
1 p !n+1 1 p !n+1
x X 1+ 5 X 1 5
= p 4 xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
x 4X 1 + 5 X 1 5
= p xn xn 5
5 n=0 2 2
n=0
2 ! 3
1 p n+1 1 p !n+1
1 4X 1 + 5 X 1 5
= p xn+1 xn+1 5
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 X 1 5
= p xn 1 xn + 1
5 n=0 2 2
n=0
"1 p !n 1 p !n #
1 X 1+ 5 n
X 1 5
= p x xn
5 n=0 2 2
n=0
1
" p !n p !n #
1 X 1+ 5 1 5
= p xn
5 2 2
n=0
By equating coe cients, we conclude that f is generated by the sequence with terms
" p !n p !n #
1 1+ 5 1 5
an = p 8n 0 (11.15)
5 2 2
We call Fibonacci numbers the terms of the sequence (11.15). There is an elegant char-
acterization of their asymptotic behavior.
Proof We have
h p n p ni p n p n
p1 1+ 5 1 5 1+ 5 1 5
an 5 2 2 2 2
p n = p n = p n
p1 1+ 5 p1 1+ 5 1+ 5
5 2 5 2 2
p n
1 5 p !n
2 1 5
= 1 p n =1 p !1
1+ 5 1+ 5
2
334 CHAPTER 11. POWER SERIES
p p
where the last step follows from (8.33) since 0 < 1 5 = 1+ 5 < 1.
In solving the Fibonacci recurrence (11.13) it was key that its generating function (11.14)
is a proper rational function, which can be then studied via its partial fraction expansion.
This suggests that, more generally, one can solve recurrences that have proper rational
generating functions. For simplicity, we focus on the case of distinct real roots.
Proposition 488 Let f (x) = p (x) =q (x) be a proper rational function such that q has k
k
Y
distinct real roots r1 , r2 , ..., rk , so q (x) = (x ri ). Then, f is a generating function for
i=1
the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
where, for all i = 1; :::; k,
p (ri )
bi =
ri q 0 (ri )
We give two proofs of this result: the rst one is direct, while the second one relies on
formula (11.8).
c1 c2 ck c1 1 c2 1 ck 1
f (x) = + + + =
x r1 x r2 x rk r1 1 rx1 r2 1 rx2 rk 1 rxk
1
X 1
X 1
c1 x n
c2 x n
ck X x n
=
r1 r1 r2 r2 rk rk
n=0 n=0 n=0
1
X n n n
c1 x c2 x ck x
=
r1 r1 r2 r2 rk rk
n=0
X1 1
X
c1 xn c2 xn ck xn c1 1 c2 1 ck 1
= = + + + xn
r1 r1n r2 r2n rk rkn n
r1 r1 r2 r2n rk rkn
n=0 n=0
X1
1 1 1
= b1 n + b2 n + + bk xn
r1 r2 rkn
n=0
where bi = p (ri ) =ri q 0 (ri ) for all i = 1; :::; k. We conclude that f is a generating function
for the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
as desired.
11.2. GENERATING FUNCTIONS 335
Proof 2 Consider the function g (x) = 1= (x r). It can be proved by induction that its
derivative of order n is
n!
g (n) (x) =
(r x)n+1
In view of (11.16), we then have
n! n! n!
f (n) (x) = c1 c2 ck
(r1 x)n+1 (r2 x)n+1 (rk x)n+1
f (n) (0) c1 1 c2 1 ck 1
= n
n! r1 r1 r2 r2n rk rkn
1 1 1
= b1 n + b2 n + + bk n
r1 r2 rk
As a dividend of this result, we can solve linear recurrences of order k given by (8.11),
that is,3
(
a0 = 0 ; a1 = 1 ; ::: ; ak 1 = k 1
(11.17)
an = p1 an 1 + p2 an 2 + + pk an k for n k
Some algebra, left to the reader, shows that the Fibonacci formula (11.14) here takes the
general form of a proper rational function given by
+( 2 k 1
0 1 0) x +( 2 1 0) x + +( k 1 k 2 0) x
f (x) =
1 p1 x p2 x2 pk xk
Assume that the polynomial at the denominator has k distinct real roots r1 , r2 , ..., rk . By
the last result, f is then the generating function of the sequence with terms
1 1 1
an = b1 n + b2 n + + bk
r1 r2 rkn
This sequence thus solves the linear recurrence (11.17). The key equation
1 p1 x p2 x2 pk xk = 0
Example 489 We can solve the Fibonacci recurrence (11.13) through this method.
p It is a
linear recurrence of order 2 where p1 = p2 = 1, a0 = 0, a1 = 1, r1 = 1 + 5 =2, and
p
r2 = 1 5 =2. So,
r1 1 1 1
b1 = = = p =p
r1 ( 1 2r1 ) 1 + 2r1 1+ 1+ 5 5
r2 1 1 1
b2 = = = p = p
r2 ( 1 2r2 ) 1 + 2r2 1+ 1 5 5
1 1 1 1
an = p p n p p n
5 1+ 5 5 1 5
2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p n p n p p n p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p n p n
1+ 5 1 5
1 2 1 2
= p p p n p p p n
5 1+ 5 1+ 5 5 1 5 1 5
2 2 2 2
p !n p !n
1 1+ 5 1 1 5
= p p
5 2 5 2
11 1 3
1 x + x2 x =0
6 6
has solutions r1 = 1, r2 = 2, and r3 = 3, we have
1 + ri ri2
bi =
ri 2 + 2ri 3ri2
So, b1 = 1=3, b2 = 1=20, and b3 = 5=69. By the last proposition, the sequence with terms
1 1 1 5 1
an =
3 20 2n 69 3n
solves this linear recurrence of order 3. N
11.3. DISCOUNTED CONVERGENCE 337
De nition 491 We say that a sequence fxn g converges in the sense of Abel to L 2 R, and
A
we write xn ! L, when4
1
X
n 1
lim (1 ) xn = L
"1
n=1
So, thanks to the normalizing factor the constant sequence indeed converges to k, as one
would expect from any bona de mode of convergence.
The proof relies on a couple of lemmas of independent interest. The rst one connects
power and ordinary partial sums.
Pn
Lemma 493 Let s0 = 0 and sn = k=1 xk for all n 1. Then, for each n 1 we have:
n
X n
X
k 1 k 1 n
xk = (1 ) sk + sn (11.19)
k=1 k=1
4
For the meaning of " 1 we refer the reader to Section 8.8.2. For a comprehensive study of Abel (and
Cesaro) convergence, we refer interested readers to Chapter III of Zygmund (2002).
338 CHAPTER 11. POWER SERIES
as desired.
Interestingly, the result holds also when 2 ( 1; 0). In this case we can write = ( 1)
with 2 (0; 1), from (11.20) it then follows that
1
X 1
1 2 +3 2
= ( 1)n 1 n 1
n= (11.21)
n=1
(1 + )2
Pm n 1
Proof Consider the sequence of partial sums sm = n=1 n. We next show that
m
X m
1 n 1 m
sm = 8m 1 (11.22)
1 1
n=1
Assume now the statement is true for m (induction hypothesis). We need to show it holds
for m + 1. By the induction hypothesis, we have that
m+1
X m
X m
X m
n 1 n 1 m 1 n 1 m m
sm+1 = n= n+ (m + 1) = + (m + 1)
1 1
n=1 n=1 n=1
m
X m m
1 n 1 m
= + (1 ) (m + 1)
1 1 1
n=1
Xm m m
1 n 1 m
= + (1 +m m)
1 1 1
n=1
m+1
X m m+1
X m+1
1 n 1 1 n 1 (m + 1)
= (m + 1) =
1 1 1 1
n=1 n=1
5 0
Recall the convention 0 = 1.
11.3. DISCOUNTED CONVERGENCE 339
By induction, we conclude that (11.22) holds. By passing to the limit, we have that
m
! m
1 X m m
1 X 1
n 1 n 1 m
lim sm = lim = lim lim m
m m 1 1 1 m 1 m
n=1 n=1
1 1 1
= 0=
1 1 (1 )2
Proof of Theorem 492 We start by proving some ancillary facts. Let 2 (0; 1). Set
Pk
mk = sk =k = i=1 xi =k for all k 1. By hypothesis, mk ! L. Since mk ! L, it follows
that there exists M > 0 suchP that kjm1k LjP1M for all k 1. In view of (11.20), in turn,
this implies that the series 1k=1 sk = k=1
k 1
kmk converges absolutely. For,
1
X 1
X 1
X 1
X
k 1 k 1 k 1 k 1
k jmk j = k jmk L + Lj k jmk Lj + k jLj
k=1 k=1 k=1 k=1
X1 1
X 1
X
k 1 k 1 k 1 M + jLj
kM + jLj k = (M + jLj) k=
k=1 k=1 k=1
(1 )2
P1 k 1 P1 k 1
This proves that k=1 kmk = k=1 sk converges absolutely. In particular, it
converges. Next, we show that if 2 (0; 1), then
1
X 1
X
(1 ) k 1
xk = (1 )2 k 1
sk (11.23)
k=1 k=1
n n n
j(1 ) sn j = (1 ) n jmn j = (1 ) n jmn L + Lj
n n
(1 ) n jmn Lj + (1 ) n jLj
n n
(1 ) nM + (1 ) n jLj
n n!1
= (1 ) n (M + jLj) ! 0
P
Paired with (11.24), this yields (11.23). De ne f : (0; 1) ! R by f ( ) = (1 ) 1
n=1
n 1
xn
for all 2 (0; 1). By (11.23), f is well-de ned. We are now ready to prove the main state-
C
ment. By hypothesis, xn ! L. So, for each " > 0 there exists n" 1 such that n n"
implies that
sn
L = jmn Lj < "
n
340 CHAPTER 11. POWER SERIES
Fix " > 0 and let n n"=2 . Next, by (11.23) and (11.20), observe that for each 2 (0; 1)
we have:
1
X 1
X
jf ( ) Lj = (1 )2 k 1
sk L = (1 )2 k 1
(sk Lk)
k=1 k=1
X1
sk
= (1 )2 k 1
k L
k
k=1
Xn 1
X
2 k 1 sk 2 k 1 sk
= (1 ) k L + (1 ) k L
k k
k=1 k=n+1
Xn X1
sk sk
(1 )2 k 1
k L + (1 2
) k 1
k L
k k
k=1 k=n+1
n
X 1
X
2 sk sk
(1 ) k 1
k L + (1 )2 k 1
k L
k k
k=1 k=n+1
Xn X1
sk "
(1 )2 k 1
k L + (1 )2 k 1
k
k 2
k=1 k=n+1
n
X
2 k 1 sk "
(1 ) k L +
k 2
k=1
Note that n does not depend on . Indeed, it was chosen before even discussing jf ( ) Lj.
Hence, if we choose close enough to 1, we have that
n
X sk "
(1 )2 k 1
k L <
k 2
k=1
So, for close enough to 1, we have jf ( ) Lj < ". This proves the statement.
Theorems 434 and 492 thus imply the following convergence hierarchy:
C A
xn ! L =) xn ! L =) xn ! L (11.25)
The converses are false. Example 437 showed that a sequence may converge in the sense of
Cesaro but not in the ordinary sense. The following modi cation of that example shows that
a sequence may converge in the sense of Abel but not in that of Cesaro.
Example 495 One can show that for the alternating sequence
xn = ( 1)n+1 n = f1; 2; 3; 4; 5; :::g
we have ( 1
x1 + x2 + + xn 2 if n is even
=
n 1
+ 1
if n is odd
2 2n
as well as
1
X
n 1 1
(1 ) xn =
n=1
(1 + )2
11.3. DISCOUNTED CONVERGENCE 341
So, this sequence converges to 0 in the sense of Abel, but it does not converge in the sense
of Cesaro. N
0; 0 ; 1; 1 ; 0; 0; 0; 0 ; 1; 1; 1; 1; 1; 1; 1; 1; :::
|{z} |{z} | {z } | {z }
2 elements 2 elements 4 elements 8 elements
where every block of 0s and 1s has length equal to the sum of the lengths of the previous
1
X
t 1
blocks. One can show that lim "1 (1 ) xt does not exist, so this sequence does not
t=1
converge in the sense of Abel. N
Tauberian theorems provide su cient conditions under which the converses in (11.25)
actually hold. Landau's Theorem gave a su cient condition under which ordinary and
Cesaro convergence are equivalent. In the same vein, the next classic Tauberian theorem
shows when Abel and Cesaro convergence are equivalent.
We give a remarkable proof, due to Karamata (1930), that relies on the following lemma.
Lemma 498 (Karamata) Let fxn g be a sequence, bounded below, that converges in the
sense of Abel to L. Then
1
X Z 1
n 1 n 1
lim (1 ) xn f =L f (t) dt (11.26)
"1 0
n=1
Proof First, observe that proving (11.26) for positive sequences is equivalent to prove it
for sequences that are bounded from below. Clearly, on the one hand, if (11.26) holds for
sequences that are bounded from below, then it holds for positive sequences (the lower bound
is indeed zero). On the other hand, if fxn g is bounded from below but not positive, there
exists m < 0 such that xn m for all n 1. If we set yn = xn m and zn = m for all
n 1, it is easy to check that fyn g and fzn g are positive as well as
342 CHAPTER 11. POWER SERIES
1
X 1
X 1
X
n 1 n 1 n 1
(1 ) yn = (1 ) (xn + zn ) = (1 ) xn m
n=1 n=1 n=1
A A
hence yn ! L m and zn ! m. If we assume that (11.26) holds for positive sequences,
then
1
X Z 1
n 1 n 1
lim (1 ) yn f = (L m) f (t) dt
"1 0
n=1
and
1
X Z 1
n 1 n 1
lim (1 ) zn f = m f (t) dt
"1 0
n=1
that is
1
X Z 1
n 1 n 1
lim (1 ) ( zn ) f =m f (t) dt
"1 0
n=1
It follows that
1
X 1
X
n 1 n 1 n 1 n 1
lim (1 ) xn f = lim (1 ) (yn zn ) f
"1 "1
n=1 n=1
Z 1
= L f (t) dt
0
In view of all this, in the rest of the proof we assume without loss of generality that the
sequence fxn g is positive. In this case, L 0. We rst prove the result when f is a
X1
n 1
polynomial. Since (1 ) xn converges for all 2 (0; 1) and fxn g is positive, for
n=1
each k 1 and 2 (0; 1), we have that the following series converges and
1
X 1
X
n 1 n 1 k (n 1)(k+1)
(1 ) xn = (1 ) xn
n=1 n=1
1
X
1 k+1 k+1
n 1
= k+1
1 xn
1 n=1
1 1
lim k+1
=
!1 1 k+1
as desired.
Now, let f : [0; 1] ! R be an integrable function. By Proposition 1870, for each " > 0
there exist two polynomials p" ; P" : [0; 1] ! R such that p" f P" and
Z 1 Z 1 Z 1 Z 1 Z 1
P" (x) dx " p" (x) dx f (x) dx P" (x) dx p" (x) dx + " (11.28)
0 0 0 0 0
1
X
n 1 n 1
Since " > 0 was arbitrarily chosen, we conclude that lim "1 (1 ) xn f ex-
n=1
R1
ists. Denote such limit by l. We have that L 0 P" (x) dx l L" for all " > 0. By (11.28),
we have for each " > 0 that
Z 1 Z 1
L P" (x) dx L f (x) dx L"
0 0
R1
yielding that l L 0 f (x) dx 2L". Again, since " > 0 was arbitrarily chosen we conclude
R1
that l = L 0 f (x) dx, as desired.
344 CHAPTER 11. POWER SERIES
Proof of Theorem 497 The \only if" follows from Theorem 492. As to the converse,
A
suppose that fxn g is a sequence, bounded below, such that xn ! L (a dual argument holds
C
if the sequence is bounded above). We want to show xn ! L.
To this end, de ne f : [0; 1] ! R by
(
0 if t 2 0; e 1
f (t) = 1
t if t 2 e 1 ; 1
1
If =e k , we have
1
X 1
X 1
X
1 n 1 1 n 1 n 1 n 1
n 1 n 1
xn f = e k xn f e k = e k xn f e k
because
8 n 1 8 n 1
< 0 if e 2 0; e 1 < 0 1
n 1
k
if e k <e
f e k = n 1 =
: 1
n 1 if e k 2 e 1; 1 : e nk 1 if e 1 e
n 1
k 1
e k
(
0 if n > k + 1
= n 1
e k if 1 n k+1
If we consider the sequence of partial sums, we can de ne the Abel sum of a series: if
A P A P1
sn ! S we write 1 n=1 xn = S and say that S is the Abel sum of the series n=1 xn . In
view of Frobenius' Theorem, Cesaro convergence implies Abel convergence. The converse is
false, as Example 500 will show. Thus, more divergent series become convergent according
to Abel than Cesaro.
This interesting formula helps to best connect Abel and ordinary sums. Indeed, it shows
that
X1 X1
A k 1
xn = lim xk
n=1 !1
k=1
1
X 1
X
k 1 k 1 k 1
If (1 ) sk converges, then so does sk and, in particular, sk ! 0 as
k=1 k=1
k k 1
k ! 1. It follows that sk = sk ! 0. By passing to the limit in n, (11.29) follows.
X1 X1
k 1 k 1
As for the case when xk converges, we only prove it in the case xk
k=1 k=1
converges absolutely. First, we assume that 2 [0; 1) and xk 0 for all k
1. This implies
X1
k 1 k 1
that sk 0 for all k 1 and xk 0 for all k 1. In this case, either sk
k=1
converges or diverges. In the rst case, from the previous part of the proof, it follows that
(11.29) holds. In the second case, the right hand side of (11.30) becomes arbitrarily large,
X1
k 1
a contradiction with xk converging. If fxk g is not positive and 62 [0; 1), then the
k=1
1
X
k 1
sequence jxk j 0 for all k 1 and j j 2 (0; 1). Since xk converges absolutely, we
k=1
P1 k 1
have that k=1 j j jxk j converges. From the previous part of the proof, we have that
1 k
!
X k 1
X
j j jxi j
k=1 i=1
346 CHAPTER 11. POWER SERIES
Pk k 1 Pk
converges. By Lemma 406 and since sk j jk 1 k 1
i=1 jxi j as well as j j i=1 jxi j +
k 1 P1 k 1
sk 0 for all k 1, it follows that k=1 sk converges. From the initial part of the
proof, we can conclude again that (11.29) holds.
Next we give a second proof of formula (11.29) based on the Cauchy product of power
series. For a change, we prove this formula starting from the 0 position, i.e.,
1
X 1
X
k k
(1 ) sk = xk
k=0 k=0
where k = s0 + + sk .
Proof 2PTo ease matters, let us assume that the terms xn are positive. Suppose rst that
(1 ) 1k=0 sk
k
converges. It holds
1
X 1
X
k k
(1 ) sk = (1 ) (x0 + + xk )
k=0 k=0
" 1 1
#
X X
k k
= (1 ) x0 + + xi +
k=0 k=i
" 1 i 1
! #
X 1 X
k k
= (1 ) x0 + + xi +
1
k=0 k=0
i
x0 1 1
= (1 ) + + xi +
1 1 1
i
x0 1 1
= (1 ) + + xi +
1 1 1
i
= x0 + + xi +
P1 k
as desired. Now, assume that the series k=0 xk converges. Set x = fxk g and 1 =
(1; :::; 1). Their convolution has term (1 x)k = x0 + + xk = sk . By Proposition 476, we
then have, for each j j < 1,
1
X 1
X 1
X 1
X 1
X
1 k k k k k
xk = xk = (1 x)k = sk
1
k=0 k=0 k=0 k=0 k=0
as desired.
does not converge in the sense of Cesaro (why?). Yet, it converges in sense of Abel. Indeed,
by (11.21) we have
1
X
n 1 2 1
xn = 1 2 +3 = 8 2 (0; 1)
n=1
(1 + )2
In view of (11.29), we conclude that this series has Abel sum 1=4. N
so that the limit case corresponds to the sum of the utilities of all periods, all with equal
unitary weight. When the horizon is in nite the problem becomes far more complex because,
1
X
for the series ut (xt ) to converge { so that, by Abel's Theorem, formula (11.32) continues
t=1
to hold for T = +1 { it must be that limt!1 ut (xt ) = 0 (Theorem 380), which is hardly
justi able from an economic standpoint.
Let us consider, instead, the Abel limit
1
X
t 1
lim (1 ) ut (xt )
"1
t=1
7
In an alternative interpretation, we can regard U has the objective function of a planner that has to
allocate consumption across di erent generations (each t then identi es a generation).
8
In view of Example 478-(v), we are studying the limit lim !1 f ( ) of the generating function of fut (xt )g.
348 CHAPTER 11. POWER SERIES
By Hardy-Littlewood's Theorem, the bounded sequence fut (xt )g converges in the sense of
Abel if and only if it converges in the sense of Cesaro,9 with
1
X T
t 1 1X
lim (1 ) ut (xt ) = lim ut (xt )
"1 T !1 T
t=1 t=1
V (x) = (1 ) U (x) 8x 2 R1
as long as the limits exist. The in nite patience case is thus captured by the limit of the
average utilities
T
1X
lim ut (xt ) (11.33)
T !1 T
t=1
that is, by the Cesaro limit of the sequence fut (xt )g. Such a criterion can be thus seen as a
limit case for " 1 of the intertemporal utility function V .
X T
The role that the sum ut (xt ) plays in case (11.32) with nite horizon is thus played in
t=1
the in nite horizon case by the limit of the average utilities (11.33). This important economic
application of Hardy-Littlewood's Theorem allows us to elegantly conclude this section.
where 2 (0; 1) and all instantaneous utility functions ut : R ! R are uniformly bounded.10
This is a criterion that a consumer may use at time 0 (say, today) to rank consumption
9
For this \if and only if" It is actually enough that the sequence fut (xt )g be positive (bounded or not).
10
Unlike the previous section, to ease notation here we start from t = 0 rather than t = 1 (this choice is
just a matter of convenience), and we omit the subscript and just write U (x).
11.4. RECURSIVE PATIENCE 349
streams to select the optimal ones. The consumer, however, may face a similar decision
problem at each point of time, thus at t = 1 (tomorrow), at t = 2 , and so on. At each point
of time t he then needs to rank consumption streams to select the ones that are optimal at
t, possibly revising his earlier choices { indeed, only current consumption xt is here assumed
to be irreversibly chosen.
Be that as it may, at each point of time t the consumer features an intertemporal utility
function Ut : R1 ! R over consumption streams. So, a collection fUt gt 0 of intertemporal
utility functions now governs his decisions at di erent points of time. In view of (11.34), a
natural form for these functions is the discounted one
1
X
t
Ut (x) = u (x ) (11.35)
=t
Note that this form presupposes the irrelevance of past consumption, only the current
and future consumption levels (xt ; xt+1 ; :::) matter. It is a non-trivial economic assumption:
habit formation, for instance, may be a channel through which past consumption may matter
(e.g., earlier high consumption levels may \spoil" the consumer, who then adjust less easily
to lower future levels).
The next important result, essentially due to Koopmans (1960, 1972), characterizes the
collections fUt gt 0 that have such a discounted representation. In the statement we consider
sets Zt , one per t 0, and their Cartesian product
1
Y
Z= Zt
t=0
and
ft (z) = 't (z ) + ft+1 (z) 8t 0
350 CHAPTER 11. POWER SERIES
Proof
P1 (i) implies (ii). Fix z 2 Z. By hypothesis, for each t 0 we have ft (z) =
t
=t ' (z ). By the uniform boundedness hypothesis, there is a constant M > 0
such that supzt 2Zt j't (zt )j M for all t 0. So, for each t 0 we have
1
X t+k
X t+k
X
t t t
jft (z)j = ' (z ) = lim ' (z ) lim j j j' (z )j
k!+1 k!+1
=t =t =t
t+k
X 1
X
t t M
M lim j j =M j j
k!+1
=t =t
1 j j
In turn, this implies that supt 0 jft (z)j < +1. At each t 0, we have
1
X 1
X 1
X
t t t 1
ft (z) = ' (z ) = 't (zt ) + ' (z ) = 't (zt ) + ' (z )
=t =t+1 =t+1
1
X
(t+1)
= 't (zt ) + ' (z ) = 't (zt ) + ft+1 (z)
=t+1
as desired.
(ii) implies (i). Fix z 2 Z. By hypothesis, we have ft (z) = 't (zt ) + ft+1 (z). At each
t 0, by iterating the recursion we have
ft (z) = 't (zt ) + ft+1 (z) = 't (zt ) + 't+1 (zt+1 ) + ft+2 (z)
t+k
X
2 t+k+1
= 't (zt ) + 't+1 (zt+1 ) + ft+2 (z) = = ' (z ) + ft+k+1 (z)
=t
that is
t+k
X
t+k+1
ft (z) = ' (z ) + ft+k+1 (z) (11.36)
=t
Set M = supt 0 jft (z)j < +1, so that jft (z)j M for all t 0. Since 2 ( 1; 1), we then
have
t+k+1
ft+k+1 (z) j jt+k+1 M ! 0
t+k+1
We conclude that limk!+1 ft+k+1 (z) = 0. By (11.36), we then have
t+k
!
X
t+k+1
ft (z) = lim zt + ft+k+1 (z)
k!+1
=t
t+k
X 1
X
t+k+1 t
= lim ' (z ) + lim ft+k+1 (z) = ' (z )
k!+1 k!+1
=t =t
P
because, thanks to the uniform boundedness hypothesis, the series 1=t t
' (z ) con-
verges absolutely (cf. the end of Example 475). This completes the proof of the theorem.
A forward recursion thus underlies the classic discounted form of intertemporal utility func-
tions. Note that for its normalized version
1
X
t
Vt (x) = (1 ) u (x )
=t
From a mathematical standpoint, Koopmans' Theorem further illustrates the deep con-
nections between series and recursions, as the backward recursion (9.3) rst showed. In this
regard the following corollary can be useful. Its nal claim follows from Abel's Theorem.
Here l1 denotes the subset of the space of scalar sequences R1 consisting of all bounded
sequences.
Continuity
353
Chapter 12
Limits of functions
sin x
f (x) =
x
and analyze its behavior for points closer and closer to x0 = 0, i.e., to the origin. In the next
table we nd the values that the function assumes at several such points:
By inserting other points, closer and closer to the origin, we can verify that the corresponding
values of f (x) get closer and closer to L = 1. In this case we say that \the limit of f (x), as
x tends to x0 = 0, is L = 1". In symbols,
lim f (x) = 1
x!0
Note that in this example the point x0 = 0 where we take the limit does not belong to the
domain of the function f .
355
356 CHAPTER 12. LIMITS OF FUNCTIONS
How does f behave when it approaches the point x0 = 1? By taking points closer and closer
to x0 = 1 we have:
Adding other points, closer and closer to x0 = 1, we can verify that, as x gets closer and
closer to x0 = 1, f (x) gets closer and closer to L = 1. In this case we say that \the limit of
f (x) as x tends to x0 = 1 is L = 1", and write
lim f (x) = 1
x!1
Observe that the value that the function assumes at the point x0 = 1 is f (1) = 1, so the
limit L = 1 is equal to the value f (1) of the function at x0 = 1.
8
>
> x if x < 1
<
f (x) = 2 if x = 1
>
>
:
1 if x > 1
Compared to the previous example we have introduced a \jump" at the point x = 1, so that
the function jumps to the value 2 { we have indeed f (1) = 2. The graph now is:
12.1. INTRODUCTORY EXAMPLES 357
If we study the behavior of f for values of x closer and closer to x0 = 1, we build the same
table as before (because the function, except at the point 1, is identical to the one in the
previous example). Therefore, also in this case we have
lim f (x) = 1
x!1
This time the value that the function assumes at the point 1 is f (1) = 2, di erent from the
value L = 1 of the limit.
Until now we have approached the point x0 from both the right and the left, that is,
bilaterally (in a two-sided manner). Sometimes this is not possible; rather, one can approach
x0 from either the right or the left, that is, unilaterally (in one-sided manner). Consider, for
example, the function f : R f2g ! R given by f (x) = 1= (x 2) and let x0 = 2. Its graph
is:
358 CHAPTER 12. LIMITS OF FUNCTIONS
\To approach the point x0 = 2 from the right" means to approach it by considering only
values x > 2:
x 2:0001 2:001 2:01 2:05 2:1 2:2 2:5
f (x) 10; 000 1; 000 100 20 10 5 2
For values closer and closer to 2 from the right, the function assumes values that are larger
and larger as well as unbounded above. In this case we say that \the function f tends to
+1 as x tends to 2 from the right" and write
lim f (x) = +1
x!2+
Let us see now what happens by approaching x0 = 2 from the left, that is, by considering
values x < 2:
For values closer and closer to 2 from the left, the function assumes larger and larger (in
absolute value) negative values. In this case we say that \the function f tends to 1 as x
tends to 2 from the left" and write
lim f (x) = 1
x!2
The \right-hand" and the \left-hand" limits both exist but are (dramatically) di erent.
As we will see in Proposition 521, the fact that the one-sided limits are distinct re ects
the fact that the two-sided limit of f (x), as x tends to 2, does not exist. Indeed, the equality
of the one-sided limits is equivalent to the existence of the two-sided limit. For example, if
we modify the previous function by considering f (x) = 1= jx 2j, we have
Now the two one-sided limits are equal and coincide with the two-sided one, which in this
case exists (even if in nite).
Considering again the function f (x) = 1= (x 2), what does it happen if, as x0 , we take
+1? In other terms, what does it happen if we consider increasingly larger values of x?
Look at the following table:
For increasingly larger values of x, the function assumes values closer and closer to 0. In this
12.1. INTRODUCTORY EXAMPLES 359
case we say that \the function tends to 0 as x tends to +1" and write
lim f (x) = 0
x!+1
Observe that the function assumes values close to 0, but always strictly positive: f ap-
proaches 0 \from above". If we want to emphasize this aspect we write
lim f (x) = 0+
x!+1
where 0+ suggests that, while converging to 0, the values of f (x) remain positive.
What does it happen if, instead, as x0 we take 1? We have the following table of
values:
For negative and increasingly larger (in absolute value) values of x, the function assumes
values closer and closer to 0. We say that \the function tends to 0 as x tends to 1" and
write
lim f (x) = 0
x! 1
If we want to emphasize that the function, in approaching 0, remains negative, we write
lim f (x) = 0
x! 1
Finally, after having seen various types of limits, let us consider a function that has no
limit, i.e., that it does not exhibit any \de nite trend". Let f : R f0g ! R be given by
1
f (x) = sin
x
At the origin, i.e., at x0 = 0, the function does not have a limit: for x closer and closer to
the origin, the function continues to oscillate with a tighter and tighter sinusoidal trend:
1 y
0.8
0.6
0.4
0.2
0
x
-0.2
-0.4
-0.6
-0.8
-1
The origin is, however, the only point where this function does not have a limit: at all other
points of the domain the limit exists. A much more dramatic behavior is displayed by the
Dirichlet function f : R ! R de ned by
(
1 if x 2 Q
f (x) = (12.3)
0 else
This remarkable function, proposed by Dirichlet in 1829, oscillates \obsessively" between the
values 0 and 1 because, by the density of the rational numbers in the real numbers, for any
pair x < y of real numbers there exists a rational number q such that x < q < y. As we will
see, the Dirichlet function does not have a limit at any point x0 2 R.
(i) limx!x0 f (x) = L 2 R, i.e., both the point x0 and the limit L are nite (scalars);
(ii) limx!x0 f (x) = 1, i.e., the point x0 is nite but the limit L is in nite;
(iii) limx!+1 f (x) = L 2 R or limx! 1f (x) = L 2 R, i.e., the point x0 is in nite but the
limit L is nite;
(iv) limx!+1 f (x) = 1 or limx! 1f (x) = 1, i.e., both the point x0 and the limit L
are in nite.
We formalize the notion of limit in these cases. We begin with case (i). First of all, let
us observe that we can meaningfully talk of the limit at x0 2 R of a function with domain
A only when x0 is a limit point of A. Indeed, in this case the sentence \as x 2 A tends to
x0 " is meaningful.
if, for every " > 0, there exists a " > 0 such that, for every x 2 A,
0 < jx x0 j < " =) jf (x) Lj < " (12.4)
The value L is called the limit of the function at x0 .
Example 504 Let us show that limx!2 (3x 5) = 1. We have to verify that, for every
" > 0, there exists " > 0 such that
We have j(3x 5) 1j < " if and only if jx 2j < "=3. Therefore, setting " = "=3 yields
(12.6). N
Intuitively, the smaller (so the more demanding) the value of " is, the smaller " is. To
make more precise this intuition, note that the relationship between " and " is similar,
mutatis mutandis, to that between " and n" in the de nition of converge of sequences dis-
cussed at length after De nition 302. Now a function f has limit L at x0 when it passes
the following, still highly demanding, test: given any threshold " > 0 selected by a relentless
examiner, there has to be a small enough " so that all points that are close to x0 have
images that are " close to L.
Note that " depends on " and is not unique: when we nd a value of " , all smaller
values also work ne. For instance, in the last example we can choose as " any (positive)
value lower than "=3. But, one typically focuses on the largest such " (if exists), which is a
genuine threshold value. It is in terms of such \threshold" " that we can, indeed, say: the
smaller (so the more demanding) the value of " is, the smaller " is.
N.B. The value of " , besides depending on ", clearly depends also on x0 . This dependence
is, however, so obvious that it can safely omitted in the notation. O
O.R. It is hard to overestimate the importance of the previous \test" in making rigorous
limit notions in mathematics. Its origin traces back to Eudoxus' method of exhaustion that
underlies integration theory (Chapter 44). Perhaps, the best classic description of a form of
such test is Proposition 1 in Euclid's Book X: \Two unequal magnitudes being set out, if
from the greater there is subtracted a magnitude greater than its half, and from that which
is left a magnitude greater than its half, and if this process is repeated continually, then there
will be left some magnitude less than the lesser magnitude set out" (trans. Heath { we put
in italics the words where the test emerges). Yet, it was only in XIX century that, through
the works of Cauchy and Weierstrass, the test took the form that we presented in De nitions
302 and 503. H
Example 505 For the Dirichlet function (12.3), limx!x0 f (x) does not exist for any x0 2 R.
Indeed, given x0 2 R, let us suppose, by contradiction, that limx!x0 f (x) exists and is equal
to L 2 R. Let 0 < " < 1=2. By de nition, there exists = " such that1
1
x0 6= x 2 (x0 ; x0 + ) =) jf (x) Lj < " <
2
1
The expression \x0 6= x 2 (x0 ; x0 + )" means \x 2 (x0 ; x0 + ) and x 6= x0 ". In words, x
belongs to the interval (x0 ; x0 + ) but is distinct from x0 . To ease notation, similar expressions are used
throughout the chapter.
362 CHAPTER 12. LIMITS OF FUNCTIONS
In each neighborhood (x0 ; x0 + ) there exist both rational points and irrational points
distinct from x0 (see Proposition 42), so points x0 ; x00 2 (x0 ; x0 + ) for which f (x0 ) = 1
00
and f (x ) = 0. We thus reach the contradiction
1 1
1 = j1 0j = f x0 f x00 f x0 L + L f x00 < + =1
2 2
Therefore, limx!x0 f (x) does not exist for any point x0 2 R. N
Though often we consider functions with interval domains, the next example shows that
the scope of the notion of limits goes well beyond this class, however important it might be.
Example 506 Consider the identity function f : Q ! R on the rationals given by f (x) = x
for all x 2 Q. The set of limit points of Q is R (Example 141). For each x0 2 R it holds
lim f (x) = x0
x!x0
Indeed, set " = " for each " > 0. We have, for each x 2 Q,
as (12.4) requires. N
De nition 503, in which the distances are made explicit, is of the \"- " type. In view
of (12.5), it is immediate to rewrite it in the language of neighborhoods. To make notation
more expressive, we denote by U (x0 ) a neighborhood of x0 of radius and by V" (L) a
neighborhood of L of radius ". Graphically, the former is a neighborhood in the horizontal
axis, while the latter is a neighborhood in the vertical axis.
lim f (x) = L 2 R
x!x0
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
lim f (x) = L 2 R
x!x0
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
The di erence between De nitions 507 and 508 is apparently minor: in the former de -
nition we have R, in the latter R. The simple modi cation allows, however, to consider also
the cases (ii), (iii) and (iv). In particular:
To exemplify we consider explicitly a few subcases, leaving to the reader the other ones.
We start with the subcase x0 2 R and L = +1 of (i). In this case De nition 508 reduces to
the following \"- " form (that is, with distances made explicit).
lim f (x) = +1
x!x0
if, for every M > 0, there exists M > 0 such that, for every x 2 A, we have
In other words, for each constant M , no matter how large, there exists M > 0 such that
all the points x0 6= x 2 A that are M close to x0 have images f (x) larger than M .
The point x0 = 2 is a limit point for R f2g, so we can consider limx!2 f (x). Let M > 0.
Setting M = 1=M , we have
1 1
0 < jx x0 j < M () 0 < jx 2j < =) >M
M jx 2j
and therefore
0 < jx 2j < M =) f (x) > M
That is, limx!2 f (x) = +1. N
364 CHAPTER 12. LIMITS OF FUNCTIONS
Let us now consider case (iii) with x0 = +1 and L 2 R. Here De nition 508 reduces to
the following \"- " one.
lim f (x) = L 2 R
x!+1
if, for every " > 0, there exists M" > 0 such that, for every x 2 A, we have
In this case, for each choice of " > 0 arbitrarily small, there exists a value M" such that
the images of points x greater that M" are " close to L.
Finally, we consider case (iv) with x0 = L = +1. Here De nition 508 reduces to the
following one:
lim f (x) = +1
x!+1
if, for every M > 0, there exists N such that, for every x 2 A, we have
Setting N = M 2 yields
x > N =) f (x) > M
That is, limx!+1 f (x) = +1. N
3
By Lemma 315, the fact that A is unbounded above guarantees that +1 is a limit point of A. For
example, this is the case when (a; +1) A.
12.2. FUNCTIONS OF A SINGLE VARIABLE 365
N.B. If A = N+ , that is, f : N+ ! R is a sequence, with the last two de nitions we recover
the notions of convergence and of (positive) divergence for sequences. The theory of limits of
functions extends, therefore, the theory of limits of sequences of Chapter 8. In this respect,
note that the set N+ has only one limit point: +1. This is why the only limit meaningful
for sequences is limn!1 . O
O.R. It may be useful to see the concept of limit \in three stages" (as a rocket):
(iii) all the values of f at x 2 U , x 6= x0 , belong to V , i.e., all the images { excluding at
most f (x0 ) { of f in U \ A belong to V : f (U \ A fx0 g) V .
10 y
V(l)
6
O U(x ) x
0
0
-2
-2 -1 0 1 2 3 4
We are often tempted to simplify to two stages: \the values of x close to x0 have images
f (x) close to L", that is,
Unfortunately, this an empty statement that is always (vacuously) true, as the gure shows:
366 CHAPTER 12. LIMITS OF FUNCTIONS
5
y
3 V(l)
0
O x
U(x )
-1 0
-2
-3
-4
-4 -2 0 2 4 6
In the gure, for every neighborhood U (x0 ), however small, of x0 there exists always a
neighborhood (possibly quite big) V (L) of L inside which fall all the values of f (x) with
x 2 U fx0 g. Such V can always be taken as an open interval that contains f (U fx0 g).H
It is easy to see that limx!1 f (x) does not exist. In these cases one can resort to the weaker
notion of one-sided (or unilateral) limit, which we already met in an intuitive way in the
introductory examples of this chapter. These examples, indeed, suggest two possible cases
when the right limit exists:
Similarly, we also have two \left" cases. Note that in both (i) and (ii) the point x0 is in
R, while the value of the limit is in R.
The next \right" de nition includes both cases.
lim f (x) = L 2 R
x!x+
0
if, for every neighborhood V" (L) of L, there exists a right neighborhood U +" (x0 ) = [x0 ; x0 + " )
of x0 such that
x0 6= x 2 U +" (x0 ) \ A =) f (x) 2 V" (L) (12.12)
The value L is called the right limit of the function at x0 .
In a similar way we can de ne the left limits, denoted by limx!x f (x), as readers can
0
check.
By excluding x0 , the neighborhood U +" (x0 ) reduces to (x0 ; x0 + " ), so (12.12) can be
more simply written as
In case (i), De nition 515 reduces to the following \"- " one.
lim f (x) = L 2 R
x!x+
0
if, for every " > 0, there exists = " > 0 such that, for every x 2 A,
It is easy to see that limx! + f (x) = + 1 and limx! f (x) = , so the one-sided limits
exist but are di erent at . In contrast, they are equal at all other points of the real line,
with (
x0 + 1 if x0 >
lim f (x) = lim f (x) =
x!x+0 x!x0 x0 if x0 <
N
Let us consider the subcase L = +1 of (ii), leaving to the reader the subcase L = 1.
For this case, De nition 515 reduces to the following \"- " one.
lim f (x) = +1
x!x+
0
if, for every M > 0, there exists M > 0 such that, for every x 2 A,
We close this section with an example, from the introduction, in which both one-sided
limits (right and left) exist, but are di erent.
1 1
x x0 < M () x 2< =) >M
M x 2
Therefore
0<x 2< M =) f (x) > M
that is, limx!2+ f (x) = +1. On the other hand, for every x < 2 we have
1 1
x0 x< M () 2 x< =) < M
M x 2
Therefore
0<2 x< M =) f (x) < M
That is, limx!2 f (x) = 1. We conclude that the two one-sided limits exist but are
dramatically di erent. This formally proves (12.1), which was intuitively discussed in the
introduction. N
12.2. FUNCTIONS OF A SINGLE VARIABLE 369
Note that B" (x0 ) fx0 g is a neighborhood of x0 deprived of x0 itself, so \with a hole"
in the middle. The result requires that there exists at least one such neighborhood in A.
Clearly, if x0 2 A this amounts to require that x0 be an interior point of A. But the
hole permits x0 to be outside A. For instance, this is the case if we consider (again) the
function f (x) = 1= jx 2j and the point x0 = 2, which is outside the domain of f . We have
limx!2 f (x) = +1 and hence, by Proposition 521,
lim f (x) = lim f (x) = lim f (x) = +1
x!2 x!2+ x!2
So, by Proposition 521 the two-sided limit limx!2 f (x) does not exist.
Proof We prove the proposition for L 2 R, leaving to the reader the case L = 1.
Moreover, for simplicity we suppose that x0 is an interior point of A.
\If". We show that limx!x f (x) = limx!x+ f (x) = L implies limx!x0 f (x) = L. Let
0 0
" > 0. Since limx!x+ f (x) = L, there exists 0" > 0 such that, for every x 2 (x0 ; x0 + 0" ) \ A,
0
we have jf (x) Lj < ". On the other hand, since limx!x f (x) = L, there exists 00" > 0
0
00 0 00
such that for every x 2 (x0 " ; x0 ) \ A we have jf (x) Lj < ". Let " = min "; " .
Then
x 2 (x0 ; x0 + " ) \ A =) jf (x) Lj < " (12.15)
and
x 2 (x0 " ; x0 ) \ A =) jf (x) Lj < " (12.16)
that is
x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < "
Therefore, limx!x0 f (x) = L.
\Only if". We show that limx!x0 f (x) = L implies limx!x f (x) = limx!x+ f (x) = L.
0 0
Let " > 0. Since limx!x0 f (x) = L, there exists " > 0 such that
x0 6= x 2 (x0 " ; x0 + ") \ A =) jf (x) Lj < " (12.17)
Since x0 is not a boundary point, both intersections (x0 " ; x0 ) \ A and (x0 ; x0 + " ) \ A
are not empty. Therefore, (12.17) implies both (12.15) and (12.16), so limx!x+ f (x) =
0
limx!x f (x) = L.
0
370 CHAPTER 12. LIMITS OF FUNCTIONS
Example 522 For the function on rationals in Example (518), by Proposition 521 we have
that (
x0 + 1 if x0 >
lim f (x) =
x!x0 x0 if x0 <
and that the two-sided limit limx! f (x) does not exist. N
As the reader may have noted, when A is an interval the hypothesis B" (x0 ) fx0 g A
of Proposition 521 forbids x0 to be a boundary point. Indeed, to x ideas, assume that A
is an interval of the real line with endpoints a < b.4 When x0 = a = inf A, it does not
make sense to talk of the one-sided limit limx!a f (x), while when x0 = b = sup A it does
not make sense to talk of the one-sided limit limx!b+ f (x). So, at the endpoints one of the
one-sided limit becomes meaningless.
Interestingly, at the endpoints we instead have
lim f (x) = lim f (x) and lim f (x) = lim f (x) (12.18)
x!a x!a+ x!b x!b
Indeed, the de nition of two-sided limit is perfectly satis ed: for each neighborhood V of L
there exists a neighborhood { necessarily one-sided because x0 is an endpoint { such that
the images of f , except perhaps f (x0 ), fall in V .
A similar observation can be made, more generally, at each boundary point x0 of A. For
p
instance, if A is a half-line [x0 ; +1), the left limit at x0 is meaningless: for f (x) = x and
p
x0 = 0, the left limit limx!0 x is meaningless.
p
Example 523 Let f : [0; 1) ! R be given by f (x) = x. We just remarked that
limx!0 f (x) is meaningless , while in Example 517 we saw that limx!0+ f (x) = 0. By
what we just noted, we can also write limx!0 f (x) = 0. It is instructive to compute this
two-sided limit directly, through De nition 503. Let " > 0. As we saw in Example 517, we
have
p
jf (x) Lj = x < " () x < "2
Setting " = "2 , for every x 2 A, that is, for every x 0, we have
lim f (x) = L 2 R
x!x0
f ((U \ A) fx0 g) V
It is this version of two-sided limit that the reader will nd generalized to topological
spaces in more advanced courses. A similar general version holds for one-sided limits, as the
reader can check.
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
De nition 508 is the special case with n = 1. In the \"- " version we have (12.19) if, for
every " > 0, there exists " > 0 such that, for every x 2 A,
5
For brevity, we consider only the two-sided case x0 2 Rn , leaving the other cases to readers.
372 CHAPTER 12. LIMITS OF FUNCTIONS
P Pn
Set " = "=n. Since j ni=1 xi j i=1 jxi j, we have
v
u n n
uX " X "2
d (x; x0 ) < " () t x2i < () x2i < 2 =)
n n
i=1 i=1
q r
2 "2 2 "2 "
xi < 2 8i = 1; 2; : : : ; n =) jxi j = xi < 2
= 8i = 1; 2; : : : ; n
n n n
Xn n
X Xn
=) jxi j < " =) d (f (x) ; 1) = xi jxi j < "
i=1 i=1 i=1
As the reader can check, we can easily extend to functions of several variables the limits
from above and from below (indeed, the limit L keeps being a scalar, not a vector). Moreover,
the notion of limit can be easily extended to operators. But we postpone it to Chapter 13
(De nition 587), when we will study the continuity of operators, a topic that will motivate
this further extension.
12.3.2 Directions
So far, so good. Too good, in a sense because the multivariable extension of the notion of
limit seems just a matter of upgrading the distance, from the absolute value jx x0 j between
scalars to the more general case of the norm kx x0 k between vectors. Formally, this is true
but one should not forget that, when n > 1, the condition kx x0 k < " controls many more
ways to approach a point. Indeed, in the real line there are only two ways to approach a
point x0 , the left direction and the right one. They are identi ed with and + in the next
gure, respectively.
Instead, in the plane { a fortiori, in a general space Rn { there are in nitely many directions
along which to approach a point x0 , as the gure illustrates:
12.3. FUNCTIONS OF SEVERAL VARIABLES 373
Intuitively, condition (12.21) requires that, as x approaches x0 along all such directions, the
function f tends to the same value L. In other words, the behavior of f is consistent across
all such directions. If, therefore, there are two such directions along which f does not tend
to the same limit value, the function does not have a limit as x ! x0 . The following example
should clarify the issue.
Let us verify that lim(x1 ;x2 )!(0;0) f (x) does not exist. Consider two possible directions along
which we can approach the origin: along the parabola x2 = x21 , and along the straight line
x2 = x1 . Graphically:
log(1 + x21 )
lim f (x1 ; x2 ) = lim f (x1 ; x1 ) = lim =1
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21
Since f tends to two di erent limit values along the two directions, we conclude that
lim(x1 ;x2 )!(0;0) f (x) does not exist.
We can prove this failure rigorously using De nition 525. Suppose, by contradiction, that
the limit exists, that is,
lim f (x1 ; x2 ) = L
(x1 ;x2 )!(0;0)
Set " = 1=4. By de nition of limit, there exists 1 > 0 such that, for (0; 0) 6= (x1 ; x2 ) 2
B 1 (0; 0), we have
1
d (f (x1 ; x2 ) ; L) < (12.22)
4
374 CHAPTER 12. LIMITS OF FUNCTIONS
log(1 + x3 )
g(x) =
x2
one gets limx1 !0 g(x1 ) = 0. Therefore, by setting again " = 1=4, there exists 2 > 0 such
that for, 0 6= x1 2 B 2 (0) R, we have
1 1
g(x1 ) 2 ( "; ") = ;
4 4
Now consider the neighborhood B 2 (0; 0) R2 of (0; 0). Take a point on the parabola
x2 = x21 that belongs to this neighborhood, that is, a point (0; 0) 6= x ^21 2 B 2 (0; 0). We
^1 ; x
have x 6
^1 2 B 2 (0), so
1 1
f x ^21 = g (^
^1 ; x x1 ) 2 ; (12.23)
4 4
Similarly, from the limit along the straight line, by setting
log(1 + x2 )
h(x) =
x2
one gets limx1 !0 h(x1 ) = 1. Therefore, setting again " = 1=4, there exists 3 > 0 such that
for 0 6= x1 2 B 3 (0) R we have
3 5
h(x1 ) 2 (1 "; 1 + ") = ;
4 4
Now consider the neighborhood B 3 (0; 0) R2 of (0; 0) and take a point of the straight line
x2 = x1 that belongs to it, that is, a point (0; 0) 6= (~x1 ; x
~1 ) 2 B 3 (0; 0). We have x
~1 2 B 3 (0),
so that
3 5
f (~x1 ; x
~1 ) = h (^
x1 ) 2 ; (12.24)
4 4
1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) >
2
1 1 1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) d f x ^21 ; L + d (L; f (~
^1 ; x x1 ; x
~1 )) < + =
4 4 2
This contradiction shows that the limit lim(x1 ;x2 )!(0;0) f (x1 ; x2 ) does not exist. N
6
Indeed, d((^ ^21 ); (0; 0)) <
x1 ; x 2, ^21 + x
that is, x ^41 < 2
2, ^21 <
implies x 2
2, whence d(^
x1 ; 0) < 2.
12.3. FUNCTIONS OF SEVERAL VARIABLES 375
Proof Let us suppose, by contradiction, that there exist two di erent limits L0 6= L00 . Let
fxn g be a sequence in A, with eventually xn 6= x0 , such that xn ! x0 . By Proposition 528,
f (xn ) ! L0 and f (xn ) ! L00 , which contradicts the uniqueness of the limit for sequences.
It follows that L0 = L00 .
Alternative proof By contradiction, let us suppose that there exist two di erent limits L1
and L2 , that is, L1 6= L2 . We assume therefore that
lim f (x) = L1
x!x0
and
lim f (x) = L2
x!x0
with L1 6= L2 . Without loss of generality, suppose that L1 > L2 . There exists a number K
such that L1 > K > L2 . Setting 0 < "1 < L1 K and 0 < "2 < K L2 , the neighborhoods
8
That is, for the case x0 2 R that, indeed, includes x ! 1 as the special cases x ! x0 = 1.
12.4. PROPERTIES OF LIMITS 377
B"1 (L1 ) = (L1 "1 ; L1 + "1 ) and B"2 (L2 ) = (L2 "2 ; L2 + "2 ) are disjoint.
10 y
L +ε
2 2
8
L2
L -ε
2 2
6
L +ε
1 1
L
1
4 L -ε
1 1
O x
0
-2
-2 -1 0 1 2 3 4
Since by hypothesis limx!x0 f (x) = L1 , given "1 > 0 one can nd 1 > 0 such that
Analogously, since by hypothesis limx!x0 f (x) = L2 , given "2 > 0 one can nd 2 > 0 such
that
x0 6= x 2 (x0 2 ; x0 + 2 ) \ A =) f (x) 2 (L2 " 2 ; L2 + " 2 ) (12.26)
Taking = min f 1 ; 2 g we have that the neighborhood (x0 ; x0 + ) of x0 with radius
is contained in the two previous neighborhoods, i.e., in (x0 ; x0 + ) both (12.25) and
(12.26) hold:
x0 6= x 2 (x0 ; x0 + ) =) f (x) 2 (L1 " 1 ; L1 + " 1 ) and f (x) 2 (L2 " 2 ; L2 + " 2 )
Hence,
We continue with a version for functions of the theorem on the permanence of sign
(Theorem 319).
We leave to the reader the easy \sequential" proof based on Theorem 319 and on Propo-
sition 528. We give, instead, a proof that directly uses the de nition of limit.
Alternative proof Let L 6= 0, say L > 0. Since limx!x0 f (x) = L, by taking " = L=2 > 0
there exists a neighborhood B" (x0 ) of x0 such that
L L L 3L
x0 6= x 2 B" (x0 ) \ A =) f (x) 2 L ;L + = ;
2 2 2 2
Since L=2 > 0, we are done. For L < 0, the proof is similar.
Again we leave to the reader the easy \sequential" proof based on Theorem 338 and on
Proposition 528, and give a proof based on the de nition of limit.
Alternative proof Let " > 0. We have to show that there exists > 0 such that f (x) 2
(L "; L + ") for every x0 6= x 2 (x0 ; x0 + ) \ A. Since limx!x0 g(x) = L, there exists
1 > 0 such that
8x0 6= x 2 (x0 1 ; x0 + 1) \ A =) L " < g(x) < L + " (12.29)
Since limx!x0 h(x) = L, there exists 2 > 0 such that
8x0 6= x 2 (x0 2 ; x0 + 2) \ A =) L " < h(x) < L + " (12.30)
By taking = min f 1 ; 2 g, both (12.29) and (12.30) then hold in (x0 ; x0 + ) \ A. By
(12.27), we then have
L " < g(x) f (x) h(x) < L + " 8x0 6= x 2 (x0 ; x0 + ) \ A
that is
f (x) 2 (L "; L + ") 8x0 6= x 2 (x0 ; x0 + ) \ A
Since " was arbitrary, we conclude that limx!x0 f (x) = L.
The comparison criterion for functions has the same interpretation than the original
version for sequences (Theorem 338). The next simple application of this criterion is similar,
mutatis mutandis, to that seen in Example 339.
12.4. PROPERTIES OF LIMITS 379
2 1
Example 534 Let f : R ! R be given by f (x) = ex cos x and let x0 = 0. Since
1
0 cos2 1 8x 2 R
x
by the monotonicity of the exponential function we have
2 1
1 = e0 x ex cos x e1 x = ex 8x 0
Setting g (x) = 1 and h (x) = ex , conditions (12.27) and (12.28) are satis ed with L = 1.
Therefore, limx!0 f (x) = 1. The proof for x < 0 is analogous. N
As it was the case for sequences, more generally also for functions the last two results
establish properties of the limits with respect to the underlying order structure of Rn . The
next proposition, which extends Propositions 320 and 321 to functions, is yet another simple
result of this kind.
Proposition 535 Let f; g : A Rn ! R be two functions, x0 2 Rn a limit point of A, and
limx!x0 f (x) = L 2 R and limx!x0 g (x) = H 2 R.
(i) If f (x) g (x) in a neighborhood of x0 , then L H.
(ii) If L > H, then there exists a neighborhood of x0 in which f (x) > g (x).
Observe that in (i) we can only say L H even when we have the strict inequality
f (x) > g (x). For example, for the functions f; g : R ! R given by
1 if x = 0
f (x) =
x2 if x =
6 0
and g (x) = 0 we have, for x ! 0, L = H = 0 although f (x) > g (x) for every x 2 R.
Similarly, if f (x) = 1=x and g (x) = 0, for x ! +1 we have L = H = 0 although
f (x) > g (x) for every x > 0.
As we did so far in this section, we leave the sequential proof { based on Propositions
320 and 321 { to readers and give, instead, a proof based on the de nition of limit.
Alternative proof (i) By contradiction, assume that L < H. Set " = H L, so that
" > 0. The neighborhoods (L "=4; L + "=4) and (H "=4; H + "=4) are disjoint since
L + "=4 < H "=4. Since limx!x0 f (x) = L, there exists 1 > 0 such that
" "
x0 6= x 2 (x0 1 ; x0 + 1 ) =) f (x) 2 L ;L +
4 4
Analogously, since limx!x0 g (x) = H, there exists 2 > 0 such that
" "
x0 6= x 2 (x0 2 ; x0 + 2 ) =) g(x) 2 H ;H +
4 4
By setting = minf 1 ; 2 g, we have
" " " "
x0 6= x 2 (x0 ; x0 + ) =) L < f (x) < L + < H < g(x) < H +
4 4 4 4
That is, f (x) < g(x) for every x 2 B (x0 ). This contradicts the hypothesis that f (x) g (x)
in a neighborhood of x0 .
(ii) We prove the contrapositive. It is enough to note that, if f (x) g(x) in every
neighborhood of x0 , then (i) implies L H.
380 CHAPTER 12. LIMITS OF FUNCTIONS
(i) limx!x0 (f + g) (x) = L+M , provided that L+M is not an indeterminate form (1.24),
of the type
+1 1 or 1+1
(ii) limx!x0 (f g) (x) = LM , provided that LM is not an indeterminate form (1.25), of the
type
1 0 or 0 ( 1)
(iii) limx!x0 (f =g) (x) = L=M provided that g (x) 6= 0 in a neighborhood of x0 , with x 6= x0 ,
and L=M is not an indeterminate form (1.26), of the type10
1 a
or
1 0
Proof We prove only (i), leaving to the reader the analogous proof of (ii) and (iii). Let
fxn g be a sequence in A, with xn 6= x0 for every n 1, such that xn ! x0 . By Proposition
528, f (xn ) ! L and g (xn ) ! M . Suppose that L + M is not an indeterminate form. By
Proposition 333, (f + g) (xn ) ! L + M , and therefore, by Proposition 528 it follows that
limx!x0 (f + g) (x) = L + M .
Example 537 Let f; g : R f0g ! R be given by f (x) = sin x=x and g (x) = 1= jxj. We
have limx!0 sin x=x = 1 and limx!0 1= jxj = +1. Therefore,
sin x 1
lim + = 1 + 1 = +1
x!0 x jxj
As for sequences, when a 6= 0 the case a=0 of point (iii) is actually not an indeterminate
form for the algebra of limits, as the following version for functions of Proposition 335 shows.
Proposition 538 Let limx!x0 f (x) = L 2 R, with L 6= 0, and limx!x0 g(x) = 0. The limit
limx!x0 (f =g) (x) exists if and only if there is a neighborhood U (x0 ) of x0 2 Rn where the
function g has constant sign, except at most at x0 . In this case:11
9
For brevity, we focus on Proposition 333 and leave to the reader the analogous extension of Proposition
337.
10
As for sequences, to exclude the indeterminacy a=0 amounts to require M 6= 0.
11
Here g ! 0+ and g ! 0 indicate that limx!x0 g (x) = 0 with, respectively, g(x) 0 and g (x) 0 for
every x0 6= x 2 U (x0 ).
12.5. ALGEBRA OF LIMITS 381
f (x)
lim = +1
x!x0 g (x)
f (x)
lim = 1
x!x0 g (x)
As in the previous section, we considered only limits at points x0 2 Rn . The reader can
verify that for scalar functions the results of this section extend to the case x ! 1.
Example 540 Take f (x) = 1=x 1 and g(x) = 1=x. As x ! +1 we have f ! 1 and
g ! 0. Since g(x) > 0 for every x > 0, so also in any neighborhood of +1, we have g ! 0+ .
Thanks to the version for x ! 1 of Proposition 538, we have limx!+1 (f =g) (x) = 1.
N
Indeterminate form 1 1
For example, the limit limx!0 (f + g) (x) of the sum of the functions f; g : R f0g ! R
given by f (x) = 1=x2 and g (x) = 1=x4 falls under the indeterminate form 1 1. We
have
1 1 1 1
(f + g) (x) = 2 4
= 2 1
x x x x2
and, therefore,
1 1
lim (f + g) (x) = lim lim 1 = 1
x!0 x!0 x2 x!0 x2
since (+1) ( 1) is not an indeterminate form. Exchanging the signs between these two
functions, that is, by setting f (x) = 1=x2 and g (x) = 1=x4 , we have again the indetermi-
nate form 1 1 at x0 = 0, but this time limx!0 (f + g) (x) = +1. Thus, also for functions
the indeterminate forms can give completely di erent results, everything goes. So, they must
be solved case by case.
Finally, note that these functions f and g give rise to an indeterminacy at x0 = 0, but not
at x0 6= 0. Therefore, for functions it is crucial to specify the point x0 that we are considering.
This is, indeed, the only novelty that the study of indeterminate forms of functions features
relative that of sequences (for which we only have the case n ! +1).
382 CHAPTER 12. LIMITS OF FUNCTIONS
Indeterminate form 0 1
For example, consider the functions f; g : R ! R given by f (x) = (x 3)2 and g (x) =
1= (x 3)4 . The limit limx!3 (f g) (x) falls under the indeterminate form 0 1. But we have
1 1
lim (f g) (x) = lim (x 3)2 4 = lim = +1
x!3 x!3 (x 3) x!3 (x 3)2
On the other hand, by considering f (x) = 1= (x 3)2 and g (x) = (x 3)4 , we have
1
lim (f g) (x) = lim (x 3)4 2 = lim (x 3)2 = 0
x!3 x!3 (x 3) x!3
Again, only the direct calculation of the limit can determine its value.
In the two case the limits are in nities of opposite sign: again, one cannot avoid the direct
calculation of the limit.
For the functions f and g just seen, at the point x0 = 0 we have the indeterminate form
0=0, but
f x2
lim (x) = lim = lim x = 0
x!0 g x!0 x x!0
while, setting g (x) = x4 , we still have an indeterminate form of the type 0=0 and
f x2 1
lim (x) = lim
= lim 2 = +1
x!0 g x!0 x4 x!0 x
p
On the other hand, by taking f : R+ ! R given by f (x) = x + x 2 and g : R f1g ! R
given by g (x) = x 1, we have
p p p
f x+ x 2 x 1+ x 1 x 1
lim (x) = lim = lim = lim 1 +
x!1 g x!1 x 1 x!1 x 1 x!1 x 1
p
x 1 1 1 3
= 1 + lim p p = 1 + lim p =1+ =
x!1 ( x 1) ( x + 1) x!1 x+1 2 2
12.6. COMMON LIMITS 383
We close with two observations: (i) as for sequences (Section 8.10.5), for functions the
various indeterminate forms can be reduced to one another; (ii) also in the case of functions
we can summarize what we have seen till now in tables similar to those in Section 8.10.4, as
readers can check.
lim xn = xn0
x!x0
(iv) Let f : (0; 1) ! R be given by f (x) = loga x, with a > 0; a 6= 1. For every x0 > 0, we
have limx!x0 loga x = loga x0 . Moreover,
( (
1 if a > 1 +1 if a > 1
lim loga x = and lim loga x =
x!0+ +1 if a < 1 x!+1 1 if a < 1
(v) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. For every x0 2 R, we
have limx!x0 sin x = sin x0 and limx!x0 cos x = cos x0 . The limits limx! 1 sin x and
limx! 1 cos x do not exist. N
Next we prove some classic limits for trigonometric functions (we already met the rst
one in the introduction of this chapter).
384 CHAPTER 12. LIMITS OF FUNCTIONS
Proposition 542 Let f; g : R+ ! R be de ned by f (x) = sin x=x and g (x) = (cos x 1) =x.
Then
sin x
lim =1 (12.31)
x!0 x
and
1 cos x 1 cos x 1
lim = 0; lim = (12.32)
x!0 x x!0 x2 2
Proof It is easy to see graphically that 0 < sin x < x < tan x for x 2 (0; =2) and that
tan x < x < sin x < 0 for x 2 ( =2; 0). Therefore, by dividing all the terms by sin x and
by observing that sin x > 0 when x 2 (0; =2) and sin x < 0 when x 2 ( =2; 0), we have in
all the cases
x 1
1< <
sin x cos x
The rst limit then follows from the comparison criterion. For the third one, it is su cient
to observe that
1 cos x 1 cos x 1 + cos x 1 cos2 x sin2 x 1
2
= 2
= 2
= 2
x x 1 + cos x x (1 + cos x) x 1 + cos x
and that, as x ! 0, the rst factor tends to 1 while the second one tends to 1=2. Finally,
the second limit follows immediately from the third one:
1 cos x 1 cos x 1
=x !0 =0
x x2 2
Finally, from the analogous propositions that we proved for sequences, we easily deduce
(the proofs are essentially identical) the following limits:
af (x) 1
lim = log a (12.33)
x!x0 f (x)
In particular,
ax 1
lim = log a (12.34)
x!0 x
which, when a = e, becomes
ex 1
lim =1 (12.35)
x!0 x
12.6. COMMON LIMITS 385
loga (1 + f (x)) 1
lim =
x!x0 f (x) log a
In particular,
loga (1 + x) 1
lim =
x!0 x log a
which, when a = e, becomes
log(1 + x)
lim =1
x!0 x
(iv) If f (x) ! 0 as x ! x0 , we have
(1 + f (x)) 1
lim =
x!x0 f (x)
In particular,
(1 + x) 1
lim =
x!0 x
N.B. The function u : (0; 1) ! R de ned by
( x1 1
1 if 6= 1
u (x) =
log x if =1
is the classic CRRA (constant relative risk aversion) utility function, where the scalar is
interpreted as a coe cient of relative risk aversion (see Pratt, 1964, p. 134). In view of the
limit (12.34), we have12
x1 1
lim u (x) = lim = log x = u1 (x)
!1 !1 1
In a similar vein, de ne u : R ! R by
8 x
< 1 e
if >0
u (x) =
: x if =0
the classic CARA (constant absolute risk aversion) utility function, where the scalar is
interpreted as a coe cient of relative risk aversion (see Pratt, 1964, p. 130). In view of the
limit (12.35), we have
1 e x 1 e x e x 1
lim u (x) = lim = lim ( x) = x lim = x = u0 (x)
!0 !0 !0 x !0 x
Note that the scalars and index functions, through them we are actually studying limit
properties of functions. O
12
Here 1 plays the role of x in (12.34).
386 CHAPTER 12. LIMITS OF FUNCTIONS
(i) If
f (x)
lim =0
x!x0 g (x)
we say that f is negligible with respect to g as x ! x0 ; in symbols,
f = o (g) as x ! x0
(ii) If
f (x)
lim = k 6= 0 (12.36)
x!x0 g (x)
we say that f is comparable with g as x ! x0 ; in symbols,
f g as x ! x0
(iii) In particular, if
f (x)
lim =1
x!x0 g (x)
we say that f and g are asymptotic (or asymptotically equivalent) to one another as
x ! x0 and we write
f (x) g (x) as x ! x0
It is easy to see that also for functions the relations and for continue to satisfy the
properties seen in Section 8.14 for sequences, i.e.,
(i) the relations of comparability and of asymptotic are symmetric and transitive;
(iii) if limx!x0 f (x) and limx!x0 g (x) are both nite and non-zero, then f g as x ! x0 ;
We now consider the cases, which also for functions continue to be the most interesting
ones, in which both functions either converge to zero or diverge to 1. We start with
convergence to zero: limx!x0 f (x) = limx!x0 g (x) = 0. In this case, intuitively, f is neg-
ligible with respect to g as x ! x0 if it tends to zero faster. Let, for example, x0 = 1,
f (x) = (x 1)2 and g (x) = x 1. We have
(x 1)2
lim = lim (x 1) ! 0
x!1 x 1 x!1
so that g = o (f ) as x ! 0.
In sum, also for functions the meaning of negligibility must be speci ed according to
whether we consider convergence to zero or divergence to in nity. Moreover, the point x0
where we take the limit is key, as already remarked several times (repetita iuvant, hopefully).
Proposition 544 For every pair of functions f and g and for every scalar c 6= 0, we have:
(i) o(f ) + o (f ) = o (f );
We omit the proof because it is similar, mutatis mutandis, to that of Proposition 353.
Also the comments we made about that proposition still apply { in particular, about the
important special case o(f )o(f ) = o(f 2 ) of point (ii).
Example 545 Let f (x) = xn , with n > 2. Consider the two functions g(x) = xn 1 and
h(x) = e x 3xn 1 . It easy to check that g = o(f ) = o(xn ) and h = o(f ) = o(xn ) as
x ! +1.
(i) Summing the two functions we obtain g + h = e x 2xn 1, which is still o(xn ) as
x ! +1, in accordance with Proposition 544-(i).
(iii) Set c = 3 and consider c g = 3xn 1 . It is easy to check that 3xn 1 is still o(xn ) as
x ! +1, in accordance with Proposition 544-(iii).
(iv) Consider the function l(x) = x + 1. It is easy to check that l = o(g) = o(xn 1 ) as
x ! +1. Consider now the sum l + h, which is a sum of a o(g) and of a o(f ), with
g = o(f ). We have l + h = x + 1 + e x 3xn 1 , which is o(xn ) as x ! +1, i.e., o(f ),
in accordance with Proposition 544-(iv). Note that l + h is not o(g), even if l = o(g).
N
The next proposition presents some classic instances of functions with di erent rates of
divergence.
xk
lim x
=0
x!+1
(ii) xh = o xk as x ! +1 if h < k;
loga x
lim =0
x!+1 xk
12.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 389
By the transitivity property of the negligibility relation, from (i) and (ii) it follows that
x
loga x = o ( ) as x ! +1
Proof For all the three functions x , xk , loga x, one has that f (n 1) f (x) f (n) where
n = [x] is the integer part of x: such sequences are therefore increasing. It is then su cient
to use the sequential characterization of the limit of a function and to use the comparison
criterion.
That is, two functions asymptotic to one another as x ! x0 have the same limit as x ! x0 .
In particular, we have the following version for functions of Lemma 355.14
(ii) f (x) =h (x) g (x) =l (x) as x ! x0 , provided that h (x) 6= 0 and l(x) 6= 0 in every
point x 6= x0 of a neighborhood B" (x0 ).
Therefore,
lim f (x) = L () lim (f (x) + o (f (x))) = L
x!x0 x!x0
and
f (x) + o (f (x)) f (x)
as x ! x0 (12.39)
g (x) + o (g (x)) g (x)
3
and let us set f (x) = x and g (x) = x 2 . As x ! +1, we have
1 2
2x 2 + 5x 3 = o (f ) and 3 + 3x = o (g)
x 2
+ 2x 4 + e x x 2
4 + x 8 + 3x 10 4
= x2 ! +1 as x ! +1
x x
(iii) Consider the limit
1 cos x
lim
x!0 sin2 x + x3
12.7.3 Terminology
Here too, for the comparison of two functions that both either converge to 0 or diverge to
1, there is a speci c terminology. In particular,
(iii) if two functions f and g are in nitesimal at x0 and such that f = o (g) as x ! x0 , then
f is said to be in nitesimal of higher order at x0 with respect to g;
12.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 391
(iv) if two functions f and g are in nite at x0 and such that f = o (g) as x ! x0 , then f
is said to be in nite of lower order with respect to g.
A function is, therefore, in nitesimal of higher order than another one if it tends to zero
faster, while it is in nite of lower order if it tends to in nity slower.
(ii) xk = o ( x ) for every > 1 and k > 0, as already proved with the ratio criterion. If
instead 0 < < 1 and k > 0, then x = o xk .
(iii) If k1 > k2 > 0, then xk2 = o xk1 ; indeed, xk2 =xk1 = xk2 k1 ! 0.
The previous results can be organized in scales of in nities and in nitesimals, in analogy
with what we saw for sequences. For brevity we omit the details.
392 CHAPTER 12. LIMITS OF FUNCTIONS
Chapter 13
Continuous functions
Ibis redibis, non morieris in bello (you will go, you will return, you will not die in war).
So the oracle muttered to the inquiring king, who had to decide whether to go to war. Or,
maybe, the oracle actually said: ibis redibis non, morieris in bello (you will go, you will not
return, you will die in war). A small change in a comma, a dramatic di erence in meaning.
When small changes have large e ects, instability may result: a small change may, sud-
denly, dramatically alter matters. In contrast, stability prevails when small changes can only
have small e ects, so nothing dramatic can happen because of small alterations. Continuity
is the mathematical translation of this general principle of stability for the relations between
dependent and independent variables that functions represent.
13.1 Generalities
Intuitively, a function is continuous when the relation between the independent variable
x and the dependent variable y is \regular", without breaks. The graph of a continuous
function can be drawn without ever lifting the pencil.
This means that a function is continuous at a point x0 of the domain if the behavior
towards x0 of the function is consistent with the value f (x0 ) that it actually assumes at x0 ,
that is, if the limit limx!x0 f (x) is equal to the image f (x0 ).
393
394 CHAPTER 13. CONTINUOUS FUNCTIONS
( p
x for x 0
f (x) =
1 for x = 1
Here x0 = 1 is an isolated point in the domain. Hence, we can (conveniently) say that f is
continuous at every point of its domain.
3
y
1 1
0
-1 O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
8
>
> x for x < 1
<
f (x) = 2 for x = 1 (13.2)
>
>
:
1 for x > 1
13.1. GENERALITIES 395
3
y
2 2
1 1
0
O 1 x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
The function f is, thus, not continuous at the point x0 = 1 because there is no consistency
between its behavior at the limit and its value at x0 . On the other hand, f is continuous at all
the other points of its domain: indeed, it is immediate to verify that limx!x0 f (x) = f (x0 )
for every x0 6= 1, so f does not exhibit other jumps besides the one at x0 = 1.
The distinction between limit points and isolated points becomes super uous for the
important case of functions f : I ! R de ned on an interval I of the real line. Indeed, the
points of any such interval { be it bounded or unbounded, closed, open or semi-closed { are
always limit points, so that f is continuous at any x0 2 I if limx!x0 f (x) = f (x0 ).
Let us give some examples. We start by observing that elementary functions are contin-
uous.
2
The condition xn 6= x0 of Proposition 528 does not appear here because x0 belongs to A.
396 CHAPTER 13. CONTINUOUS FUNCTIONS
Example 553 (i) Let f : (0; 1) ! R be given by f (x) = log x. Since limx!x0 log x = log x0
for every x0 > 0, the function is continuous.
(ii) Let f : R ! R be given by f (x) = ax , with a > 0. Since limx!x0 ax = ax0 for every
x0 2 R, the function is continuous.
(iii) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. Since limx!x0 sin x =
sin x0 and limx!x0 cos x = cos x0 , both functions are continuous.
1
f (x) =
x2 2
is continuous: N
10
y
8
0
O x
-2
-4
-6
-8
-10
-2 -1 0 1 2 3 4
13.1. GENERALITIES 397
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
N
398 CHAPTER 13. CONTINUOUS FUNCTIONS
Example 556 (i) The Dirichlet function is not continuous at any point of its domain because
limx!x0 f (x) does not exist for any x0 2 R (Example 505).
(ii) The function f : R ! R given by
(
sin x1 if x 6= 0
f (x) = (13.6)
0 if x = 1
is continuous everywhere except the origin, where it does not have a limit:
1 y
0.8
0.6
0.4
0.2
0
x
-0.2
-0.4
-0.6
-0.8
-1
N
Example 557 De ne f : R ! R by
(
2x + b if x 2
f (x) = (13.7)
4 x2 if x > 2
For which values of b is f continuous at x0 = 2 (so, on its domain)? To answer this question,
it is necessary to nd the value of b such that
lim f (x) = lim f (x) = f (2)
x!2 x!2+
so that f and lim becomes exchangeable; in other words, limits can be taken inside the
arguments. This exchangeability is the essence of the concept of continuity.
By proceeding as in Example 526, we can verify that limx!x0 f (x) = f (x0 ) for every x0 2 Rn .
The function is, therefore, continuous.
(ii) The function
1
f (x1 ; x2 ) = x21 +
x2
is continuous at each point of its natural domain
A = x = (x1 ; x2 ) 2 R2 : x2 6= 0
In reading Proposition 552 one should not forget that, as emphasized in Section 12.3.2, in
the multivariable case there are in nitely many directions along which to approach a point
x0 . The next example is a stark illustration of this remark.
This function is separately continuous, that is, it is continuous in each variable once the
other is xed { so the function becomes a scalar function in the \free" variable.4 Indeed, by
setting x2 = k we have
xk 2 x0 k 2
lim f (x; k) = lim = = f (x0 ; k) 8k 2 R
x!x0 x!x0 x2 + k 4 x20 + k 4
kx22 kx20
lim f (k; x) = lim = = f (k; x0 ) 8k 2 R
x!x0 x!x0 k 2 + x4 k 2 + x40
3
With his sharp mind, Giuseppe Peano was a master of counterexamples (a few of them entered calculus
folklore).
4
See Section 20.4.1 for more on this.
400 CHAPTER 13. CONTINUOUS FUNCTIONS
So, the function behaves according to continuity as we approach it at a point (x1 ; x2 ) of the
plane along the vertical and horizontal directions:
0.8
0.6
0.4
0.2
x0
2
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
Yet, the function is not continuous at the origin 0 = (0; 0). Indeed, let us approach the
origin along a parabola (x1 ; x2 ) = (t2 ; t), with t 2 R. We have
t4 1
lim f (t; t) = lim = 6= 0 = f (0)
t!0 t!0 t4 + t4 2
So, if we take a sequence xn = (1=n2 ; 1=n), we have xn ! 0 but not f (xn ) ! f (0). By
Proposition 552, the function is not continuous at the origin. N
O.R. Naively, we could claim that a function such as f (x) = 1=x has a (huge) discontinuity
at x = 0. After all, it makes a \big jump" by passing from 1 to +1.
5
Later in the book, Example 1769 will present another example of this kind.
13.1. GENERALITIES 401
10
y
8
0
O x
-2
-4
-6
-8
-10
-2 -1 0 1 2 3 4
In contrast, the function g (x) = log x does not su er from any such problem, so it seems
\more continuous":
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
If we pay close attention to these two functions, however, we would realize that 1=x commits
the little sin of not being de ned for x = 0, an \original" sin, while log x commits the much
more serious sin of being de ned neither at x = 0 nor at any x < 0.
The truth is that, at the points at which a function is not de ned it is meaningless
to wonder about its continuity,6 a property of a function that can be considered only at
points where the function is de ned. At such points, the functions 1=x and log x are both
continuous. H
6
It would be as asking if green pigs are able to y: they do not exist, so the question is meaningless.
402 CHAPTER 13. CONTINUOUS FUNCTIONS
13.2 Discontinuity
As the previous examples indicate, for functions of a single variable f : I ! R there are
di erent types of discontinuity.7 Speci cally, f may be not continuous at x0 because:
(i) the two-sided limit at x0 exists nite but is di erent from f (x0 ), i.e.,
(ii) the one-sided limits at x0 exist nite but are di erent, i.e.,
(iii) at least one of the one-sided limits at x0 is either 1 or does not exist.
lim f (x) = 1=
6 lim f (x) = +1
x!0 x!0+
(the two-sided limit here exists, but it is in nite). Finally, the discontinuity at each point
x0 2 R of the Dirichlet function is also of type (iii) because it is easy to see that its one-sided
limits do not exist; the same is true for the discontinuity at the origin of function (13.6). For
convenience, next we line up these discontinuity examples:
10 5
y 1 y
y
8 0.8
4
6 0.6
3 3
y 4 0.4
2 0.2
2 2
2
0 0
O x 1 x
-0.2
1 1 -2
-0.4
-4 0
0 O x -0.6
O 1 x -6
-1 -0.8
-8
-1
-1
-10 -2
-2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4 -0.4 -0.2 0 0.2 0.4 0.6
-2
-3
-3 -2 -1 0 1 2 3 4 5
7
Recall that f (x0 ) 2 R, we cannot have f (x) = 1. To ease matters, in this section we focus on functions
f : I ! R with interval domains.
13.2. DISCONTINUITY 403
(ii) De ne f : R ! R by
( x2 1
x 1 if x 6= 1
f (x) =
0 if x = 1
Such a xing is, in general, no longer possible for non-removable discontinuities, in partic-
ular for jump discontinuities. To see it, let f : I ! R be a function with a jump discontinuity
at an interior point x0 of I. The jump (or saltus) of f at x0 is given by the di erence
There is no easy x for this non-removable discontinuity. To restore continuity one has to
play with the domain by considering suitable restrictions that get rid of the problematic
points, as we did in Example 13.5 by getting rid of the discontinuity point x0 = 1.
Thus, jumps at interior points have no easy x. Matters are, instead, simpler at the
endpoints of the interval I that belong to it. In this case, the jump would be
at the upper endpoint x0 = max I. This kind of jump discontinuity is actually removable.
Formally, this is the case because at endpoints one-sided and two-sided limits coincide.8
Intuitively, it is enough to look at the graph of the function, as the next example illustrates.
Example 562 When I = [a; b], the possible jumps at the endpoints are
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
This function is continuous on (0; 1) but not at the endpoints, where it has the jumps
1 1
lim f (x) f (a) = and f (b) lim f (x) =
x!0+ 2 x!1 2
Clearly, the xed version f~ : [0; 1] ! R of function f is the identity function given by
f~ (x) = x. N
Summing up, the distinction between removable and jump discontinuity is sharp for
functions de ned on open intervals, with jump discontinuity being a signi cantly more severe
type of discontinuity. For functions de ned on general intervals, the distinction is relevant
only at the interior points because a jump discontinuity at an endpoint is actually removable.
8
Recall (12.18) of Section 12.2.3.
13.3. OPERATIONS AND COMPOSITION 405
for each pair of distinct points x0 < y0 of the domain of f . Therefore, these limits exist
nite, which excludes discontinuities of type (iii).
Moreover, f cannot have removable discontinuities at interior points of I because they
would violate monotonicity. Therefore, a monotone function can only have jump disconti-
nuities. Indeed, the next result shows that a monotone function can have at most countably
many jump discontinuities. The proof of this useful result is based on the following lemma,
which is of independent interest.
Proof Let fIj gj2J be a collection of disjoint intervals of R. By the density of the rational
numbers, each interval Ij contains a rational number qj . Since these intervals are disjoint,
we have qj 6= qj 0 for j 6= j 0 . Then, the set of rational numbers fqj gj2J is a proper subset of
Q and is, therefore, at most countable. In turn, this implies that the index set J is, at most,
countable.
The disjointedness hypothesis cannot be removed: for instance, the set of overlapping
intervals f( r; r) : r 2 Rg is clearly uncountable.
Proposition 564 A monotone function f : I ! R can have at most countably many jump
discontinuities.
Proof Consider rst the open interval I = (a; b) with a; b 2 R (i.e., bounded or not). A
jump discontinuity of the function f determines a bounded interval in its codomain { i.e., in
the real line { with endpoints given by limx!b f (x) and limx!a+ f (x). By the monotonicity
of f , these intervals determined by jumps are disjoint. By Lemma 563, the intervals, and
therefore the jumps of f , are at most countable. This proves the result when I = (a; b).
Now, let I be any interval. Its interior has the form (a; b), with a; b 2 R. By what
just proved, f has at most countably many jump discontinuities on (a; b). This observation
readily implies the result (why?).
In the proof the monotonicity hypothesis is key for having countably many discontinuities:
it guarantees that the intervals de ned by the jumps of the function do not overlap.
Proof We prove (i), leaving to the reader the other points. Since limx!x0 f (x) = f (x0 ) 2 R
and limx!x0 g (x) = g (x0 ) 2 R, Proposition 536-(i) yields
Therefore, f + g is continuous at x0 .
Proof Let fxn g A be such that xn ! x0 . By Proposition 552, f (xn ) ! f (x0 ). Since
g is continuous at f (x0 ), another application of Proposition 552 shows that g (f (xn )) !
g (f (x0 )). Therefore, g f is continuous at x0 .
As the next example shows, the result can be useful also in the computation of limits
since, when its hypotheses hold, we can write
x2
lim sin
x! x+
Indeed, once we observe that it can be written in terms of g f , then by (13.10) we have
x2 2
lim sin = lim (g f ) (x) = (g f ) ( ) = sin = sin =1
x! x+ x! 2 2
Therefore, continuity permits to calculate limits by substitution. N
13.4. ZEROS AND EQUILIBRIA 407
13.4.1 Zeros
The rst result, Bolzano's Theorem,9 is very intuitive. Yet its proof, although simple, is
not trivial, showing how statements that are intuitive may happen to be di cult to prove.
Intuition is a fundamental guide in the search for new results, but it may be misleading.
Sometimes, properties that appeared to be intuitively true turned out to be false.10 For this
reason, the proof is the unique way of establishing the validity of a result; intuition, even the
most re ned one, must at a certain point leave the place to the rigor of the mathematical
argument.
Proof If f (a) f (b) = 0, we have f (a) = 0 or f (b) = 0. In the rst case, the result holds
by setting c = a; in the second case, by setting c = b. If instead f (a) f (b) < 0, we have
either f (a) < 0 < f (b) or f (a) > 0 > f (b). We rst study the case f (a) < 0 < f (b). Let
C = fx 2 [a; b] : f (x) < 0g
9
The result is named after Bernard Bolzano, who gave a rst proof in 1817.
10
Recall Guidi's crescendo in Section 10.3.2.
408 CHAPTER 13. CONTINUOUS FUNCTIONS
In words, C is the set of points where f is strictly negative. This set is not empty because
a 2 C. Let c = sup C. By Proposition 127,
c x 8x 2 C (13.11)
and
8" > 0; 9x0 2 C; x0 > c " (13.12)
We next prove that f (c) = 0. By contradiction, assume that f (c) 6= 0, i.e., that either
f (c) < 0 or f (c) > 0. Let us study separately the two possibilities and show that in both of
them we reach a contradiction.
Case 1: f (c) < 0. Hence, c < b because f (b) > 0. Since f is continuous, by Theorem 532
(permanence of sign) there exists > 0 small enough so that:11
Since c < b, we can actually take small enough so that also b c. Thus, c + b
and so (c; c + ) [a; b]. With this, take any x 2 (c; c + ). By the de nition of C, we have
x 2 C. As x > c, this contradicts property (13.11), so the fact that c is a supremum. We
thus reached a contradiction.
Case 2: f (c) > 0. Again by Theorem 532 there exists > 0 small enough so that
Summing up, in both cases we reached a contradiction. We conclude that f (c) = 0. This
completes the proof when f (a) < 0 < f (b). Next, assume that f (a) > 0 > f (b). Consider
the continuous function f . Clearly, ( f ) (a) < 0 < ( f ) (b). So, by what has been just
proved, there exists c 2 [a; b] such that ( f ) (c) = 0. Hence, f (c) = 0 as desired.
Finally, if f is strictly monotone, it is injective (Proposition 218) and therefore there
exists a unique point c 2 [a; b] such that f (c) = 0.
Bolzano's Theorem is a central result in the study of equations, as it will be seen momen-
tarily in Chapter 14. To preview this role, here we show a simple application to the existence
of real solutions of a polynomial equation. To this end, let f : R ! R be a polynomial
2 n
f (x) = 0 + 1x + 2x + + nx (13.13)
f (x) = 0 (13.14)
11
For a function f : [a; b] ! R continuous at a point x0 2 [a; b], Theorem 532 ensures the existence of a
neighborhood B (x0 ) = (x0 ; x0 + ) such that f (x) f (x0 ) > 0 for all x 2 B (x0 ) \ [a; b]. Indeed, by the
continuity of f at x0 we can take L = f (x0 ) in the statement of Theorem 532 (the use of " or is, instead,
just an inconsequential choice of a convenient notation).
13.4. ZEROS AND EQUILIBRIA 409
of degree n. This equation does not always have real solutions: for example, when f (x) =
x2 + 1 the polynomial equation (13.14) becomes a second-degree equation
x2 + 1 = 0
that has no real solutions. By Bolzano's Theorem, we have the following result that guar-
antees that each polynomial equation of odd degree { e.g., a cubic equation { has always at
least a real solution.
O.R. In presenting Bolzano's Theorem, we remarked the limits of intuition. A nice example
in this regard is the following. Imagine you put a rope around the Earth at the equator
(about 40; 000 km) such that it perfectly adheres to the equator in each point. Now, imagine
that you add one meter to the rope and you lift it by keeping uniform its distance from the
ground. What is the measure of this uniform distance? We are all tempted to say \very,
very small: one meter out of forty thousands km is nothing!" Instead, no: the distance is 16
cm. Indeed, if c denotes the equatorial Earth circumference (in meters), the Earth radius is
r = c=2 ; if we add one meter, the new radius is r0 = (c + 1) =2 and the di erence between
the two is r0 r = 1=2 ' 0:1592. This proves another remarkable result: the distance of
about 16 centimeters is independent of c: no matter whether it is the Earth, or the Sun, or
a tennis ball, the addition of one meter to the length of the rope always causes a lift of 16
cm! As the manifesto of the Vienna circle remarked \Intuition ... is especially emphasized
by metaphysicians as a source of knowledge.... However, rational justi cation has to pursue
all intuitive knowledge step by step. The seeker is allowed any method; but what has been
found must stand up to testing." H
13.4.2 Equilibria
The next result is a further consequence of Bolzano's Theorem, with a remarkable economic
application: the existence and the uniqueness of the market equilibrium price.
Proposition 570 Let f; g : [a; b] ! R be continuous. If f (a) g (a) and f (b) g (b),
there exists c 2 [a; b] such that
f (c) = g (c)
If f is strictly decreasing and g is strictly increasing, such c is unique.
410 CHAPTER 13. CONTINUOUS FUNCTIONS
Since h is continuous, by Bolzano's Theorem there exists c 2 [a; b] such that h (c) = 0, that
is, f (c) = g (c).
If f is strictly decreasing and g is strictly increasing, then h is strictly decreasing. There-
fore, again by Bolzano's Theorem, c is unique.
We now apply the result to establish the existence and uniqueness of the market equi-
librium price. Let D : [a; b] ! R and S : [a; b] ! R be the demand and supply functions of
some good, where [a; b] R+ is the set of the prices at which the good can be traded (see
Section 8.4). A pair (p; q) 2 [a; b] R+ of prices and quantities is called market equilibrium
if
q = D (p) = S (p)
A fundamental problem is the existence, and the possible uniqueness, of such an equilib-
rium. By Proposition 570, so ultimately by Bolzano's Theorem, we can solve the problem
in a very general way. Let us assume that the functions D and S are both continuous, so
neither side of the market abruptly responds to price changes, with
At the smallest possible price a , the demand of the good is greater than its supply, while
the opposite is true at the highest possible price b. These natural hypotheses ensure, by
Proposition 570, the existence of an equilibrium price p 2 [a; b], i.e., such that D (p) = S (p).
The equilibrium quantity is q = D (p) = S (p). Therefore, the pair of prices and quantities
(p; q) is a market equilibrium.
Moreover, again by Proposition 570, the market has a unique market equilibrium (p; q)
when the demand function D is strictly decreasing { i.e., at greater prices, smaller quantities
are demanded { and the supply function S is strictly increasing { i.e., at greater prices,
greater quantities are o ered.
Because of its importance, we state formally this market equilibrium result.
Proposition 571 Let D : [a; b] ! R and S : [a; b] ! R be continuous. If D (a) S (a) and
D (b) S (b), there exists a market equilibrium (p; q) 2 [a; b] R+ . If, in addition, D is
strictly decreasing and S is strictly increasing, such equilibrium is unique.
The next gure illustrates graphically the result, which corresponds to the classic \inter-
section" of demand and supply:
13.5. WEIERSTRASS' THEOREM: A PREVIEW 411
6
y
D
5
S
3
0
O b x
-1
-0.5 0 0.5 1 1.5 2
In equilibrium analysis, Bolzano's Theorem is often directly applied using the demand
excess function E : [a; b] ! R de ned by
E (p) = D (p) S (p)
We have E (p) 0 when at the price p the demand exceeds the supply; otherwise, we have
E (p) 0. Therefore, p 2 [a; b] is an equilibrium price if and only if it solves the market
equation
E (p) = 0 (13.15)
i.e., if and only if p equalizes demand and supply. The equilibrium price p is a zero of the
excess demand function; the conditions on the functions D and S assumed in Proposition
571 guarantee the existence and uniqueness of such a zero.
A nal observation: the reader can easily verify that Proposition 570 holds as long as
(i) the monotonicity of f and g are opposite: one is increasing and the other decreasing,
(ii) at least one of them is strict. In the statement we assumed f to be strictly decreasing
and g to be strictly increasing both for simplicity and in view of the application to market
equilibrium.
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
It is easy to see that f has no maximizers. Indeed, rst observe that the endpoints are
not maximizers as f (x) > 1=2 for each x 2 (1=2; 1). Now, suppose that x 2 (1=2; 1) is a
candidate maximizer. As there exists x < y < 1, we have f (y) = y > x = f (x) and so the
candidate fails. We conclude that there are no maximizers. A similar argument shows that
there are no minimizers as well.
(ii) Let f : (0; 1) ! R be the identity function f (x) = x de ned over the open unit
interval. Here f is continuous but the interval (0; 1) is not compact (it is open). In this case,
too, the function has no maximizers and minimizers.
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
13.5. WEIERSTRASS' THEOREM: A PREVIEW 413
(iii) Let f : [0; 1) ! R be again the identity function f (x) = x now de ned on the
positive half-line. This function is continuous but the interval [0; 1) is not compact (it is
closed but not bounded). It has no maximizers and the only minimizer is the origin.
y
2
0
O x
-1
-2
-3
-2 -1 0 1 2 3 4 5
with graph
2
y
1.5
1
1
0.5 1/2
0
O x
-0.5
-1
-1.5
-2
-5 -4 -3 -2 -1 0 1 2 3 4 5
The function f is continuous (and bounded), but R is not compact (it is closed but not
bounded). This function has no maximizers and minimizers. N
Summing up, Weierstrass' Theorem shows that continuity and compactness are general
conditions jointly ensuring the existence of maximizers and minimizers, a most important
fact. Of course, they can exist even without these conditions: it is immediate to construct
414 CHAPTER 13. CONTINUOUS FUNCTIONS
examples of discontinuous functions de ned on non-compact sets that have maximizers and
minimizers (for instance, take the Dirichlet function over the real line: the rationals are
maximizers, the irrationals are minimizers). Yet, without continuity and compactness one
needs to painfully proceed example by example to nd explicitly maximizers and minimizers,
as we no longer have a general result like the Weierstrass Theorem ensuring their existence.
f (a) z f (b)
then there exists c 2 [a; b] such that f (c) = z. If f is strictly increasing, such c is unique.
Proof If f (a) = f (b), it is su cient to set c = a or c = b. Let f (a) < f (b) and let
h : [a; b] ! R be de ned by h (x) = f (x) z. We have
Since f is continuous, by Bolzano's Theorem there exists c 2 [a; b] such that h (c) = 0, that
is, f (c) = z.
The function h is strictly monotone if and only if f is so. Therefore, by Bolzano's
Theorem such c is unique whenever f is strictly monotone.
The function assumes, therefore, all the values between f (a) and f (b), without any
\break". The lemma formalizes the intuition given at the beginning of the chapter that the
graph of a continuous function can be drawn without ever lifting the pencil.
Together with Weierstrass' Theorem, the last lemma implies the following classic result.
Proof Let z 2 [m; M ]. By Weierstrass' Theorem, m and M are well de ned and so there
exist x1 ; x2 2 [a; b] such that m = f (x1 ) and M = f (x2 ). Assume rst that x1 x2 and
consider the compact interval [x1 ; x2 ] [a; b]. The function f is continuous on [x1 ; x2 ]. Since
f (x1 ) z f (x2 ), by Lemma 574 there exists c 2 [x1 ; x2 ] such that f (c) = z.
13.6. INTERMEDIATE VALUE THEOREM 415
If x1 > x2 , consider the continuous function f on the compact interval [x2 ; x1 ] [a; b].
Clearly, M = minx2[a;b] ( f ) (x) and m = maxx2[a;b] ( f ) (x). As M z m,
by what has been just proved, there exists c 2 [x2 ; x1 ] such that ( f ) (c) = z. Hence,
f (c) = z.
Finally, a strictly monotone f is injective (Proposition 218) and, therefore, the point
c 2 [a; b] such that f (c) = z is unique.
Im f = [m; M ] (13.16)
5
y
4
M
3
z = f(c)
1
m
0 O a x c x b x
M m
-1
-1 0 1 2 3 4 5 6
By the Intermediate Value Theorem, a continuous f maps the compact interval [a; b] into
the compact interval [m; M ]. We can express this property by saying that the continuous
image of a compact interval is a compact interval. Momentarily, we will extend this property
to any interval, compact or not (Proposition 580).
Bolzano's Theorem is, via Lemma 574, behind the Intermediate Value Theorem. Its
statement is a special case. Indeed, observe that its hypothesis f (a) f (b) 0 can be
equivalently stated as
Clearly,
m min ff (a) ; f (b)g 0 max ff (a) ; f (b)g M
and so the Intermediate Value Theorem ensures the existence of c 2 [a; b] such that f (c) = 0.
The continuity of f on [a; b] is crucial for the Intermediate Value Theorem. To see this,
consider, for example, the so-called signum function sgn : R ! R de ned by
8
( x >
> 1 if x > 0
if x =
6 0 <
jxj
sgn x = = 0 if x = 0 (13.17)
0 if x = 0 >
>
:
1 if x < 0
416 CHAPTER 13. CONTINUOUS FUNCTIONS
The proof of this result relies on a lemma that presents a useful characterization of strict
monotonicity on open intervals.
x < z < y =) min ff (x) ; f (y)g < f (z) < max ff (x) ; f (y)g (13.18)
Strict monotonicity thus holds if and only if x < z < y implies either f (x) < f (z) < f (y)
or f (y) < f (z) < f (x).
Proof \If". Let x; y; z 2 (a; b) be such that x < z < y. If f is strictly monotone, then
f (x) < f (z) < f (y), and so min ff (x) ; f (y)g = f (x) < f (z) < f (y) = max ff (x) ; f (y)g.
\Only if". Suppose, per contra, that f is not strictly monotone. Then, there exist
x; y; w; z 2 (a; b) such that
As (a; b) is an open interval, there exist ; 2 (a; b) such that <x<y< and <w<
z < . As < w < z, by (13.18) we have
As f (x) f (y), this implies that max ff ( ) ; f (y)g = f ( ) > f (x). As x < y < , by
(13.18) we have
min ff (x) ; f ( )g < f (y) < max ff (x) ; f ( )g
As f (x) f (y), this implies that min ff (x) ; f ( )g = f ( ) < f (y). We conclude that
As f (w) f (z), this implies min ff ( ) ; f (z)g = f ( ) < f (w). As w < z < , by (13.18)
we have
min ff (w) ; f ( )g < f (z) < max ff (w) ; f ( )g
As f (w) f (z), this implies max ff (w) ; f ( )g = f ( ) > f (z). We conclude that
Proof of Proposition 576 The \if" follows from Proposition 218. As to the converse,
assume that f is injective. We want to show that f is strictly monotone. It is enough to
show that f is strictly monotone on int I since the strict monotonicity on I then follows
from the continuity of f (why?). So, let us assume that I is an open interval. Suppose, by
contradiction, that f is not strictly monotone. By the last lemma, there exist x < z < y
such that either f (z) max ff (x) ; f (y)g or f (z) min ff (x) ; f (y)g. As f is injective,
these inequalities are actually strict. Suppose that f (z) > max ff (x) ; f (y)g, the other
case being similarly handled. Let f (z) > k > max ff (x) ; f (y)g. That is, we have both
f (z) > k > f (x) and f (z) > k > f (y). By the Intermediate Value Theorem, there exist
t0k 2 [x; z] and t00k 2 [z; y] such that f (t0k ) = f (t00k ) = k, thus contradicting the injectivity of
f . We conclude that f is strictly monotone.
Without continuity the \only if" fails: consider the discontinuous function f : R ! R
given by
(
x if x 2 Q
f (x) =
x else
It is not strictly monotone: if x = 3, z = and y = 4, we have x < z < y and f (z) <
min ff (x) ; f (y)g. Yet, f is injective. Indeed, let x 6= y. We have f (x) = x 6= y = f (y) if
x; y 2 Q and f (x) = x 6= y = f (y) if x; y 2 = Q. If x 2 Q and y 2= Q, then f (x) = x 2 Q
and f (y) = y 2 = Q, and so again f (x) 6= f (y). We conclude that f is injective.
So,
1 1
x0 " lim inf f (yn ) lim sup f (yn ) x0 + "
Since these inequalities hold for each " > 0 small enough, we conclude that
1 1
lim f (yn ) = x0 = f (y0 )
Example 579 In the real line, take A = [0; 1) [ f3g. De ne the function f : A ! R by
(
x if x 2 [0; 1)
f (x) =
1 if x = 3
This function is injective. As 3 is an isolated point of A, it is also continuous. Its image is
easily seen to be the closed unit interval [0; 1]. Its inverse f 1 : [0; 1] ! R is given by
(
1
x if y 2 [0; 1)
f (y) =
3 if y = 1
It is discontinuous at 1. N
Next we show that the continuous image of an interval is an interval, thus extending to
any interval what the Intermediate Value Theorem ensured for compact intervals.
Proof Let t1 ; t2 2 Im f , say t1 < t2 , and 2 [0; 1]. We want to show that t1 + (1 ) t2 2
Im f . There exist x1 ; x2 2 I such that f (x1 ) = t1 and f (x2 ) = t2 . Let m = minx2[x1 ;x2 ] f (x)
and M = maxx2[x1 ;x2 ] f (x). In view of (13.16), by the Intermediate Value Theorem we have
f ([x1 ; x2 ]) = [m; M ]. Since t1 ; t2 2 f ([x1 ; x2 ]), we thus have t1 + (1 ) t2 2 f ([x1 ; x2 ]),
and so t1 + (1 ) t2 2 Im f . This proves that Im f is an interval.
This \interval image" property actually characterizes continuity for monotone functions
de ned on intervals. Intuitively, monotone functions may have only jump discontinuities
(Proposition 564), a possibility that an interval image precludes.
Without monotonicity this elegant result fails: the image of the discontinuous and non-
monotone function f : R ! R given by
8
>
> x if x<0
>
>
>
< 1 if x=0
f (x) =
>
> 0 if x 2 (0; 1)
>
>
>
:
x + 1 if x 1
13.6. INTERMEDIATE VALUE THEOREM 419
Proof The \only if" is a special case of the last result. As to \if", suppose per contra that f is
not continuous at x0 2 I. If x0 2 int I there exists " > 0 small enough so that x0 " 2 I. By
Proposition 564, there is a jump at x0 , so an open gap (a; b) [limx!x f (x) ; limx!x+ f (x)]
0 0
in the image of f , i.e., (a; b) \ Im f = ;. On the other hand, it holds f (x0 ") < a < b <
f (x0 + ") and so, being Im f an interval,
This contradiction proves that f is continuous at x0 2 int I. We leave to the reader the
similar argument for the lower endpoint x0 = min I of the interval I (when relevant, i.e.,
when min I 2 I) and for its upper endpoint x0 = max I (when relevant, i.e., when max I 2 I).
Depending on the kind of monotonicity that f features, its inverse is thus either convex
or concave. The continuity of f becomes super uous when I is open because concave and
convex functions are automatically continuous on open convex sets, as it will be seen later
in the book (Theorem 833).
Proof Let f be continuous, strictly increasing and concave. By Proposition 581, the domain
Im f of f 1 is an interval. By Proposition 222, f 1 is strictly increasing. Suppose, by
contradiction, that f 1 is not convex. Then there exist two points x1 6= x2 in Im f and a
scalar 2 (0; 1) such that
1 1 1
f ( x1 + (1 ) x2 ) > f (x1 ) + (1 )f (x2 )
x1 + (1 ) x2 > f ( t1 + (1 ) t2 )
Proof Let t1 ; t2 2 Im f , say t1 < t2 , and 2 [0; 1]. We want to show that t1 + (1 ) t2 2
Im f . There exist x1 ; x2 2 C such that f (x1 ) = t1 and f (x2 ) = t2 . De ne the auxiliary
function ' : [0; 1] ! R by ' ( ) = f ( x1 + (1 ) x2 ). This auxiliary function is easily
seen to be continuous, with ' (0) = t2 and ' (1) = t1 . So, by Proposition 580 its image
Im ' is an interval. Thus, t1 + (1 ) t2 2 Im ', so there exists 2 [0; 1] such that
f ( x1 + (1 ) x2 ) = ' ( ) = t1 + (1 ) t2 . This implies t1 + (1 ) t2 2 Im f , as
desired.
Corollary 584 Let f : C ! R a continuous function de ned on a compact and convex set
C in Rn . Set
m = min f (x) and M = max f (x)
x2C x2C
Proof By the Weierstrass Theorem, m = minx2C f (x) and M = maxx2C f (x) are well
de ned. Clearly, Im f [m; M ]. By Proposition 583, Im f is a convex set. Since it contains
both m and M , we conclude that Im f = [m; M ]. So, if z 2 [m; M ] then there exists c 2 C
such that f (c) = z.
We close with an example that vividly shows the importance of convexity for this result.
is continuous but its domain is not convex, so the last corollary does not apply. It actually
fails completely because Im f = f 1; 1g. N
12
Convex sets will be studied in Chapter 16.
13.7. LIMITS AND CONTINUITY OF OPERATORS 421
fi : A Rn ! R 8i = 1; 2; :::; m
de ned by
y1 = f1 (x1 ; :::; xn )
y2 = f2 (x1 ; :::; xn )
ym = fm (x1 ; :::; xn )
The functions fi are the component functions of the operator f . To illustrate, let us go back
to the operators of Example 188.
(ii) If f : R3 ! R2 is de ned by
then
f1 (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 and f2 (x1 ; x2 ; x3 ) = x1 x42
N
if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
Here, too, an operator that is continuous at all the points of a subset E of the domain
A is called continuous on E, while an operator that is continuous at all the points of its
domain is called continuous. It is easy to see that the two operators of the last example are
continuous.
The continuity of an operator is thus brought back to the continuity of its component
functions, a componentwise notion of continuity.
In Section 8.16 we saw that the convergence of vectors is equivalent to that of their
components. This will allow (the reader) to prove the next sequential characterization of
continuity that extends Proposition 552 to operators.
The statement is formally identical to that of Proposition 552, but here f (xn ) ! f (x0 )
indicates convergence of vectors in Rm .
Proposition 590 permits to extend to operators the continuity results established for
functions of several variables, except the ones that use in an essential way the order structure
of their codomain R, like the Bolzano and Weierstrass Theorems. We leave to the reader
such extensions.
This characterization is identical to the de nition of limx!x0 f (x) = f (x0 ) for a point
x0 that belongs to the domain of the function, except for the elimination of the condition
0 < kx x0 k { i.e., of the requirement that x 6= x0 { so to include x0 that are isolated points
of A.
so that B (x0 ) \ A = fx0 g. Thus, for each x 2 A we have kx x0 k < if and only if x = x0 .
It follows that, for each " > 0, there exists > 0 such that kx x0 k < implies x = x0 for
all x 2 A, so that jf (x) f (x0 )j = 0 < ".
In symbols, the preimage f 1 (C) of each closed set C of Rm is itself a closed set of Rn .
For instance, level sets f 1 (y) = fx 2 Rn : f (x) = yg of continuous operators are closed sets
since singletons fyg are closed sets in Rm .
The proof of this proposition relies on some basic set-theoretic properties of images and
preimages, whose proof is left to the reader.
Lemma 593 Let f : X ! Y be a function between any two sets X and Y . We have:
Proof of Proposition 592 \If". Suppose that f is continuous. Let C be a closed set of
Rm . Let fxn g f 1 (C) be such that xn ! x0 2 Rn . We want to show that x0 2 f 1 (C).
Set yn = f (xn ). Since f is continuous, we have f (xn ) ! f (x0 ). Then f (x0 ) 2 C because
C is closed. In turn, this implies x0 2 f 1 (C), as desired.
\Only if". Suppose that, for each closed set C of Rm , the set f 1 (C) is closed in Rn .
So, if V is an open set in Rm containing f (x0 ), the set f 1 (V ) is open in Rn because
c
f 1 (V ) = f 1 (V c ). Thus, being x0 2 f 1 (V ), there exists a neighborhood B (x0 ) such
that B (x0 ) f 1 (V ). So, f (B (x0 )) f f 1 (V ) V . In view of (13.21), we conclude
that f is continuous at x0 .
In view of Lemma 593-(ii), there is a dual version of the last proposition for open sets.
Because of its importance, we report it formally.15
14
We use the term \topological" because it is a property purely in terms of open and closed sets.
15
This is, indeed, the characterization of continuity used to generalize this fundamental notion well beyond
Euclidean spaces, as readers will learn in more advanced courses.
424 CHAPTER 13. CONTINUOUS FUNCTIONS
of a continuous function f : Rn ! R are closed because the intervals [t; 1) and (1; t] are
closed. By Corollary 594, their strict versions
1 1
(f t) = f ((t; 1)) and (f < t) = f ((1; t)) 8t 2 R
are open because the intervals (t; 1) and (1; t) are open. N
There is no counterpart of the last two results for images: given a continuous function,
in general the image of an open set is not open and the image of a closed set is not closed. In
other words, the continuous image of an open (closed) set is not necessarily open (closed).
Example 596 (i) Let f : R ! R be the quadratic function f (x) = x2 . For the open
interval I = ( 1; 1) we have f (I) = [0; 1), which is not open. (ii) Let f : R ! R be the
exponential function f (x) = ex . The real line R is a closed set (also open, but here this is
not of interest), but f (R) = (0; 1) is not closed. N
The next result clari es why in (ii) we have a closed but unbounded (so not compact)
set like R.
Proof With the notions of topology at our disposal we are able to prove the result only in
the case n = m = 1 (the general case, however, does not present substantial di erences). So,
let n = m = 1. By De nition 32, to show that the set Im f is bounded in R it is necessary to
show that it is bounded both above and below in R. Suppose, by contradiction, that Im f is
unbounded above. Then there exists a sequence fyn g Im f such that limn!1 yn = +1.
Let fxn g K be the corresponding sequence such that f (xn ) = yn for every n. The
sequence fxn g is bounded since it is contained in the bounded set K. By Bolzano-Weierstrass'
Theorem, there exist a subsequence fxnk g and a point x ~ 2 R such that limk!1 xnk = x ~.
Since K is closed, we have x ~ 2 K. Moreover, the continuity of f implies limk!1 ynk =
limk!1 f (xnk ) = f (~x) 2 R. This contradicts limk!1 ynk = limn!1 yn = +1. It follows
that the set Im f is bounded above in the real line. In a similar way, one shows that it is
also bounded below. Thus, the set Im f is bounded in the real line.
To complete the proof that Im f is compact, it remains to show that it is closed. Consider
a sequence fyn g Im f that converges to y 2 R. By Theorem 174, we must show that
16
They will be introduced in Section 17.2.1.
13.8. INFRACODA TOPOLOGICA 425
Therefore, y 2 Im f , as desired.
The fact that continuity preserves compactness is quite remarkable. It is another charac-
teristic that distinguishes compact sets among closed sets, for which in general this property
does not hold, as Example 596-(ii) shows.
A nice dividend of this proposition is an operator version of Proposition 578 about the
continuity of inverses.
Proof Also here we consider the scalar case m = n = 1. By Proposition 597, the set Im f
is compact. Let C be a closed subset of Im f . We want to show that f 1 (C) is closed.
Since Im f is compact, C is actually compact. Let fxn g f 1 (C) be a sequence that
converges to x 2 Rn . We want to show that x 2 f 1 (C). By de nition of f 1 (C), there
exists a sequence fyn g C such that yn = f (xn ) for each n 1. Since C is compact, by
the Bolzano-Weierstrass Theorem there exists a subsequence fynk g that converges to some
y 2 C. By the continuity of f , we conclude that x = f 1 (y) 2 f 1 (C), thus proving that
f 1 (C) is closed. So, f 1 maps closed sets into closed sets. In view of Proposition 600
(momentarily established), this proves that f 1 is continuous.
We let readers to contrast this corollary with Proposition 578. Meditations will be helped
by the next example that, along with the earlier Example 579, shows the importance of a
compact domain for the last result and of an interval domain for that proposition.
The function f : (0; 1] ! R given by f (x) = x readily shows that this proposition is false
without the topological hypothesis on the domain.
Proof \If". Suppose that f is continuous. Let F be a closed set of Rm . Let fxn g
f 1 (F ) \ C be such that xn ! x0 2 Rn . We want to show that x0 2 f 1 (F ) \ C. Since
C is closed, we have x0 2 C. Set yn = f (xn ). Since f is continuous on C, we have
f (xn ) ! f (x0 ). Then f (x0 ) 2 F because F is closed. In turn, this implies x0 2 f 1 (F ).
We conclude that x0 2 f 1 (F ) \ C, as desired.
\Only if". Suppose that, for each closed set F of Rm , the set f 1 (F ) \ C is closed
in Rn . So, if V is an open set in Rm containing f (x0 ), the set f 1 (V ) [ C c is open in
c
Rn because f 1 (V ) \ C = f 1 (V c ) \ C. Thus, being x0 2 f 1 (V ), there exists a
neighborhood B (x0 ) such that B (x0 ) f 1 (V ) [ C c , i.e., B (x0 ) \ C f 1 (V ) \ C.17 So,
f (B (x0 ) \ C) f f 1 (V ) \ C f f 1 (V ) V . In view of (13.21), we conclude that
f is continuous at x0 .
In a dual way, an operator de ned on an open set is continuous if and only if its preimages
are open. For later reference, we state this dual version formally.
Here the value of " thus depends only on ", no longer on a point x0 . Indeed, no speci c
points x0 are mentioned in this de nition, which considers only the domain per se.
Uniform continuity implies continuity, but the converse does not hold. For example, we
will see soon that the quadratic function is continuous on R, but not uniformly. Yet, the two
notions of continuity become equivalent on compact sets (Section 5.6).
Proof The \if" is obvious because uniform continuity implies continuity. We prove the \only
if". For simplicity, consider the scalar case n = 1 with K = [a; b]. So, let f : [a; b] ! R be
continuous. We need to show that it is also uniformly continuous. Suppose per contra that
there exist " > 0 and two sequences fxn g and fyn g in [a; b] with jxn yn j ! 0 and
Since the sequences fxn g and fyn g are bounded, the Bolzano-Weierstrass' Theorem yields
two convergent subsequences fxnk g and fynk g, i.e., there exist x; y 2 [a; b] such that xnk ! x
and ynk ! y. Since xn yn ! 0, we have xnk ynk ! 0 and, therefore, x y = 0 because of the
uniqueness of the limit. Since f is continuous, we have f (xnk ) ! f (x) and f (ynk ) ! f (y).
Hence, f (xnk ) f (ynk ) ! f (x) f (y) = 0, which contradicts (13.22). We conclude that f
is uniformly continuous.
Theorem 603 does not hold without assuming the compactness of K, as the next coun-
terexamples show. The rst one considers a closed but unbounded set { the real line { while
the other ones consider bounded sets which are not closed.
Example 604 The quadratic function f : R ! R is continuous but not uniformly contin-
uous. Suppose, by contradiction, that f (x) = x2 is uniformly continuous on R. By setting
" = 1, there exists " > 0 such that
If we take xn = n and yn = n + " =2, we have jxn yn j < " for every n 1, but
2 2 2
limn xn yn = +1, which contradicts (13.23). Therefore, the quadratic function x is not
uniformly continuous on R. Yet, its restriction to any compact interval [a; b] is uniformly
continuous thanks to the last theorem. We can check this directly, for instance, for the
restriction f : [0; 1] ! R on the closed unit interval. We have
x2 y 2 = jx yj jx + yj 2 jx yj 8x; y 2 [0; 1]
as desired. N
Example 605 The hyperbola f : (0; 1) ! R is continuous but not uniformly continuous.
Indeed, suppose per contra that f (x) = 1=x is uniformly continuous on (0; 1). By setting
" = 1, there exists " > 0 such that
1 1
jx yj < " =) <1 8x; y 2 (0; 1) (13.24)
x y
Let y = min f " =2; 1=2g and x = y=2. It is immediate that 0 < x < y < 1 and jx yj < ".
By (13.24), we thus have
1 1 1 1
= <1 (13.25)
x y x y
428 CHAPTER 13. CONTINUOUS FUNCTIONS
is continuous but not uniformly continuous (why?). The bounded set Q \ [0; 2] is not closed,
so not compact (its closure is the interval [0; 2]).
(ii) The function f : [0; 1) [ (1; 2] ! R given by
(
1 if 0 x < 1
f (x) =
1 if 1 < x 2
is continuous but not uniformly continuous (why?). The bounded set [0; 1) [ (1; 2] is not
closed, so not compact (its closure is the interval [0; 2]). N
This important result was proven by Karl Weierstrass in 1885, with a signi cant extension
due to Marshall Stone in 1937.18 A third protagonist of this result is Sergei Bernstein, who
in 1913 gave a beautiful proof of this theorem in which the approximating polynomials {
aptly called Bernstein polynomials { are explicitly constructed when [a; b] is the unit interval
[0; 1]. Here we will follow his lead by proving the Stone-Weierstrass Theorem via Bernstein's
result.
Before doing that, however, we give a sandwich version of Stone-Weierstrass' Theorem
that is sometimes useful.
18
Weierstrass proved this result when he was about 70 years old. We consider only his original result since
Stone's version is beyond the scope of this book. Yet, we name this theorem after both Stone and Weierstrass
also to distinguish it from Weierstrass' Theorem on extremals.
13.10. ULTRACODA CONTINUA 429
Corollary 608 Let f : [a; b] ! R be a continuous function. For each " > 0 there exist two
polynomials p; P : [a; b] ! R such that p f P and
P (x) p (x) " 8x 2 [a; b]
So, to de ne B2 we only need to know the values of f at the three points f0; 1=2; 1g. In
general, to de ne the Bernstein polynomial of degree n we only need to know the values of
f at the n + 1 points
1 2 n 1
0; ; ; :::; ;1
n n n
of the unit interval.
Proof (i) Suppose that f (x) g (x) for all x 2 [0; 1]. Then, f (k=n) g (k=n) for all
0 k n, so Bn f (x) Bn g (x) for all x 2 [0; 1]. (ii) Let c 2 R. We have, for all x 2 [0; 1],
n
X n
X
n k n k n k
Bn c (x) = c x (1 x) =c x (1 x)n k
= c [x + (1 x)]n = c
k k
k=0 k=0
where the penultimate equality follows from Newton's binomial formula (B.7).
(iii) We have, for all x 2 [0; 1],
n
X k n k
Bn ( f + g) (x) = ( f + g) x (1 x)n k
n k
k=0
Xn n
X
k n k k n k
= f x (1 x)n k
+ g x (1 x)n k
n k n k
k=0 k=0
= Bn f (x) + Bn g (x)
as desired.
n k
pk = x (1 x)n k
k
P
In particular, by Newton's binomial formula we have nk=0 pk = 1 (see point (ii) of the last
proof). So, the function pn;x : f1; :::; ng ! [0; 1] de ned by pn;x (k) = pk is a probability
distribution, the so-called binomial distribution. The function f induces a function :
13.10. ULTRACODA CONTINUA 431
f1; :::; ng ! R de ned by (k) = f (k=n). The Bernstein polynomial Bn f (x) is thus the
expectation of the induced function under this binomial distribution, that is,
n
X
Bn f (x) = (k) pn;x (k)
k=0
It is because of this probabilistic nature that Bernstein polynomials are naturally de ned on
the unit interval.
Theorem 610 (Bernstein) Let f : [0; 1] ! R be a continuous function. For each " > 0
there exists n" 1 such that, for all n n" , we have
The theorem relies on the next lemma in which the Bernstein polynomials of the rst
two powers are computed. This lemma is of independent interest in that it shows, inter alia,
that the Bernstein polynomial Bn f of a polynomial f may be di erent from the polynomial
itself, i.e., Bn f 6= f .
Bn f (x) = x 8n 1
So, by setting y = 1 x,
n
X k n k
Bn x = x (1 x)n k
= x (x + 1 x)n 1
=x
n k
k=0
432 CHAPTER 13. CONTINUOUS FUNCTIONS
as desired.
Proof of Bernstein's Theorem Let " > 0. Since [0; 1] is compact, the function f is
uniformly continuous (Theorem 603). So, there exists > 0 such that
"
jx yj < =) jf (x) f (y)j < 8x; y 2 [0; 1]
2
Fix any x0 2 [0; 1]. We have
"
jx x0 j < =) jf (x) f (x0 )j < 8x 2 [0; 1] (13.26)
2
By Weierstrass' Theorem, the function f has a maximizer x
^ 2 [0; 1]. Set M = f (^
x), so M is
the maximum value of f on [0; 1]. We thus have f (x) M for all x 2 [0; 1]. In particular,
for each x 2 [0; 1] we have
(x x0 )2
jx x0 j =) jf (x) f (x0 )j (jf (x)j + jf (x0 )j) 2M 2M 2 (13.27)
(x x0 )2 "
jf (x) f (x0 )j 2M 2 + 8x 2 [0; 1]
2
where the last inequality holds because maxx2[0;1] x x2 = 1=4. In particular, by taking
x = x0 we have
2M 1 "
jBn f (x0 ) f (x0 )j 2 4n + 2
We have
2M 1 " M
2 4n () n 2
2 "
13.10. ULTRACODA CONTINUA 433
So,
jBn f (x0 ) f (x0 )j " 8x 2 [0; 1]
for all n M=2 2 ", as desired.
We are now ready to prove the Stone-Weierstrass' Theorem via Bernstein's Theorem.
Since x was arbitrarily chosen in [a; b], this inequality actually holds for all x 2 [a; b], thus
proving the statement.
Concavity and increasing monotonicity are two important properties that, on the one
hand, Bernstein polynomials Bn f inherit from the function f and, on the other hand, via
Bernstein's Theorem they transmit to it, as next we show.
Proposition 612 Let f : [0; 1] ! R be a continuous function. The following conditions are
equivalent:
Proof We rst prove monotonicity part and then move to concavity. We begin with a useful
fact: for each n 1 and each x 2 (0; 1),
n
X1 k+1 k n 1
Bn0 f (x) = n f f xk (1 x)n k 1
(13.30)
n n k
k=0
Since B1 f (x) = f (0) (1 x) + f (1) x for all x 2 [0; 1], it easy to check that (13.30) holds
for n = 1. Since Bn f is a polynomial for all n 2, it follows that Bn f is derivable at each
19
The composition of two polynomials is still a polynomial (as readers can check). Since both p and g are
polynomials, so does their composition p^.
434 CHAPTER 13. CONTINUOUS FUNCTIONS
x 2 (0; 1). In particular, by (B.4) and (B.5), we have that for each n 2 and each x 2 (0; 1),
0 n
Bn0 f (x) = f n (1 x)n 1
n 0
n
X1 k n
+ f kxk 1
(1 x)n k
(n k) xk (1 x)n k 1
n k
k=1
n n
+f nxn 1
n n
n
X1
0 n n 1 k n n 1
= f n (1 x) + f kxk 1
(1 x)n k
n 0 n k k 1
k=1
n
X1 k n n 1 n n
f (n k) xk (1 x)n k 1
+f nxn 1
n n k n k 1 n n
k=1
n
X2
0 n k+1 n 1
= f n (1 x)n 1
+n f xk (1 x)n k 1
n 0 n k
k=0
n
X1 k n 1 n n
n f xk (1 x)n k 1
+f nxn 1
n k n n
k=1
n
X1 k+1 k n 1
=n f f xk (1 x)n k 1
n n k
k=0
proving (13.30). We can now prove the equivalence of (i) and (ii) in the monotone case.
k+1 k
f f 0 8k 2 f0; :::; n 1g
n n
Since n n k 1 xk (1 x)n k 1 0 for all x 2 [0; 1] and for all k 2 f0; :::; n 1g, this implies
that Bn0 f (x) 0 for all x 2 (0; 1). By Proposition 1322, we have that Bn f is increasing on
(0; 1). Since Bn f is continuous, we conclude that Bn f is increasing on [0; 1].
(ii) implies (i). By Bernstein's Theorem, we have that Bn f (x) ! f (x) for all x 2 [0; 1].
If x y, then this implies that
where
2 k k+2 k+1 k
f =f 2f +f 8k 2 f0; :::; n 2g
n n n n
We can now prove the equivalence of (i) and (ii) in the concave case.
(i) implies (ii). Fix n 1. If n = 1, then Bn f (x) = f (0) (1 x) + f (1) x = f (0) +
(f (1) f (0)) x for all x 2 [0; 1], so Bn f is a ne and, in particular, concave. If n 2, since f
is concave, 2 f (k=n) 0 for all k 2 f0; :::; n 2g. Since n (n 1) n k 2 xk (1 x)n k 2 0
for all x 2 [0; 1] and for all k 2 f0; :::; n 2g, this implies that Bn00 f (x) 0 for all x 2 (0; 1).
By Corollary 1438, we have that Bn f is concave on [0; 1].
(ii) implies (i). By Bernstein's Theorem, we have that Bn f (x) ! f (x) for all x 2 [0; 1].
If x; y 2 [0; 1] and 2 [0; 1], then this implies that
This result implies, inter alia, that increasing (resp., concave) continuous functions on
compact intervals are approximated by increasing (resp., concave) polynomials.
Corollary 613 Let f : [a; b] ! R be an increasing (resp., concave) and continuous function.
For each " > 0 there exists an increasing (resp., concave) polynomial p : [a; b] ! R such that
We leave to the reader the proof of this result (by now, it should be easy). Finally,
the reader may also want to establish a dual version of this corollary for decreasing (resp.,
convex) functions.
436 CHAPTER 13. CONTINUOUS FUNCTIONS
Chapter 14
14.1 Equations
14.1.1 Poincare-Miranda's Theorem
An operator f = (f1 ; :::; fn ) : A Rn ! Rn de nes an (operator ) equation
f (x) = 0 (14.1)
that is, 8
>
> f1 (x1 ; :::; xn ) = 0
>
>
>
< f2 (x1 ; :::; xn ) = 0
(14.2)
>
>
>
>
>
:
fn (x1 ; :::; xn ) = 0
The vector x is the unknown of the equation. The solutions of equation (14.1) are all x 2 A
such that f (x) = 0.1
can be written as
f (x) = 0 (14.3)
where f : R ! R is the polynomial f (x) = 0 + 1 x + 2 x2 + + n xn . Its solutions are
all x 2 R that satisfy (14.3).
(ii) The market equation (13.15), i.e., E (p) = 0, is de ned via the excess demand function
E : [a; b] ! R of a single good. Its solutions are the equilibrium prices of the one-good
market.
1
Often (14.2) is referred to as a \system of equations", each fi (x) = 0 being an equation. We will also use
this terminology when dealing with systems of linear equations (Section 15.7). In view of (14.1), however,
one should use this terminology cum grano salis.
437
438 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)
(iii) Later in the book (Section 15.7) we will study systems of linear equations that can
be written as f (x) = 0 through the a ne operator f : Rn ! Rn de ned by f (x) = Ax b.
N
A fundamental issue in dealing with equations is the existence of solutions, that is,
whether there exist vectors x 2 A such that f (x) = 0. As well-known from (at least) high
school, this might well not be the case: consider f : R ! R given by f (x) = x2 + 1; there
are no x 2 R such that x2 + 1 = 0.
Bolzano's Theorem is a powerful result to establish the existence of solutions in the scalar
case. Indeed, if f : A R ! R is a continuous function and A is an interval, then equation
f (x) = 0 (14.4)
has a solution provided there exist x0 ; x00 2 A such that f (x0 ) < 0 < f (x00 ). For instance, in
this way Corollary 569 was able to establish the existence of solutions of some polynomial
equations.
Bolzano's Theorem admits a generalization to Rn that, surprisingly, turns out to be a
quite di cult result, known as Poincare-Miranda's Theorem.2 A piece of notation: given a
vector x 2 Rn , we write
(xi ; x i )
to emphasize the component i of vector x. For instance, if x = (4; 7; 11) then x1 = 4 and
x 1 = (7; 11), while x3 = 11 and x 3 = (4; 7).
Under this condition, the Poincare-Miranda's Theorem ensures that for a continuous operator
f = (f1 ; f2 ) : [a; b] ! R2 there exists a point x 2 [a; b] such that
f1 (x) = f2 (x) = 0
In general, if there exist vectors x0 ; x00 2 A such that condition (14.5) holds on the interval
[x0 ; x00 ] A, then the equation (14.1) induced by a continuous function f : A Rn ! Rn
has a solution. The next example illustrates.
2
It was stated in 1883 by Henri Poincare and proved by Carlo Miranda in 1940 (unaware of Poincare's
earlier work). For a proof, we refer interested readers to Kulpa (1997).
3
For instance, if a; b 2 R3 , then [a 1 ; b 1 ] = [a2 ; b2 ] [a3 ; b3 ], [a 2 ; b 2 ] = [a1 ; b1 ] [a3 ; b3 ] and [a 3 ; b 3 ] =
[a1 ; b1 ] [a2 ; b2 ].
14.1. EQUATIONS 439
By the Poincare-Miranda's Theorem, the equation has a solution x 2 [x0 ; x00 ] R2 , with
f1 (x) = f2 (x) = 0. N
Proposition 617 Let f = (f1 ; :::; fn ) ; g = (g1 ; :::; gn ) : [a; b] ! Rn be continuous operators
de ned on an interval of Rn . If, for each i = 1; :::; n, we have
Proof Let h : [a; b] ! Rn be de ned by h (x) = f (x) g (x). Then, for each i = 1; :::; n, we
have
for each x 2 [a; b]. Since h is continuous, by the Poincare-Miranda's Theorem there exists
c 2 [a; b] such that h (c) = 0, that is, f (c) = g (c).
Through this result we can generalize the equilibrium analysis that we carried out earlier
in the book for the market of a single good (Proposition 571). Consider now a market where
n goods are traded. Let
be, respectively, the aggregate demand and supply functions, that is, at price p 2 [a; b] Rn+
the market demands a quantity Di (p) 0 and o ers a quantity Si (p) 0 of each good
i = 1; :::; n.
A pair (p; q) 2 [a; b] Rn+ of prices and quantities is a market equilibrium if
The last result permits to establish the existence of such equilibrium, thus generalizing
Proposition 571 to the general case of n goods. Besides continuity, existence requires that,
for each good i, we have
At its smallest possible price, a i , the demand of the good i is greater than its supply
regardless of the prices of the other goods, while the opposite is true at its highest possible
price bi . To x ideas, assume that a = 0. Then, condition Di (0; p i ) Si (0; p i ) just means
that demand of a free good will always exceed its supply, regardless of which are the prices of
the other goods (a reasonable assumption). In contrast, the opposite happens at the highest
price bi , at which the supply of good i exceeds its demand regardless of the prices of the
other goods (a reasonable assumption as long as bi is \high enough").
Via the excess demand function E : [a; b] ! Rn de ned by
E (p) = D (p) S (p)
we can formulate the equilibrium condition (14.8) as a market equation
E (p) = 0 (14.9)
For n = 1, it reduces to the earlier one-good market equation (13.15). A pair (p; q) of
prices and quantities is a market equilibrium if and only if price p solves this equation and
q = D (p). There is excess demand at price p of good i if Ei (p) 0 and excess supply if
Ei (p) 0. In equilibrium, there is neither excess demand nor excess supply. Next we state
the general equilibrium existence result in excess demand terms.
Proposition 618 Let the excess demand function E : [a; b] ! R be continuous and such
that, for each good i = 1; :::; n,
Ei (bi ; p i ) 0 Ei (ai ; p i ) 8p i 2 [a i ; b i ]
Then, there exists a market equilibrium (p; q) 2 [a; b] Rn+ .
This result thus establishes the existence of equilibria under the reasonable assumptions
previously discussed. It can be easily extended to the standard case when demand and
supply functions are de ned on Rn+ , so have the form D : Rn+ ! Rn+ and S : Rn+ ! Rn+ , by
requiring the existence of prices p0 < p00 that play the roles of the vectors a and b, respectively.
These prices are here, mutatis mutandis, the analogs of the vectors x0 and x00 considered for
equation (14.7).
Example 619 (i) All operators f : Rn ! Rn are, trivially, self-maps. (ii) The function
f : [0; 1] ! R given by f (x) = x2 is a self-map because x2 2 [0; 1] for all x 2 [0; 1]. In
contrast, the function f : [0; 1] ! R given by f (x) = x + 1 is not a self-map because, for
instance, f (1) = 2 2
= [0; 1]. N
Self-maps are important here because they may admit xed points.
For instance, for the quadratic self-map f : [0; 1] ! [0; 1] given by f (x) = x2 , the
endpoints 0 and 1 of the unit interval are xed points. For the self-map f : R2 ! R2 given
by f (x1 ; x2 ) = (x1 ; x1 x2 ), the origin is a xed point in that f (0) = 0.
Turn now to the key question of the existence of xed points. In the scalar case, it is an
immediate consequence of Bolzano's Theorem.
Proof The result is obviously true if either f (0) = 0 or f (1) = 1. Suppose f (0) > 0 and
f (1) < 1. De ne the auxiliary function g : [0; 1] ! R by g (x) = x f (x). Then, g (0) < 0
and g (1) > 0. Since g is continuous, by Bolzano's Theorem there exists x 2 (0; 1) such that
g (x) = 0. Hence, f (x) = x, and so x is a xed point.
In the general case, the existence of xed points is ensured by the famous Brouwer's
Fixed Point Theorem.4 In analogy with the scalar case, it can be viewed as a consequence
of the Poincare-Miranda's Theorem.
Proof We rst consider the interval case K = [0; 1]n . Let I : [0; 1]n ! [0; 1]n be the identity
function I (x) = x. We have Ii (0i ; x i ) fi (0i ; x i ) and Ii (1i ; x i ) fi (1i ; x i ) for all
x 2 [0; 1]n , where 1 = (1; :::; 1). So, we can apply the Poincare-Miranda's Theorem to the
function I f , which ensures the existence of a vector x 2 [0; 1]n such that (I f ) (x) = 0.
Hence, f (x) = x. The interval case is thus an immediate consequence of Poincare-Miranda's
Theorem. To consider a general K we need a claim involving its dimension m = dim K n
(see Section 16.6 below).
Claim There exists a continuous bijection h from K to the closed unit ball B1 (0) of Rm .
Proof of the Claim We refer to Stoer and Witzgall (1970) pp. 124 for a proof. Here we
just remark that when m = n, i.e., when K has a nonempty interior (Proposition 812 below)
and when 0 2 int K, the function h : K ! B1 (0) de ned by
8
< inff 0: x2Kg x if x 6= 0
kxk
h (x) =
: 0 if x = 0
4
It is named after Luitzen Brouwer, who proved it in 1912.
442 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)
g 1
f g : [0; 1]m ! [0; 1]m
is continuous and so, by what previously proved, it has a xed point x. Since
1
g f g (x) = x () f (g (x)) = g (x)
Brouwer's Theorem is a powerful result that only requires the self-map to be continuous.
However, it is demanding on the domain, which has to be a compact and convex set, and it
is a non-constructive existence result: it ensures the existence of a xed point, but gives no
information on how to nd it.5
We close by observing that, as proved by Carlo Miranda in his seminal 1940 piece,
Brouwer's Theorem in turn can be used to prove the Poincare-Miranda's Theorem. The two
results thus imply each other and, in this sense, are equivalent.6
f (x) = 0
Proof A useful consequence of the positive homogeneity condition A.2 is that, without loss
of generality, we can consider the operator E restricted on the simplex
( n
)
X
n
n 1 = p 2 R+ : pi = 1
i=1
We want to prove that E + (p) = 0. Suppose, by contradiction, that there exists a good k for
which Ek+ (p) = Ek (p) > 0. By (14.12), it follows that pk > 0. Hence, by A.3 there exists a
good j for which Sj (p) > Dj (p). Hence, Ej+ (p) = 0. Moreover, A.4 implies that its price is
strictly positive, i.e., pj > 0. In view of (14.12) we can write
n
X
0= Ej+ (p) = pj Ei+ (p)
i=1
Pn
This yields i=1 Ei+ (p) = 0, which contradicts Ek+ (p) > 0. We conclude that E + (p) = 0,
so p is a weak equilibrium price.
A.5 Di (p) < Si (p) for some i with pi > 0 implies Sj (p) < Dj (p) for some j: if some goods
are in excess supply at a positive price, other ones must be in excess demand.
This result shares with our earlier equilibrium existence result, Proposition 618, condi-
tions A.1 and A.4 { the latter being, essentially, the condition Ei (ai ; x i ) 0. Conditions
A.2, A.3 and A.5 are, instead, new and replace the highest price condition Ei (bi ; x i ) 0.
In particular, condition A.2 will be given a compelling foundation in Section 22.9.
In Section 22.9, we will present a simple exchange economy that provides a foundation
in terms of individual behavior of the aggregate market analysis of this section. In that
section we will see that it is natural to expect that the excess demand satis es the following
property:
As it will be seen in Section 22.9, condition W.1 only requires agents to buy a ordable
bundles, while Walras' law requires them to exhaust their budgets, a reasonable but non-
trivial assumption.
Condition W.1 implies condition A.3. So, in the existence Theorem 623 we can replace
A.3 with a weak Walras' law, which has a compelling economic foundation. The stronger
condition W.2 implies both A.3 and A.5, so in the last result Walras' law can replace these
two conditions. A bit more is actually true, as the next simpli ed version of classic results,
due to Kenneth Arrow and Gerard Debreu, shows.8
8
The classic work on this topic is Debreu (1959).
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 445
Theorem 625 (Arrow-Debreu) Under conditions A.1, A.2 and W.1, a weak market equi-
librium exists. If, in addition, A.4 and W.2 hold, then a market equilibrium exists.
and, in particular, Ei+ (p) Ei (p) = 0 for each i. By the de nition of Ei+ (p), we obtain that
Ei (p) 0 for each i. Therefore, p is a weak equilibrium price.
It remains to show that, if also A.4 and W.1 hold, then p is an equilibrium price. Since
W.1 implies A.5, we can proceed as in the proof of Proposition 624.
If ' is linear and A is the real line, by Riesz's Theorem there exists a vector a = (a1 ; :::; an ) 2
Rn such that ' (x) = a x, so we get back to the linear recurrence (8.11). Solutions of this
important class of recurrences have been studied in Section 11.2.2.
9
Most of the analysis of this section continues to hold if A is a subset of Rn , as readers can check.
446 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)
Proof Let f; f~ : N ! R be two solutions of this recurrence. We want to show that f = f~.
We proceed by induction. A preliminary observation: by construction and since the terms
~
i are given, we have f (n) = f (n) for all n < k.
Initial step: just note that f (0) = 0 = f~ (0). Induction step: assume that f (n 1) =
f~ (n 1). By the preliminary observation, if n 1 < k 1, then n < k and f (n) = f~ (n) for
all n < k. By (14.13), if n 1 k 1, then n k and
14.2.2 Asymptotics
From now on, we focus on the recurrence (14.14). We need some notation. Given any selfmap
' : A ! A, its second iterate ' ' : A ! A is denoted by '2 . More generally, 'n : A ! A
denotes the n-th iterate 'n = 'n 1 ', i.e.,
We adopt the convention that '0 is the identity map '0 (x) = x for all x 2 A.
Example 628 (i) Consider the self-map ' : [0; 1) ! [0; 1) de ned by ' (x) = x= (1 + x).
Then,
x
1+x x
'2 (x) = ' (' (x)) = x =
1 + 1+x 1 + 2x
x
x
'3 (x) = ' '2 (x) = 1+2xx =
1 + 1+2x 1 + 3x
Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
x
n+1 n 1+nx x
' (x) = ' (' (x)) = x =
1 + 1+nx 1 + (n + 1) x
as desired.
(ii) Consider the self-map ' : [0; 1) ! [0; 1) de ned by ' (x) = ax2 . Then,
2
'2 (x) = ' (' (x)) = a ax2 = a3 x4
2
'3 (x) = ' '2 (x) = a a3 x4 = a7 x8
Let us verify by induction this guess. Initial step: the guess clearly holds for n = 1. Induction
step: assume it holds for n. Then,
n 1 2n 2 n 1 2n 1 n+1 n+1 1 2n+1
'n+1 (x) = ' ('n (x)) = a a2 x = aa2 a x2 = a2 x
as desired. N
We can represent the sequence fxn g de ned via the recurrence (14.14) using the iterates
'n of the selfmap ' : A ! A. Indeed, we have
A sequence of iterates f'n (x0 )g of points in A that starts from an initial point x0 of A is
called orbit of x0 under '. The collection
ff'n (x0 )g : x0 2 Ag
of all the orbits determined by possible initial conditions is called phase portrait of '. In view
of (14.18), the orbits that form the phase portrait of ' describe how the sequence de ned
by the recurrence (14.14) may evolve according to how it is initialized.
Example 629 (i) For the geometric recurrence, the relation (14.18) takes the familiar form
xn = 'n (x0 ) = an x0 8n 0
Orbits solve the recurrence (14.14) if they can be described in closed form, as it is the
case for the recurrences of the last two examples. Unfortunately, often this is not possible
and so the main interest of (14.18) is theoretical. Yet, operationally it may still suggest a
qualitative analysis of the recurrence. A main issue in this regard is the asymptotic behavior
of orbits: where do they end up eventually? for instance, do they converge?
The next simple, yet important, result shows that xed points play a key role in studying
the convergence of orbits.
Theorem 630 Let ' : A ! A be a continuous self-map and x0 a point of A. If the orbit
f'n (x0 )g converges to x 2 A, then x is a xed point of '.
Proof Assume that xn = 'n (x0 ) ! x 2 A. Since ' is continuous, we have ' (x) =
lim ' ('n (x0 )). So,
' (x) = lim ' ('n (x0 )) = lim 'n+1 (x0 ) = lim xn+1 = lim xn = lim 'n (x0 ) = x
where the equality lim xn+1 = lim xn holds because, as easily checked, if xn ! x then
xn+k ! x for every given k 1. We conclude that x is a xed point, as desired.
So, a necessary condition for a point to be the limit of a sequence de ned by a recurrence
of order 1 is to be a xed point of the underlying self-map. If there are no xed points,
convergence is hopeless. If they exist (e.g., by Brouwer's Theorem), we have some hope.
Yet, it is only a necessary condition: as it will become clear later in the section, there are
xed points of ' that are not limit points of the recurrence (14.14).10
Fixed points thus provide the candidate limit points. We have the following procedure
to study limits of sequences de ned by a recurrence (14.14):
1. Find the collection fx 2 A : ' (x) = xg of the xed points of the self-map '.
2. Check whether they are limits of the orbits f'n (x0 )g, that is, whether 'n (x0 ) ! x.
This procedure is especially e ective when xed points are unique. Indeed, in this case
there is a unique candidate limit point for all possible initial conditions x0 2 A, so if orbits
converge { e.g., they form a monotone sequence, so Theorem 323 applies { then they have to
converge to the xed point. Remarkably, in this case iterations swamp the initial condition,
which asymptotically plays no role in the behavior of the recursion. Regardless of how it
starts, the recursion eventually behaves the same.
In view of this discussion, the next result is especially interesting.11
Proposition 631 If the self-map ' : A ! A is a contraction, then it has at most a unique
xed point.
10
See the oscillating case in Section 14.2.3.
11
Contractions are introduced in Section 19.1. For this section, it is enough to recall its simple de nition for
the case at hand: ' is a contraction if there exists a constant k 2 (0; 1) such that j' (x1 ) ' (x2 )j k jx1 x2 j
for all x1 ; x2 2 A.
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 449
Proof Suppose that x1 ; x2 2 A are xed points. By the de nition of contraction, for some
k 2 (0; 1) we have
0 jx1 x2 j = j' (x1 ) ' (x2 )j k jx1 x2 j
and so jx1 x2 j = 0. This implies x1 = x2 , as desired.
So, recursions de ned by self-maps that are a contraction have at most a single candidate
limit point. It is then enough to check whether it is actually a limit point.
The next example shows, inter alia, that being a contraction is a su cient but not
necessary condition for the uniqueness of xed points.
Example 633 Consider the self-map ' : [0; 1) ! [0; 1) de ned by ' (x) = x= (1 + x). We
have, for all x; y 0,
jx yj
j' (x) ' (y)j =
(1 + x) (1 + y)
So, ' is not a contraction. Nevertheless, it is easy to check that it has a unique xed point
given by the origin x = 0. By (14.16), we have
x0
'n (x0 ) = !0 8x0 0
1 + nx0
So, the orbits converge to the xed point for all initial conditions x0 0. N
In the rest of the section we illustrate our asymptotic analysis through some important
applications.
Classic expectations Let us go back to the recurrence, with initial expectation E0 (p1 ),
(
p1 = E0 (p1 )
(14.19)
pt = pt 1 for t 2
of the equilibrium prices of markets with production delays and classic expectations, that is,
extrapolative expectations of the simplest form Et 1 (pt ) = pt 1 (cf. Section 8.4.3).
450 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)
We now study lim pt to understand the asymptotic behavior of such equilibrium prices.
To this end, consider the map ' : [a; b] ! R de ned by
' (x) = x
Proof Since > 0, the function ' is strictly decreasing and = 2 (0; 1]. Moreover, we
have
' (0) = 0= and ' = 1 0
We can conclude that 0 '( = ) ' (x) ' (0) = = for all x 2 [a; b], that is, ' is a
self-map.
We can thus write ' : [a; b] ! [a; b]. This self-map de nes the price recurrence (14.19).
Its unique xed point of ' is easily seen to be
p=
+
Thus, the unique candidate limit price is the equilibrium price (8.17) of the market without
delays in production.
Let us check whether or not p is indeed the limit point. The following formula is key.
1 1
pt p = pt 1 = pt 1
+ +
+
= pt 1 = pt 1 = (pt 1 p)
( + ) +
that is,
pt p= (pt 1 p) 8t 2 (14.21)
as desired.
Since (
t 1 1 if t odd
( 1) =
1 if t even
from formula (14.20) it follows that
t 1
jpt pj = jp1 pj 8t 1 (14.22)
The value of lim pt thus depends on the ratio = of the slopes of the supply and demand
functions. We need to distinguish two cases according to whether such ratio is lower than
or equal to 1, that is, according to whether < or = .
Case 1: < The supply function has a lower slope than the demand function. We
have
t 1
lim jpt pj = jp1 pj lim =0
So,
lim pt = p (14.23)
as well as
lim Et 1 (pt ) =p (14.24)
When < , the xed point p is indeed a limit point. Equilibrium prices of markets with
delays and classic expectations thus converge to the equilibrium price of the market without
delays in production. This holds for any possible initial expectation E0 (p1 ), which in the
long run turns out to be immaterial.
Note that the (one-step-ahead) forecast error vanishes asymptotically:
et = pt Et 1 (pt ) !0
Classic expectations, though lazy, are nevertheless asymptotically correct provided < .
Case 2: = The demand and supply functions have the same slope. Formula (14.20)
implies
pt p = ( 1)t 1
(p1 p) 8t 1
The initial price p1 is equal to p if and only if the initial expectation is correct:
E0 (p1 ) = p1 () E0 (p1 ) = E0 (p1 ) () E0 (p1 ) = p
So, if the initial expectation is correct, then pt = p for all t 1. Otherwise, the initial error
E0 (p1 ) 6= p1 determines an oscillating sequence of equilibrium prices
2p p1 if t even
pt = p + ( 1)t 1
(p1 p) =
p1 if t odd
for all t 1. Also the error forecast
et = pt Et 1 (pt ) = pt pt 1 = (( 1)t 1
( 1)t 2
) (p1 p) = 2 ( 1)t 1
(p1 p)
keeps oscillating.
452 CHAPTER 14. EQUATIONS AND FIXED POINTS (SDOGANATO)
In view of the equilibrium relation (8.19), in the market of potatoes with production
delays the producers' error forecast et at time t is
et = pt Et 1 (pt ) = + 1 Et 1 (pt )
et = 0 () + 1 Et 1 (pt ) = 0 () Et 1 (pt ) =
+
So, expectations are rational if and only if
Et 1 (pt ) = pt = p = 8t 1
+
We have thus proved the following result.
Proposition 636 A uniperiodal market equilibrium of markets MRt features rational expec-
tations if and only if the sequence of equilibrium prices is constant with
pt = Et 1 (pt ) =p
for all t 1.
The constancy of equilibrium prices is thus equivalent to the correctness of expectations.
A non-trivial price dynamics is, thus, the outcome of forecast errors. This result holds for
any kind of expectations, extrapolative or not. Indeed, the rationality of expectations is
a property of expectations, not an hypothesis on how they are formed: once a possible
expectation formation mechanism is speci ed, a theoretical issue is to understand when they
are correct. For instance, in the previous case = , we saw that classic expectations are
rational if and only if the initial expectation is correct, that is, E0 (p1 ) = p1 .
The uniperiodal price equilibrium under rational expectations of markets MRt with pro-
duction delays is equal to the equilibrium price (8.17) of market M . Remarkably, rational
expectations have neutralized, in equilibrium, any e ect of di erences in production tech-
nologies. In terms of potatoes' equilibrium prices, it is immaterial to have a traditional
technology, with sowing in t 1 and harvest in t, rather than a Star Trek one with instan-
taneous production.
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 453
Consider t 2. We have
(1 ) 1
pt p = (1 ) pt 1 + = pt 1 +
+ +
(1 ) + (1 ) (1 )
= pt 1 + = pt 1 +
( + ) ( + )
(1 ) (1 ) (1 )
= pt 1 + = (pt 1 p)
+
that is,
(1 )
pt p= (pt 1 p) 8t 2
as desired.
which reduces to (14.22) in the classic case = 1. The limit behavior of the prices pt thus
depends on the ratio
(1 )
(1 )
= + (1 ) ( 1) (14.26)
<1 and =1
1< + (1 ) ( 1) < 1
1 a
xn+1 = xn + 8n 2 (14.27)
2 xn
p
Theorem 638 (Heron) Let 0 < a 6= 1. It holds xn ! a.
Thus, Heron's sequence converges to the square root of a. On top of that, the rate of
convergence is quite fast, as we will see in a few examples.
Proof By induction, it is immediate to show that xn > 0 for all n 1. Heron's sequence
is convergent because it is (strictly) decreasing, at least after n = 2. To prove it, we rst
observe that
p p
xn > a =) xn > xn+1 > a (14.28)
p
Indeed, let xn > a. It follows that x2n > a, i.e., xn > a=xn . So,
1 a 1
xn+1 = xn + < (xn + xn ) = xn
2 xn 2
14.2. ASYMPTOTIC BEHAVIOR OF RECURRENCES 455
x4n + a2 a2
x4n 2x2n a + a2 > 0 =) > 2a =) x2
n + > 2a =)
x2n x2n
2
a2 a
x2n + + 2a > 4a =) xn + > 4a
x2n xn
that is,
2
1 a
x2n+1 = xn + >a
4 xn
p
proving that xn+1 > a. This completes the proof of (14.28).
p p
If a > 1, we have x1 = a > a. By (14.28), x2 > a. If, instead, 0 < a < 1, then
p
x2 = (a + 1) =2 > a. Indeed, by squaring we obtain
1 2 3
x1 = 2 ; x2 = 2+ = = 1:5
2 2 2
1 3 2 17
x3 = + = ' 1:4166667
2 2 3=2 12
1 17 2 577
x4 = + = ' 1:4142156
2 12 17=12 408
1 577 2 665857
x5 = + = ' 1:4142135
2 408 577=408 470832
1 a
xn+1 = xn + < xn
2 xn
By iterating the algorithm, xn and a=xn become closer and closer, till they reach their
p
common value a. The following gure illustrates:
y
4
2a/x n+1
a/x
n
1
0
O x x x
n+1 n
-1
-1 0 1 2 3 4 5
Part IV
457
Chapter 15
Example 641 The scalar functions f : R ! R de ned by f (x) = mx for some m 2 R are
linear. Geometrically, they are straight lines passing through the origin with slope m. N
Example 642 Through inner products (Section 4.1.1), it is easy to de ne linear functions.
Indeed, given a vector 2 Rn , de ne f : Rn ! R by
f (x) = x 8x 2 Rn (15.2)
This function f is linear:
n
X
f ( x + y) = ( x + y) = i( xi + yi )
i=1
n
X n
X
= i xi + i yi = ( x) + ( y)
i=1 i=1
= f (x) + f (y)
for every x; y 2 Rn and every ; 2 R. When n = 1, we go back to the last example: f is
then a straight line passing through the origin with slope 2 R. N
1
These functions are sometimes called linear functionals.
459
460 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
Example 643 Production functions may take the linear form (15.2), with = ( 1 ; 2 ; :::; n) 2
Rn interpreted as the vector of constant production coe cients. Indeed, we have
f e1 = e1 = 1
2 2
f e = e = 2
f (en ) = en = n
which means that 1 is the quantity of output determined by one unit of the rst input,
2 is the quantity of output determined by one unit of the second input, and so on. These
coe cients are constant because they do not depend on the quantity of input. This implies
that the returns to scale of these production functions are constant. N
Next, we give a simple but important characterization: a function is linear if and only if
it preserves the operations of addition and scalar multiplication. Linear functions are, thus,
the functions that preserve the linear structure of Rn . This clari es their nature.
f ( x + y) = f ( x) + f ( y) = f (x) + f (y)
Next we show that, more generally, linear combinations are preserved by linear functions.
When k = 2 we are back to the de nition, but the result goes well beyond that, as it holds
for every k 2.
f (0) = 0
and !
k
X k
X
i
f ix = if xi (15.3)
i=1 i=1
k
for every set of vectors xi i=1
in Rn and every set of scalars f i gki=1 .
Proof Let us show that f (0) = 0. Since f is linear, we have f ( 0) = f (0) for all 2 R.
So, f (0) = f (0) for all 2 R, which can happen if and only if f (0) = 0. The proof of
(15.3) is left to the reader.
15.1. LINEAR FUNCTIONS 461
A more general version of (15.3), called Jensen's inequality, will be proved in Chapter
17. Property (15.3) has an important consequence: once we know the values taken by a
linear function on the elements of a basis, we can determine its value for any vector of Rn
whatsoever. Indeed, let S be a basis of Rn . Each vector x 2 Rn can be written as a linear
n
so there exists a nite set of vectors xi i=1 in S and a set of
combination of elements of S, P
n n
scalars f i gi=1 such that x = i=1 i xi . By (15.3), we then have
n
X
f (x) = if xi
i=1
Linearity is a purely algebraic property that requires functions to have a consistent behav-
ior with respect to the operations of addition and scalar multiplication. Thus, prima facie,
linearity has no topological consequences. It is, therefore, remarkable that linear functions
turn out to be continuous.
This elegant result is important because continuity is, as we learned in the last chapter,
a highly desirable property. We omit, however, the proof because it is a special case of a
result, Theorem 833, that will be proved later in the book.
15.1.2 Representation
The operations of addition and scalar multiplication for functions f; g : Rn ! R, additive or
not, and a scalar 2 R are de ned in the usual pointwise way (cf. Section 6.3.2), that is,
and
( f ) (x) = f (x) 8x 2 Rn (15.5)
In particular, linearity is preserved by these operations.
Next we introduce the natural ambient where to carry out these operations over linear
functions.
462 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
De nition 648 The set of all linear functions f : Rn ! R is called the dual space of Rn
and is denoted by (Rn )0 .
By Proposition 647, the dual space (Rn )0 is closed under addition and scalar multiplica-
tion:
The two operations satisfy the properties (v1)-(v8) that, in Chapter 3, we discussed for
Rn . Hence, intuitively, (Rn )0 is an example of a vector space. In particular, the neutral
element for the addition is the zero function f such that f (x) = 0 for every x 2 Rn , while
the opposite element of f 2 (Rn )0 is the function g = ( 1) f = f such that g(x) = f (x)
for every x 2 Rn .
The next important result, an elementary version of the celebrated Riesz's Theorem,
describes the dual space (Rn )0 . We saw that every vector 2 Rn induces a linear function
f : Rn ! R de ned by f (x) = x (Example 642). The following result shows that the
converse holds: all linear functions de ned on Rn have this form, i.e., the dual space (Rn )0
consists of the linear functions of the type f (x) = x for some 2 Rn . In particular, the
straight lines passing through the origin are the unique linear functions de ned on the real
line (Example 641).
Theorem 649 (Riesz) A function f : Rn ! R is linear if and only if there exists a unique
vector 2 Rn such that
f (x) = x 8x 2 Rn
Proof We have already seen the \if" part in Example 642. It remains to prove the \only if"
part. So, let f : Rn ! R be a linear function and consider the standard basis e1 ; :::; en of
Rn . Set
= f e1 ; :::; f (en ) 2 Rn
P
We can write each vector x 2 Rn as x = ni=1 xi ei . Thus, by the linearity of f we have:
n
! n n
X X X
i
f (x) = f xi e = xi f ei = i xi = x 8x 2 Rn
i=1 i=1 i=1
and so 0 = .
15.1.3 Monotonicity
Turn now to the order structure of Rn . A function f : Rn ! R is said to be:
To prove that a linear function is (strictly, strongly) increasing, it thus is enough to show
that it is (strictly, strongly) positive.
Proof \Only if". Let f : Rn ! R be linear and increasing. Let x 0. We want to show
that f (x) 0. As f is increasing, we have
f (x) f (0) = 0
where the equality follows from Proposition 645. We conclude that f (x) 0, as desired.
\If". Let f : Rn ! R be linear and positive. Let x; y 2 Rn be such that x y. We want
to show that f (x) f (y). Set z = x y 2 Rn . Since x y, we have z 0. Positivity and
linearity then imply
f (x) f (y) = f (x y) = f (z) 0
yielding that f (x) f (y), as desired.
Finally, the proof of the strict and strong versions is similar.
Positivity emerges also in the monotone version of Riesz's Theorem. This result is of
great importance in applications as, for example, we will see in Section 24.6.3
f (x) = x 8x 2 Rn
In particular,
2
Positivity with respect to the order structure is weaker than positivity of the image of a function f : A
n
R ! R, a stronger notion requiring f (x) 0 for all x 2 A. In what follows, it should be clear from the
context which notion of positivity we are referring to.
3
Co-named after Andrej Markov because his 1938 piece is an early systematic study of monotone linear
functions in general spaces. A more general version of the Riesz-Markov Theorem will be given in Theorem
765.
464 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
Proof \Only if". Let f : Rn ! R be linear and increasing. As f is linear, by the Riesz
Theorem there exists 2 Rn such that f (x) = x for all x 2 Rn . As f is increasing, we
have
i = ei = f ei 0
for each i = 1; :::; n. This proves that 0, that is, 2 Rn+ .
n
\If". Let f : R ! R be such that f (x) = x for all x 2 Rn , with 2 Rn+ . Clearly,
f is linear. To show that f is increasing, take x; y 2 Rn with x y. We want to show that
f (x) f (y). Set z = x y. Since x y, we have z 0. As 0, we then have
z 0
i = 1 = f (1) > 0
i=1
Pn
As 0, from i=1 i > 0 it follows that > 0.
\If". Let f : Rn ! R be such that f (x) = x for all x 2 Rn , with > 0. Clearly, f is
linear. To show that f is strongly increasing, take x; y 2 Rn with x y. We want to show
that f (x) > f (y). Set z = x y. Since x y, we have z 0. As > 0, we then have
z>0
i = ei = f ei > 0
z>0
(Strictly, strongly) increasing linear functions are thus characterized by (strongly, strictly)
positive representing vectors . Let us see an instance of this result.
15.2. MATRICES 465
As the reader can easily verify, dual results hold for decreasing and negative linear func-
tions. Finally, let us denote by (Rn )0+ the set of the dual space (Rn )0 consisting of all positive
linear functions. If we identify each f 2 (Rn )0+ with its unique positive representing vector
, we can identify (Rn )0+ with Rn+ .
15.2 Matrices
15.2.1 De nition
Matrices play a key role in the study of linear operators. Speci cally, a m n matrix is
simply a table, with m rows and n columns, of scalars
2 3
a11 a12 a1j a1n
6 a21 a22 a2j a2n 7
6 7
6 7
6 7
4 5
am1 am2 amj amn
For example, 2 3
1 5 7 9
4 3 2 1 4 5
12 15 11 9
is a 3 4 matrix, where
Notation The elements (or components or entries) of a matrix are denoted by aij and the
matrix itself is also denoted by (aij ). A matrix with m rows and n columns will be often
denoted by A .
m n
A matrix is called square (of order n) when m = n and is called rectangular when m 6= n.
1 5 7 9 ; 3 2 1 4 ; 12 15 11 9
The 3 3 matrix 2 3
1 5 1
4 3 4 2 5
1 7 9
is square, with three row vectors
1 5 1 ; 3 4 2 ; 1 7 9
Example 654 (i) The square matrix of order n obtained by writing, one next to the other,
the versors ei of Rn is called the identity (or unit) matrix and is denoted by In or, when
there is no danger of confusion, simply by I:
2 3
1 0 0
6 0 1 0 7
6 7
I=6 . . .. .. 7
4 .. .. . . 5
0 0 1
15.2. MATRICES 467
(ii) The m n matrix with all zero elements is called null and is denoted by Omn or, when
there is no danger of confusion, simply by O:
2 3
0 0 0
6 0 0 0 7
6 7
O=6 . . . .. 7
.
4 . . . .
. . 5
0 0 0
N
(i) given two matrices (aij ) and (bij ) in M (m; n), the addition (aij ) + (bij ) is de ned by
2 3 2 3 2 3
a11 a1n b11 b1n a11 + b11 a1n + b1n
6 7 6 7 6 7
6 7+6 7=6 7
4 5 4 5 4 5
am1 amn bm1 bmn am1 + bm1 amn + bmn
that is (aij ) + (bij ) = (aij + bij );
(ii) given 2 R and (aij ) 2 M (m; n), the scalar multiplication (aij ) is de ned by
2 3 2 3
a11 a1n a11 a1n
6 7 6 7
6 7=6 7
4 5 4 5
am1 amn am1 amn
that is (aij ) = ( aij ).
Example 656 Given a square matrix A = (aij ) of order n and two scalars and , we have
2 3
a11 + a12 a1n
6 a21 a22 + a2n 7
A+ I =6 4
7:
5
an1 an2 ann +
N
468 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
It is easy to verify that the operations of addition and scalar multiplication just introduced
on M (m; n) satisfy the properties (v1)-(v8) that in Chapter 3 we established for Rn , that
is:
(v1) A + B = B + A
(v2) (A + B) + C = A + (B + C)
(v3) A + O = A
(v4) A + ( A) = O
(v5) (A + B) = A + B
(v6) ( + ) A = A + A
(v7) 1A = A
(v8) ( A) = ( )A
Intuitively, we can say that M (m; n) is another example of a vector space. Note that
the neutral element for the addition is the null matrix.
(i) symmetric if aij = aji for every i; j = 1; 2; :::; n, i.e., when the two triangles separated
by the main diagonal are mirror images of each other;
(ii) lower triangular if all the elements above the main diagonal are zero, that is, aij = 0
for i < j;
(iii) upper triangular if all the elements below the main diagonal are zero, that is, aij = 0
for i > j;
(iv) diagonal if it is both lower and upper triangular, that is, if all the elements outside the
main diagonal are zero: aij = 0 for i 6= j.
bij = aji
as well as
2 3
1 3
1 0 7 4
A= and T
A = 0 5 5
3 5 1
7 1
Note that
T
AT =A
so the \transpose of the transpose" of a matrix is the matrix itself. In particular, it is easy
to see that a square matrix A is symmetric if and only if AT = A. In this case, transposition
has no e ect. Finally, in terms of operations we have
that is, xT 2 M (n; 1). This allows us to identify Rn also with M (n; 1).
In what follows we will often identify the vectors of Rn with matrices. Sometimes it
will convenient to regard them as row vectors, that is, as elements of M (1; n), sometimes
as column vectors, that is, as elements of M (n; 1). In any case, one should not forget that
vectors are elements of Rn , identi cations are holograms.
470 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
a1 x; a2 x; :::; am x
are the inner products between the rows of A and the vector x. In particular, AxT 2 M (m; 1).
It is thus evident why the dimension of the vector x must be equal to the number of
columns of A: in multiplying A with x, the components of AxT are the inner products
between the rows of A and the vector x. But, inner products are possible only between
vectors of the same dimension.
Notation To ease notation, in what follows we will just write Ax instead of AxT .
However, it is not possible to take the product xA: the number of rows of A (i.e., 3) is not
equal to the number of columns of x (i.e., 1). N
15.2. MATRICES 471
The product matrix AB is of type m q: so, it has the same number of rows as A and
the same number of columns as B. Note that it is possible to take the product AB of the
matrices A and B if and only if the product B T AT of the transpose matrices B T and
m n n q q n
T
A is well-de ned. Momentarily it will be seen that, indeed, (AB)T = B T AT .
n m
This de nition of product between matrices nds its justi cation in Proposition 677,
which we discuss later in the chapter. For the moment, it is important to understand the
\mechanics" of the de nition. To this end, we proceed with some examples.
3 0 + ( 2) 5 + 8 12 + ( 6) ( 1) 3 2 + ( 2) ( 6) + 8 7 + ( 6) 9
AB =
13 0 + 0 5 + ( 4) 12 + 9 ( 1) 13 2 + 0 ( 6) + ( 4) 7 + 9 9
3 3 + ( 2) 1 + 8 0 + ( 6) (11) 92 20 59
=
13 3 + 0 1 + ( 4) 0 + 9 (11) 57 79 138
However, it is not possible to take the product BA: the number of rows of A (i.e., 2) is not
equal to the number of columns of B (i.e., 3). As we just remarked, it is possible, though,
to take the product B T AT ; indeed, the number of columns of B T (i.e., 4) is equal to the
number of rows of AT . N
472 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
The product matrix AB is 2 4. In this regard, note the useful mnemonic rule (2 4) =
(2 3)(3 4). We have:
2 3
1 2 1 0
1 3 1 4 2 5 2 2 5
AB =
0 1 4
0 1 3 2
1 1+3 2+1 0 1 2+3 5+1 1 1 1+3 2+1 3 1 0+3 2+1 2
=
0 1+1 2+4 0 0 2+1 5+4 1 0 1+1 2+4 3 0 0+1 2+4 2
7 18 10 8
=
2 9 14 10
N
The product of matrices has the following properties, as the reader can verify.
Proposition 662 Let A; B and C be any three matrices for which it is possible to take the
products indicated below. Then
(v) (AB)T = B T AT .
Among the properties of the product, commutativity is missing. Indeed, the product of
matrices does not satisfy this property: if both products AB and BA are well-de ned, in
general we have AB 6= BA. The next example will illustrate this notable failure.
When AB = BA, we say that the two matrices commute. Since (AB)T = B T AT , the
matrices A and B commute if and only if their transposes commute.
Since A and B are square matrices, both BA and AB are well-de ned 3 3 matrices. We
have:
2 32 3
2 1 4 1 0 3
4
BA = 0 3 1 5 4 2 1 0 5
4 2 4 1 4 6
2 3
2 1+1 2+4 1 2 0+1 1+4 4 2 3+1 0+4 6
=4 0 1+3 2+1 1 0 0+3 1+1 4 0 3+3 0+1 6 5
4 1+2 2+4 1 4 0+2 1+4 4 4 3+2 0+4 6
2 3
8 17 30
=4 7 7 6 5
12 18 36
while
2 32 3
1 0 3 2 1 4
AB = 4 2 1 0 54 0 3 1 5
1 4 6 4 2 4
2 3
1 2+0 0+3 4 1 1+0 3+3 2 1 4+0 1+3 4
=4 2 2+1 0+0 4 2 1+1 3+0 2 2 4+1 1+0 4 5
1 2+4 0+6 4 1 1+4 3+6 2 1 4+4 1+6 4
2 3
14 7 16
= 4 4 5 9 5
26 25 32
The notion of linear operator generalizes that of linear function (De nition 640), which
is the special case m = 1, that is, Rm = R.
Linear operators are the operators which preserve the operations of addition and scalar
multiplication, thus generalizing the analogous result that we established for linear functions
(Proposition 644). Though natural, it is a signi cant generalization: here T (x) is a vector
of Rm , not a scalar (unless m = 1).
We conclude this rst section with some basic properties of linear operators that gen-
eralize those stated in Proposition 645 for linear functions (the easy proof is left to the
reader).
As we have already seen for linear functions, property (15.9) has the important conse-
quence that, once we know the values taken by a linear operator T on the elements of a basis
of Rn , we can determine the values of T for each vector of Rn .
The operations of addition and scalar multiplication for operators are de ned, as usual
(cf. Section 6.3.2), pointwise: given two operators S; T : Rn ! Rm , linear or not, and a
scalar 2 R, de ne S + T : Rn ! Rm and T : Rn ! Rm by
and
( T ) (x) = T (x) 8x 2 Rn
Denote by
L (Rn ; Rm )
the space of all linear operators T : Rn ! Rm . In the case of linear functions, i.e., m = 1,
the space L (Rn ; R) reduces to the dual space (Rn )0 that we studied before. It is immediate
to check that addition and scalar multiplication preserve linearity:
The space L (Rn ; Rm ) is thus closed under these two operations, which are also easily seen
to satisfy the \usual" properties (v1)-(v8). Again, this means that L (Rn ; Rm ) is, intuitively,
another example of a vector space. To ease notation, in the special case n = m, i.e., for
linear operators T : Rn ! Rn having the same domain and codomain, we just write L (Rn )
in place of L (Rn ; Rn ).
Addition and scalar multiplication are, by now, routine operations. The next notion is,
instead, peculiar to operators.
for every x 2 Rn .
In other words, the product operator ST is the composite function S T . If the operators
S and T are linear, also the product ST is so. Indeed:
for every x; y 2 Rn and every ; 2 R. The product of two linear operators is, therefore,
still a linear operator.
476 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
As Proposition 677 will make clear, in general the product is not commutative: when
both products ST and T S are de ned, in general we have ST 6= T S. Hence, when one writes
ST and T S, the order with which the two operators appear is important.
Last, but not least, we state the version for operators of the remarkable Theorem 646 on
continuity.
We omit the proof because, like Theorem 646, this result is a special case of Theorem
833.
15.3.2 Representation
T (x) = Ax (15.10)
for every x 2 Rn .
The matrix A is called matrix associated to the operator T (or also representative matrix
of the operator T ). In particular, if we identify each linear operator with its associated
matrix, we can identify the space L (Rn ; Rm ) with the space of matrices M (m; n).
Matrices allow us, therefore, to represent operators in the form (15.10), which is of great
importance both theoretically and operationally. This is why matrices are so important:
though the fundamental notion is that of operator, thanks to the representation (15.10)
matrices become a most useful auxiliary notion that will accompany us in the rest of the
book.
Proof \If". This part is contained, essentially, in Example 666. \Only if". Let T be a linear
operator. Set
" #
A = T e1 ; T e2 ; :::; T (en ) (15.11)
m n m 1
m 1 m 1
15.3. LINEAR OPERATORS 477
that is, A is the m n matrix whosePn columns are the column vectors T ei for i = 1; :::; n.
We can write every x 2 Rn as x = ni=1 xi ei . Therefore, for every x 2 Rn ,
n
! n
X X
i
T (x) = T xi e = xi T ei
i=1 i=1
2 3 2 3 2 3
a11 a12 a1n
6 a21 7 6 a22 7 6 a2n 7
6 7 6 7 6 7
= x1 6 .. 7 + x2 6 .. 7+ + xn 6 .. 7
4 . 5 4 . 5 4 . 5
am1 am2 amn
2 3 2 3
a11 x1 + a12 x2 + + a1n xn a1 x
6 a21 x1 + a22 x2 + 7 6
+ a2n xn 7 6 a2 x 7
6 7
=6 .. 7=6 .. 7 = Ax
4 . 5 4 . 5
am1 x1 + am2 x2 + + amn xn am x
(a1n ; a2n ; :::; amn ) = T (en ) = Ben = (b1n ; b2n ; :::; bmn )
Therefore, A = B.
Example 673 De ne T : R3 ! R3 by
T (x) = (0; x2 ; x3 ) 8x 2 R3
and therefore 2 3
" # 0 0 0
A = T e1 ; T e2 ; T e3 =4 0 1 0 5
3 3 3 1 3 1 3 1 0 0 1
Hence, T (x) = Ax for every x 2 R3 . N
Example 674 De ne T : R3 ! R2 by
T (x) = (x1 x3 ; x1 + x2 + x3 ) 8x 2 R3
478 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
T e1 = (1; 1) ; T e2 = (0; 1) ; T e3 = ( 1; 1)
Proposition 675 Let S; T : Rn ! Rm be two linear operators and let 2 R. Let A and B
be the two m n matrices associated to S and T , respectively. Then,
is then the matrix associated to the operator S +T . Moreover, if we take for example = 10,
by Proposition 675, 2 3
0 0 0
4
A = 0 10 0 5
0 0 10
is the then matrix associated to the operator S. N
15.3. LINEAR OPERATORS 479
Then, the matrix associated to the product operator ST : Rn ! Rq is the product matrix
AB = (abij )
q n
The product matrix AB is, therefore, the matrix representation of the product operator
ST . This motivates the notion of product of matrices that, when it was introduced earlier
in the chapter, might have seemed quite arti cial.
n q m
Proof Let ei i=1
, e~i i=1
, and ei i=1
be respectively the standard bases of Rn , Rq , and
Rm . We have
Pm
Therefore, cij = k=1 aik bkj and we conclude that C = AB.
As we saw in Section 15.2.4, the product of matrices is in general not commutative: this,
indeed, re ects the lack of commutativity of the product of linear operators.
480 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
15.4 Rank
15.4.1 Linear operators
The kernel, denoted ker T , of an operator T : Rn ! Rm is the set
That is, the kernel of T is the preimage of 0 under T , i.e., ker T = T 1 (0). The kernel is
thus the set of the points at which the operator takes on a null value (i.e., the zero vector 0
of Rm ). When T is linear, we always have 0 2 ker T because T (0) = 0 by Proposition 669.
Example 678 De ne T : R2 ! R2 by
This operator is easily seen to be linear. We have ker T = f0g, i.e., the zero vector is the
only vector where T takes on value 0. Indeed, a vector x = (x1 ; x2 ) of the plane belongs to
ker T when both the di erence and the sum of its components is null, a property that only
the vector 0 satis es. N
Another important set is the image (or range) of T , which is de ned in the usual way as
Im T = fT (x) : x 2 Rn g Rm (15.14)
The image is, therefore, the set of the vectors of Rm that are \reached" from Rn through
the operator T .
For linear operators the above sets turn out to be vector subspaces, the kernel of the
domain Rn and the image of the codomain Rm .
Proof We show the result for ker T , leaving Im T to the reader. Let x; x0 2 ker T , i.e.,
T (x) = 0 and T (x0 ) = 0. For every ; 2 R, we have
T x + x0 = T (x) + T x0 = 0 + 0 = 0
These vector subspaces are important when dealing with the properties of injectivity
and surjectivity of linear operators. In particular, by de nition the operator T is surjective
when Im T = Rm , that is, when the subspace Im T coincides with the entire space Rm . As
to injectivity, by exploiting the linearity of T we have the following simple characterization
through a null kernel.
Proof \If". Suppose that ker T = f0g. We want to show that T is injective. Let x; y 2 Rn
with x 6= y. Since x y 6= 0, the hypothesis ker T = f0g implies x y 2 = ker T , i.e.,
T (x y) 6= 0. By the linearity of T , we then have T (x) 6= T (y).
\Only if". Let T : Rn ! Rm be an injective linear operator. We want to show that
ker T = f0g. By Proposition 669, T (0) = 0 and so 0 2 ker T . Let 0 6= x 2 Rn . By
injectivity, T (x) 6= T (0) = 0 and so x 2
= ker T . We conclude that ker T = f0g.
We can now state the important Rank-Nullity Theorem, which says that the dimension
n of the Euclidean space Rn is the sum of the dimensions of the two subspaces ker T and
Im T determined by a linear operator T . To this end, we give a name to such dimensions.
Using this terminology, we can now state and prove the result.
(T ) + (T ) = n (15.15)
k
Proof Setting (T ) = k and (T ) = h, let y i i=1
be a basis of the vector subspace Im T
h k
of Rm and xi i=1
a basis of the vector subspace ker T of Rn . Since y i i=1 Im T , by
k n i
de nition there exist k vectors fxi gi=1 in R such that T (xi ) = y for every i = 1; :::; k. Set
n o
E = x1 ; :::; xk ; x1 ; :::; xh
To prove the theorem it is su cient to show that E is a basis of Rn . Indeed, in this case E
consists of n vectors and therefore k + h = n.
First of all, we show that the set E is linearly independent. Let f 1 ; :::; k ; 1 ; :::; h g be
scalars such that
k
X h
X
i
i xi + ix =0 (15.16)
i=1 i=1
h Ph
On the other hand, since xi i=1
is a basis of ker T , we have T i=1 ix
i = 0. Therefore,
k
! k k
X X X
i i i
T ix = iT x = iy =0 (15.17)
i=1 i=1 i=1
4
In this proof we use two di erent zero vectors 0: the zero vector 0Rm in Rm and the zero vector 0Rn in
n
R . For simplicity, we omit subscripts as no confusion should arise.
482 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
k
Being a basis, y i i=1 is a linearly independent set, so (15.17) implies i = 0 for every
P
i = 1; :::; k. Therefore, (15.16) reduces to hi=1 i xi = 0, which implies i = 0 for every
h
i = 1; :::; h because xi i=1 , as a basis, is a linearly independent set. Thus, we conclude that
the set E is linearly independent.
It remains to show that span E = Rn . Let x 2 Rn and consider its image T (x). By
k
de nition, T (x) 2 Im T and therefore, since y i i=1 is a basis of Im T , there exists a set
P
f i gki=1 R such that T (x) = ki=1 i y i . Setting y i = T xi for every i = 1; :::; k, one
obtains !
X k Xk
i i
T (x) = iT x = T ix
i=1 i=1
Pk i
Pk i
Therefore, T x i=1 ix = 0, and so x i=1 ix 2 ker T . On the other hand,
h
xi is a basis of ker T , and therefore there exists a set f i ghi=1 of scalars such that
i=1
P k i
Ph i
Pk i
Ph i
x i=1 i x = i=1 i x . In conclusion, x = i=1 i x + i=1 i x , which shows that
x 2 span E, as desired.
To appreciate the importance of this result, next we present some interesting conse-
quences that it has.
Proof (i) Let T be injective. By Lemma 680, ker T = f0g. Since Im T is a vector subspace
of Rm , we have (T ) = dim (Im T ) dim Rm = m. Therefore, (15.15) reduces to
as claimed.
For a generic function, injectivity and surjectivity are altogether distinct and independent
properties: for instance, the function f : R ! R given by f (x) = arctan x is injective but not
surjective since Im f = ( =2; =2), as seen in Section 6.5.3, while the function f : R ! R
given by f (x) = x sin x is surjective but not injective (as its graph vividly shows). The next
important result, a remarkable consequence of the Rank-Nullity Theorem, shows that for
linear operators T : Rn ! Rn with the same domain and codomain the two properties turn
out to be, instead, equivalent.5
5
In Section 14.1.2 we called self-maps the operators T : Rn ! Rn .
15.4. RANK 483
n = (T ) + (T ) = n + (T )
Hence, (T ) = 0. In turn, this implies that ker T = f0g. By Lemma 680, T is injective.
\Only if". Let T be injective. By Lemma 680, ker T = f0g. By (15.15),
n = (T ) + 0 = (T )
Remarkably, for a linear operator T : Rn ! Rn the following properties are thus equiva-
lent:
(i) T is bijective;
An equivalent way to state this equivalence is to say that the following conditions are
equivalent:
(i) T is bijective;
(ii) (T ) = 0;
(iii) (T ) = n.
De nition 686 The rank of a matrix A, denoted by (A), is the maximum number of its
linearly independent columns.
Let A be the matrix associated to a linear operator T . Since the vector subspace Im T
is generated by the column vectors of A,6 we have (T ) (A) (why?). The next result
shows that, actually, equality holds: the notions of rank for operators and for matrices are
consistent. In other words, the dimension of the image of a linear operator is equal to the
maximum number of linearly independent columns of the matrix associated to it.
Thanks to the Rank-Nullity Theorem, this proposition has the following corollary that
shows that the linear independence of the columns is the matrix counterpart for injectivity.
So far we have considered the linear independence of the columns of A. The connection
with the linear independence of the rows of A is, however, very tight as the next important
result shows. In reading it, note that the rank of the transpose matrix AT is the maximum
number of linearly independent rows of A.
Theorem 690 For every matrix A, the maximum numbers of its linearly independent rows
and columns coincide, i.e., (A) = AT .
6 Pn Pn
Indeed, recall that the i-th column of A is T ei and therefore T (x) = T i=1 xi ei = i=1 xi T ei .
This shows that the image T (x) is a linear combination of the columns of A.
15.4. RANK 485
Proof Let A = (aij ) 2 M (m; n). In the proof we denote the i-th row by Ri and the j-th
column by Cj . We have to prove that the subspace of Rn generated by the rows of A, called
row space of A, has the same dimension of the subspace of Rm generated by the columns
of A, called column space of A. Let r be the dimension of the row space of A, that is,
r = AT , and let fx1 ; x2 ; :::; xr g Rn be a basis of this space, where
Each row Ri of A can be written in a unique way as a linear combination of fx1 ; x2 ; :::; xr g,
that is, there exists a vector of r coe cients (w1i ; w2i ; :::; wri ) such that
Let us concentrate now on the rst column of A, i.e., C1 = (a11 ; a21 ; :::am1 ). The rst
component a11 of C1 is equal to the rst component of R1 , the second component a21 of C1
is equal to the rst component of R2 , and so on until the m-th component am1 of C1 which
is equal to the rst component of Rm . Thanks to (15.18), we have
that is,
2 3 2 1 3 2 3 2 3
a11 w1 w21 wr1
6 a21 7 6 2 7 6 2 7 6 wr2 7
C1 = 6 7 = x11 6 w1 7 + x21 6 w2 7 + + xr1 6 7
4 5 4 5 4 5 4 5
am1 w1m m
w2 m
wr
The column C1 of A can, therefore, be written as linear combination of the vectors w1 ; w2 ; :::; wr ,
where 2 1 3 2 1 3 2 1 3
w1 w2 wr
6 2
w1 77 6 2 7 6 wr2 7
w1 = 6 2 6 w2 7 ; ; wr = 6 7
4 5; w = 4 5 4 5
w1m w2m wrm
In a similar way it is possible to verify that all the n columns of A can be written as linear
combinations of w1 ; w2 ; :::; wr . Therefore, the column space of A is generated by the r vectors
w1 ; w2 ; :::; wr of Rm , which implies that its dimension (A) is lower than or equal to r. That
is,
(A) r = (AT )
By interchanging the rows and the columns and by repeating the same reasoning, we get
r = (AT ) (A)
Since the rst row is obtained by multiplying the second one by 3, the set of all the three
rows is linearly dependent. Therefore, AT < 3. Instead, the two rows (3; 6; 18) and
(0; 1; 3) are linearly independent, like the rows (1; 2; 6) and (0; 1; 3). Therefore, AT = 2.
N
The maximum sets of linearly independent rows or columns can be di erent: in the
matrix of the last example we have two di erent sets, both for the rows and for the columns.
Yet, they have the same cardinality because (A) = AT . It is a remarkable result that, in
view of Corollary 685, shows that for a linear operator T : Rn ! Rn the following conditions
are equivalent:
(i) T is injective;
(ii) T is surjective;
The equivalence of these conditions is one of the deepest results of linear algebra.
O.R. Sometimes one calls rank by rows the maximum number of linearly independent rows,
and rank by columns what we have de ned as the rank, that is, the maximum number of
linearly independent columns. According to these de nitions, Theorem 690 says that the
rank by columns always coincides with the rank by rows. The rank is their common value.H
15.4.3 Properties
From Theorem 690 it follows that, if A 2 M (m; n), we have
If it happens that (A) = min fm; ng, the matrix A is said to be of full (or maximum) rank.
Indeed, the rank cannot assume a higher value.
Note that the rank of a matrix does not change if one permutes the places of two columns.
So, without loss of generality, we can assume that, for a matrix A of rank r, the rst r
columns are linearly independent. This useful convention will be used several times in the
proofs below.
Proof (i) Let r and r0 be the ranks of A and of B: there are r and r0 linearly independent
columns in A and in B, respectively. If r + r0 n the result is trivial because the number of
columns of A + B is n and there cannot be more than n linearly independent columns.
Let therefore r + r0 < n. We denote by as and bs , with s = 1; : : : ; n, the generic columns
of the two matrices, so that the sth column of A + B is as + bs . We can always assume that
the r linearly independent columns of A are the rst ones { i.e., a1 ; : : : ; ar { and that the r0
0
linearly independent columns of B are the last ones { i.e., bn r +1 ; : : : ; bn . In this way the
n (r + r0 ) central columns
n of A + B (i.e., the as +obs with s = r + 1; : : : ; n r0 ) are certainly
0
linear combinations of a1 ; ; ar ; bn r +1 ; : : : ; bn because the as can be written as linear
n 0
o
combinations of a1 ; ; ar and the bs of bn r +1 ; : : : ; bn . It follows that the number of
linearly independent columns of A + B cannot exceed r + r0 . We leave to the reader the
proof of the rest of the statement.
(ii) Let us prove (A) = (AD), leaving to the reader the proof of (A) = (CA) (the
equality (A) = (CAD) can be obtained immediately from the other two ones). If A = O,
the result is trivially true. Let therefore A 6= O and let r be the rank of A; there are therefore
r linearly independent columns: let us call them a1 ; a2 ; : : : ; ar since we can always suppose
that they are the rst r ones; the others, ar+1 ; ar+2 ; ; an are linear combinations of the
rst ones. Let us prove, now, that the columns of AD are linear combinations of the columns
of A. To this end, let A = (aij ) and D = (dij ). Moreover, let i for i = 1; 2; :::; m and aj for
j = 1; 2; :::; n be the rows and the columns of A, and dj for j = 1; 2; :::; n be the columns of
D. Then
2 1
3 2 1
3
d1 1 d2 1 dn
6 2 7 1 2 6 2 d1 2 d2 2 dn 7
AD = 6
4
7 d jd j
5 jdn = 6
4
7
5
m m d1 m d2 m d n
The rst column of AD is, therefore, a linear combination of the columns of A. Analogously,
it is possible to prove that the second column of AD is
Therefore, since each column of AD is a linear combination of the columns of A, the space
generated by the columns of AD is a subspace of Rm of dimension lower than or equal to
that of the space generated by the columns of A. In other words,
Let us suppose, by contradiction, that (AD) < (A) = r. Then, in the linear combinations
(15.20) one of the rst r columns of A always has coe cient zero { otherwise, the column
space of AD would have dimension at least r, being a1 ; a2 ; :::; ar linearly independent vectors
of Rm . Without loss of generality, let us suppose that column a1 is the one having coe cient
zero in all linear combinations (15.20). Then, we have
which is a contradiction since D has full rank and it cannot have a row of only zeros.
Therefore, the space generated by the columns of AD has dimension at least r, that is,
(AD) r. Together with (15.21), this proves the result.
(iii) If A, and therefore AT , has full rank, the result follows from (ii). Suppose that A
has not full rank and let (A) = r, with r < minfm; ng. As seen in (ii), the columns of AT A
are linear combinations of the columns of AT , and so
By assuming that the rst r columns of A are linearly independent, we can write A as
A = B C
m n m r m (n r)
BT BTB BTC
AT A = [B C] = :
CT C TB C TC
By property (ii), the submatrix B T B, which is square of order r, has full rank r. Therefore,
the r columns of B T B are linearly independent vectors of Rr . Consequently, the rst r
columns of AT A are linearly independent vectors of Rn (otherwise, the r columns of B T B
would not be linearly independent). The column space of AT A has dimension at least r,
that is, (AT A) r. Together with (15.22), this proves the result.
15.4. RANK 489
have rank 3: in the rst one the rst three columns are linearly independent (they are
the three versors of R3 ); in the second one the rst three rows are linearly independent.
The matrices (15.23) are a special case of echelon matrices, which are characterized by the
following properties:
(i) the rows with not all elements zero have 1 as rst non-zero component, called pivot
element, or simply pivot;
(ii) the other elements of the column of the pivot are zero;
(iii) pivots form a \little scale" from the left to the right: a pivot of a lower row is to the
right of the pivot of an upper row;
(iv) the rows with all elements zero (if they exist) lie under the other rows, so in the lower
part of the matrix.
in which the pivots are in boldface. Note that a square matrix is an echelon matrix when it
is diagonal, possibly followed by rows of only zeros; for example:
2 3
1 0 0
4 0 1 0 5
0 0 0
Clearly, the non-zero rows (that is, the rows with at least one non-zero element) are linearly
independent. The rank of an echelon matrix is, therefore, obvious.
Lemma 693 The rank of an echelon matrix is equal to the number of non-zero rows.
490 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
There exist some simple operations that permit to transform any matrix A into an echelon
matrix. Such operations, called elementary operations (by row ),8 are:
The three operations amounts to multiplying, on the left, the matrix A 2 M (m; n) by
suitable m m square matrices, called elementary. Speci cally,
(i) multiplying the s-th row of A by a scalar amounts to multiplying, on the left, A by
the elementary matrix Ps ( ) that coincides with the identity matrix Im except that,
in the place (s; s), we have instead of 1;
(ii) adding to the r-th row of A a multiple of the s-th row amounts to multiplying, on
the left, A by the elementary matrix Srs ( ) that coincides with the identity matrix
Im except that, in the place (r; s), we have instead of 0;
(iii) interchanging the r-th row and the s-th row of A amounts to multiplying, on the left,
A by the elementary matrix Trs that coincides with the identity matrix Im except that
the r-th row and the s-th row have been interchanged.
(i) Multiplying A by 2 3
1 0 0
P2 ( ) = 4 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
1 0 0 3 2 4 1 3 2 4 1
4
P2 ( ) A = 0 0 5 4 5
1 0 6 9 = 4 0 6 9 5
0 0 1 5 3 7 4 5 3 7 4
(ii) Multiplying A by 2 3
1 0
S12 ( ) = 4 0 1 0 5
0 0 1
8
Though we could de ne also analogous elementary operations by column, we prefer not to do it and to
refer always to the rows in order to avoid any confusion and errors in computations. Choosing the rows over
the columns does not change the results.
15.4. RANK 491
in which to the rst row one added the second one multiplied by .
(iii) Multiplying A by 2 3
0 1 0
T12 =4 1 0 0 5
0 0 1
on the left, we get
2 3 2 3 2 3
0 1 0 3 2 4 1 1 0 6 9
T12 A=4 1 0 0 5 4 1 0 6 9 5=4 3 2 4 1 5
0 0 1 5 3 7 4 5 3 7 4
The next result, the proof of which we omit, shows the uniqueness of the echelon matrix
to which we arrive via elementary operations:
Lemma 695 Each matrix A 2 M (m; n) is transformed, via elementary operations, into a
unique echelon matrix A 2 M (m; n).
Naturally, di erent matrices can be transformed into the same echelon matrix. The
sequence of elementary operations that transforms a matrix A into the echelon matrix A is
called the Gaussian elimination procedure
where: (1) multiplication of the rst row by 1=3; (2) addition of the rst row to the second
one; (3) addition of 5 times the rst row to the third one; (4) subtraction of the second
row from the rst one; (5) addition of the second row multiplied by 1=2 to the third one; (6)
addition of the third row multiplied by 3=2 to the rst one; (7) subtraction of the third row
multiplied by 22=12 from the second one; (8) multiplication of the second row by 3=2 and of
the third one by 1=4. Finally, we get
2 3
3
1 0 0 2
6 7
6 7
6
A=6 0 1 0 21 7
4 7
4 5
7
0 0 1 4
Example 697 If A is square of order n, the echelon matrix A that the Gaussian elimination
procedure yields is square of order n and upper triangular, with diagonal composed of only
1s and 0s. N
Going back to the calculation of the rank, which was the motivation of this section,
Proposition 692 shows that the elementary operations by row do not alter the rank of A
because the elementary matrices are square matrices of full rank. We have therefore:
To calculate the rank of a matrix one can, therefore, apply Gaussian elimination to obtain
an echelon matrix of equal rank, whose rank is evident. For instance, in the last example
(A) = 3 because all the three rows are non-zero. By Proposition 698, we have (A) = 3,
so matrix A has full rank.
This lemma, whose proof is left to the reader, shows that the inverse operator T 1 is a
linear operator too, that is, T 1 2 L (Rn ).9 Moreover, it is easy to verify that
1 1
T T = TT =I (15.24)
1 0
A=
1 2
The operator T is invertible, as the reader can verify, where T 1 (x) = Bx for every x 2 R2
and
1 0
B= 1 1
2 2
Finding the inverse operator is not an easy task, yet it is not just con ned to guessing. Later
in the chapter we will discuss a procedure allowing for the computation of B. N
In the last section we saw a rst characterization of the invertibility through the notions
of rank and nullity (Corollary 685). We give now another characterization of invertibility.
What is remarkable is that the existence of either a right or a left inverse grants full- edged
invertibility.
Proposition 701 Given an operator T 2 L (Rn ), the following conditions are equivalent:
(i) T is invertible;
T S = RT = I (15.25)
(ii) implies (iii). Assume T S = I. We next show that ker S = ker T = f0g. By
contradiction, assume that there exists x 6= 0 such that S (x) = 0. Since T is linear, we
reach the contradiction 0 = T (0) = T (S (x)) = I (x) = x. By Corollary 685, we conclude
that ker S = f0g and S is bijective. Since ker S = f0g, we are left to show that ker T = f0g.
By contradiction, assume that there exists x 6= 0 such that T (x) = 0. Since S is bijective,
9
Recall that L(Rn ) is the space of linear operators T : Rn ! Rn .
494 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
there exists y 2 Rn such that S (y) = x. Since x 6= 0 and S is bijective, y must be di erent
from 0. Since T S = I, we have y = T (S (y)) = T (x) = 0, a contradiction. By Corollary
685, we conclude that ker T = f0g and that T is injective. So, T is invertible and it is enough
to set R = T 1 .
(iii) implies (iv). Assume that there exists R 2 L (Rn ) such that RT = I. We can use
the same technique of above with the role of T played by R and of S played by T . This
allows us to conclude that R is invertible. Since RT = I, this implies that R 1 (RT ) = R 1 ,
yielding that T = R 1 . Since R 1 is injective, T is invertible and S and R can be chosen to
be T 1 .
(iv) implies (i). Since (iv) is stronger than (ii), with the same technique used to prove
that (ii) implies (iii) we can show that T is invertible.
1 0 1 1 0
A= and A = 1 1
1 2 2 2
T (x) = (x1 x2 ; x1 + x2 )
Corollary 703 For a square matrix A of order n the following properties are equivalent:
(i) A is invertible;
(iv) (A) = n;
(v) there exist two square matrices B and C of order n such that AB = CA = I; such
matrices are unique, with B = C = A 1 .
Proposition 704 If the square matrices A and B of order n are invertible, then their prod-
uct is invertible and
(AB) 1 = B 1 A 1
Proof Let A and B be of order n and invertible. We have (A) = (B) = n, so that
(AB) = n by Proposition 692. By Corollary 703, the matrix AB is invertible. Recall from
(6.12) of Section 6.4 that, for the composition of invertible functions f and g, one has that
(g f ) 1 = f 1 g 1 . In particular this holds for linear operators, that is, (ST ) 1 = T 1 S 1 ,
so Proposition 677 implies (AB) 1 = B 1 A 1 .
15.6 Determinants
15.6.1 De nition
A matrix contained in a matrix A 2 M (m; n) is called a submatrix of A. It can be thought
of as obtained from A by deleting some rows and/or columns. In particular, we denote by
Aij the (m 1) (n 1) submatrix obtained from A by deleting row i and column j.
that is, its determinant is simply the product of the elements of the main diagonal. Indeed,
all the other products are zero because they necessarily contain a zero element of the rst
row.
Since det A = det AT (Proposition 717), a similar result holds for upper triangular ma-
trices, so also for the diagonal ones. N
Example 710 If the matrix A has all the elements of its rst row zero except for the rst
one, which is equal to 1, then
2 3
1 0 0 2 3
6 a21 a22 7 a22 a2n
6 a2n 7 6 .. .. .. 7
det 6 . .. .. .. 7 = det 4 . . . 5
4 .. . . . 5
an2 ann
an1 an2 ann
That is, the determinant coincides with the determinant of the submatrix A11 . Indeed, in
n
X
det A = ( 1)1+j a1j det A1j
j=1
all the summands except for the rst one are zero. More generally, for any scalar k we have
2 3
k 0 0 2 3
6 a21 7 a22 a2n
6 a22 a2n 7 6 . .. .. 7
det 6 . .. .. .. 7 = k det 4 .. . . 5
4 .. . . . 5
an2 ann
an1 an2 ann
The determinant of a square matrix can, therefore, be calculated through a well speci ed
procedure { an algorithm { based on its submatrices. There exist various techniques to
simplify the calculation of determinants (we will see some of them shortly) but, for our
purposes, it is important to know that they can be calculated through algorithms.
15.6.2 Geometry
Geometrically, the determinant of a square matrix measures (with a sign!) the \space taken
up" by its column vectors. Let us try to explain this, at least in the simplest case. So, let A
be the matrix 2 2
a11 a12
A=
a21 a22
498 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
in which we assume that a11 > a12 > 0 and a22 > a21 > 0 (the other possibilities can be
similarly studied, as readers can check).
3 G
2 a F C E
22
1 a B
21
0
D
O a a
12 11
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
The determinant of A is the area of the parallelogram OBGC (see the gure), i.e., twice
the area of the triangle OBC that is obtained from the two column vectors of A. The area
of the triangle OBC can be easily calculated by subtracting from the area of the rectangle
ODEF the areas of the three triangles ODB, OCF , and BEC. Since
a11 a21 a22 a12
area ODEF = a11 a22 ; area ODB = ; area OCF =
2 2
(a11 a12 ) (a22 a21 ) a11 a22 a11 a21 a12 a22 + a12 a21
area BCE = =
2 2
one gets
a11 a21 + a22 a12 + a11 a22 a11 a21 a12 a22 + a12 a21
area OBC = a11 a22
2
a11 a22 a12 a21
=
2
Therefore,
det A = area OBGC = a11 a22 a12 a21
The reader will immediately realize that:
(i) if we exchange the two columns, the determinant changes only its sign (because the
parallelogram is covered in the opposite direction);
(ii) if the two vectors are proportional, that is, linearly dependent, the determinant is zero
(because the parallelogram collapses in a segment).
One has
6 2
area ODEF = 6 8 = 48; area ODB =
=6
2
8 4 (6 4) (8 2)
area OCF = = 16; area BCE = =6
2 2
and so
area OBC = 48 6 16 6 = 20
We conclude that
det A = area OBGC = 40
For 3 3 matrices, the determinant is the volume (with sign) of the hexahedron determined
by the three column vectors.
15.6.3 Combinatorics
A permutation of the set of numbers N = f1; 2; :::; ng is any bijection : N ! N (Appendix
B.2). There are n! possible permutations. For example, the permutation
interchanges the rst two elements of N and leave the others unchanged. So, it is represented
by the function : N ! N de ned by
8
>
> 2 if k = 1
<
(k) = 1 if k = 2
>
>
:
k else
is called parity.10 In particular, an even permutation has parity +1, while an odd permutation
has parity 1.
Example 711 (i) The permutation (15.26) is odd because there is only one inversion, with
k = 1 and k 0 = 2. So, its parity is 1. (ii) The identity permutation (k) = k has, clearly,
no inversions. So, it is an even permutation, with parity +1. N
10
Though the symbol sgn has been already used for the signum function, no confusion should arise.
500 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
Let us go back to determinants. Consider a 2 2 matrix A and set N = f1; 2g. In this
case consists of only two permutations and 0 , de ned by
( (
1 if k = 1 0
2 if k = 1
(k) = and (k) =
2 if k = 2 1 if k = 2
Indeed:
0
(sgn ) a1 (1) a2 (2) + sgn a1 0 (1) a2 0 (2) = a11 a22 a12 a21
The next result shows that this remarkable fact is true in general, thus providing an important
combinatorial characterization of determinants (we omit the proof).
Note that each term in the sum (15.27) contains only one element of each row and only
one element of each column. This will be crucial in the proofs of the next section.
15.6.4 Properties
The next proposition collects some of the main properties of determinants, which are also
useful for their computation. In the statement \line" stands for either row or column: the
properties hold, indeed, symmetrically for both the rows and the columns of the matrix.
\Parallel lines" means two rows or two columns.
Proposition 713 Let A and B be two square matrices of the same order. Then:
(iii) If B is obtained from A by interchanging two parallel lines, then det B = det A.
(v) If a line of A is the sum of two vectors b and c, then det A is the sum of the determinants
of the two matrices that are obtained by taking that line equal rst to b and then to c.
(vi) If B is obtained from A by adding to a line a multiple of a parallel line, then det B =
det A.
15.6. DETERMINANTS 501
Proof The proof relies on the combinatorial characterization of the determinant established
in Proposition 712, in particular on the observation that each term that appears in the
determinant contains exactly one element of each row and one element of each column. In
the proof we only consider rows (similar arguments hold for the columns).
(i) In all the products that constitute the determinant, there appears one element of each
row: if a row is zero, all the products are then zero.
(ii) For the same reason, all the products turn out to be multiplied by k.
(iii) By exchanging two rows, all the even permutations become odd and vice versa.
Therefore, the determinant changes sign.
(iv) Let A be the matrix that has rows i and j equal and let Aij be the matrix A with
such rows interchanged. By (iii), we have det Aij = det A. Nevertheless, since the two
interchanged rows are equal, we have A = Aij . So, det Aij = det A. This is possible if and
only if det Aij = det A = 0.
(v) Suppose
2 1 3 2 3
a a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
A=6 7 6
6 ar 7 = 6 b + c 7
7
6 7 6 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am
and let 2 3 2 3
a1 a1
6 a2 7 6 a2 7
6 7 6 7
6 .. 7 6 .. 7
6 . 7 6 . 7
Ab = 6
6
7
7 and Ac = 6
6
7
7
6 b 7 6 c 7
6 .. 7 6 .. 7
4 . 5 4 . 5
am am
be the two matrices obtained by taking as r-th row b and c, respectively. Then
0 1
X n
Y X Y
det A = sgn ai (i) = sgn @ ai (i)
A (b + c)
r (r)
2 i=1 2 i6=r
0 1 0 1
X Y X Y
= sgn @ ai (i)
A br (r) + sgn @ ai (i)
A cr (r) = det Ab + det Ac
2 i6=r 2 i6=r
The matrix obtained from A by adding, for example, k times the rst row to the second one,
is 2 3
a1
6 a2 + ka1 7
6 7
B=6 .. 7
4 . 5
am
Moreover, let 2 3 2 3
a1 a1
6 ka1 7 6 a1 7
6 7 6 7
C=6 .. 7 and D = 6 .. 7
4 . 5 4 . 5
am am
By (v), det B = det A + det C. On the other hand, by (ii) we have det C = k det D. But,
since D has two equal rows, by (i) we have det D = 0. We conclude that det B = det A.
An important operational consequence of this proposition is that now we can say how
the elementary operations E1 -E3 , which characterize the Gaussian elimination procedure,
modify the determinant of A. Speci cally:
or, equivalently, det A = 0 if and only if det B = 0. This observation leads to the following
important characterization of square matrices of full rank.
Proposition 714 A square matrix A has full rank if and only if det A 6= 0.
Proof \Only if". If A has full rank, its rows are linearly independent (Corollary 703). By
Lemma 695 and Proposition 698, A can be then transformed via elementary operations into
a unique echelon square matrix of full rank, that is, the identity matrix In . By (15.28), we
conclude that det A 6= 0.
\If". Let det A 6= 0. Suppose, by contradiction, that A does not have full rank. Then,
its rows are not linearly independent (Corollary 703), so at least one of them is a linear
combination of the others. Such row can be reduced to become zero by repeatedly adding
to it carefully chosen multiples of the other rows. Denote by B such transformed matrix.
By Proposition 713-(i), det B = 0, so by (15.28) we have det A = 0, a contradiction. We
conclude that A has full rank.
Corollary 703 and the previous result jointly imply the following important result.
15.6. DETERMINANTS 503
Corollary 715 For a square matrix A the following conditions are equivalent:
(iii) det A 6= 0.
The determinants behave well with respect to the product, as the next result shows. It
is a key property of determinants.
Theorem 716 (Binet) If A and B are two square matrices of the same order n, then
det AB = det A det B.
So, determinants commute: det AB = det BA. This is a rst interesting consequence of
Binet's Theorem. Since I = A 1 A, another interesting consequence of this result is that
1 1
det A =
det A
when A is invertible. Indeed, 1 = det I = det A 1 A = det A 1 det A. In particular, this
formula implies that det A 6= 0 if and only if det A 1 6= 0.
Proof If (at least) one of the two matrices has linearly dependent rows or columns, then the
statement is trivially true since the columns of AB are linear combinations of the columns
of A and the rows of AB are linear combinations of the rows of B, hence in both cases AB
has also linearly dependent rows or columns, so det AB = 0 = det A det B.
Suppose, therefore, that both A and B have full rank. Suppose the matrix A is diagonal.
If so, det A = a11 a22 ann . Moreover, we have
0 10 1
a11 0 0 b11 b12 b1n
B 0 a22 0 C B b2n C
AB = B C B b21 b22 C
@ A@ A
0 0 ann bn1 bn2 bnn
0 1
a11 b11 a11 b12 a11 b1n
B a22 b21 a22 b22 a22 b2n C
=B@
C
A
ann bn1 ann bn2 ann bnn
By Proposition 713-(ii),
D = S ( )S ( ) S ( )T T T A
| {z }| {z }
h times k times
Since D is obtained from A through h elementary operations that do not modify the deter-
minant and k elementary operations that only change its sign, we have det D = ( 1)k det A.
Therefore,
det DB = ( 1)k det A det B (15.29)
DB = (S ( ) S ( )T T A) B = (S ( ) S ( )T T ) (AB)
Therefore, DB is obtained from AB via h elementary operations that do not modify the
determinant and k elementary operations that only change its sign. So, as before, we have
Putting together (15.29) and (15.30), we get det AB = det A det B, as desired.
when = 1.
Proof (i) Transposition does not alter any of the n! products in the sum (15.27), as well
as their parity. (ii) Let 2 R. First, observe that A = ( I) A. By Binet's Theorem,
det ( A) = det (( I) A) = det ( I) det A = n det A.
15.6. DETERMINANTS 505
The matrix of algebraic complements (or cofactor matrix ) of A, denoted by A , is the matrix
whose elements are the algebraic complements of the elements of A, that is,
A = aij
with i; j = 1; 2; :::; n. The transpose (A )T is sometimes called the (classical ) adjoint matrix.
Similarly,
We conclude that 2 3
16 26 27
A =4 12 4 15 5
6 2 16
N
506 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
Using the notion of algebraic complement, the de nition of the determinant of a square
matrix (De nition 706) can be viewed as the sum of the products of the elements of the rst
row by their algebraic complements, that is,
n
X
det A = a1j a1j
j=1
The next result shows that, actually, there is nothing special about the rst row: the
determinant can be computed using any row or column of the matrix. The choice of which
one to use is then just a matter of analytical convenience.
Proposition 719 The determinant of a square matrix A is equal to the sum of the products
of the elements of any line (row or column) by their algebraic complements.
Proof For the rst row, the result is just a rephrasing of the de nition of determinant. Let
us verify it for the i-th row. By points (ii) and (v) of Proposition 713 we can rewrite det A
in the following way:
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det A = det 6
6 ai1 aij ain 7 7 (15.31)
6 .. .. 7
4 . . 5
an1 anj ann
2 3 2 3
a11 a1j a1n a11 a1j a1n
6 .. .. 7 6 .. .. 7
6 . . 7 6 . . 7
6 7 6 7
= ai1 det 6
6 1 0 0 7 7+ + + aij det 6
6 0 1 0 7 7
6 .. .. 7 6 .. .. 7
4 . . 5 4 . . 5
an1 anj ann an1 anj ann
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
+ +ain det 6
6 0 0 1 7 7
6 .. .. 7
4 . . 5
an1 anj ann
15.6. DETERMINANTS 507
Let us calculate the determinant of the submatrix relative to the term (i; j):
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
6
det 6 0 1 0 7 (15.32)
7
6 .. .. 7
4 . . 5
an1 anj ann
Note that to be able to apply the de nition of the determinant and to use the notion of
algebraic complement, it is necessary to bring the i-th row to the top and the j-th column to
the left, i.e., to transform the matrix (15.32) into a matrix that has (1; 0; :::; 0) as rst row,
(1; a1j ; a2j ; :::; ai 1;j ; ai+1;j ; :::anj ) as rst column and Aij as the (n 1) (n 1) South-East
submatrix:
2 3
1 0 0 0 0
6 a1j a11 a1;j 1 a1;j+1 a1n 7
6 7
6 7
6 7
~ 6
A = 6 ai 1;j ai 1;1 ai 1;j 1 ai 1;j+1 ai 1;n 7
7
6 ai+1;j ai+1;1 ai+1;j 1 ai+1;j+1 ai 1;n 7
6 7
4 5
anj an1 an;j 1 an;j+1 an;n
The transformation requires i 1 exchanges of adjacent rows to bring the i-th row to the
top, and j 1 exchanges of adjacent columns to bring the j-th column to the left (leaving
the order of the other rows and columns unchanged). Clearly, we have
det A~ = 1 det Aij
and so
2 3
a11 a1j a1n
6 .. .. 7
6 . . 7
6 7
det 6
6 0 1 0 7 7 = ( 1)
i+j 2
det A~ = ( 1)i+j det Aij = aij (15.33)
6 .. .. 7
4 . . 5
an1 anj ann
By applying formula (15.31) and using (15.33) we complete the proof.
Example 720 Let 2 3
1 3 4
A= 4 2 0 2 5
1 3 1
By Proposition 719, we can compute the determinant using any line. It is, however, simpler
to compute it using the second row because it contains a zero, a feature that facilitates the
algebra. Indeed,
det A = a21 a21 + a22 a22 + a23 a23
3 4 1 3
= ( 2)( 1)2+1 det + 0 + (2)( 1)2+3 det
3 1 1 3
= ( 2)( 1)( 15) + 0 + (2)( 1)(0) = 30
508 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
The next result completes Proposition 719 by showing what happens if we use the alge-
braic complements of a di erent row (or column).
Proposition 721 The sum of the products of the elements of any row (column) by the al-
gebraic complements of a di erent row (column) is zero.
Let us add the products of the elements of the second row by the algebraic complements of
the rst row:
2a11 + a12 + 3a13 = 26 4 + 30 = 0
Now, let us add the products of the elements of the second row by the algebraic complements
of the third row:
2a31 + a32 + 3a33 = 4 1 3 = 0
The reader can verify that, in accordance with the last result, we get 0 in all the cases
in which we add the products of the elements of a row by the algebraic complements of a
di erent row. N
The last two results are summarized in the famous, all-inclusive, Laplace's Theorem:
and
n
X
aij aiq = jq det A
i=1
Theorem 724 A square matrix A is invertible if and only if det A 6= 0. In this case,
1
A 1
= (A )T
det A
(i) A is invertible;
(v) (A) = n.
we have 2 3
1
6 2 7
A (A )T = 6
4
7 (
5 )1 j ( )2 j j( )n
n
where (a C )i is the i-th column of A and (aC )q is the q-th column of A. Therefore,
2 3
det A 0 0
6 0 det A 0 7
6 7
A (A )T = (A )T A = 6 . . .. .. 7 = det A In
4 .. .. . . 5
0 0 det A
That is,
1 1
A (A )T = (A )T A = In
det A det A
which allows us to conclude that
1
A 1
= (A )T
det A
as desired.
Example 725 We use formula (15.34) to calculate the inverse of the matrix
1 2
A=
3 5
We have
det A11 a22 5
a111 = ( 1)1+1 = = = 5
det A a11 a22 a12 a21 1
det A21 a12 2
a121 = ( 1)1+2 = = =2
det A a11 a22 a12 a21 1
det A12 a21 3
a211 = ( 1)2+1 = = =3
det A a11 a22 a12 a21 1
det A22 a11 1
a221 = ( 1)2+2 = = = 1
det A a11 a22 a12 a21 1
So,
a22 a12
1 det A det A 5 2
A = a21 a11 =
det A det A 3 1
N
Proposition 728 A matrix A has rank r if and only if it has a square submatrix of order r
with a non-zero determinant and all its square submatrix of order r + 1 (if there exist any)
have a zero determinant.
1 0
det = 1
0 1
This result sheds some further light on the nature of the rank of a matrix, but the next
version is more useful to compute it as it involves fewer minors.
11
The property is easy to verify and has already been used in the proof of Proposition 692.
12
This example is from Mirsky (1955) p. 136.
15.6. DETERMINANTS 513
Proposition 730 A matrix A has rank r if and only if it has a square submatrix B of order
r with non-zero determinant and such that each square submatrix C of order r + 1 (if there
exist any) that contains B has a zero determinant.
The Kronecker Algorithm uses this result to compute the rank of a matrix through
determinants. It proceeds as follows.
The Kronecker Algorithm is best described using minors. To this end, given a matrix A,
we call bordered minor of a square submatrix B of order r the determinant of a submatrix
C of A of order r + 1 that contains B.
15.6.8 Summing up
We conclude this section by noting how the rank of a matrix is simultaneously many things
{ each one of them being a possible de nition of it. Indeed, it is:
The rank is a multi-faceted notion that plays a key role in linear algebra and its many
applications. Operationally, the Gaussian elimination procedure and the Kronecker's Algo-
rithm permit to compute it.
514 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
In matrix form:
A x = b (15.35)
n nn 1 n 1
where A is a square n n matrix, while x and b are vectors in Rn . We ask two questions
concerning the system (15.35):
Existence: which conditions ensure that the system has a solution for every vector
b 2 Rn , that is, when, for any given b 2 Rn there exists a vector x 2 Rn such that
Ax = b?
Uniqueness: which conditions ensure that such a solution is unique, that is, when, for
any given b 2 Rn there exists a unique x 2 Rn such that Ax = b?
To frame the problem in what we studied until now, consider the linear operator T :
Rn ! Rn associated to A, de ned by T (x) = Ax for every x 2 Rn . The system (15.35) can
be written in functional form as
T (x) = b
So, it is immediate that:
the system admits a unique solution for a given b 2 Rn if and only if the preimage
T 1 (b) is a singleton; in particular, the system admits a unique solution for every
b 2 Rn if and only if T is injective.13
Since injectivity and surjectivity are, by Corollary 685, equivalent properties for linear
operators from Rn into Rn , the two problems of existence and uniqueness are equivalent:
there exists a solution for the system (15.35) for every b 2 Rn if and only if such a solution
is unique.
In particular, a necessary and su cient condition for such a unique solution to exist for
every b 2 Rn is that the operator T is invertible, i.e., that one of the following equivalent
conditions holds:
The condition required is, therefore, the invertibility of the matrix A, or one of the
equivalent properties (ii) and (iii). This is the content of Cramer's Theorem, which thus
follows easily from what we learned so far.
Theorem 732 (Cramer) Let A be a square matrix of order n. The system (15.35) has
one, and only one, solution for every b 2 Rn if and only if the matrix A is invertible. In this
case, the solution is given by
x = A 1b
Thus, the system (15.35) admits a solution for every b if and only if the matrix A is
invertible and, even more important, the unique solution is expressed in terms of the inverse
matrix A 1 . Since we are able to calculate A 1 using determinants (Theorem 724), we
have obtained a procedure for solving linear systems of n equations in n unknowns: formula
x = A 1 b can indeed be written as
1
x= (A )T b (15.36)
det A
Using Laplace's Theorem, it is easy to show that formula (15.36), called Cramer's rule, can
be written in detail as: 2 det A1 3
det A
6 7
6 det A2 7
x=6 det A 7 (15.37)
4 5
det An
det A
where Ak denotes the matrix obtained by replacing the k-th column of the matrix A with
the column vector 2 3
b1
6 b2 7
b=6 4
7
5
bn
14
Alternatively, it is possible to prove the \if" in the following, rather mechanical, way. Set x = A 1 b; we
have Ax = A A 1 b = AA 1 b = Ib = b, so x = A 1 b solves the system. It is also the unique solution.
~ 2 Rn is another solution, we have x
Indeed, if x ~ = Ix~ = A 1A x ~ = A 1 (A~x) = A 1 b = x as claimed.
516 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
Example 733 A special case of the system (15.35) is when b = 0. Then the system is called
homogeneous and, if A is invertible, by Proposition 732 the unique solution is x = 0. N
1 2
A=
3 5
From Example 725 we know that A is invertible. By Proposition 732, the unique solution of
the system is therefore
1 5 2 b1 5b1 + 2b2
x=A b= =
3 1 b2 3b1 b2
b1 2 1 b1
det A = 1 det A1 = det = 5b1 2b2 det A2 = det = b2 3b1
b2 5 3 b2
Therefore,
5b1
2b2 b2 3b1
x1 = = 5b1 + 2b2 ; x2 = = 3b1 b2
1 1
which coincides with the solution found above. N
x = (1 + 2 2; 1 2; 1 2 2) = (5; 3; 5)
Hence
b1 2b3 b2 + b3 b2 + 2b3
x1 = = b1 + 2b3 x2 = = b2 b3 x3 = = b2 2b3
1 1 1
which coincides with the solution found above. N
This is the so-called Vandermonde matrix, which permits to write the linear system (15.38)
in the standard matrix form
Ax = b (15.40)
For instance,
2 3
1 a1 a21
6 7
det 4 1 a2 a22 5 = (a2 a1 ) (a3 a2 ) (a3 a1 )
1 a3 a23
Proof We proceed by induction on the order of the matrix. Initial step: for n = 1 we trivially
have det A = 1. Induction step: assume that formula (15.41) holds for Vandermonde matrices
of order n 1. Consider the following operation: subtract from each column the previous
one multiplied by a1 . By Proposition 713,
2 3
1 a1 a21 a1n 1
6 7
6
det A = det 6 1 a2 a22 a2n 1 7
7
4 5
1 an a2n ann 1
2 3
1 a1 a1 a21 a21 an1 1
a1 a1n 2
6 7
6
= det 6 1 a2 a1 a22 a2 a1 an2 1
a1 a2n 2 7
7
4 5
1 an a1 a2n an a1 ann 1 a1 ann 2
2 3
1 0 0 0
6 7
6
= det 6 1 a2 a1 a2 (a2 a1 ) an2 2
(a2 a1 ) 7
7
4 5
1 an a1 an (an a1 ) ann 2 (a
n a1 )
Consider now the following operation: divide each row i, expect the rst one, by ai a1 .
Again by Proposition 713,
2 3
1 0 0 0
6 7
6
det 6 1 a2 a1 a2 (a2 a1 ) an2 2
(a2 a1 ) 7
7
4 5
1 an a1 an (an a1 ) ann 2 (an a1 )
2 3
1 0 0 0
6 1 n 2 7
6 1 a2 a2 7
= (a2 a1 ) det 6 a2 a1 7
4 5
1 an a1 an (an a1 ) ann 2 (an a1 )
2 3
1 0 0 0
6 7
6 1
1 a2 an2 2 7
= (a2 a1 ) (an a1 ) det 6 a2 a1 7
4 5
1 n 2
a a 1 an an
2 n 1 3
n 2
1 a2 a2
6 7
= (a2 a1 ) (an a1 ) det 4 5
1 an n
an 2
15.8. GENERAL LINEAR SYSTEMS 519
where the last equality follows from Laplace's Theorem. By the induction hypothesis,
2 3
1 a2 an2 2 Y
det 4 5= (ai aj )
1 an ann 2 2 j<i n
So,
n
Y Y Y
det A = (ai a1 ) (ai aj ) = (ai aj )
i=2 2 j<i n 1 j<i n
as desired.
Corollary 737 A Vandermonde matrix A of order n is invertible if and only if its n pa-
rameters a1 , a2 , ..., an are distinct.
Armed with this result, let us go back to the linear system (15.40). If the parameters a1 ,
a2 , ..., an are distinct, the Vandermonde matrix A is invertible. So, by Cramer's Theorem
this linear system admits a unique solution x, which in turn Pnprovides the coe cients of the
n j 1
interpolating polynomial px : R ! R de ned by px (a) = j=1 ai xj and such that
px (ai ) = bi 8i = 1; :::; n
This polynomial is thus able to match parameters and known terms, in accordance with
(15.39).15
where A 2 M (m; n), x 2 Rn , and b 2 Rm . The square system is the special case where
n = m.
(ii.a) determined (or uniquely solvable) when it admits only one solution, i.e., T 1 (b) is a
singleton;
(ii.b) undetermined when it admits in nitely many solutions, i.e., T 1 (b) has in nite cardi-
nality.16
These two cases exhaust all the possibilities: if a system admits two solutions, it certainly
has in nitely many ones. Indeed, if x and x0 are two di erent solutions { that is, Ax = Ax0 =
b { then all the linear combinations x+(1 ) x0 with 2 R are also solutions of the system
because
A x + (1 ) x0 = Ax + (1 ) Ax0 = b + (1 )b = b
Using this terminology, in the case n = m Cramer's Theorem says that a square linear
system is solvable for every vector b if and only if it is determined for every such vector. In
this section we modify the analysis of the last section in two di erent directions:
(ii) we study the existence and uniqueness of solutions for a given vector b (so, for a speci c
system at hand), rather than for every such vector.
To this end, let us consider the so-called augmented (or complete) matrix of the system
Ajb
m (n+1)
obtained by writing near A the vector b of the known terms. The next famous result gives
a necessary and su cient condition for a linear system to have a solution.
Proof Let T : Rn ! Rm be the linear operator associated to the system, which can therefore
be written as T (x) = b. The system is solvable if and only if b 2 Im T . Since Im T is the
vector subspace of Rm generated by the columns of A, the system is solvable if and only if b
is a linear combination of such columns. That is, if and only if the matrices A and Ajb have
the same number of linearly independent columns (so, the same rank).
16
Since the set T 1 (b) is convex, it is a singleton or it has in nite cardinality (in particular, it has the
power of the continuum), tertium non datur. We will introduce convexity in the next chapter.
15.8. GENERAL LINEAR SYSTEMS 521
the third row is the di erence between the second and rst rows. These three rows are thus
not linearly independent: (A) = (Ajb) = 2. So, the system is solvable. N
Example 740 A homogeneous system is always solvable because the zero vector is always
a solution of the system. This is con rmed by the Kronecker-Capelli's Theorem because the
ranks of A and of Aj0 are always equal. N
Note the Kronecker-Capelli's Theorem considers a given pair (A; b), while Cramer's Theo-
rem considers, as given, only a square matrix A. This re ects the new direction (ii) mentioned
above and, for this reason, the two theorems are only partly comparable in the case of square
matrices A. Indeed, Cramer's Theorem considers only the case (A) = n, in which condition
(15.42) is automatically satis ed for every b 2 Rn (why?). For this case, it is more powerful
than Kronecker-Capelli's Theorem: the existence holds for every vector b and, moreover,
we have also the uniqueness. But, di erently from Cramer's Theorem, Kronecker-Capelli's
Theorem is able to handle also the case (A) < n by giving, for a given vector b, a necessary
and su cient condition for the system to be solvable.
15.8.2 Uniqueness
We now turn our attention to the uniqueness of the solutions of a system Ax = b, whose
existence is guaranteed by Kronecker-Capelli's Theorem. The next result shows that for
uniqueness, too, it is necessary to consider the rank of the matrix A (recall that, thanks to
condition (15.19), we have (A) n).
Proposition 742 Let T : Rn ! Rm be a linear operator and suppose T (x) = b. The vectors
x 2 Rn for which T (x) = b are those of the form x + z with z 2 ker T , and only them. That
is,
T 1 (b) = fx + z : z 2 ker T g (15.43)
522 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
The \only if" part of Lemma 680 { i.e., that linear and injective operators have trivial
kernels { is a special case of this result. Indeed, suppose that the linear operator T is
injective, so that T 1 (0) = f0g. If b = 0, we can set x = 0 and (15.43) then implies
f0g = T 1 (0) = f0 + z : z 2 ker T g = ker T . So, ker T = f0g.
Corollary 743 If x is a solution of the system Ax = b, then all solutions are of the form
x+z
Therefore, once we nd a solution of the system Ax = b, all the other solutions can be
found by adding to it the solutions of the homogeneous system Ax = 0. Besides its theoretical
interest, this is relevant also operationally (especially when it is signi cantly simpler to solve
the homogeneous system than the original one).17
That said, Corollary 743 allows to prove Proposition 741.
Proof of Proposition 741 By hypothesis, the system has at least one solution x. Moreover,
since (A) = (T ), by the Rank-Nullity Theorem (A) + (T ) = n. If (A) = n, we have
(T ) = 0, that is, ker T = f0g. From Corollary 743 it follows that x is the unique solution.
If, instead, (A) < n we have (T ) > 0 and therefore ker T is a non-trivial vector subspace
of Rm , with in nitely many elements. By Corollary 743, adding such elements to the solution
x we nd the in nitely many solutions of the system.
15.8.3 Summing up
Summing up, now we are able to state a general result on the resolution of linear systems
that combines the Kronecker-Capelli's Theorem and Proposition 741.
The comparison of the ranks (A) and (Ajb) with the number n of the unknowns
allows, therefore, to establish the existence and the possible uniqueness of the solutions of
the system. If the system is square, we have (A) = n if and only if (A) = (Ajb) = n
for every b 2 Rm .18 Cramer's Theorem, which was only partly comparable with Kronecker-
Capelli's Theorem, becomes a special case of the more general Theorem 744.
Example 745 Consider a homogeneous linear system Ax = 0. Since, as already observed,
the condition (A) = (Aj0) is always satis ed, the system has a unique solution (that is,
the zero vector) if and only if (A) = n, and it is undetermined if and only if (A) < n. N
O.R. It is often said that a linear system Ax = b with A 2 M (m; n)
(i) has a unique solution if m = n, i.e., there are as many equations as unknowns;
(ii) is undetermined if m < n, i.e., there are less equations than unknowns;19
(iii) is unsolvable if m > n, i.e., there are more equations than unknowns.
The idea is wrong because it might well happen that some equations are redundant:
some of them are a multiple of another or a linear combination of others (in such cases, they
would be automatically satis ed once the others are satis ed). In view of Theorem 744,
however, the claims (i) and (ii) become true provided that by m we mean the number of
non-redundant equations, that is, the rank of A: indeed, the rank counts the equations that
cannot be expressed as linear combinations of others. H
As usual, we can assume that the k rows and the k columns that determine the rank of
A are the rst ones. Let A0 be a non-singular submatrix k k of A,20 and write
2 3
A0 B
6 k k k (n k) 7
A =4 5
m n C D
(m k) k (m k) (n k)
We can then eliminate the last m k rows and give arbitrary values, say z 2 Rn k to the
last n k unknowns, obtaining in this way the system
A0 x0 = b0 Bz (15.44)
in which x0 2 Rk is the vector that contains the only k \true" unknowns and b0 2 Rk is the
vector of the rst k known terms.
The square system (15.44) satis es the hypothesis of Cramer's Theorem for every z 2
R k , so it can be solved with the Cramer's rule. If we call x
n ^0 (z) the unique solution for
each given z 2 R n k , the solutions of the original system Ax = b are
^0 (z) ; z
x 8z 2 Rn k
1 2 3 3
A0 = ; B = ; C = 5 2 ; D = [ 1] ; b0 =
2 2 6 4 2 1 2 1 2 1 1 2 1 7
so that, setting b0z = b0 Bz, the square system (15.44) becomes A0 x = b0z , that is,
(
x1 + 2x2 = 3 3z
6x1 + 4x2 = 7 2z
In other words, the procedure consisted in deleting the redundant equation and in assigning
arbitrary value z to the unknown x3 .
Since det A0 6= 0, by Cramer's Rule the in nitely many solutions are described as
2 8z 1 11 + 16z 11
x1 = = + z; x2 = = 2z; x3 = z
8 4 8 8
20
Often there is more than one possible A0 , so there is some freedom in choosing which equations to delete
and which unknowns are \ ctitious".
15.9. SOLVING SYSTEMS: CRAMER'S METHOD 525
1 11 1 + 11
First equation : 1 +z +2 2z +3 z = +0 z =3
4 8 4
1 11 6 + 22
Second equation : 6 +z +4 2z +2 z = +0 z =7
4 8 4
Alternatively, we could have noted that the second equation is the sum of the rst and third
ones and then delete the second equation rather than the third one. In this way the system
would reduce to
x1 + 2x2 + 3x3 = 3
5x1 + 2x2 x3 = 4
We can now assign arbitrary value to the rst unknown, say x1 = z~, rather than to the third
one.21 This yields the system
2x2 + 3x3 = 3 z~
2x2 x3 = 4 5~ z
that is, A00 x = b00z~ , with matrix
2 3
A00 =
2 1
and vectors x = (x2 ; x3 )T and b00z~ = (3 z~; 4 z )T . Since det A00 6= 0, Cramer's Rule
5~
expresses the in nitely many solutions as
15 16~
z 1
x1 = z~; x2 = ; x3 = + z~ z 2 Rn
8~
8 4
In the rst way we get x1 = 1=4 + z, while in the second one x1 = z~. Therefore z~ = 1=4 + z.
With such value the solutions just found,
1
x1 = z~ = +z
4
1
15 16~z 15 16 4 +z 15 4 16z 11
x2 = = = = 2z
8 8 8 8
and
1 1 1
x3 =
+ z~ = + +z =z
4 4 4
become the old ones. The two sets of solutions are the same, just written using two di erent
parameters. We invite the reader to delete the rst equation and redo the calculations. N
De nition 748 The average or mean ( in the sense of Chisini) of a vector x = (x1 ; x2 ; :::; xn ) 2
An with respect to a function f : An ! R is a scalar x 2 A such that
In words, the average x of a vector x is a scalar that can replace each component of x in the
function of interest f without altering its value. In terms of interpretation, the components of
x represent di erent amounts of some homogeneous entity { for instance, di erent amounts
of money (like the pro ts discussed above) { and the average x summarizes the di erent
amounts in a single one that, if replaced to the di erent components of x, gives the same
value of interest; i.e., f (x) = f (x). If all the n di erent branches of the rm earned pro t
x, the rm's pro t would be the same. Pro t x can be thus viewed as a typical pro t of the
branches with respect to the function of interest f for the board.
Di erent functions f , motivated by di erent aims, may result in di erent averages x of
the same vector x. Unless they are a strictly monotone transformation one of the other, as
next we show.
15.10.2 Examples
Next we present a few classic examples of averages.
In our motivating example, the sum is the function of interest for the board of directors of
a rm whose pro ts are just the sum of the pro ts of its branches (so, the rm's centralized
activities are not pro table per se).
Di erent branches, however, may end up making the same proPt. If i is the number of
branches making pro t xi , the sum of branches' pro ts becomes ni=1 i xi . So, let us take
528 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
P
f : An ! RPgiven by f (x) = ni=1 i xi , with each i > 0. Now formula (15.45)
a function P
becomes x ni=1 i = ni=1 i xi and we get the weighted arithmetic average
n
X i
x= Pn xi (15.46)
i=1 i=1 i
P
The weights i = ni=1 i are easily seen to add up to 1. In the example, each such weight
gives the proportion of the branches making pro ts xi , so the weighted arithmetic average x is
able to account for the multiplicity or frequency of the pro ts' levels made by the branches.24
Geometric average Another simple function f : An ! R, with A [0; 1), is the product
n
Y
f (x) = xi
i=1
n
Y
Formula (15.45) becomes xn = xi , so the average of a vector x = (x1 ; x2 ; :::; xn ) 2 An
i=1
with respect to this product function is the geometric average
n
!1
Y n
x= xi
i=1
For instance, consider an investor who, in each period of time, can invest an initial monetary
capital c 0 and receive at the end of next period an amount (1 + r) c, with r 0 (see
Example 295). Assume that the investor keeps investing for n periods and that in period i
the interest rate is ri , possibly di erent across periods. At the end, per euro invested the
Yn n
Y
investor earns (1 + ri ) euros or, equivalently, Ri where Ri = 1 + ri is the gross return
i=1 i=1
in period i.
n
Y
The function of interest is thus the product Ri that says how much the investor earned
i=1
per euro invested. The relevant average gross return R is then the geometric average
n
!1
Y n
R= Ri
i=1
24
We say that = ( 1 ; :::; n ) 6= 0 is a vector of frequencies if its elements are positive, i.e., if > 0. If
they add up to 1, they are called weights. So, weights are normalized frequencies.
15.10. CODA MEDIA 529
Pn n
Y
with > 0. Formula (15.45) becomes x i=1 i = xi i and we get the weighted geometric
i=1
average
n
! Pn 1
Y i=1 i
x= xi i
i=1
Note that we also get this average if we take as function of interest the logarithmic transfor-
Yn
P
mation ni=1 i log xi of xi i (cf. Proposition 749), provided that A (0; 1).
i=1
For instance, consider a racing car driver who in a lap reaches di erent speeds x1 , x2 ,..., xn
{ in kilometers per hour { in di erent parts of the circuit, which he keeps for 1 , 2 , ...,
n kilometers, respectively. The lap time, which is the quantity of interest for the driver, is
then
n
X i
xi
i=1
The average speed that keeps unchanged such time is the harmonic average (15.47).
25
It is possible to let vary also the vector of frequencies but this would require notions that readers will
learn in more advanced courses.
530 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
In the uniform case = (1=n; :::; 1=n) we drop the adjective \weighted" and we have the
classic arithmetic, geometric, and harmonic averages.
Is there a common structure across these important examples? To address this question,
given a vector of frequencies 2 Rn++ and a strictly monotone and continuous function
: A ! R, de ne a function f : An ! R by
n
X
f (x) = i (xi )
i=1
P P
Formula (15.45) becomes (x) ni=1 i = ni=1 i (xi ), so the average of a vector x 2 An
with respect to the function f is the quasi-arithmetic average
n
!
X i
1 Pn
x= (xi ) (15.48)
i=1 i=1 i
For instance, while the board of directors cares about the pro ts that the branches
generate, shareholders are more interested in the dividends that each of them is able to
generate. Assume that these dividends depend on a branch's pro ts according Pto a strictly
monotone and continuous function : A ! R, so the sum of dividends is ni=1 i (xi ).
This sum is the function of interest for shareholders and (15.48) is the average pro t relevant
for them, which might well di er from the average pro t (15.46) relevant for the board of
directors. So, before asking for an \average" it should be clari ed what we are interested in:
di erent notions of average serve di erent purposes.
Remarkably, the earlier examples of averages are all special cases of quasi-arithmetic
averages:
Quasi-arithmetic averages thus answer our previous question. They form a general class
of averages that covers most cases of interest. For instance, a further example of a quasi-
arithmetic average is the power case (x) = xk , with k 6= 0 and A [0; 1), which leads to
the power average
n
!1
X i
k
P k
x= n xi
i=1 i=1 i
15.10. CODA MEDIA 531
O.R. The invariance property established in Proposition 749 suggests that there might be
an optimization angle on averages. Indeed, let be di erentiable on an open interval A,
with 0 > 0. It is easily checked that the quasi-arithmetic averageP
(15.48) of a vector x 2 An
is a critical point of the function : A ! R de ned by (y) = ni=1 P i ( (xi ) (y))2 .26
n
In the special case (x) = x, the arithmetic average is the minimizer of i=1 (xi y)2 . This
optimization twist on averages can be relevant computationally. H
Proposition 750 If f is strongly monotone and continuous, then each vector x = (x1 ; x2 ; :::; xn ) 2
An has a unique average x 2 A with respect to f , given by
1
x=' (f (x1 ; x2 ; :::; xn )) (15.49)
Moreover, x is internal:
min xi x max xi
i=1;:::;n i=1;:::;n
Lemma 751 If f is strongly monotone and continuous, then ' 1 f is well de ned.
Proof Since f is strongly monotone and continuous, ' is continuous and either strictly
increasing or strictly decreasing. Being ' either strictly increasing or strictly decreasing, it
is enough to show that Im ' = Im f . Clearly, by the de nition of ' we have Im ' Im f .
As for the opposite inclusion, let x 2 An . De ne = mini=1;:::;n xi 2 A as well as =
maxi=1;:::;n xi 2 A. Note that 1 x 1. Since f is strongly monotone and 1; 1 2 An ,
it follows that either ' ( ) = f ( 1) f (x) f ( 1) = ' ( ) or ' ( ) = f ( 1) f (x)
f ( 1) = ' ( ). In both cases, since ' is continuous on [ ; ] A, by the Intermediate Value
Theorem there exists 2 A such that ' ( ) = f (x). It follows that f (x) 2 Im '. Since x
was arbitrarily chosen, we conclude that Im ' Im f .
Proof If f is strongly increasing, then ' and its inverse are strictly increasing (Proposition
222). In this case, the result is easily established. If f is strongly decreasing, then '
and its inverse are strictly decreasing (Proposition 222). For all x; y 2 An , if x y then
f (x) 1
f (y) and so ' (f (x)) 1
' (f (y)), while if x y then f (x) < f (y) and so
' 1 (f (x)) > ' 1 (f (y)). We conclude that ' 1 f is strongly increasing.
average of x. Vice versa, if c is also an average of x, we have that ' (x) = f (x) = ' (c).
Since ' is either strictly increasing or strictly decreasing, this implies that c = x and x is
therefore unique. As to internality, assume by contradiction that x < mini=1;:::;n xi , so that
x (x; :::; x). By Lemma 752, ' 1 f is strongly increasing. So, we reach the contradiction
x = ' 1 (f (x; :::; x)) < ' 1 (f (x)) = x. We conclude that mini=1;:::;n xi x. A similar
argument shows that x maxi=1;:::;n xi .
Formula (15.49) makes explicit the average, which in (15.45) was de ned only implicitly.
So, the strong monotonicity and continuity of the function of interest makes its average
explicit, unique, and internal (the reader can revisit the previous examples and identify the
relevant functions '). This motivates the following de nition.
De nition 753 Given a strongly monotone and continuous function f : An ! R, its aver-
age function m : An ! R is de ned by
1
m (x) = ' (f (x1 ; x2 ; :::; xn )) 8x 2 An
By the last lemma, the average function is well de ned and internal:
(i) weighted arithmetic average if and only if there exists > 0 such that
n
X i
m (x) = Pn xi
i=1 i=1 i
(ii) weighted geometric average if and only if there exists > 0 such that
n
! Pn 1
Y i=1 i
m (x) = xi i
i=1
(iii) weighted harmonic average if and only if there exists > 0 such that
n
X i
m (x) = Pn i
i=1 i=1 xi
More generally, the average function m is quasi-arithmetic if and only if there exists a
vector 2 Rn++ and a strictly monotone and continuous function : A ! R such that
n
!
X i
1 Pn
m (x) = i (xi )
i=1 i=1 i
for all x 2 An .
15.10. CODA MEDIA 533
f (x + z) = f (y + z)
We could argue and justify in a similar fashion a property of homogeneity of f . The function
of interest is then quasi-linear, a class of function that next we de ne.
f (x) = f (y) =) f ( x + z) = f ( y + z) 8z 2 Rn ; 8 ; 2R
for all x 2 Rn .
Pn
Couple of observations on this nice result. First,
Pn if we set
Pn i = i = i=1 i for some
vector > 0, we can equivalently write m (x) = i=1 ( i = i=1 i ) xi , as we did before.
Second, quasi-linearity is easily seen to be an ordinal property,28 a feature that in the
representation corresponds
P to the strict monotonicity of '. In particular, ' is the identity,
so that f (x) = ni=1 i xi , when f is such that f (c; :::; c) = c for all c 2 R, a normalization
property.29
Proof \If". Suppose that there exists a strictly increasing (decreasing) function ' : R ! R
and a vector 2 n 1 such that f (x) = ' ( x) for all x 2 Rn . The function is easily seen
27
Simplexes will be studied in Chapter 17 (see Example 774).
28
Ordinal and cardinal properties are discussed in Section 17.3.3.
29
Later in the book we aptly call \normalized" the functions having this property (Section 19.3).
534 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
as desired.
A further interesting property that f may satisfy is symmetry. In our pro t example,
symmetry says that the board does not care about which branch realized which pro t, but
only about the size of the overall pro t. For instance, if n = 2 and x = (1000; 4000) and
y = (4000; 1000), under symmetry f (x) = f (y) because the only di erence in the two vectors
is which branch earned a given pro t.
To state formally symmetry we need permutations, that is, bijections : N ! N where
N = f1; 2; :::; ng.30 Given x; y 2 Rn , write x y if there exists a permutation such that
xi = y (i) for all i = 1; 2; :::; n. In other words, y can be obtained from x by permuting
indexes.
Example 756 We have x = (1000; 4000) y = (4000; 1000). Indeed, let : f1; 2g ! f1; 2g
be the permutation given by (1) = 2 and (2) = 1, in which indexes are interchanged.
Then (y (1) ; y (2) ) = (y2 ; y1 ) = (1000; 4000) = x. N
We say that f is symmetric if
x y =) f (x) = f (y)
In other words, a symmetric f assigns the same value to all vectors that can be obtained
from another one via a permutation.
30
See Appendix B.2.
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 535
Remarkably, this result provides a foundation for the classic arithmetic average: it is the
only average function on Rn that corresponds to a function f which is strongly increasing,
quasi-linear, continuous, and symmetric. As long as these properties are compelling in our
application, we can summarize vectors via their arithmetic averages.
Proof In view of the last proposition, f is strongly increasing (decreasing), continuous, and
quasi-linear if and only if there exist a strictly increasing (decreasing) continuous function
' : R ! R and a unique vector 2 n 1 such that m (x) = x for all x 2 Rn and f = ' m.
Clearly, f is symmetric if and only if m is symmetric. Thus, it remains to prove that m is
symmetric if and only if i = 1=n for each i = 1; 2; :::; n. \If". Suppose that i = 1=n for
each i = 1; 2; :::; n. Let x; y 2 Rn be such that x y. By de nition, there is a permutation
such that xi = y (i) for all i = 1; 2;
P:::; n. Clearly,
Pn nite sums are commutative, so they are
n
invariant under permutations, i.e., i=1 y (i) = i=1 yi . Then,
n n n
1X 1X 1X
m (x) = xi = y (i) = yi = m (y)
n n n
i=1 i=1 i=1
proving that m is symmetric. \Only if". Suppose m is symmetric. Note that ei ej for all
indexes i 6= j. Indeed, it is enough to consider the permutation : N ! N de ned by
8
< j if k = i
(k) = i if k = j
:
k else
Summing up, Riesz's Theorem and its variations permit a principled approach to weighted
averages that are justi ed via the properties of the functions of interest, which are the
fundamental objects of interest { averages being de ned through them.
f ( x + y) = f (x) + f (y)
Since V is closed with respect to sums and multiplications by a scalar, we have that
x + y 2 V , and therefore this de nition is well posed and generalizes De nition 640.
V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg
This is quite an important problem, as we will see shortly, also for applications. Fortunately,
the following positive result holds.
Proof Let dim V = k n and let x1 ; :::; xk be a basis for V . If k = n, there is nothing to
prove since V = Rn . Otherwise, by Theorem 92, there are n k vectors xk+1 ; :::; xn such
that the overall set x1 ; :::; xn is a basis for Rn . Let frk+1 ; :::; rn g be an arbitrary set of
n k real numbers. By Theorem 89, note that forPeach vector x in Rn there exists a unique
collection of scalars f i gni=1 R such that x = ni=1 i xi . De ne f : Rn ! R to be such
Pk Pn
that f (x) = i=1 i f (xi ) + i=k+1 i ri . Since for each vector x the collection f i gni=1 is
unique, we have that f is well-de ned and linear (why?). Note also that
(
i
f xi for i = 1; :::; k
f x =
ri for i = k + 1; :::; n
Since x1 ; :::; xk is a basis for V , for every x 2 V there are k scalars f i gki=1 such that
P
x = ki=1 i xi . Hence,
k
! k k k
!
X X X X
i
f (x) = f ix = if xi = if xi = f ix
i
= f (x)
i=1 i=1 i=1 i=1
As one can clearly infer from the proof, such an extension is far from unique: to every
set of scalars fri gni=k+1 , a di erent extension is associated.
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 537
Example 761 Consider the previous example, with the plane V = f(x1 ; x2 ; 0) : x1 ; x2 2 Rg
of R3 and the linear function f : V ! R de ned by f (x) = x1 + x2 . By the Hahn-Banach's
Theorem, there is a linear function f : R3 ! R such that f (x) = f (x) for each x 2 V .
For example, f (x) = x1 + x2 + x3 for each x 2 R3 , but also f (x) = x1 + x2 + x3 is an
extension, for each 2 R. This con rms the multiplicity of the extensions. N
Although it may appear as a fairly innocuous result, the Hahn-Banach's Theorem is very
powerful. Let us see one of its remarkable consequences by extending Riesz's Theorem to
linear functions de ned on subspaces.31
f (x) = x 8x 2 V (15.50)
Proof We prove the \only if" since the converse is obvious. Let f : V ! R be a linear
function. By the Hahn-Banach's Theorem, there is a linear function f : Rn ! R such
that f (x) = f (x) for each x 2 V . By the Riesz's Theorem, there is a 2 Rn such that
f (x) = x for each x 2 Rn . Therefore f (x) = f (x) = x for every x 2 V , as desired.
Conceptually, the main novelty relative to this version of Riesz's Theorem is the loss of
the uniqueness of vector . Indeed, the proof shows that such a vector is determined by the
extension f whose existence is guaranteed by Hahn-Banach's Theorem. Yet, such extensions
are far from being unique, thus implying the non-uniqueness of vector .
Example 763 Going back to the previous examples, we already noted that all linear func-
tions f : R3 ! R de ned by f (x) = x1 + x2 + x3 , with 2 R, extend f to R3 . By setting
= (1; 1; ), we have that f (x) = x for every 2 R, so that
f (x) = x 8x 2 V
for every 2 R. Hence, in this example there are in nitely many vectors for which the
representation (15.50) holds. N
Theorem 764 Let V be a vector subspace of Rn . Every (strictly) increasing linear function
f : V ! R can be extended on Rn so to be (strictly) increasing and linear.
Proof We prove the statement in the particular, yet important case, in which V \ Rn++ is
not empty and f is increasing.32 We start by introducing a piece of notation which is going
to be useful.
31
In Section 24.6 we will see an important nancial application of this result.
32
In nancial applications this assumption is often satis ed (see Section 24.6). The proof of the more
general case, as well as the strictly increasing version of the result, relies on mathematical facts that the
reader will encounter in more advanced courses.
538 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
This set is not empty since it contains dim V . For, f is an extension of itself which is linear
and monotone increasing by assumption. Consider now max N . Being N not empty, max N
is well-de ned. If max N = n, then the statement is proved. Indeed, in such a case we can
conclude that there exists a linear monotone increasing extension of f whose domain is a
vector subspace of Rn with dimension n, that is, the domain is Rn itself. By contradiction,
assume instead that n = dim N < n. It means that, in looking for an extension of f
which preserves linearity and monotonicity, one can at most nd a monotone increasing
linear extension f~ : W ! R where W is a vector subspace of dimension n < n. Let
x1 ; :::; xn be a basis of W . Since n < n, we can nd at least a vector xn+1 2 Rn such
that x1 ; :::; xn ; xn+1 is still linearly independent. Fix a vector x 2 V \ Rn++ . Clearly, we
have that x 2 V W and for each z 2 Rn there exists m 2 N such that mx z mx.
Let U = x 2 W : x xn+1 and L = y 2 W : xn+1 y . Since x 2 W , both sets are not
empty. Consider now f~ (U ) and f~ (L) which are both subsets of the real line. Since f~ is
monotone increasing, it is immediate to see that each element of f~ (U ) is greater or equal than
each element of f~ (L). By the separation property of the real line, we have that there exists
c 2 R such that a c b for every a 2 f~ (U ) and for every b 2 f~ (L). Observe also that
each vector x 2 span x1 ; :::; xn ; xn+1 can be written in a unique way as x = yx + x xn+1 ,
where yx 2 W and x 2 R (why?).
De ne now f^ : span x1 ; :::; xn ; xn+1 ! R to be such that f^ (x) = f~ (yx ) + x c for
every x 2 span x1 ; :::; xn ; xn+1 . We leave to the reader to verify that f^ is indeed lin-
ear and f^ extends f . Note instead that f^ is positive, that is, f^ (x) 0 for all x 2
span x1 ; :::; xn ; xn+1 \ Rn+ . Otherwise, there would exist x 2 span x1 ; :::; xn ; xn+1 such
that x 0 and f^ (x) < 0. If x = 0, then yx = yx + x xn+1 = x 0 and this would yield
that yx 0, that is, since f~ is monotone increasing, 0 > f^ (x) = f~ (yx ) 0, a contradiction.
If x 6= 0, then xn+1 yx = x and c < f~ ( yx = x ). In other words, we have that yx = x
belongs to L, thus f~ ( yx = x ) 2 f~ (L) and c f~ ( yx = x ) > c a contradiction. Since we
just showed that f must be positive, by Proposition 650, this implies that f^ is monotone
^
increasing as well. To sum up, we just constructed a function (namely f^) which extends
f to a vector subspace which has dimension n + 1 (namely span x1 ; :::; xn ; xn+1 ), thus
max N n + 1. At the same time, our working hypothesis was that n = max N , thus
reaching a contradiction.
f (x) = x 8x 2 V
As to (i), note that the function f (x) = x1 + x2 is strongly positive, and so is f (x) =
x1 + x2 + x3 with > 0.
A nice dividend of the Hahn-Banach's Theorem is the following extension result for a ne
functions, which will be introduced momentarily in the next chapter and play a key role in
applications (cf. Chapter 42).
Proof of the Claim We start by proving that the statement is true when = 0. Let
x; y 2 C and ; 2 R be such that + = 1 as well as x + y 2 C. We have two cases
either ; 0 or at least one of the two is strictly negative. In the rst case, since + = 1,
we have that 1. Since f is a ne and = 1 , this implies that
In the second case, without loss of generality, we can assume that < 0. Since + = 1,
we have that = 1 > 1. De ne w = x + (1 ) y = x + y 2 C. De ne = 1= and
note that 2 (0; 1). Observe that x = w + (1 ) y. Since f is a ne, we have that
1 1
f (x) = f ( w + (1 ) y) = f (w) + (1 ) f (y) = f ( x + (1 ) y) + 1 f (y)
by rearranging terms, we get that (15.52) holds. We next prove that (15.51) holds. Let us
now consider the more general case, that is, x; y; z 2 C and ; ; 2 R such that + + = 1
and x + y + z 2 C. We split the proof in three cases:
f ( x + y + z) = f ( w + (1 ) z) = f (w) + (1 ) f (z)
= ( + )f x+ y + (1 ) f (z)
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)
f ( x + y + z) = f ( z + (1 ) w) = f (z) + (1 ) f (w)
= f (z) + ( + ) f x+ y
+ +
= f (x) + f (y) + (1 ) f (z)
= f (x) + f (y) + f (z)
We can now start proving the main statement. We do so by further assuming that
0 2 C and f (0) = 0. We will show that f admits a linear extension to Rn . This will
prove the statement in this particular case (why?). If C = f0g, then any linear function
extends f and so any linear function is an a ne extension of f . Assume C 6= f0g. Since
f0g = 6 C Rn there exists a linearly independent collection x1 ; :::; xk C with 1
k n. Let k be the maximum number of linearly independent vectors of C. Note that
span x1 ; :::; xk C. Otherwise, we would have that there exists a vector x in C that
does not belong to span x1 ; :::; xk . Now, observe that if we consider a collection f g [
P
f i gki=1 R of k + 1 scalars, we can say that if x + ki=1 i xi = 0, then we have two cases:
P
either 6= 0 or = 0. In the former case, we could conclude that x = ki=1 ( i = ) xi 2
span x1 ; :::; xk , a contradiction with x 62 span x1 ; :::; xk . In the latter case, we could
Pk i 1 k
conclude that i=1 i x = 0. Since the vectors x ; :::; x are linearly independent, it
follows that i = 0 for all i 2 f1; :::; kg, proving that x ; :::; xk ; x are linearly independent,
1
a contradiction with the fact that x1 ; :::; xk contains the maximum number of linearly
P
independent vectors of C. De ne f : span x1 ; :::; xk ! R by f (x) = ki=1 i f xi , where
P
f i gki=1 is the unique collection of scalars such that x = ki=1 i xi . By construction, f is
linear (why?). Next, we show it extends f . Let x 2 C. There exists a unique collection of
15.11. ULTRACODA: HAHN-BANACH ET SIMILIA 541
Pk
scalars f i gki=1 such that x = i=1
i
ix . Divide these scalars in three sets
P P
De ne = i2P i and = i2N i. We have four cases:
k
X
f (x) = if xi = 0 = f (0) = f (x)
i=1
k
!
X X X i
X
i
f (x) = if x = i P f xi = if xi
i=1 i2P i
i2P i2P i2P
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2P i2P
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2P i2P
k
!
X X X i
X
f (x) = if xi = i P f xi = if xi
i=1 i2N i
i2N i2N i2N
! !
X X
i i
= f ix = f ix + (1 ) f (0)
i2N i2N
! !
X X
i i
=f ix + (1 )0 =f ix = f (x)
i2N i2N
542 CHAPTER 15. LINEAR FUNCTIONS AND OPERATORS
De nition 767 A set C in Rn is said to be convex if, for every pair of points x; y 2 C,
x + (1 )y 2 C
x + (1 )y
which, when varies in [0; 1], represents geometrically the points of the segment
that joins x with y. A set C is convex if it contains the segment (16.1) that joins any two
543
544 CHAPTER 16. CONVEXITY AND AFFINITY
Example 768 (i) Intervals, bounded or not, are the convex sets of the real line (see Propo-
sition 30). Convex sets can, therefore, be seen as the generalization to Rn of the notion of
interval.
(ii) Neighborhoods are convex sets in Rn . To see why, take a neighborhood B" (x) =
fy 2 Rn : kx yk < "g. Let y 0 ; y 00 2 B" (x) and 2 [0; 1]. By the properties of the norm
(Proposition 108),
x y 0 + (1 ) y 00 = x + (1 )x y 0 + (1 ) y 00
= x y 0 + (1 ) x y 00
x y 0 + (1 ) x y 00 < "
Therefore, y 0 + (1 ) y 00 2 B" (x). This proves that B" (x) is a convex set. N
16.1. CONVEX SETS 545
Let us see a rst topological property of convex sets (for brevity, we omit its proof).
Proposition 769 The closure and the interior of a convex set are convex sets.
The converse does not hold: a non-convex set may happen to have a convex interior or
closure. In the real line, the set [2; 5] [ f7g is not convex but its interior (2; 5) is convex, the
set (0; 1) [ (1; 5) is not convex but its closure [0; 5] is convex. In the plane, take a square
and on a side remove a point that is not a vertex: the resulting set is not convex, yet both
its closure and its interior are convex.
Proposition 770 The intersection of any collection of convex sets is a convex set.
In contrast, a union of convex sets is not necessarily convex: the union (0; 1) [ (2; 5) is
not a convex set although both sets (0; 1) and (2; 5) are so.
The points of the segment (16.1) are convex combinations of the vectors x and y. In
k
general, given a collection xi i=1 of vectors, a linear combination
k
X
i
ix
i=1
k
is called a convex (linear ) combination of the vectors xi i=1 if the coe cients f i gki=1 are
P
weights, i.e., i 0 for each i and ki=1 i = 1. In words, weights are positive coe cients
that add up to 1. In the case n = 2, 1 + 2 = 1 implies 2 = 1 1 , hence convex
combinations of two vectors have the form x + (1 ) y, with 2 [0; 1], used in de ning
convex sets.
Via convex combinations we can de ne a basic class of convex sets.
k
De nition 771 Given a nite collection of vectors xi i=1 of Rn , the polytope that they
generate is the set
( k k
)
X X
i
P = ix : i = 1 and i 0 for every i
i=1 i=1
A polytope generated by a nite collection of vectors thus consists all possible convex
combinations that one can form with them. Clearly, polytopes are convex sets. In particular,
the polytope generated by two vectors x and y is the segment that joins them.
546 CHAPTER 16. CONVEXITY AND AFFINITY
In the plane, polytopes have simple geometric interpretations that takes us back to high
school. Given three vectors x, y and z of the plane (not aligned), the polytope
f 1x + 2y + 3z : 1; 2; 3 0 and 1 + 2 + 3 = 1g
2
x
1 y
-1
z
-2
-3 -2 -1 0 1 2 3 4 5
In general, the polytope P generated by vectors of the plane is a polygon whose vertices are
among these vectors. The polygons that we studied in high school can be regarded as the
locus of all convex combinations of their vertices.
1.5
0.5
-0.5
-1
-1.5
-2
-2 -1 0 1 2 3
1
A caveat: if, for instance, x lies on the segment that joins y and z (i.e., the vectors are linearly dependent),
the triangle generated by x, y and z reduces to that segment. In this case, the vertices are only y and z.
Similar remarks applies to general polygons, as Example 772 will momentarily show.
16.1. CONVEX SETS 547
is the polytope generated by the four vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1)g, which are its
vertices. Note that also the ve vectors f(0; 1) ; (1; 0) ; ( 1; 0) ; (0; 1) ; (1=2; 1=2)g generate
the same rhombus
1.5
0.5
-0.5
-1
-1.5
-2
-2 -1 0 1 2 3
because the added vector (1=2; 1=2) already belongs to the rhombus. All vertices of a polytope
are among the vectors that generate it, but the converse in general fails: some of these vectors
may not be, like (1=2; 1=2), vertices. Later in the chapter the notion of extreme point will
clarify this issue. N
Proposition 773 A set is convex if and only if it is closed under all convex combinations
of its own elements.
In other words, a set is convex if and only if contains all the polytopes { all the polygons,
in the plane { generated by its elements. Though they are de ned in terms of segments,
convex sets actually contain all polytopes. In symbols, C is convex if and only if
n
X
i
ix 2C
i=1
n
for every nite collection xi i=1
of vectors of C and every collection f i gni=1 of weights.
Proof The \if" is obvious because by considering the convex combinations with n = 2 we
get De nition 767. We prove the \only if". Let C be convex. We claim that C then contains
any convex combination with n of its elements. We proceed by induction on n. Initial step:
the claim is true for n = 2 since C contains, by the de nition of convexity, any convex
combination of any two of its elements. Induction step: let us assume (induction hypothesis)
that the claim is true for n 1, i.e., that C contains any convex combinations of n 1 of
its elements. We want to show that this implies the claim for n, i.e., C contains any convex
combinations of n of its elements.. We have:
n
X n
X1 n
X1
i i n i
ix = ix + nx = (1 n) xi + nx
n
1 n
i=1 i=1 i=1
548 CHAPTER 16. CONVEXITY AND AFFINITY
as desired. This completes the inductive step, so the induction argument. We conclude that
the claim is correct, i.e., C is closed under all convex combinations of its own elements.
of all their convex combinations is called standard simplex. For instance, the standard simplex
of the plane
and ( )
n
X
n 1 = x 2 Rn+ : xi = 1 Rn
i=1
Standard simplices are an important class of polytopes, as we will see later in the chapter.
De nition 775 We call convex envelope (or hull), written co A, of a subset A of Rn the
smallest convex set that contains A.
Next we show that convex envelopes are the counterpart for convex combinations of what
generated subspaces are for linear combinations (cf. Section 3.4).
Proposition 776 The convex envelope of a set is the intersection of all convex sets that
contain it.
Proof Given a set A of Rn , let fCi gi2I be the collection of all convex subsetsT containing
A, where I is a (Tnite or in nite) index set. We want to show that co A = i2I T Ci . By
Proposition 770, i2I Ci is a convex set. Since A Ci for each i, we have co A i2I Ci
since, by de nition, co A is the smallest convex subset containing A. On the other hand,
co A belongs to the collection fCi gi2I , being a T
T convex subset containing A. It follows that
C
i2I i co A and we therefore conclude that i2I Ci = co A.
The next result shows that convex envelopes can be represented through convex combi-
nations.
In other words, x 2 co A if and only if there exists a nite set xi of A and a nite
P i2I
set f i gi2I of weights such that x = i2I i xi .
Proof \If". Let x 2 Rn be convex combination of a nite set xi i2I of vectors of A. The
set co A is convex and, since xi i2I co A, Lemma 773 implies x 2 co A, as desired.
\Only if". Let C be the set of all the vectors that can be expressed as convex combinations
of vectors of A, i.e., x 2 C if there exist nite sets xi i2I A and weights f i gi2I such
2
The rest of the chapter is for coda readers.
550 CHAPTER 16. CONVEXITY AND AFFINITY
P
that x = ni=1 i xi . It is easy to see that C is a convex subset containing A. It follows that
co A C and hence each x 2 co A is a linear combination of vectors of A.
Thus, convex envelopes preserve compactness (we omit the proof). When K is a compact
subset, co K is then compact. For instance, polytopes are compact because they are the
convex envelope of a nite (so, compact) collection of vectors of Rn . Formally:
In this set, besides the vertices there is also the vector (1=2; 1=2), which is useless for the
representation of the polygon because is itself a convex combination of the vertices.3 We
therefore have a redundancy in the set A0 , while this does not happen in the set A of the
vertices, whose elements are all essential for the representation of the rhomb.
Hence, for a polygon the set of the vertices is the natural candidate to be the minimal set
that allows to represent each point of the polygon as a convex combination of its elements.
This motivates the notion of extreme point, which generalizes that of vertex of a polygon to
any convex set.
Lemma 781 A point x0 of a convex set C is extreme if and only if the set C x0 is
convex.
The next result shows that extreme points must be boundary points. No interior point
of a convex set can be an extreme point.
1 " 1 "
x= 1 x+ 1+ x
2 n 2 n
and so x 2
= ext C.
Open convex sets (like, for example, open unit balls) thus do not have extreme points.
We now present other examples in which we nd the extreme points of some convex sets.
Example 784 Consider the closed unit ball B1 (0) = fx 2 Rn : kxk 1g of Rn . We have:
In words, the set of the extreme points of the closed unit ball is given by the unit sphere,
i.e., by its skin. Though a quite intuitive result (just draw a circle), it is a bit delicate
to prove. Since @B1 (0) = fx 2 Rn : kxk = 1g, the last proposition implies the inclusion
ext B1 (0) fx 2 Rn : kxk = 1g. As to the converse inclusion, let x0 2 @B1 (0). Let x0 =
tx + (1 t) y 2 B1 (0) with x; y 2 B1 (0) and t 2 (0; 1). We want to show that x = y. We
have
where the angle is the di erence of the angles and determined by the two vectors
(Section C.3). If x 6= y, we have cos ( ) < 1, so ktx + (1 t) yk2 < 1. This contradicts
x0 2 @B1 (0), therefore x = y. We conclude that x 2 ext B, as desired. N
We are now ready to address the opening question of this section. We rst need a
preliminary \minimality" lemma that shows that ext C is included in all subsets of C whose
convex envelope is C itself.
In other words, the extreme points of the convex hull of a set A belong to A.
Proof Let x 2 ext C. We want to show that x 2 A. Since x 2Pco A, there is a collection
n
xi i=1 A and a collection fti gni=1 of weights such that x = ni=1 ti xi . Without loss of
generality, assume ti > 0 for every i. We have:
n
X
1 ti
x = t1 x + (1 t1 ) xi
1 t1
i=2
Pn i
Since C is convex, i=2 ti x = (1 t1 ) belongs to C. Then,
n
X ti
x = x1 = xi
1 t1
i=2
The next fundamental result shows that convex and compacts sets can be reconstructed
from its extreme points by taking all their convex combinations. We omit the proof.
K = co (ext K) (16.3)
In view of the previous lemma, Minkowski's Theorem answers the opening question:
ext K is the minimal set in K for which (16.3) holds. Indeed, if A K is another set for
which K = co A, then ext K A by the lemma. Summing up:
all the points of a compact and convex set K can be expressed as convex combinations
of the extreme points;
the set of the extreme points of K is the minimal set in K for which this is true.
Minkowski's Theorem stands out as the deepest and most beautiful result of this chapter.
It shows that, in a sense, convex and compact sets in Rn are generalized polytopes (cf.
Example 783) with extreme points generalizing the role of vertices. In particular, polytopes
are the convex and compact sets of Rn that have a nite number of extreme points (which
are then their vertices).
16.3 A ne sets
De nition 787 A set A in Rn is said to be a ne if x + (1 ) y 2 A for all x; y 2 A and
all 2 R.
Example 788 The solutions of a linear system form an a ne set. Formally, given a m n
matrix B and a vector b 2 Rm , the set A = fx 2 Rn : Bx = bg is a ne. Indeed, if x; y 2 A
and 2 R, then
B ( x + (1 ) y) = Bx + (1 ) By = b + (1 )b = b
and so x + (1 ) y 2 A, as desired. N
Vector subspaces are a ne, and a ne sets are convex. The converses are false: a bounded
interval is a simple example of a convex set which is not a ne; in the last example, the
solution set A is a ne but not a vector subspace unless b = 0 (so, unless the system is
homogeneous). A nity is thus an intermediate notion between convexity and linearity. It
shares with them a, readily proved, basic set stability property.
554 CHAPTER 16. CONVEXITY AND AFFINITY
The next result clari es the nature of a ne sets by showing that they are \parallel" to
vector subspaces.
A = V + z = fx + z : x 2 V g
y1 + y2
x1 + x2 = y 1 + y 2 2z = 2 z 2V
2
We conclude that V is a vector subspace. The nal part of the statement is easily proved.
Proof The \if" is contained in Example 788. We omit the proof of the converse, which uses
the last proposition.4
4
For a proof, we refer readers to Rockafellar (1970) p. 6.
16.4. AFFINE INDEPENDENCE 555
A ne sets thus correspond to solution sets of linear systems. By Corollary 791, vector
subspaces have then the form fx 2 Rn : Bx = 0g, so correspond to solution sets of homoge-
neous linear systems.5
If we de ne the linear operator T : Rn ! Rm by T (x) = Bx, we have
1
T (b) = fx 2 Rn : Bx = bg
We can thus view the last result as saying that a ne sets are the level sets of linear operators.
This angle will be useful later in the chapter.
16.4 A ne independence
P i m
Pm
A linear combination m i
i=1 i x of vectors x i=1 is a ne if i=1 i = 1. An a ne combi-
nation of two vectors x and y has the form x + (1 ) y, so a set is a ne when it contains
the a ne combinations of any two of its elements. Next we show that a ne combinations
play for a ne sets the role that linear and convex combinations played for vector spaces and
convex sets, respectively.
Proposition 794 A set is a ne if and only if it closed with respect to all a ne combinations
of its own elements.
Proof \Only if". Let A be an a ne set. By Proposition 790, there is a vector subspace V
m
and a vector z 2 A such that A = z +V . Let xi i=1 A and let f i gm i=1 be scalars such that
Pm i
Pm Pm
i=1 i = 1. Since x z 2 V for each i, it follows that i=1 i x = z + i=1 i xi z 2
i
z+V.
\If". Assume that set A is closed under a ne combinations. Let z 2 A. We must prove
that A z is a vector space. Given x1 ; x2 2 A and 1 ; 2 2 R, we have
1 2 1 2
1 (x z) + 2 (x z) + z = 1x + 2x + (1 1 + 2) z 2A
Unlike linear independence, where the coe cients are unconstrained, here they are re-
quired to add up to 0. So, it is a weaker notion of independence. Next we relate the two
notions.
m
Proposition 796 A set of m vectors xi i=1
is a nely independent if and only if the m 1
m 1
vectors xi xm i=1 are linearly independent.6
5
Corollary 743 is an early version of Proposition 790 in the context of linear systems.
6
As the proof clari es, the choice of xk is arbitrary: any other of the k vectors could have been chosen.
556 CHAPTER 16. CONVEXITY AND AFFINITY
This result implies, inter alia, that there are at most n + 1 a nely independent vectors
in Rn .
m Pm 1
Proof \Only if". Let xi i=1
be a nely independent. Set i=1 i xi xm = 0. Then,
m
!
X1 m
X1
m i
i x + ix =0
i=1 i=1
m 1
which implies 1 = = m 1 = 0, so the vectors xi xm are linearly independent.
i=1
m 1 P
\If".PSuppose that the vectors xi xm i=1 are linearly independent. Let m i
i=1 i x =
0 with m i=1 i = 0. Then,
m
!
X1 m
X1 m
X1 m
X1
m i
0= mx + ix = i xm + i
ix = i xi xm
i=1 i=1 i=1 i=1
m
Hence, 1 = 2 = = m = 0, so the vectors xi i=1
are a nely independent.
In view of Lemma 789, the a ne envelope of a set B is easily seen to be the intersection
of all a ne subspaces that contain B. It is also readily checked to be the collection of all
a ne combinations of elements of B.
Example 798 (i) The a ne envelope of the standard simplex 1 is the a ne set f( ; 1 ): 2
Geometrically, 1 is the segment that joins the versors e1 and e2 , while a 1 is the line
that passes through them.
(ii) The a ne envelope of the standard simplex 2 is the a ne set f( 1 ; 2 ; 1 1 2) : 1; 2
1 2 3
Geometrically, 2 is the equilateral triangle with vertices the versors e , e and e , while
a 2 is the plane that passes through them. N
x1 ; x2 ; :::; xm
Proof We prove the \only if" and leave the converse to the reader. Let B = P
be a basis. Clearly, any x 2 a B is representable as an a ne combination m i
i=1 i x . Let
us shows that the coe cient i are uniquely determined. Suppose that
m
X m
X
i 0 i
x= ix = ix
i=1 i=1
Pm Pm 0
with i=1 i =1= i=1 i. This implies
0
1 1 x1 + 2
0
2 x2 + + m
0
m xm = 0
and
0 0 0
1 1 + 2 2 + + m m =0
0
Since the vectors are a nely independent, we have i = i for i = 1; :::; m. So, the repre-
sentation is unique.
16.5 Simplices
An important class of polytopes is de ned through a ne independence.
Example 802 (i) Triangles are the simplices of the plane. For instance, the right triangle
558 CHAPTER 16. CONVEXITY AND AFFINITY
It generalizes to Rnthe previous right triangle. Even more in general, we can replace 0
with any vector y 2 Rn by considering the simplex P = co B generated by the a nely
independent vectors B = y + e1 ; :::; y + en ; y of Rn . To check the a ne independence of
these vectors, let
n+1
X Xn
i
i = 0 and i y + e + n+1 y = 0 (16.4)
i=1 i=1
Minkowski's Theorem ensures that the elements of a convex and compact set can be rep-
resented as a convex combination of its extreme points but, in general, this representation is
not unique: for example, the origin 0 in the closed unit ball in Rn can be expressed in di erent
ways as a convex combination of the ball's extreme points (cf. Example 784). Remarkably,
simplices are an important class of convex compact sets for which the representation turns
out to be unique, as the next important result shows.
Theorem 803 (Choquet-Meyer) For a compact convex set K of Rn , the following con-
ditions are equivalent:
(i) K is a simplex;
(iv) each element of K has a unique representation as a convex combination of its extreme
points.
16.5. SIMPLICES 559
A strong version of Minkowski's Theorem thus holds for simplices.7 The equivalence of
(ii) and (iii) shows that when the set ext K is an a nely independent, it is automatically
maximally so.
Inspection of the proof shows that for a simplex P = co x1 ; x2 ; :::; xm it holds ext P =
x1 ; x2 ; :::; xm
. Thus, each element x of a simplex P can be uniquely written as a convex
combination
Xm
x= i xi
i=1
7
It is named after the two mathematicians who developed the far-reaching consequences of this result, as
readers may learn in more advanced courses.
560 CHAPTER 16. CONVEXITY AND AFFINITY
of the extreme points of P , i.e., of its vertices. The unique weights i are the positive
barycentric coordinates of x in P .
In sum, for a simplex the set of its extreme points forms a meaningful notion of convex
basis, conceptually analogous to the notion of basis of vector subspaces and of a ne sets.
Since no similar analog of the notion of basis exists for general convex sets, this remarkable
property is peculiar to simplices among convex sets.
16.6 Dimension
We begin with a non-trivial sharpening of the \if" part of Proposition 790 that shows that
the choice of z 2 A is immaterial and that there is a unique vector subspace parallel to an
a ne set.
A=V +z
Proof We rst prove uniqueness. Let V1 and V2 be two subspaces such that V1 + z = V2 + z
for some z 2 A. Let x1 2 V1 . Then, there exists x2 2 V2 such that x1 + z = x2 + z. In turn,
this implies x1 = x2 and so x1 2 V2 . Thus, V1 V2 . By interchanging the roles of V1 and
V2 , we also have V2 V1 . We conclude that V1 = V2 .
Next we prove that A A is a vector subspace. Clearly, 0 2 A A. By Corollary 791,
it is enough to prove that A A is an a ne set. Let x; y 2 A A and 2 R. There exist
x1 ; x2 ; y 1 ; y 2 2 A such that x = x1 x2 and y = y 1 y 2 . Hence,
x+(1 )y = x1 x2 +(1 ) y1 y 2 = x1 + (1 ) y1 x2 + (1 ) y2 2 A A
| {z } | {z }
2A 2A
A A = f (x y) : 2 Rg
A=f 1x + 2y + (1 1 2) : 1; 2 2 Rg
be the plane that passes through them. It is easy to check that the set A is a ne, with
A A=f 1 (x z) + 2 (y z) : 1; 2 2 Rg
As for vector subspaces, also for a ne sets the notion of dimension and independence
are closely related.
Proof In view of Proposition 796, it is enough to prove that a x1 ; :::; xm = A if and only
m 1 m 1
if span xi xm i=1 = A A. So, assume that span xi xm i=1 = A A. Let x 2 A.
We want to show that x 2 a x1 ; :::; xm . By Proposition 805, A = (A A) + xm and so
there exist scalars f i gm 1
i=1 such that
m
!
X1 m
X1 m
X1
i
x= i (x xm ) + xm = i
ix + 1 i x
m
that dim A = m. Suppose, by contradiction, that dim A = p > m. Then, there exists a basis
of A A with p elements and so, by Lemma 808, an a ne basis of A with p + 1 > m + 1
elements, a contradiction.
A ne sets of dimension 1 are called lines and those of dimension 2 planes (cf. Example
806). Next we introduce the a ne sets of dimension n 1. To this end, de ne an hyperplane
H in Rn as the set of points x 2 Rn that satisfy the condition a x = b for some 0 6= a 2 Rn
and b 2 R. That is,
H = fx 2 Rn : a x = bg
In view of Riesz's Theorem, hyperplanes are the level curves of linear functions, that is, they
have the form f 1 (b) where f : Rn ! R is a linear function.
Hyperplanes are easily seen to be a ne sets. They are actually the a ne sets of dimension
n 1.
Earlier in the chapter we learned that a ne sets are level curves of linear operators
T : Rn ! Rm (Proposition 793). This result shows that, in particular, those of dimension
n 1 correspond to the level sets of linear functions f : Rn ! R.
The proof relies on the following lemma.
x1 = x+ 1 x0
f (x) f (x)
f (x) f (x)
x= x1 + 1 x0
De nition 811 The dimension of a convex set C of Rn , written dim C, is the dimension
of its a ne envelope a C.
9
For a proof, we refer readers to Rockafellar (1970) p. 5.
16.6. DIMENSION 563
In view of Example 806, dim 1 = 1 and dim 2 = 2. More in general, the dimension
of a simplex P = co x1 ; x2 ; :::; xm is m 1 because, by the Choquet-Meyer Theorem,
x1 ; x2 ; :::; xm is a maximal a ne independent set in P and so, by Proposition 807, dim P =
m 1.
In the important case when Pndim C = n we say that C has full dimension. For instance,
n
the simplex P = x 2 R+ : i=1 xi 1 has dimension n and so is full-dimensional (cf.
Example 802).
Earlier in the book (Section 5.3.2), we observed that the straight line
A = (x1 ; x2 ) 2 R2 : x1 + x2 = 1
has an empty interior, as the following gure indicates:
4
x
2
3
2 2
0
-1 O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
The set A is a ne and has dimension 1, so smaller than the dimension n of the space
Rn . This simple example suggests that non-empty convex sets with empty interior have
dimension < n. Next we show that, indeed, this is the case, thus proving an important
topological consequence of dimension.
Proposition 812 A convex subset in Rn has a non-empty interior if and only if it has full
dimension.
Proof \If". Let C be a convex subset in Rn . We prove the contrapositive: if int C = ;, then
dim C < n. To this end, it su ces to prove there is an a ne space A C with dim A < n.
We rst prove that there are no n + 1 a nely independent vectors x1 ; x2 ; :::; xn+1 in C.
Suppose, per contra, that there exist such vectors. Then, = co x1 ; x2 ; :::; xn+1 C. In
view of Proposition 800, consider the barycentric coordinates x $ ( 1 ; 2 ; :::; n+1 ). To the
uniform barycentric coordinates (1= (n + 1) ; :::; 1= (n + 1)) corresponds the vector
1
x= x1 + + xn+1 2 C
n+1
Consequently, there is a neighborhood U of x in Rn small enough so that all its elements
have positive barycentric coordinates. Hence, U C and so C has non-empty interior,
a contradiction.
564 CHAPTER 16. CONVEXITY AND AFFINITY
So, there are at most n a nely independent vectors in C. Let m < n+1 be the maximum
number of a nely independent vectors in C and denote them by x1 ; x2 ; :::; xm . If x is any
vector of C, the linear system
1 2 m
1x + 2x + + mx + v = 0
1 + 2 + + m + = 0
Concave functions
17.1 Generalities
A convex set can represent, for example, a collection of bundles on which a utility function
is de ned, or a collection of inputs on which a production function is de ned. The convexity
of the sets allows us to combine bundles or inputs. It then becomes important to study
how the functions de ned on such sets, be they utility or production functions, behave with
respect to these combinations.
For this reason, concave and convex functions are extremely important in economics. We
have already introduced them in Section 6.4.5 for scalar functions de ned on intervals of R.
The following de nition holds for any function de ned on a convex set C of Rn .
The geometric interpretation is the same as the one seen in the scalar case: a function is
concave if the chord that joins any two points (x; f (x)) and (y; f (y)) of its graph lies below
the graph of the function, while it is convex if the opposite happens, that is, if this chord
lies above the graph of the function.
565
566 CHAPTER 17. CONCAVE FUNCTIONS
14 14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
-2 x O y -2 x O y
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5
So, the following gure of a concave function should clarify its geometric interpretation:
for every x; y 2 R and every 2 [0; 1]. More generally, the norm k k : Rn ! R is a convex
function. Indeed,
Note that a function f is convex if and only if f is concave: through this simple duality,
the properties of convex functions can be easily obtained from those of concave functions.
Accordingly, we will consider only the properties of concave functions, leaving to the reader
the simple deduction of the corresponding properties of convex functions.
N.B. The natural domains of concave (convex) functions are convex sets to ensure that the
images f ( x + (1 ) y) are well de ned for all 2 [0; 1]. For this reason, throughout the
chapter we assume, often without mentioning it, that concave (and convex) functions are
de ned on a convex set C of Rn . O
for every x; y 2 C, with x 6= y, and every 2 (0; 1). For this important class of concave
functions, the inequality (17.1) is thus required to be strict. This implies that the graph of
a strictly concave function has no linear parts. In a dual way, a function f : C Rn ! R is
strictly convex if
f ( x + (1 ) y) < f (x) + (1 ) f (y)
for every x; y 2 C, with x 6= y, and every 2 (0; 1). In particular, a function is strictly
convex if and only if f is strictly concave.
We give now some examples of concave and convex functions. To verify these properties
using their de nition is often not easy. For this reason we invite readers to resort to their
geometric intuition for these examples and to wait to see later in the book some su cient
conditions based on di erential calculus that greatly simplify the veri cation (Chapter 31).
p
Example 815 (i) The square root function f : [0; 1) ! R given by f (x) = x is strictly
concave.
(ii) The logarithmic function f : (0; 1) ! R given by f (x) = log x is strictly concave.
(iii) The quadratic function f : R ! R given by f (x) = x2 is strictly convex.
(iv) The cubic function f : R ! R given by f (x) = x3 is neither concave nor convex.
However, it is strictly concave on the interval ( 1; 0] and strictly convex on the interval
[0; 1).
(v) The function f : R ! R given by
(
x if x 1
f (x) =
1 if x > 1
568 CHAPTER 17. CONCAVE FUNCTIONS
Example 816 (i) The function f : R2 ! R given by f (x) = x21 + x22 is strictly convex.
(ii) The Cobb-Douglas function (Example 187) is concave, as it will be seen in Corollary
880. N
f (x) = min xi
i=1;:::;n
because in minimizing separately x and y we have more degrees of freedom than in minimizing
them jointly, i.e., their sum. It then follows that, if x; y 2 Rn and 2 [0; 1], we have
f ( x + (1 ) y) = min ( xi + (1 ) yi )
i=1;:::;n
1 1 1 1 1 1 1 1
f x+ 0 =f x = min xi = min xi = f (x) = f (x) + f (0)
2 2 2 i=1;:::;n 2 2 i=1;:::;n 2 2 2
In consumer theory, u (x) = mini=1;:::;n xi is the Leontief utility function (Example 233). N
17.1. GENERALITIES 569
Example 818 Given a convex function f : (0; 1) ! R, the function g : R2++ ! R de ned
by
x2
g (x1 ; x2 ) = x1 f
x1
is convex. Indeed, let x; y 2 R2++ and 2 [0; 1]. We have,
x2 + (1 ) y2
g ( x + (1 ) y) = ( x1 + (1 ) y1 ) f
x1 + (1 ) y1
!
x1 xx12 + (1 ) y1 yy21
= ( x1 + (1 ) y1 ) f
x1 + (1 ) y1
x1 x2 (1 ) y1 y2
= ( x1 + (1 ) y1 ) f +
x1 + (1 ) y1 x1 x1 + (1 ) y1 y1
x1 x2 (1 ) y1 y2
( x1 + (1 ) y1 ) f + f
x1 + (1 ) y1 x1 x1 + (1 ) y1 y1
x2 y2
= x1 f + (1 ) y1 f = g (x) + (1 ) g (y)
x1 y1
as desired. Note that the inequality step holds because f is convex and
x1 (1 ) y1
+ =1
x1 + (1 ) y1 x1 + (1 ) y1
If f is strictly convex, this inequality is actually strict and so the previous argument shows
that in this case g is strictly convex. For instance, if we consider the strictly convex function
f (x) = log x, we have
x2 x1
g (x1 ; x2 ) = x1 log = x1 log
x1 x2
So, the function g : R2++ ! R de ned by
x1
g (x1 ; x2 ) = x1 log
x2
is strictly convex, a noteworthy nding. N
Since inequalities (17.1) and (17.2) are weak, a function may be at the same time concave
and convex. The next de nition covers this important case.
f ( x + (1 ) y) = f (x) + (1 ) f (y)
f ( x + (1 ) y) = f (x) + (1 ) f (y)
for every x; y 2 C and every 2 [0; 1]. The notion of a ne function is closely related to
that of linear function.
570 CHAPTER 17. CONCAVE FUNCTIONS
where 2 Rn and q 2 R. For example, when = (3; 4) and q = 2 we have the a ne function
f : R2 ! R given by f (x) = 3x1 + 4x2 + 2. In the scalar case, we get
f (x) = mx + q (17.6)
with m 2 R.1 A ne functions of a single variable have, therefore, a well-known form: they
are the straight lines with slope m and intercept q. In particular, this con rms that the
linear functions of a single variable are the straight lines passing through the origin. Indeed,
for them it holds f (0) = q = 0.
Proof In view of Theorem 766, it is enough to prove the result for C = Rn . \If". Let
x; y 2 Rn and 2 [0; 1]. We have
f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q + (1 )q
= (l (x) + q) + (1 ) (l (y) + q)
\Only if". Let f : Rn ! R be a ne and set l (x) = f (x) f (0) for every x 2 Rn . Setting
q = f (0), we have to show that l is linear. We start by showing that
l ( x) = l (x) 8x 2 Rn ; 8 2 R (17.7)
Let now > 1. Setting y = x, by what has just been proved we have
y 1
l (x) = l = l (y)
1
We use in the scalar case the more common letter m in place of .
17.1. GENERALITIES 571
1 1 1 1
0 = l (0) = l x x =f x x f (0)
2 2 2 2
1 1 1 1 1 1
= f (x) + f ( x) f (0) f (0) = l (x) + l ( x)
2 2 2 2 2 2
so that l ( x) = l (x). Hence, if < 0 then
All this proves that (17.7) holds. In view of Proposition 644, to complete the proof of the
linearity of l we have to show that
We have
x+y x y x y
l (x + y) = 2l = 2l + =2 f + f (0)
2 2 2 2 2
1 1 1 1
=2 f (x) + f (y) f (0) f (0) = l (x) + l (y)
2 2 2 2
as desired.
The de nition of concavity requires the entire chord joining any two points of its graph
to lie below the graph of the function. Remarkably, for continuous functions it is enough
that there exists just a single point in this chord that lies below the graph of the function.
Under continuity, concavity thus takes a much less demanding appearance. This is proved
in the next theorem (cf. Hardy et al., 1934, p. 73).
of the chord joining the points (x; f (x)) and (y; f (y)) of the graph of f lies below the graph
of f . The weight x;y is allowed to depend on x and y, so it can vary as we consider di erent
pairs of points x and y.2
Proof We prove the \if", the non-trivial direction. Suppose, by contradiction, that f is not
concave. So, there exist x; y 2 C and 2 (0; 1) such that
'( 1 + (1 ) 2) = f (( 1 + (1 ) 2) x + (1 ( 1 + (1 ) 2 )) y)
= f (( 1 + (1 ) 2) x + ( + (1 ) ( 1 + (1 ) 2 )) y)
= f (( 1 + (1 ) 2) x + ( (1 1) + (1 ) (1 2 )) y)
= f ( z1 + (1 ) z2 ) f (z1 ) + (1 ) f (z2 ) = ' ( 1) + (1 )'( 2
Since 1 and 2 were arbitrarily chosen, we conclude that for each 1; 2 2 [0; 1] there is
1; 2 2 (0; 1) such that
De ne : [0; 1] ! R by
( ) = ' ( ) (' (1) ' (0)) ' (0) < ' (1)+(1 ) ' (0) (' (1) ' (0)) ' (0) = 0
Moreover, it is easy to see that (17.12) continues to hold, i.e., for each 1; 2 2 [0; 1] there
exists 1 ; 2 2 (0; 1) such that
( 1; 2 1 + (1 1; 2 ) 2) 1; 2 ( 1) + (1 1; 2 ) ( 2)
These sets are compact because is continuous and are non-empty because 0 2 A and 1 2 B.
Thus, there exist
a = max A and b = min B
with 0 a< <b 1. Since (a) = (b) = 0, we have
A special case of this theorem, involving the chord midpoint ((x + y) =2; (f (x) + f (y)) =2),
is noteworthy. Here the weight x;y is kept xed and equal to 1=2.
17.2. PROPERTIES 573
1 1 1 1
f x+ y f (x) + f (y)
2 2 2 2
for all x; y 2 C.
In words, a continuous function is a ne if and only if in each chord joining any two
points of its graph there is at least a point that lies exactly on the graph of the function,
i.e., that touches it.
17.2 Properties
17.2.1 Concave functions and convex sets
There exists a simple characterization of concave functions f : C Rn ! R that uses convex
sets. Namely, consider the set
called the hypograph of f , formed by the points (x; y) 2 Rn+1 that lie below the graph of the
function.3 Graphically, the hypograph of a function is:
6
y
1
O x
0
0 1 2 3 4 5 6
3
Recall that the graph is given by Gr f = f(x; y) 2 C R : f (x) = yg Rn+1
574 CHAPTER 17. CONCAVE FUNCTIONS
The next result shows that the concavity of f is equivalent to the convexity of its hypo-
graph.
Proof Let f be concave, and let (x; y) ; (y; z) 2 hypo f . By de nition, y f (x) and
z f (y). It follows that
t + (1 )z f (x) + (1 ) f (y) f ( x + (1 ) y)
that is,
f (x) + (1 ) f (y) f ( x + (1 ) y)
as desired.
It is easy to check that the dual result for a convex function f features the convexity of
its epigraph epi f = f(x; y) 2 C R : f (x) yg, i.e., the collection of points (x; y) 2 Rn+1
that lie above its graph.
Earlier in the book (Section 6.3.1) we have de ned the level curves of a function f : C
Rn ! R as the preimages
1
f (k) = fx 2 C : f (x) = kg
fx 2 C : f (x) kg
are called upper contour (or superlevel ) sets, denoted by (f k), while the sets
fx 2 C : f (x) kg
are called lower contour (or sublevel) sets, denoted by (f k). Clearly,
1
f (k) = (f k) \ (f k) (17.14)
5
y
4
2 y=k
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
For instance,
(f 2) = ( 1; 2] [ [2; 1) ; (f 1) = [ 1; 1] ; (f 1) = R ; (f 1) = ;
In economics upper contour sets appear already in the rst lectures of a course in mi-
croeconomics principles. Indeed, for a utility function u : Rn+ ! R, the upper contour
set
(u k) = x 2 Rn+ : u (x) k
Proposition 826 If f : C Rn ! R is concave, then all its upper contour sets (f k) are
convex.
4
This notion will be made rigorous later in the book (cf. Section 34.3).
576 CHAPTER 17. CONCAVE FUNCTIONS
f ( x + (1 ) y) f (x) + (1 ) f (y) k + (1 )k = k
We have thus shown that the usual form of the indi erence curves is implied by the
concavity of the utility functions. That is, more rigorously, we have shown that concave
functions have convex upper contour sets. The converse is not true! Take for example any
strictly increasing function f : R ! R: we have
1
(f k) = f (k) ; +1
for all k 2 R. All the upper contour sets are therefore intervals, so convex sets, although in
general strictly increasing functions might well be not concave.5
Example 827 The cubic function f : R ! R given by f (x) = x3 is strictly increasing but
1
not concave. For each k 2 R, we have (f k) = [k 3 ; +1) and so all the upper contour sets
of the cubic function are intervals. N
In sum, concavity is a su cient, but not necessary, condition for the convexity of the
upper contour sets: there exist non-concave functions, like the cubic function, with convex
upper contour sets. In particular, this means that the concavity of utility functions is a
su cient, but not necessary, condition for the \convexity" of the indi erence curves. At this
point it is natural to ask what is the class of utility functions, larger than that of the concave
ones, characterized by having \convex" indi erence curves. More formally, which class of
functions is characterized by having convex upper contour sets. Section 17.3 will answer this
question by introducing quasi-concavity.
The dual version of the last result holds for convex functions, in which the lower contour
sets (f k) are convex. If f is a ne, it then follows by (17.14) that the level sets (f = k)
are convex, being the intersection of convex sets. But, much more can be said for the level
sets of a ne functions besides their convexity.
f ( x + (1 ) y) = f (x) + (1 ) f (y)
Proof Consider the \only if", the converse being trivial. If f is a ne, it can be written as
f (x) = l (x) + q for every x 2 Rn (Proposition 820). This implies that, for all 2 R and all
x; y 2 Rn ,
f ( x + (1 ) y) = l ( x + (1 ) y) + q = l (x) + (1 ) l (y) + q
= l (x) + (1 ) l (y) + q + (1 ) q = f (x) + (1 ) f (y)
as desired.
Although concavity is de ned via convex combinations involving only two elements, next we
show that it actually holds for all convex combinations.
for every nite collection fxi gni=1 of vectors of C and every collection f i gni=1 of weights.
The inequality (17.15) is known as Jensen's inequality and is very important in applica-
tions.6 A dual version, with , holds for convex functions, while for a ne functions we have
a Jensen equality
n
! n
X X
f x
i i = i f (xi )
i=1 i=1
So, a ne functions preserve all a ne combinations, be they with two or more elements.
Proof \If". It is obvious because for n = 2 the Jensen inequality reduces to the de nition
of concavity.
\Only if". Let f be concave. We want to show that the Jensen inequality holds. We
proceed by induction on n. Initial step: the inequality (17.15) trivially holds for n = 2
because f is concave. Induction step: suppose that the Jensen inequality holds for n 1
Pn 1 Pn 1
(induction hypothesis), i.e., f i=1 i xi i=1 i f (xi ) for every convex combination
of n 1 elements of C. We want to show that it holds for convex combinations of n elements
6
The inequality is named after Johan Jensen, who introduced concave functions in 1906.
578 CHAPTER 17. CONCAVE FUNCTIONS
as desired. Here the rst inequality follows from the concavity of f and the second one from
the induction hypothesis.
It is easy to see that for strictly concave and convex functions we have strict versions of
the Jensen's inequality. This and much more is illustrated in the next example.
that is, !
n
X n
X Pn
xi1 xi1
xi1 log > xi1 log Pi=1
n
xi2 i=1 xi2
i=1 i=1
This remarkable inequality is called log-sum inequality and plays a key role in some important
applications.
(ii) Given a convex function f : (0; 1) ! R, the log-sum inequality takes the general
form ! Pn
Xn Xn
xi2 xi2
xi1 f xi1 f Pi=1 n
xi1 i=1 xi1
i=1 i=1
17.2. PROPERTIES 579
x2
g (x1 ; x2 ) = x1 f (17.17)
x1
as the reader can check (recall Example 818). The inequality becomes strict when f is
strictly convex. The function g is called perspective function. Note that here we have the
ratios xi2 =xi1 while in the log case we have their reciprocals { indeed, (17.16) is the special
case of (17.17) that corresponds to f (t) = log t. N
Concave functions are very well behaved, in particular they have remarkable continuity
properties.
Theorem 833 A concave function is continuous at every interior point of its domain.
Then f is concave on the entire domain [ 1; 1] and is discontinuous at 0 and 1, i.e., at the
boundary points of the domain. In accordance with the last theorem, f is continuous on
(0; 1), the interior of its domain [0; 1]. N
Proof of Theorem 833 We prove the result for scalar functions. Let f be a concave
function de ned on an interval C of the real line. We will show that f is continuous in every
closed interval [a; b] included in the interior of C: this will imply the continuity of f on the
interior of C.
So, let [a; b] int C. Let m be the smallest between the two values f (a) and f (b); for
every x = a + (1 ) b, with 0 1, that is, for every x 2 [a; b], one has
a b b a
t
2 2
580 CHAPTER 17. CONCAVE FUNCTIONS
For instance, concave functions f : Rn ! R de ned on the entire space Rn are continuous.
This is the case, for instance, for the norm k k : Rn ! R, whose continuity { proved in
Proposition 560 { now also follows from its convexity (Example 814).
If we strengthen the hypothesis on f we can weaken that on its domain, as the next
interesting result shows.
When the inequality in (17.18) is strict, with 2 (0; 1) and x 6= y, the function f is said
to be strictly quasi-concave. Similarly, when the inequality in (17.19) is strict, with 2 (0; 1)
and x 6= y, the function f is said to be strictly quasi-convex.
Finally, a function f is said to be quasi-a ne if it is both quasi-concave and quasi-convex,
that is,
min ff (x) ; f (y)g f ( x + (1 ) y) max ff (x) ; f (y)g (17.20)
for every x; y 2 C and every 2 [0; 1].
while convex functions are quasi-convex. In particular, a ne functions are quasi-a ne. The
converses of these implications are false, as the following example shows.
582 CHAPTER 17. CONCAVE FUNCTIONS
Example 838 Monotone scalar functions (e.g., the cubic) are quasi-a ne. Indeed, let f :
C R ! R be increasing on the interval C and let x; y 2 C and 2 [0; 1], with x y. Then,
x x + (1 ) y y and the increasing monotonicity implies f (x) f ( x + (1 ) y)
f (y), that is, (17.20) holds. A similar argument applies when f is decreasing. The following
gure illustrates:
8
y
6
4
y=k
2
O x
-2
-4
-4 -3 -2 -1 0 1 2 3 4
This example shows that, unlike concave functions, quasi-concave functions may be quite
irregular. For instance, they might well be discontinuous at interior points of their domain
(just take any discontinuous monotone scalar functions). N
while strictly convex functions are strictly quasi-convex. The converses of these implications
are false. In particular, note that a quasi-concave function can be strictly convex { for
example, the exponential f (x) = ex . The terminology must, therefore, be taken cum grano
salis.
A dual version of this result holds for quasi-convex functions with lower contour sets in
place of the upper contour ones.
Proof Let f be quasi-concave. Given a non-empty (otherwise the result is trivial) upper
contour set (f k), let x; y 2 (f k) and 2 [0; 1]. We have
Quasi-concave functions are thus characterized by the convexity of their upper contour
sets. So, quasi-concavity is the weakening of the notion of concavity that answers the opening
question.
It is a weakening that, however, also brings some bad news. We already remarked in the
last example that quasi-concave functions are, in general, much less regular than concave
functions. In a similar vein, additivity preserves concavity (Proposition 832) but not quasi-
concavity, as next we show.
1 1 1 1
h x+ y =h = < 0 = h (x) = h (y)
2 2 2 8
Interestingly, in the scalar continuous case we can fully characterize quasi-concave func-
tions.
Proof Let x = min(arg max[a;b] f ), i.e., x is the smallest maximizer of f . By the Weierstrass
Theorem, arg max[a;b] f is compact and non-empty, so x is well de ned. We divide the proof
in three steps.
Step 1: x = a. Let x; y 2 [a; b] be such that x y. Since f is quasi-concave and f (x) f (x),
it follows that (f f (x)) is a convex set and x belongs to it. Since x y x, this implies
that y 2 [a; x] (f f (x)). Thus, f (y) f (x). Since x and y were arbitrarily chosen, it
follows that f is decreasing.
Step 2: x = b. Let x; y 2 [a; b] be such that x y. Since f is quasi-concave and f (x) f (y),
it follows that (f f (y)) is a convex set and x belongs to it. Since x x y, it follows
that x 2 [y; b] (f f (y)). Thus, f (x) f (y). Since x and y were arbitrarily chosen, it
follows that f is increasing.
Step 3: x 2 (a; b). De ne I^ = [a; x] and I~ = [x; b]. Denote f restricted to I^ by f^ and
f restricted to I~ by f~. Both restrictions are continuous and quasi-concave. In particu-
lar, x continues to be the smallest maximizer for both of them, i.e., min(arg maxI^ f^) =
x = min(arg maxI~ f~). In view of steps 1 and 2, we conclude that f^ is increasing and f~ is
decreasing. This proves the statement.
for every nite collection fxi gni=1 of vectors of C and every collection f i gni=1 of weights.
The simple induction proof and the dual version for quasi-convex functions are left to
the reader.
So, the level curves are, trivially, convex. Yet, f is neither quasi-concave nor quasi-convex,
a fortiori not quasi-a ne.
In utility theory this observation shows that a su cient, but not necessary, condition for a
utility function u to have convex (in a proper sense!) indi erence curves is to be quasi-a ne.
Recall that previously we talked about convexity in an improper sense { so, within quotes
\." { of the indi erence curves, meaning by this the convexity of the upper contour sets
(u k). Although improper, this is a common terminology. In a proper sense, the convexity
of the indi erence curves is the convexity of the level curves (u = k). Thanks to Proposition
839, the improper convexity of the indi erence curves characterizes quasi-concave utility
functions, while their proper convexity is satis ed by quasi-a ne utility functions (without
being, however, a characterizing property of a nity).
Proof (i) Let x; y 2 C and 2 [0; 1]. Thanks to the properties of the functions f and g, we
have
as desired.
(ii) Again, let x; y 2 C and 2 [0; 1]. Now we have
as desired.
Example 845 (i) Given any strictly positive concave function g : C Rn ! (0; 1), by
Proposition 844-(i) its transformation log f is concave. (ii) Consider a version of the Cobb-
Douglas h : Rn+ ! R given by
Yn
h (x) = xi i
i=1
Pn
with the exponents i > 0 (we do not require i=1 i = 1). We have
Pn
log xi
h (x) = e i=1 i
8x 2 Rn++
586 CHAPTER 17. CONCAVE FUNCTIONS
P
Since the function g (x) = ni=1 i log xi is easily seen to be concave on Rn++ and f (x) = ex
is increasing, by Proposition 844-(ii) we conclude that h = f g is quasi-concave on Rn++ .
In turn, this easily implies that h is quasi-concave on the entire orthant Rn+ (why?). N
Between (i) and (ii) there is an important di erence: concavity is preserved by the mono-
tone transformation f g if f is both increasing and concave, while increasing monotonicity
is su cient to preserve quasi-concavity. In other terms, quasi-concavity is preserved by
monotone (increasing) transformations, while this is not true for concavity. For example, if
p
f; g : [0; 1) ! R are g (x) = x and f (x) = x4 , the composite function f g : [0; 1) ! R is
the quasi-concave and strictly convex function x2 .7 So, with f increasing but not concave,
the concavity of g only implies the quasi-concavity of f g, nothing more.
This di erence between (i) and (ii) is important in utility theory. A property of the
utility functions that is preserved for strictly increasing monotone transformations is called
ordinal, while a property that is preserved only for strictly increasing a ne transformations
{ that is, for f (x) = x + with > 0 and 2 R { is called cardinal. Naturally, an ordinal
property is also cardinal, while the converse is false. Thanks to Proposition 844, we can thus
say that quasi-concavity is an ordinal property, while concavity is only cardinal.
The distinction between cardinal and ordinal properties is, conceptually, very important.
Indeed, given a utility function u : C Rn ! R and a strictly increasing function f : Im u !
R, we saw in Section 6.4.4 that the transformation f u : C Rn ! R of the utility function
u : C Rn ! R is itself a utility function equivalent to u. In other words, f u represents
the same preference relation %, which is the fundamental economic notion (Section 6.8).
Indeed,
x % y () u (x) u (y) () (f u) (x) (f u) (y)
For this reason, ordinal properties { which are satis ed by u and all its equivalent transfor-
mations f u { are characteristic of utility functions in that numeric representations of an
underlying preference %. In contrast, this is not true for cardinal properties, which might
well get lost through strictly increasing transformations that are not linear.
In light of this, ordinal quasi-concavity, rather than cardinal concavity, seems to be the
relevant property for utility functions u : C Rn ! R. Nevertheless, before we declare
quasi-concavity to be the relevant property, in place of concavity, we have to make a last
subtle observation. The monotone transformation f u is quasi-concave if u is concave; does
the opposite also hold? That is, is any quasi-concave function a monotone transformation of
a concave function?
If this were the case, concavity would be back in business also in an ordinalist approach:8
given a quasi-concave function, it would be then su cient to consider its equivalent concave
version, obtained through a suitable strictly increasing transformation.
The answer to the question is negative: there exist quasi-concave functions that are not
monotone transformations of concave functions.
7
Note that x4 is here strictly increasing because we are considering its restriction on [0; 1). For the same
reason, x2 is quasi-concave.
8
Recall the discussion of Section 6.2.1.
17.3. QUASI-CONCAVE FUNCTIONS 587
g=f h (17.22)
1 1 3 1 1 1
f (0) = f g = f g x+ y
4 2 2
1 1 1 1 1 1 1 1 1
f g (x) + f g (y) = f + f (0)
2 2 2 2 2
that is
1 1 1
f (0) f
2
which contradicts the fact that f 1 is strictly increasing. This proves the claim. N
This example shows that there exist genuinely quasi-concave functions that cannot be
represented as monotone transformations of concave functions. It is the de nitive proof
that quasi-concavity, and not concavity, is the relevant property in an ordinalist approach.
This important conclusion was reached in 1949 by Bruno de Finetti in the article in which
he introduced quasi-concave functions, whose theory was then extended in 1954 by Werner
Fenchel.
Proof We prove only (ii), as (i) can be similarly proved. Let x; y 2 C and 2 [0; 1]. We
have
(f g) ( x + (1 ) y) = f (g ( x + (1 ) y))
= f (g1 ( x + (1 ) y) ; :::; gm ( x + (1 ) y))
f ( g1 (x) + (1 ) g1 (y) ; :::; gm (x) + (1 ) gm (y))
= f ( (g1 (x) ; :::; gm (x)) + (1 ) (g1 (y) ; :::; gm (y)))
= f ( g (x) + (1 ) g (y)) min ff (g (x)) ; f (g (y))g
= min f(f g) (x) ; (f g) (y)g
n
Y
f (x) = xi i
i=1
with the exponents i > 0 is quasi-concave and increasing (Example 845). By Proposition
847-(ii), the function h = f g = C Rn ! R given by
n
Y
h (x) = gi i (x)
i=1
Corollary 849 The product of a concave and of a positive function is a quasi-concave func-
tion.
than the original one, i.e., k. If x = (0; 1) is the bundle composed by 0 units of water and 1
of bread, while y = (1; 0) is composed by 1 unit of water and 0 of bread, their mixture
1 1 1 1
(0; 1) + (1; 0) = ;
2 2 2 2
is a diversi ed bundle, with positive quantities of both water and bread. It is natural to
think that this mixture gives a utility which is not smaller than the utility of the two original
bundles.
for every 2 [0; 1]. This is, precisely, the classic property of \convexity" of indi erence
curves. Mathematically, it is the convexity of the upper contour set (u k), which corre-
sponds to the quasi-concavity of utility functions.
Everything ne? Almost, we can actually sharpen what was just said. Observe that the
diversi cation principle implies that, for every x; y 2 C,
Indeed, by setting k = u (x) = u (y), we obviously have u (x) k and u (y) k, which
implies u ( x + (1 ) y) k by the diversi cation principle. We call condition PDP the
pure diversi cation principle. In preferential terms, the PDP takes the nice form
x y =) x + (1 )y % x 8 2 [0; 1]
De nition 850 A set C in Rn is said to be directed if, for every x; y 2 C, there exists
z 2 C such that z x and z y.
In words, a set is directed when any pair of its elements has a common lower bound that
belongs to the set. In consumer theory many sets of interest are directed. For example, all
sets C Rn+ that contain the origin are directed. Indeed, 0 x for every x 2 Rn and,
therefore, the origin itself is the lower bound common to all the pairs of elements of C.
590 CHAPTER 17. CONCAVE FUNCTIONS
Proof Since the \only if" part is obvious, we prove the \if" part: the PDP implies the
quasi-concavity of u. Let x; y 2 C and 2 [0; 1], with u (x) u (y). Since C is directed,
there exists z 2 C such that z x and z y. By the increasing monotonicity of u,
we have u (z) u (x) and u (z) u (y). De ne the auxiliary function : [0; 1] ! R by
(t) = u (tx + (1 t) z) for t 2 [0; 1]. Since C is convex, the function is well de ned. The
continuity of u implies that of . Indeed:
tn ! t =) tn x + (1 tn ) z ! tx + (1 t) z
=) u (tn x + (1 tn ) z) ! u (tx + (1 t) z) =) (tn ) ! (t)
Since (0) = u (z) u (y) u (x) = (1), by the Intermediate Value Theorem the continu-
ity of implies the existence of t 2 [0; 1] such that (t) = u (y). By setting w = tx+(1 t) z,
we have therefore u (w) = u (y). Moreover, z x implies w x.
By the PDP condition, it follows that u ( w + (1 ) y) u (w) = u (y), while w x
implies that w + (1 )y x + (1 ) y. Since u is increasing, we conclude that
The result just proved guarantees that, under assumptions typically satis ed in consumer
theory, the two possible interpretations { proper and improper { of the convexity of the indif-
ference curves are equivalent. We can therefore consider the pure principle of diversi cation,
which is the clearest form of the diversi cation principle, as the motivation for the use of
quasi-concave utility functions.
What about concave functions? They satisfy the diversi cation principle and therefore
their use does not violate the principle. Example 846 has shown, however, that there ex-
ist examples of quasi-concave functions that are not monotone transformations of concave
functions, i.e., that do not have the form f g with f strictly increasing and g concave. In
other words, quasi-concavity (so, the diversi cation principle) is a weaker property than the
concavity in ordinal utility theory.
In conclusion, the use of concave functions is consistent with the diversi cation principle,
but it is not justi ed by it. Only quasi-concavity is justi ed by this principle, being its
mathematical counterpart.9
We make a last observation on the pure diversi cation principle that does not add much
conceptually, but is useful in applications. Consider a version of condition PDP with strict
inequality: for every x 6= y,
x y =) x + (1 )y x 8 2 (0; 1)
We thus obtain a strong form of the principle in which diversi cation is always strictly
preferred by the consumer. Condition SDP is implied by the strict quasi-concavity of u since
Under the hypotheses of Proposition 851, it is indeed equivalent to SDP. We thus have the
following version of that proposition (the proof is left to the reader).
SDP is thus the version of the diversi cation principle that characterizes strict quasi-
concavity, a property often used in applications because it ensures the uniqueness of the
solutions of optimization problems, as it will be discussed in Section 22.6.
We close by observing that, although important, the diversi cation principle does not
have universal validity: there are cases in which it makes little sense. For example, if the
bundle (1; 0) consists of 1 unit of brewer's yeast and 0 of cakes' yeast, while the bundle (0; 1)
consists of 1 unit of cakes' yeast and 0 of brewer's yeast, and we judge them indi erent, their
combination (1=2; 1=2) might be useless for making both a pizza and a cake. In this case,
the combination turns out to be rather bad.
Naturally, a function satis es Cauchy's equation if and only if it is additive (cf. De nition
856). Much more is true:
N.B. The conclusion of Theorem 853 holds also when f is de ned only on R+ : the proof is
the same. O
Proof The \if" part is trivial; let us show the \only if" part in three steps. (i) Taking
x = y = 0, the equation gives f (0) = f (0) + f (0) = 2f (0), that is, f (0) = 0: the graphs of
all functions that satisfy the equation pass through the origin.
(ii) We claim that f is continuous at every point. Let x0 be the point at which, by
hypothesis, f is continuous, so that f (x) ! f (x0 ) as x ! x0 . Take another (generic) point
z0 . By the Cauchy equation and the continuity of f at x0 ,
Therefore,
f (x z0 ) ! f (x0 ) f (z0 ) = f (x0 z0 ) as x ! x0
which proves the continuity of f at x0 z0 and, by the arbitrariness of x0 z0 , f is everywhere
continuous.
(iii) Using Cauchy's equation n times, we can write that, for every x 2 R and for every
n 2 N,
f (nx) = f (x
| +x+ {z + x}) = f| (x) + f (x){z+ + f (x) = nf (x)
}
n times n times
1 1
f y = f (y) 8y 2 R, 8k 2 Z (17.24)
k k
frk g of rationals that tends to x. We know that f (rk ) = ark for every k 1. Since ark ! ax
as k ! 1, the continuity of f at each x 2 R then yields
as desired.
It admits the trivial solution f (x) = 0 for every x 2 R. Every other solution is strictly
positive. Indeed, if f is such a solution, for every x 2 R we have:
x x x x h x i2
f (x) = f + =f f = f 0
2 2 2 2 2
Moreover, if there exists y 6= 0 with f (y) = 0, then f (x) = f ((x y) + y) =
f (x y) f (y) = 0 for every x 2 R, which contradicts the non-triviality of f . Ev-
ery non-trivial solution of (17.25) is therefore strictly positive. This allows us to take
the logarithm of both members of (17.25), so that
log f (x + y) = log f (x) + log f (y) 8x; y 2 R
which is the Cauchy equation in the unknown function log f . The solution is therefore
log f (x) = mx with m 2 R, so the exponential function
f (x) = emx
is the non-trivial solution of the functional equation (17.25).
(ii) \ -+": consider the functional equation
f (x y) = f (x) + f (y) 8x; y > 0 (17.26)
It also admits the trivial solution f (x) = 0 for every x 2 R. By using the identity
xy = elog x+log y , (17.26) becomes
It admits, too, the trivial solution f (x) = 0 for every x 2 R. The reader can verify that
also in this case we can take the logarithm of the two members, so that the equation
reduces to (ii) with log f in place of f , with solution log f (x) = m log x, that is, the
power function
f (x) = em log x = xm
The results just seen are remarkable because they establish a functional foundation to
the elementary functions. For example, the exponential function can be characterized, as in
Theorem 399, via the limit
x n
ex = lim 1 +
n!1 n
but also, from a completely di erent angle, as the function that solves the functional equation
(17.25). Both points of view are of great importance.
Because of the importance of this new perspective on elementary functions, we record as
a theorem what we established.
Theorem 854 (i) The exponential function f (x) = emx , with m 2 R, is the unique non-
trivial solution of the functional equation
(ii) The logarithmic function f (x) = log xm , with m 2 R, is the unique non-trivial solution
of the functional equation
(iii) The power function f (x) = xm , with m 2 R, is the unique non-trivial solution of the
functional equation
f (x y) = f (x) f (y) 8x; y 0
m = m (c; t) : R R+ ! R
Here, c < 0 is interpreted as a debt. Consider the following properties on this function:
Condition (i) requires that the terminal value of a sum of capitals be the sum of their
terminal values. Observe that it would be meaningless to suppose that m (c1 + c2 ; t) <
m (c1 ; t) + m (c2 ; t) for some c1 ; c2 0 because, in such a case, it would be more pro table
to invest separately c1 and c2 than their sum c1 + c2 . In contrast, it might be reasonable to
have m (c1 + c2 ; t) m (c1 ; t) + m (c2 ; t), but this would lead us a bit too far away.
Condition (ii) requires that the terminal value increases with the length of the investment.
This presumes that such value is measured in nominal terms. Finally, condition (iii) is
obvious.
Theorem 855 Let m be continuous for, at least, some value of c. It satis es conditions
(i)-(iii) if and only if
m (c; t) = cf (t)
Under conditions (i)-(iii), the terminal value is therefore proportional to the amount c
of the capital. In particular, we have f (t) = m (1; t), so f (t) can be interpreted as the
terminal value in t of a unit capital. The terminal value of any other amount of capital can
be obtained simply by multiplying it by f (t). For this reason, f (t) is called the compound
factor.
The most common compound factor has the form
t
f (t) = e
with 0. To see how the exponential factor may come up, suppose that one has to invest
a capital c from today, 0, until the date t1 + t2 . We can think of two investment strategies:
(a) to invest from the beginning to the end, thus obtaining the terminal value cf (t1 + t2 );
(b) to invest rst from 0 to t1 , getting the terminal value cf (t1 ), and then reinvest this
amount for the remaining t2 , thus obtaining the terminal value (cf (t1 )) f (t2 ).
If the two terminal values di er, that is, f (t1 + t2 ) 6= f (t1 ) f (t2 ), arbitrage opportunities
may open if in the nancial market it is possible to lend and borrow without quantity
constraints and transactions costs. Indeed, if f (t1 + t2 ) > f (t1 ) f (t2 ), it would be pro table
to invest without interruptions from 0 to t1 + t2 and to borrow with interruption at t1 ,
earning in this way the di erence f (t1 + t2 ) f (t1 ) f (t2 ) > 0. Vice versa, if f (t1 + t2 ) <
f (t1 ) f (t2 ), it would be pro table to borrow without interruptions, and then investing with
an interruption at t1 .
596 CHAPTER 17. CONCAVE FUNCTIONS
In sum, the equality f (t1 + t2 ) = f (t1 ) f (t2 ) must hold for every t1 ; t2 0 in order
not to open arbitrage opportunities. Remarkably, from the study of the variant (17.25) of
Cauchy's equation, it follows that this equality amounts to
t
f (t) = e
provided f is continuous at least at one point. The exponential compound factor is thus the
outcome of a no arbitrage argument, as it is the case for many key results in nance (cf.
Section 24.6).
N.B. In this section we assumed that time is continuous, so t can take any positive value,
so each c induces a function mt (see the proof of the last theorem). In contrast, if time were
discrete, with t 2 N+ , we would have a sequence. In this case, the discrete compound factor
f : N+ ! R that corresponds to the exponential continuous compound factor is given by
f (t) = (1 + r)t with mt = (1 + r)t c (cf. Example 295).
(iii) f is linear;
(iv) there exists a (unique) vector 2 Rn such that f (x) = x for all x 2 Rn .
Proof (iv) implies (iii) by Riesz's Theorem. (iii) implies (ii) by Theorem 646. (ii) trivially
implies (i). Finally, to prove that (i) implies (iv) is enough to show, along the lines of the
proof of Cauchy's Theorem for scalar functions (which is easily adapted to Rn , as readers
can check), that (i) implies that f is homogeneous, so linear.
if and only if there exists a vector 2 Rn such that f (x) = e x for all x 2 Rn .
Chapter 18
Homogeneous functions
Geometrically, C is a cone if, any time x belongs to C, the set C also includes the whole
half-line starting at the origin and passing through x.
5
y 7
6 y
4
3
4
2 3
2
1 O x
O x
1
0
0
-1 -1
-3 -2 -1 0 1 2 3 4 5 6 7 -6 -4 -2 0 2 4 6 8 10
Note that the origin 0 always belong to a cone: given any x 2 C, by taking = 0 we have
0 = 0x 2 C.
One can easily show that the closure of a cone is a cone and that the intersection of two
cones is still a cone.
x; y 2 C =) x + y 2 C 8 ; 0
597
598 CHAPTER 18. HOMOGENEOUS FUNCTIONS
While a generic convex set is closed with respect to convex combinations, convex cones
are closed with respect to all linear combinations with positive coe cients (regardless of
whether or not they add up to 1). This is what distinguishes them among all convex sets.
Proof \Only if". Let C be a cone. Take x; y 2 C. We want to show that x + y 2 C for
all ; 0. Fix ; 0. If = = 0, then x + y = 0 2 C. Assume that + > 0.
Since C is convex, we have
x+ y2C
+ +
Since C is a cone, we have
x+ y =( + ) x+ y 2C
+ +
as desired.
\If". Suppose that x; y 2 C implies that x + y 2 C for all ; 0. We want to show
that C is a cone. By taking = = 0, one can conclude that 0 2 C and, by taking y = 0,
that x 2 C for all 0. Hence, C is a cone.
Example 860 (i) A singleton fxg Rn is always convex; it is also a cone if x = 0. (ii)
The only non-trivial cones in R are the two half-lines ( 1; 0] and [0; 1).1 (iii) The set
Rn+ = fx 2 Rn : x 0g of the positive vectors is a convex cone. N
Cones can be closed, for example Rn+ , or open, for example Rn++ . Vector subspaces form
an important class of closed convex cones (the non-trivial proof is omitted).
For example, this proposition implies that the graphs of straight lines passing through
the origin are closed sets because they are vector subspaces of R2 .
Example 863 (i) Linear functions f : Rn ! R are positively homogeneous. (ii) The func-
p
tion f : R2+ ! R given by f (x) = x1 x2 is positively homogeneous. Indeed
p p p
f ( x) = ( x1 ) ( x2 ) = 2x x = x1 x2 = f (x)
1 2
for all 0. N
The condition 0 2 C in the de nition ensures that x 2 C for all 2 [0; 1], so that (18.1)
is well-de ned. Whenever C is a cone { as in the previous examples { property (18.1) holds,
more generally, for any positive scalar .
Proof Since the \if" side is trivial, we focus on the \only if". Let f be positively homogeneous
and let x 2 C. We must show that f ( x) = f (x) for every > 1. Let > 1 and
set y = x, so that x = y= . From > 1 it follows that 1= < 1. Thanks to the
positive homogeneity of f , we have that f (x) = f (y= ) = f (y) = = f ( x) = , that is,
f ( x) = f (x), as desired.
Linear production functions are positively homogeneous, thus having constant returns to
scale (Example 643). Let us now illustrate another famous example.
Apart from being constant, returns to scale may be increasing or decreasing. This moti-
vates the following de nition.
f ( x) f (x)
for all x 2 C and all 2 [0; 1], while it is said to be ( positively) subhomogeneous if
f ( x) f (x)
Example 867 Given any scalar k > 0, the function f : [0; 1) ! R de ned by f (x) =
1
1 + xk k
is subhomogeneous. Indeed, for each 2 [0; 1] we have
1
1 1 1
k k k k k 1 k
k
k
f ( x) = 1 + ( x) = 1+ x = k
+x 1 + xk = f (x)
as desired. N
f ( x) f (x) 8 2 [0; 1]
and
f ( x) f (x) 8 1
Proof We consider the \only if" side, the converse being trivial. Let f be subhomogeneous
and x 2 C. Our aim is to show that f ( x) f (x) for all > 1. Take > 1 and set
y = x, so that x = y= . Since > 1, we have 1= < 1. By the positive subhomogeneity of
f , we have f (x) = f (y= ) f (y) = = f ( x) = , that is, f ( x) f (x), as desired.
Thus, by doubling all inputs ( = 2) the output is less than doubled, by tripling all inputs
( = 3) the output is less than tripled, and so on for each 1. A proportional increase of
all inputs brings along a less than proportional increase in output, which models decreasing
returns to scale. Dual considerations hold for increasing returns to scale, which entail more
than proportional increases in output as all inputs increase proportionally. Note that when
2 [0; 1], so we cut inputs, opposite output patterns emerge.
18.2. HOMOGENEITY AND RETURNS TO SCALE 601
f ( x) = ( x1 )a ( x2 )b = a+b a b
x1 x2 = a+b
f (x)
In conclusion, the notions of homogeneity are de ned for 2 [0; 1] { that is, for propor-
tional cuts { on convex sets containing the origin. Nonetheless, their natural domains are
cones, where they model the classic returns to scale hypotheses in which both cuts, 2 [0; 1],
and raises, 1, in inputs are considered.
f (x)
fm (x) =
x
for each x > 0. It is important in applications: for example, if f is a production function, fm
is the average production function; if f is the cost function, fm is the average cost function;
and so on.
If f : Rn+ ! R is a function of several variables, it is no longer possible to \divide" it by
a vector x. We must, therefore, come up with an alternative concept of \average function".
602 CHAPTER 18. HOMOGENEOUS FUNCTIONS
The most natural surrogate for such a function is the following. Having chosen a generic
y
vector 0 6= y 2 Rn+ , let us consider the function fm : (0; 1) ! R given by
y f (zy)
fm (z) =
z
It yields the average value of f with respect to positive multiples of z only (which is arbitrarily
chosen). In the n = 1 case, by choosing y = 1 one ends up with the previous de nition of
average function.
Proof \Only if". If f is subhomogeneous one has that, for any 0 < ,
f ( y) = f y f ( y)
y y y
that is f ( y) = f ( y) = , or fm ( ) fm ( ). Therefore, the function fm is decreasing.
y y y
\If". If fm is decreasing, by setting = 1, we have fm ( ) fm (1) for 0 < 1 and so
f ( y) = f (y), that is, f ( y) f (y) for each 0 < 1. Since f (0) = 0, the function
f is subhomogeneous.
Proof \Only if" Let f be positively homogeneous and concave. Then, for all x; y 2 C we
have
1 1 1 1 1
f (x + y) = f x+ y f (x) + f (y)
2 2 2 2 2
18.2. HOMOGENEITY AND RETURNS TO SCALE 603
So, f (x + y) f (x)+f (y). \If" Let f be positively homogeneous and superadditive. Then,
for all x; y 2 C and 2 [0; 1] we have
So, f is concave.
By the last result, superlinear functions are the positively homogeneous that are concave.
So, concavity and positive homogeneity join forces in this important class of functions.
Example 873 (i) The norm k k : Rn ! R is a sublinear function (cf. Example 814). (ii)
De ne f : Rn ! R by
f (x) = inf i x 8x 2 Rn
i2I
where f i gi2I be a collection, nite or in nite, of vectors of Rn . This function is easily seen
to be superlinear. (iii) Given a convex function f : (0; 1) ! R, consider the perspective
function g : R2++ ! R de ned by
x2
g (x1 ; x2 ) = x1 f
x1
This function is convex (Example 831) and is easily seen to be positively homogeneous. So,
it is sublinear. N
Next we report some useful properties of superlinear functions, for simplicity de ned
directly on Rn (readers can consider versions of the next results on a ne sets).
f ( x) f (x) 8x 2 Rn (18.4)
A simple consequence of the last result is the following corollary, which motivates the
\superlinear" terminology.
Proof Let f be both superlinear and sublinear. By (18.4), we have both f ( x) f (x)
and f ( x) f (x) for all x 2 Rn , that is, f ( x) = f (x) for all x 2 Rn . By Proposition
874, f is then linear. The converse is trivial.
Proof Let l 2 (Rn )0 and suppose that f (x) l (x) for all x 2 Rn . Let x 2 Rn . Then, we
have both f (x) l (x) and f ( x) l ( x), which in turn implies f (x) l (x) = l ( x)
f ( x). This proves (18.5).
Proof We only prove the \if", the converse being obvious. Let f 0 be quasi-concave and
null-superadditive. In view of Proposition 871, it is enough to show that f is superadditive,
i.e., f (x + y) f (x) + f (y) for all x; y 2 C.
Let x; y 2 C. First, assume that f (x) + f (y) = 0. Since f 0, we trivially have
f (x + y) f (x) + f (y). Second, assume that f (x) + f (y) > 0. Since f 0, either
f (y) is strictly greater than 0 or f (x) or both. If f (x) = 0 and f (y) > 0, then, by
null-superadditivity, f (x + y) f (y) = f (x) + f (y). If f (y) = 0 and f (x) > 0, by the
same argument, we obtain again f (x + y) f (x) = f (x) + f (y). Finally, in the third
case, f (x) > 0 and f (y) > 0. Then, there exist strictly positive scalars and such that
f (x) and f (y) . Since f is positively homogeneous, f (x= ) 1 and f (y= ) 1.
Hence x= , y= 2 (f 1) where the latter set is convex, being f quasi-concave. Therefore,
1 x+y x y
f (x + y) = f =f + 1
+ + + +
that is f (x + y) + . By taking = f (x) and = f (y), it follows f (x + y)
f (x) + f (y).
Lemma 879 Let C Rn be a convex set with int C 6= ;. If x 2 C and y 2 int C, then
(1 ) x + y 2 int C for all 2 (0; 1). Moreover, we have int C = C
Proof Consider 2 (0; 1), x 2 C, and y 2 int C. Since y 2 int C, there exists " > 0 such
that B" (y) C. De ne = " > 0. We next show that B ((1 ) x + y) C which will
prove that (1 ) x + y 2 int C. To do so, consider z 2 B ((1 ) x + y) and de ne
z (1 )x
w=
606 CHAPTER 18. HOMOGENEOUS FUNCTIONS
z (1 )x y + (1 )x z
ky wk = y = < = ";
Proof of Proposition 878 We only prove the \if", the converse being obvious. Let f be
quasi-concave, with f > 0 on int C. By proceeding as in the last proof when we studied the
case f (x) > 0 and f (y) > 0, we can show that f (x + y) f (x) + f (y) for all x; y 2 int C.
Next, consider x; y 2 int C and 2 (0; 1). By Lemma 879 and since 0 2 C and C is
convex, it follows that (1 ) y = 0 + (1 ) y 2 int C. A similar argument yields that
x 2 int C. By the previous part, it follows that f ( x + (1 ) y) f ( x)+f ((1 ) y) =
f (x) + (1 ) f (y), that is, f is concave on int C. By Lemma 879 and since C is convex,
we have C int C. The continuity of f then implies its concavity on C.
Let us illustrate a couple of noteworthy applications of the last two results. In both
applications, we will use these results to establish the concavity of some classic functions by
showing their positivity, quasi-concavity and positive homogeneity. This route is far more
simple than verifying concavity directly.
Corollary 880 (i) The CES production function P is concave if 0 < 1. (ii) The Cobb-
Douglas production function is concave as long as ni=1 ai = 1.
Proof of Corollary 880 (i) For = 1 the statement is obvious. If < 1, note that on R+
the power function x is concave if 2 (0; 1). Hence, also g (x) = x1 + (1 ) x2 is concave.
1
Since h (x) = x is strictly increasing on R+ for any > 0, it follows that f = h g is
quasi-concave and null-superadditive. Since f 0 and thanks to Theorem 877, we conclude
that f is concave as we have previously shown its homogeneity.
(ii) The Cobb-Douglas function f is quasi-concave (Example 848). Since f is continuous
and on Rn++ is > 0, Proposition 878 implies P
that f is concave on Rn+ as we have already seen
that f is positively homogeneous whenever ni=1 ai = 1.4
f ( x + (1 ) 0) = f ( x) = f (x) + (1 ) f (0)
18.3 Homotheticity
18.3.1 Semicones
For the sake of simplicity, till now we considered convex sets containing the origin 0, and
cones in particular. To introduce the notions of this nal section such an assumption becomes
too cumbersome to maintain, so we will consider the following generalization of the notion
of cone.
Unlike the de nition of cone, here we require that x belong to C only for > 0 rather
than for 0. A cone is thus, a fortiori, a semicone. However, the converse does not hold:
n
the set R++ is a notable example of a semicone that is not a cone.
Therefore, semicones do not necessarily contain the origin and, when they do, they
automatically become cones. In any case, the origin is always in the surroundings of a
semicone:
The easy proofs of the above lemmas are left to the reader. The last lemma, in particular,
leads to the following result.
The distinction between cones and semicones thus disappears when considering closed
sets. Finally, the following version of Proposition 859 holds for semicones, with coe cients
that now are required to be strictly positive, as the reader can check.
x; y 2 C =) x + y 2 C
Example 886 (i) The two half-lines ( 1; 0) and (0; 1) are semicones in R (but they are
not cones) (ii) The set Rn++ = fx 2 Rn : x 0g of the strongly positive vectors is a convex
semicone (which is not a cone). N
The next result shows that this notion is consistent with what we did so far.
Proof If 0 2 C, then for every > 0 we have f (0) = f ( 0) = f (0). Hence, f (0) = 0.
Thus, when the semicone is actually a cone { i.e., it contains the origin (Lemma 882) { we
get back to the notion of positive homogeneity on cones of the previous section. Everything
ts together.
Pn
Example P 889 Consider the function f : Rn++ ! R given by f (x) = e i=1 ai log xi , with
ai > 0. If ni=1 ai = 1, the function is positively homogeneous. Indeed, for any > 0 we
have
Pn Pn Pn Pn
ai log xi ai (log +log xi )
f ( x) = e i=1 =e i=1 = elog e i=1 ai log xi
= e i=1 ai log xi
In other words, a continuous and strongly increasing function is homothetic if and only if
it is a strictly increasing transformation of a positively homogeneous function.6 In particular,
homogeneous functions themselves are homothetic because f (x) = x is, trivially, strictly
increasing.
In sum, homotheticity can be seen as the ordinal version of positive homogeneity. As
such, it is the version relevant in ordinal utility theory.
Yn
Example 892 Let u : Rn+ ! R be the Cobb-Douglas utility function u (x) = xai i , with
P i=1
ai > 0 and ni=1 ai = 1. It follows from Example 869 that such a function is positively
homogeneous. If f is strictly increasing, the transformations f u of the Cobb-Douglas utility
function are homothetic. For example, if we consider the restriction of u on the semicone Rn++
(where it is still positively homogeneous) and the logarithmic transformation
Pn f (x) = log x,
we obtain the log-linear utility function v = log u given by v (x) = i=1 ai log xi , which is
thus homothetic. N
Proof The \if" part is simple. For, if h (x) = h (y) then g (x) = g (y) because f is strictly
increasing. Thus, g ( x) = g (x) = g (y) = g ( y) for all > 0, yielding that h ( x) =
f (g ( x)) = f (g ( y)) = h ( y) for all > 0, that is, h is homothetic. The \only if" part
is more involved. Set 1 = (1; :::1) 2 Rn+ . We next show that for each x 2 Rn+ there exists
a unique x 0 such that h (x) = h ( x 1). Note that for each x 2 Rn+ we have that there
exists ; 0 such that 1 x 1. Since h is strongly increasing, we have that
h ( 1) h (x) h ( 1)
and x 2 Rn+ . Since h is homothetic and h (x) = h ( x 1), we have that h ( x) = h ( x 1).
Since x 0 is the unique number such that h ( x) = h ( x 1), we conclude that g ( x) =
x = x = g (x), proving that g is positively homogeneous.
The assumptions above on h play an important role, as the next example shows.
1 x1 > 0
h (x) =
0 x1 = 0
Note that R2+ is a semicone and that h is homothetic. Indeed, given > 0, if h (x) = h (y),
then either h (x) = h (y) = 1 and x1 ; y1 > 0 or h (x) = h (y) = 0 and x1 = y1 = 0. In the rst
case, the rst components of x and y are still strictly positive and h ( x) = 1 = h ( y).
In the second case, the rst components of x and y are still null and h ( x) = 0 = h ( y).
The homothetic function h, which is neither continuous nor strongly increasing, cannot be
obtained as composition of f and g where f : Im g ! R is strictly increasing and g is
positively homogeneous. Otherwise, consider 1 = (1; 1). Since h (1) = 1 > 0 = h (0) and
f is strictly increasing, we would have g (1) > g (0). Since g is positively homogeneous, we
could conclude that g (0) = 0. This would imply that g (1) > 0 and g (2; :::;2) = 2g (1) =
g (1) + g (1) > g (1), which would yield 1 = h (2; :::;2) > h (1) = 1, a contradiction. N
Chapter 19
Lipschitz functions
A function is called Lipschitz, without further quali cations, when the inequality (19.1)
holds on the entire domain of the function. When f is a function, this inequality takes the
simpler form
jf (x1 ) f (x2 )j k kx1 x2 k
where in the left hand side we have the absolute value in place of the norm.
In a Lipschitz operator, the distance kf (x1 ) f (x2 )k between the images of two vectors
x1 and x2 is controlled, through a positive coe cient k, by the distance kx1 x2 k between
the vectors x1 and x2 themselves. This \variation control" that the independent variable
exerts on the dependent variable is at the heart of Lipschitzianity. The rein is especially
tight when k < 1, so variations in the independent variable cause strictly smaller variations
of the dependent variable. In this case, the Lipschitz operator is called a contraction.
The control nature of Lipschitzianity translates in a strong form of continuity. To see
how, rst note that Lipschitz operators are continuous. Indeed, let x0 2 A. If xn ! x0 , we
have:
kf (xn ) f (x0 )k k kxn x0 k ! 0 (19.2)
and hence f (xn ) ! f (x0 ). So, f is continuous at x0 . More is true:
611
612 CHAPTER 19. LIPSCHITZ FUNCTIONS
The converse is false, as Example 897 will show momentarily. Because of its control
nature, Lipschitzianity thus embodies a stronger form of continuity than the uniform one.
Proof For each " > 0, take 0 < " < "=k. Then, kf (x) f (y)k k kx yk < " for each
x; y 2 Rn such that kx yk < " .
So, setting y = 0, there is no k > 0 such that jf (x) f (y)j k jx yj for each x; y 0.
That said, the previous example shows that f is Lipschitz on each interval [a; b] with
a > 0. So f is not Lipschitz on its entire domain, but it is in suitable subsets of it. More
interestingly, by Theorem 603 the function f is uniformly continuous on each interval [0; b],
with b > 0, but it is not Lipschitz on [0; b]. This also shows that the converse of the last
lemma does not hold. N
Lemma 899 Given a linear operator f : Rn ! Rm , there exists a constant k > 0 such that
kf (x)k k kxk for every x 2 Rn .
So, if x 6= 0 we have
kf (x)k
0< k
kxk
The ratio kf (x)k = kxk is thus bounded above by a constant k, so it cannot explode, for
all non-zero vectors x. In other words, there is no sequence fxn g of vectors such that
kf (xn )k = kxn k ! +1.
19.2. LOCAL CONTROL 613
Pn
Proof Set k = i=1 f ei . We have:
n
! n n
X X X
i i
kf (x)k = f xi e = xi f e jxi j f ei
i=1 i=1 i=1
Proof of Theorem 898 Let x; y 2 Rn . Since f is linear, the last lemma implies
So, f is Lipschitz.
Note the local nature of this de nition: the constant kx0 depends on the point x0 at hand
and the inequality is required only between points of a neighborhood of x0 (not between any
two points of the domain of f ).
When f is locally Lipschitz at each point of a set B we say that it is locally Lipschitz on
B. If B is the entire domain, we say that the operator is locally Lipschitz, without further
quali cations.
Now, the \variation control" that the independent variable exerts on the dependent
variable is only local, in a neighborhood of a given point. This local control still translates in
a strong from of continuity at a point (with kx0 in place of k , (19.2) still holds as xn ! x0 ),
but no longer across points as it was the case with global Lipschitzianity.
614 CHAPTER 19. LIPSCHITZ FUNCTIONS
where 0 < "0 < ". Since the derivative f 0 is continuous on [x0 "0 ; x0 + "0 ], by Weierstrass'
Theorem the constant k0 is well-de ned. By proceeding as in the Example 896, mutatis
mutandis, the reader can then check that f is locally Lipschitz at x0 . N
There is, however, an important case where local and global Lipschitzianity are equiva-
lent.
Proof Since the \only if " is obvious, we only prove the \if." Assume that f is locally
Lipschitz on K. Suppose, by contradiction, that f is not Lipschitz on K. So, there exist two
sequences fxn g and fyn g in K such that
kf (xn ) f (yn )k
! +1 (19.4)
kxn yn k
Since K is compact, by the Bolzano-Weierstrass' Theorem there exist two subsequences
fxnk g and fynk g such that xnk ! x 2 K and ynk ! y 2 K. Since f is continuous, we have
f (xnk ) ! f (x) and f (ynk ) ! f (y). We consider two cases.
(ii) Suppose x = y. By hypothesis, f is locally Lipschitz at x, so there is B" (x) such that
Since xnk ! x and ynk ! x, there is a large enough k" 1 so that xnk ; ynk 2 B" (x)
for all k k" . Then,
kf (xnk ) f (ynk )k
kx 8k k"
kxnk ynk k
which contradicts (19.4).
19.2. LOCAL CONTROL 615
The next important result shows that concave functions are locally Lipschitz, thus clari-
fying the continuity properties of these fundamental functions.
In view of Proposition 903, f is then Lipschitz on each compact set K C. The theorem
is a consequence of the following lemma of independent interest.
Lemma 905 A function continuous at an interior point of its domain is locally bounded at
that point.
Example 906 (i) The continuous function f : (0; 1) ! R de ned by 1=x is unbounded but
locally bounded at each point of its domain. (ii) The function f : R ! R de ned by
(
log jxj if x 6= 0
f (x) =
0 if x = 0
Proof of Theorem 904 We want to show that f is locally Lipschitz at any x 2 C. By the
last lemma, f is locally bounded at x, i.e., there exists mx 2 R and a neighborhood B2" (x),
without loss of generality of radius 2", such that jf (y)j mx for all y 2 B2" (x). Given
y1 ; y2 2 B2" (x), set
"
y3 = y 2 + (y2 y1 )
ky2 y1 k
Then, y3 2 B2" (x) since
"
ky3 xk = y3 y2 + (y2 y1 ) 2"
ky2 y1 k
Since
" ky2 y1 k
y2 = y1 + y3
ky2 y1 k + " ky2 y1 k + "
616 CHAPTER 19. LIPSCHITZ FUNCTIONS
concavity implies
" ky2 y1 k
f (y2 ) f (y1 ) + f (y3 )
ky2 y1 k + " ky2 y1 k + "
so that
ky2 y1 k ky2 y1 k
f (y1 ) f (y2 ) (f (y1 ) f (y3 )) 2mx (19.5)
ky2 y1 k + " "
Interchanging the roles of y1 and y2 , we get
ky1 y2 k ky1 y2 k
f (y2 ) f (y1 ) (f (y2 ) f (y3 )) 2mx
ky1 y2 k + " "
2mx
jf (y1 ) f (y2 )j ky1 y2 k
"
So, f is locally Lipschitz at x.
So, even if in the de nition we only require invariance with respect to positive constants,
it actually holds for any constant, positive or not.
Proof We only prove the \only if", the converse being trivial. Let f : Rn ! R be translation
invariant. We only need to prove that (19.6) also holds when k < 0. If k < 0, then k > 0.
By translation invariance with x + k in place of x, we have
Example 909 De ne f : Rn ! R by
It is normalized if and only if c = 1. Later in the book, Theorem 1564 will characterize this
class of translation invariant functions. N
Though translation invariance is much weaker than linearity, under monotonicity we still
have Lipschitzianity. Actually, for the result is enough that the function be Blackwell.
Proof First, note that since f is increasing, we have f (1) > 0. Let x 2 Rn . By (19.3), we
have jxi j kxk for each i = 1; :::; n.5 Therefore, maxi=1;:::;n jxi j kxk, which in turn implies
x y maxi=1;:::;n jxi yi j kxk for all x; y 2 Rn . So x y + kx yk. Since f is increasing
and Blackwell, we then have
So, f (x) f (y) kx yk for all x; y 2 Rn . By exchanging the roles of x and y, we also
have f (y) f (x) kx yk for all x; y 2 Rn . We conclude that
as desired.
N.B. The proof shows that an increasing Blackwell function f is a contraction if and only
if f (1) < 1. In applications, this is the most relevant case. O
Remarkably, like positive homogeneity (Theorem 877), also under translation invariance
concavity and quasi-concavity are equivalent properties.
5
To ease matters, in this proof with an abuse of notation we write x k and x + k in place of x k and
x + k.
618 CHAPTER 19. LIPSCHITZ FUNCTIONS
Proof We only prove the \if", the converse being obvious. Let f be quasi-concave. We
have, for all x 2 Rn ,
t t
f (x) t () f (x) t 0 () f x = f (x) f (1) 0 8t 2 R
f (1) f (1)
where t = (t; :::; t). So, for all x 2 Rn ,
t
x 2 (f t) () x 2 (f 0) 8t 2 R
f (1)
which implies:6
t
(f t) = (f 0) + 8t 2 R
f (1)
If t and s are any two scalars and 2 (0; 1), then
t + (1 )s
(f t) + (1 ) (f s) = (f 0) + (1 ) (f 0) +
f (1)
t + (1 )s
= (f 0) + = (f t + (1 ) s)
f (1)
That is,
(f t) + (1 ) (f s) = (f t + (1 ) s) (19.7)
Take any two points x; y 2 Rn and set f (x) = t and f (y) = s. Then, x 2 (f t) and
y 2 (f s), and x + (1 ) y 2 (f t) + (1 ) (f s). By (19.7), x + (1 )y 2
(f t + (1 ) s), that is,
f ( x + (1 ) y) t + (1 ) s = f (x) + (1 ) f (y)
So, f is concave.
g (x1 + k; :::; xn + k) = log f ex1 +k ; :::; exn +k = log f ex1 ek ; :::; exn ek
= log ek f (ex1 ; :::; exn ) = k + log f (ex1 ; :::; exn ) = k + g (x)
Supermodular functions
The join x _ y is thus the smallest vector that is larger than both x and y, while the meet
x ^ y is the largest vector that is smaller than both of them. That is, for all z 2 Rn we have
z x and z y =) z x_y
and
z x and z y =) z x^y
Example 913 Let x = (0; 1) and y = (2; 0) be two vectors in the plane. We have
(x _ y)1 = max fx1 ; y1 g = max f0; 2g = 2 , (x _ y)2 = max fx2 ; y2 g = max f1; 0g = 1
(x ^ y)1 = min fx1 ; y1 g = min f0; 2g = 0 , (x ^ y)2 = min fx2 ; y2 g = min f1; 0g = 0
so x ^ y = (0; 0). N
The next simple, yet key, property relates meets, joins and sums.
621
622 CHAPTER 20. SUPERMODULAR FUNCTIONS
Proof The equality is trivially true if x and y are scalars. If x and y are vectors of Rn , we
then have:
as desired.
(ii) (x + z) _ (y + z) = (x _ y) + z and (x + z) ^ (y + z) = (x ^ y) + z;
Proof We prove only the rst equality in (i) and leave the other properties to the reader.
For each component i we have
as desired.
Joins and meets permit to associate three positive vectors to a vector x in Rn : its positive
part x+ , its negative part x and its modulus jxj, de ned via the formulas
x+ = x _ 0 , x = (x ^ 0) and jxj = x _ ( x)
The modulus extends to Rn the absolute value by considering its order characterization (4.4),
while the norm extended it using the algebraic characterization. In terms of components, we
have
x+
i = max fxi ; 0g , xi = min fxi ; 0g and jxi j = max fxi ; xi g
In words, the components of x+ coincide with the positive ones of x and are 0 otherwise.
Similarly, the components of x coincide with the negative ones of x and are 0 otherwise.
In contrast, the components of jxj are the absolute values of the components of x (for this
reason, jxj is often called the absolute value of x). For instance, for x = ( 1; 2; 4) 2 R3 we
have
x+ = (0; 2; 4) , x = (1; 0; 0) and jxj = (1; 2; 4)
Next we report some, easily checked, properties of these three vectors.
(i) x = x+ x and x+ ^ x = 0;
20.1. JOINS AND MEETS 623
(ii) jxj = x+ + x .
The decomposition
x = x+ x (20.2)
is thus the minimal decomposition of a vector x. It has the natural monotonicity property
(iii). The next example illustrates.
Example 919 We can construct a portfolio x also by buying and selling according to any
pair of positive vectors x0 and x00 such that x = x0 x00 . In the last example we noted
that to form portfolio x = (1; 2; 3) one has to buy and sell the amounts prescribed by
x+ = (1; 2; 0) and x = (0; 0; 3), respectively. At the same time, this portfolio can be also
formed by buying an extra unit of the third asset and by selling the same extra unit of that
asset. In other words, we have that x = x0 x00 , where x0 = (1; 2; 1) and x00 = (0; 0; 4). By
Proposition 918-(i), we have
x+ x0 and x x00
So, the positive and negative parts represent the minimal holdings of the primary assets
needed to construct portfolio x. By Proposition 918-(iii), larger portfolios necessarily involve
larger short and long positions. N
624 CHAPTER 20. SUPERMODULAR FUNCTIONS
The reader can prove these inequalities by rst establishing them for absolute values,
something that for properties (ii) and (iii) was actually done in Section 4.1.2. We close this
section by showing that moduli and norms are consistent in their ranking of vectors.
Proof Just observe that jxj jyj implies jxi j jyi j, and so x2i yi2 , for each i = 1; :::; n.
kjxjk = kxk 8x 2 Rn
20.2 Lattices
Joins and meets permit to introduce lattices, an important class of sets.
De nition 922 A set L of Rn is a lattice if, for any two elements x and y of L, both x _ y
and x ^ y belong to L.
Lattices are, thus, subsets L of Rn that are closed under joins and meets, that is, both
the join and the meet of any its two elements belongs to L.
Example 923 (i) Given any x; y 2 Rn , the quadruple fx; y; x _ y; x ^ yg is the simplest
example of a nite lattice. (ii) Given any a; b 2 Rn , with a b, the interval
[a; b] = fx 2 Rn : a x bg
Example 925 (i) Functions of a single variable are modular. Indeed, let x; y 2 R with, say,
x y. Then, x ^ y = x and x _ y = x, so modularity trivially holds. (ii) Linear functions
f : Rn ! R are modular: by (20.1) we have
f (x _ y) + f (x ^ y) = f (x _ y + x ^ y) = f (x + y) = f (x) + f (y)
Interestingly, the modularity notions just introduced have no bite on functions of a single
variable, so they are of interest only in the multivariable case. That said, the next two results
show how to manufacture supermodular functions via convex transformations.
Example 927 Let f : Rn ! R be a positive linear function. Given any convex function
' : R ! R, the function ' f is supermodular. N
Proof Let x; y 2 I with, say, f (x) f (y). Since f is increasing, we have f (x ^ y) f (y)
f (x) f (x _ y). Set k = f (x _ y) f (x) f (y) f (x ^ y) = h. Since f is supermodular,
we have k h 0. Since ' has increasing increments, we then have
' (f (y)) ' (f (x ^ y)) = ' (f (x ^ y) + h) ' (f (x ^ y)) ' (f (x) + h) ' (f (x))
' (f (x) + k) ' (f (x)) = ' (f (x _ y)) ' (f (x))
where the last inequality holds because ' is increasing. So, ' f is supermodular.
Example 931 Consider the function f : [1; 1) [3; 1) [2; 1) ! R de ned by f (x1 ; x2 ; x3 ) =
p
(x1 1) (x2 3) (x3 2). For a xed x1 1, the section f x1 : [3; 1) [2; 1) ! R now has
x2 and x3 as the independent variables { indeed, we have x 1 = (x2 ; xp 3 ). For instance, if x1 =
5 5
5 the section f : [3; +1) [2; +1) ! R is de ned by f (x2 ) = 2 (x2 3) (x3 2). In a
similar way we can de ne the sections f x2 : [1; 1) [2; 1) ! R and f x3 : [1; 1) [3; 1) ! R.
On the other hand, if we x x 1 = (x2 ; x3 ) 2 [3; 1) [2; 1), we have the section
f x2 ;x3 : [1; 1) ! R that has x1 as the independent variable. Forpinstance,
p if x2 = 6 and
x3 = 10, the section f 6;4 : [1; 1) ! R is de ned by f 6;4 (x1 ) = 2 8 x1 1. In a similar
way we can de ne the sections f x1 ;x3 : [3; 1) ! R and f x1 ;x2 : [2; 1) ! R. N
2
Recall the notation x i from Section 14.1. Here A i is the Cartesian products of all sets fA1 ; :::; An g
except Ai , i.e., A i = j6=i Aj .
20.4. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 627
The sections f x i can be used to formalize ceteris paribus arguments in which all variables
are kept xed, except xi . Indeed, partial derivation at a point x 2 Rn can be expressed in
terms of these sections:
@f (x) fx i (xi + h) fx i (xi )
= lim
@xi h!0 h
In sum, we have sections f xi in which the variable xi is kept xed and the other variables
vary, as well as a section f x i in which the opposite holds: the variable xi is the only
independent variables, the other ones being kept xed. In a similar spirit we can have
\intermediate" sections in which we block a subset of the variables.
Though this notation is more handy, superscripts best emphasize the parametric role of the
blocked variables.
De nition 933 A function f : I Rn ! R has increasing (cross) di erences if, for each
xi 2 Ii and hi 0 with xi + hi 2 Ii , the di erence
f xi +hi (x i ) f xi (x i )
Increasing and decreasing di erences are dual notions, so we will focus on the former.
For functions of two variables, we have a simple characterization of this property.
and
f x2 +h2 (x1 ) f x2 (x1 ) f x2 +h2 (x1 + h1 ) f x2 (x1 + h1 ) (20.5)
which are both equal to (20.3).
So, symmetrically, an increase in the rst input has a higher impact when also the second
input increases. In sum, the marginal contribution of an input is increasing with the other
input: the two inputs are complementarity.
Proposition 935 A function f : I Rn ! R has increasing di erences if and only if, for
each 1 i 6= j n, the section f x ij : I ij R2 ! R satis es (20.3), i.e.,
fx ij
(xi ; xj + hj ) fx ij
(xi ; xj ) fx ij
(xi + hi ; xj + hj ) fx ij
(xi + hi ; xj ) (20.6)
In terms of the previous interpretation, we can say that a production function has increas-
ing di erences if and only if its inputs are pairwise complementary. Increasing di erences
thus model this form of complementarity. In a dual way, decreasing di erences model an
analog form of substitutability.
Proof Assume that f has increasing di erences. To x ideas, let i = 1 and j = 2. We want
to show that
fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) fx 12
(x1 + h1 ; x2 + h2 ) fx 12
(x1 + h1 ; x2 )
We have
fx 12
(x1 ; x2 + h2 ) fx 12
(x1 ; x2 ) = f (x1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 ; x2 ; x3 ; :::; xn )
x2 +h2 x2
=f (x1 ; x3 ; :::; xn ) f (x1 ; x3 ; :::; xn )
x2 +h2
f (x1 + h1 ; x3 ; :::; xn ) f x2 (x1 + h1 ; x3 ; :::; xn )
= f (x1 + h1 ; x2 + h2 ; x3 ; :::; xn ) f (x1 + h1 ; x2 ; x3 ; :::; xn )
x x
=f 12
(x1 + h1 ; x2 + h2 ) f 12
(x1 + h1 ; x2 )
20.4. FUNCTIONS WITH INCREASING CROSS DIFFERENCES 629
as desired. The general case is analogous, just notationally cumbersome. So, (20.6) holds.
We omit the proof of the converse.
Example 937 (i) Let f : R2+ ! R be a CES production function de ned by f (x) =
1
( x1 + (1 ) x2 ) with 2 [0; 1] and > 0 (cf. Example 865). We have
@f (x) 1 1
2
= (1 ) (1 ) (x1 x2 ) ( x1 + (1 ) x2 )
@x1 @x2
By the previous result, f has decreasing di erences if > 1 and increasing di erences
if 0 < < 1. So, the parameter determines whether the inputs in the CES pro-
duction functions are complements or substitutes. (ii) Let f : R2+ ! R be a Cobb-
Douglas production function f (x) = x1 1 x2 2 , with 1 ; 2 > 0 (cf. Example 869). Since
@f (x) =@x1 @x2 = 1 2 x1 1 1 x2 2 1 , by the previous result f has increasing di erences (so,
its inputs are complements). N
630 CHAPTER 20. SUPERMODULAR FUNCTIONS
A function f of several variables is easily seen to admit the following \telescopic" expan-
sion: if x y, then
f (y) f (x) = f (y1 ; x2 ; :::; xn ) f (x1 ; :::; xn ) + f (y1 ; y2 ; x3 ; :::; xn ) f (y1 ; x2 ; x3 ; :::; xn )
+ + f (y1 ; :::; yn ) f (y1 ; :::; yn 1 ; xn )
n
X
= f (y1 ; :::; yi ; xi+1 ; :::; xn ) f (y1 ; :::; yi 1 ; xi ; :::; xn )
i=1
Proof \If". Suppose that f has increasing di erences. Let x; y 2 I. By (20.1), we can set
f (x _ y) f (x) = f (x + h) f (x)
Xn
= f (x1 + h1 ; :::; xi + hi ; xi+1 ; :::; xn ) f (x1 + h1 ; :::; xi 1 + hi 1 ; xi ; :::; xn )
i=1
n
X
= f xi +hi (x1 + h1 ; :::; xi+1 ; :::; xn ) f xi (x1 + h1 ; :::; xi 1 + hi 1 ; xi+1 ; :::; xn )
i=1
Xn
f xi +hi (y1 ; :::; yi 1 ; yi+1 hi+1 ; :::; yn hn ) f xi (y1 ; :::; yi 1 ; yi+1 hi+1 ; ::
i=1
Xn
= f (y1 ; :::; yi 1 ; xi + hi ; yi+1 hi+1 ; :::; yn hn ) f (y1 ; :::; yi 1 ; xi ; yi+1 hi
i=1
Xn
= f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(xi + hi ) f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(xi )
i=1
Xn
f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(yi ) f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn
(yi hi )
i=1
Xn
= f (y1 ; :::; yi; ; yi+1 hi+1 ; :::; yn hn ) f (y1 ; :::; yi 1 ; yi hi ; :::; yn hn )
i=1
= f (y) f (y h) = f (y) f (x ^ y)
20.5. SUPERMODULARITY AND CONCAVITY 631
where the rst inequality follows from increasing di erences, while the second one holds
because a function of a single variable { like the section f y1 ;:::;yi 1 ;yi+1 hi+1 ;:::;yn hn { is
trivially supermodular.
\Only if". Suppose that f is supermodular. In view of Proposition 935, it is enough to
show that 20.6 holds. Let y = (xi ; xj + hj ; x ij ) 2 Rn and z = (xi + hi ; xj ; x ij ) 2 Rn , so
that z _ y = (xi + hi ; xj + hj ; x ij ) and z ^ y = x. By the supermodularity of f , we then
have
fx ij
(xi + hi ; xj + hj ) + f x ij
(xi ; xj ) = f (xi + hi ; xj + hj ; x ij ) + f (x) = f (z _ y) + f (z ^ y)
f (z) + f (y) = f (xi + hi ; xj ; x ij ) + f (xi ; xj + hj ; x ij )
x x
=f ij
(xi + hi ; xj ) + f ij
(xi ; xj + hj )
as desired.
For production functions, this means that, under constant returns to scale, complemen-
tarity implies concavity.
Proof We only prove the result when f is twice di erentiable on Rn++ . Let x; y 2 Rn+ . From
yi yj 2
yi2 yj2 yi yj
= + 2
xi xj x2i x2j xi xj
it follows that
2
1 xj 1 xi 1 yi yj
yi yj = yi2 + yj2 xi xj
2 xi 2 xj 2 xi xj
So,
0 1
X n
X Xn X 2
@f (x) y2 i @ @f (x) A 1 @f (x) yi yj
y i yj = xj xi xj
@xi @xj xi @xi @xj 2 @xi @xj xi xj
1 i;j n i=1 j=1 1 i;j n
that is,
n
X @f (x)
xi = 0 8x 2 Rn++
@xi @xj
i=1
X @f (x) 1 X @f (x) yi yj 2
yi yj = xi xj
@xi @xj 2 @xi @xj xi xj
1 i;j n 1 i;j n
X @f (x) yi yj 2
= xi xj 0
@xi @xj xi xj
1 i6=j n
where the last inequality follows from (20.7) and Theorem 938. The Hessian matrix of f is
thus negative semi-de nite for all x 2 Rn++ and so f is concave on Rn++ . The reader can
check that the converse holds when n = 2.
Example 940 Let f : R2+ ! R be the positively homogeneous function de ned by f (x) =
1
(x1 1 x2 2 ) 1 + 2 , with 1 ; 2 > 0. It is supermodular if 1 + 2 1 (why?), so it is concave
by Choquet's Theorem. N
A similar result holds for translation invariant functions (we omit the proof of this note-
worthy result).
f ( x + (1 ) y) [f (x)] [f (y)]1
for every x; y 2 C and 2 [0; 1], and it is said to be log-concave if the inequality is reversed.
Proof We prove the convex version, the concave one being similar. \If". Let log f be convex.
In view of Proposition 46, we have
as desired.
2
Example 944 (i) The function f : R ! (0; 1) given by f (x) = ex is log-convex. (ii)
2
The Gaussian function f : R ! (0; 1) de ned by f (x) = e x is log-concave. (iii) The
exponential function is both log-concave and log-convex. N
Log-convexity is much better behaved than log-concavity, as the next result and example
show. They are far from being dual notions.
Proposition 945 (i) Log-convex functions are convex. (ii) Concave functions are log-
concave functions, which in turn are quasi-concave.
Proof (i) Let f be long-convex. Since log f is convex, the result follows from the convex
version of Proposition 844-(i) because we can write f = elog f . (ii) Obvious.
Example 946 The quadratic function f : (0; 1) ! (0; 1) de ned by f (x) = x2 is, at the
same time, strictly convex and log-concave. Indeed, in view of the last lemma, it is enough
to note that log f (x) = 2 log x is concave. So, the converse of point (i) of the last proposition
fails (there exist convex functions that are not log-convex), while point (ii) is all we can say
about log-concave functions (they can even be strictly convex). N
It is easy to check that the product of log-convex functions is log-convex, as well as that
the product of log-concave functions is log-concave. Addition, instead, does not preserve
log-concavity.
d2 x 2x e x
log e + e = >0
dx2 (1 + e x )2
As a further proof of the much better behavior of log-convexity, we have the following
remarkable result that shows that addition preserves log-convexity (we omit the proof).
Example 949 Given n strictly positive scalars ti > 0 and a strictly positive function ' :
(0; 1) ! (0; 1), de ne f : C ! (0; 1) by
n
X
f (x) = ' (ti ) txi
i=1
where C any interval of the real line, bounded or not. By Artin's Theorem, f is log-convex.
Indeed, each function ' (ti ) txi is log-convex in x because log ' (ti ) txi = log ' (ti ) + x log ti is
a ne in x.
An integral version of Artin's Theorem actually permits to conclude that if ' is contin-
uous, then the function f : C ! (0; 1) de ned by
Z 1
f (x) = ' (t) tx 1 dt
0
is log-convex (provided the improper integrals are well-de ned for all x 2 C). In this regard,
note that the function ' (t) tx 1 is log-convex in x since log ' (t) tx 1 = log ' (t)+(x 1) log t
is a ne in x. In the special case ' (t) = e t and C = (0; 1), the function f is the classic
gamma function Z 1
(x) = tx 1
e t dt
0
We will consider this log-convex function later in the book (Chapter 30). N
Chapter 21
Correspondences
Example 950 (i) The correspondence ' : R R given by ' (x) = [ jxj ; jxj] associates to
each scalar x the interval [ jxj ; jxj]. For instance, ' (1) = ' ( 1) = [ 1; 1] and ' (0) = f0g.
(ii) The budget correspondence B : Rn+ R+ Rn+ de ned by B (p; w) = x 2 Rn+ : p x w
associates to each pair (p; w) of prices and income the corresponding budget set.
(iii) Given a concave function f : Rn ! R, the superdi erential correspondence @f :
R n Rn has as image @f (x) the superdi erential of f at x (cf. Proposition 1524 later
in the book). The superdi erential correspondence generalizes for concave functions the
derivative operator rf : Rn ! Rn de ned in (27.6).
(iv) Let f : X ! Y be a function between any two sets X and Y . The inverse corre-
spondence f 1 : Im f X is de ned by f 1 (y) = fx 2 X : f (x) = yg. If f is injective, we
get back to the inverse function f 1 : Im f ! Y . For instance, if f : R ! R is the quadratic
function f (x) = x2 , then Im f = [0; 1) and so the inverse correspondence f 1 : [0; 1) R
is de ned by
p p
f 1 (y) = f y; yg
for all y 0. Recall that in Example 179 we argued that this rule does not de ne a function
since, to each strictly positive scalar, it associates two elements of the codomain, i.e., its
positive and negative square roots.
635
636 CHAPTER 21. CORRESPONDENCES
(v) All correspondences considered in this example are viable. Yet, in the last point we
can equivalently write f 1 : Y X and then say that dom f 1 = Im f . For instance, the
inverse of the quadratic function can be equivalently written as f 1 : R R with
( p p
1
y; y if y 0
f (y) =
; else
and dom f 1 = [0; 1). Whether to specify right away the domain, as we did when writing
f 1 : [0; 1) R, or not, as we just did by writing f 1 : R R, is purely a matter of
convenience and depends on the problem at hand. N
The graph Gr ' of a viable correspondence ' : X Y is the set
Gr ' = f(x; y) 2 X Y : y 2 ' (x)g
Like the graph of a function, the graph of a correspondence is a subset of X Y . If ' is a func-
tion, we get back to the notion of graph of a function Gr ' = f(x; y) 2 X Y : y = ' (x)g.
Indeed, condition y 2 ' (x) reduces to y = ' (x) when each image ' (x) is a singleton.
Example 951 (i) The graph of the correspondence ' : R R given by ' (x) = [ jxj ; jxj]
is Gr ' = (x; y) 2 R2 : jxj y jxj . Graphically:
From now on we consider viable correspondences ' : A Rm that have as domain a subset
A of Rn and as codomain Rm . We say that such a ' is:
Functions are, trivially, both compact-valued and convex-valued because singletons are
compact convex sets. Let us see an important economic example.
Example 952 Suppose that the consumption set A is both closed and convex, say it is Rn+ .
Then, the budget correspondence is convex-valued, as well as compact-valued if p 0 and
n
w > 0, that is, when restricted to R++ R++ (cf. Propositions 991 and 992). N
The converse implications are false: closedness and convexity of the graph of ' are
signi cantly stronger assumptions than the closedness and convexity of the images ' (x).
This is best seen by considering scalar functions, as we show next.
(
x if x < 0
f (x) =
1 if x 0
The lack of convexity is obvious. To see that Gr ' is not closed observe that the origin is a
boundary point that does not belong to Gr '.
(ii) A scalar function f : R ! R has convex graph if and only if it is a ne (i.e., it is a
straight line). The \if" is obvious. As to the \only if," suppose that Gr f R2 is convex.
Given any x; y 2 R and any 2 [0; 1], then ( x + (1 ) y; f (x) + (1 ) f (y)) 2 Gr f ,
that is, f ( x + (1 ) y) = f (x) + (1 ) f (y), proving f is a ne. By Proposition 820,
this implies that there exist m; q 2 R such that f (x) = mx + q. We conclude that all scalar
functions that are not a ne are convex-valued but do not have convex graphs. N
Example 954 The budget correspondence is bounded if the consumption set A is bounded.
Indeed, by de nition B (p; w) A for all (p; w) 2 Rn+ R+ . N
21.3 Hemicontinuity
There are several notions of continuity for correspondences. For bounded correspondences,
the main class of correspondences for which continuity will be needed (cf. Section 41.4), the
following notions are adequate.
Formally, let xn ! 1 and y 2 ' (1) = f1=2g, that is, y = 1=2. If we take, for instance,
yn = 1=2 2 ' (xn ) for all n, we have yn ! y. In contrast, ' is not upper hemicontinuous at
x = 1 (where an \abrupt shrink" in the graph occurs). For example, consider the sequences
xn = 1 1=n and yn = 1=4. It holds xn ! 1 and yn 2 ' (xn ), but yn trivially converges to
1=4 2
= ' (1) = f1=2g. Finally, ' is easily seen to be continuous on [0; 1). N
Formally, if xn ! 1, yn ! y and yn 2 ' (xn ) = [1; 2], then y 2 [1; 2] ' (1). In contrast, '
is not lower hemicontinuous at x = 1 (where an \abrupt dilation" in the graph occurs). For
example, consider the sequence xn = 1 1=n and y = 3. It holds xn ! 1 and y 2 ' (1), but
there is no sequence fyn g such that yn 2 ' (xn ) that converges to y. Finally, ' is easily seen
to be continuous on [0; 1). N
The next two results further clarify the nature of upper hemicontinuous correspondences.
Proof Suppose Gr ' is closed. Let xn ! x, yn ! y and yn 2 ' (xn ). Since (xn ; yn ) ! (x; y)
and Gr ' is a closed set, we have (x; y) 2 Gr ', yielding that y 2 ' (x). We conclude that '
is upper hemicontinuous.
As to the converse, assume that the domain A is closed and ' : A Rm is upper
hemicontinuous. Let f(xn ; yn )g Gr ' be such that (xn ; yn ) ! (x; y) 2 Rn Rm . To show
that Gr ' is closed, we need to show that (x; y) 2 Gr '. Since A is closed, xn ! x 2 A.
By construction, we also have that yn ! y and yn 2 ' (xn ) for every n. Since ' is upper
hemicontinuous, we have y 2 ' (x), proving that (x; y) 2 Gr ' and that Gr ' is closed.
Proof Let x 2 A. We need to show that ' (x) is a closed set. Consider fyn g ' (x) to be
such that yn ! y 2 Rm . De ne fxn g A to be such that xn = x for every n. It follows
that xn ! x, yn ! y and yn 2 ' (xn ) for every n. Since ' is upper hemicontinuous, we can
conclude that y 2 ' (x), yielding that ' (x) is closed.
For bounded functions the two notions of hemicontinuity are equivalent to continuity.
(i) f is continuous at x;
Proof First observe that, being f a function, y = f (x) amounts to y 2 f (x), when we look
at the function f as a single-valued correspondence.
(i) implies (ii). Let xn ! x and y = f (x). Since f is a function, we can only choose
fyn g to be such that yn = f (xn ). By continuity, yn = f (xn ) ! f (x) = y, so f is lower
hemicontinuous at x.
(ii) implies (iii). Let xn ! x and fyn g such that yn 2 f (xn ) and yn ! y. Since
f is a function, we can only choose fyn g to be such that yn = f (xn ). Since f is lower
642 CHAPTER 21. CORRESPONDENCES
De nition 962 Given any two sets A and B in Rn , their sum A + B is the set in Rn such
that
A + B = fx + y : x 2 A and y 2 Bg
Example 963 (i) The sum of the unit square A = [0; 1] [0; 1] and of the singleton B =
f(3; 3)g is the square A + B = [3; 4] [3; 4]. (ii) The sum of the squares A = [0; 1]
[0; 1] and B = [2; 3] [2; 3] is the square A + B = [2; 4] [2; 4]. Note that B A+B
since 0 2 A. (iii) The sum of the sides A = f(x1 ; x2 ) 2 [0; 1] [0; 1] : x1 = 0g and B =
f(x1 ; x2 ) 2 [0; 1] [0; 1] : x2 = 0g of the unit square is the unit square itself, i.e., A + B =
[0; 1] [0; 1]. (iv) The sum of the vertical A = (x1 ; x2 ) 2 R2 : x1 = 0 and horizontal
B = (x1 ; x2 ) 2 R2 : x2 = 0 axes is the entire plane, i.e., A + B = R2 . N
Proof (i) Let A and B be convex. Let v; w 2 A + B and 2 [0; 1]. By de nition, there
exist x0 ; x00 2 A and y 0 ; y 00 2 B such that v = x0 + y 0 and w = x00 + y 00 . Since A and B are
convex, we have that x0 + (1 ) x00 2 A and y 0 + (1 ) y 00 2 B. This implies that
of n sets Ai in Rn . Properties (i) and (iii) just established for the sum of two sets continue
to hold for sums of n sets.
De nition 965 Given a scalar 2 R and a set A in Rn , their product A is the set in Rn
such that A = f x : x 2 Ag.
Example 966 The product of the unit square A = [0; 1] [0; 1] and of = 2, is the square
2A = [0; 2] [0; 2]. N
De nition 967 Given any two correspondences '; :A Rm , their sum ' + :A Rm
is the correspondence such that
(i) if ' and are bounded and upper hemicontinuous at a point, their sum '+ is
upper hemicontinuous at that point;
(ii) if ' and are lower hemicontinuous at a point, their sum ' + is lower hemicon-
tinuous at that point.
Proof It is enough to consider the case = = 1, as the general case then easily follows.
(i) Suppose that at x we have xn ! x, yn ! y and yn 2 (' + ) (xn ). We want to show
that y 2 (' + ) (x). By de nition, for each n there exist yn0 2 ' (xn ) and yn00 2 (xn ) such
that yn = yn0 + yn00 . Since ' and are bounded, there exist compact sets K' and K such
that fyn0 g K' and fyn00 g K . Hence, both sequences are bounded, so by the Bolzano-
Weierstrass' Theorem there exist subsequences yn0 k and yn00k that converge to some points
y 0 2 Rm and y 00 2 Rm , respectively. Since yn0 k 2 ' (xnk ) and yn00k 2 (xnk ) for every k and
xnk ! x, we then have y 0 2 ' (x) and y 00 2 (x) because ' and are upper hemicontinuous
at x. We conclude that y = limk!1 ynk = limk!1 yn0 k + yn00k = y 0 + y 00 2 (' + ) (x), as
desired.
(ii) Suppose that at x we have xn ! x and y 2 (' + ) (x). We want to show that there
exist elements yn 2 (' + ) (xn ) such that yn ! y. By de nition, there exist y 0 2 ' (x)
and y 00 2 (x) such that y = y 0 + y 00 . Since ' and are lower hemicontinuous, there exist
elements yn0 2 ' (xn ) and yn00 2 (xn ) such that yn0 ! y 0 and yn00 ! y 00 . Setting yn = yn0 + yn00
we then have yn 2 (' + ) (xn ) and yn = yn0 + yn00 ! y 0 + y 00 = y, as desired.
21.5. COMBINING CORRESPONDENCES 645
Optima
647
Chapter 22
Optimization problems
22.1 Generalities
Consider the function f : R ! R given by f (x) = 1 x2 , with graph:
4 y
3
0
O x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
It is immediate to see that f attains its maximum value, equal to 1, at the point x = 0, that
is, at the origin (Example 251). On the other hand, there is no point at which f attains a
minimum value.
649
650 CHAPTER 22. OPTIMIZATION PROBLEMS
Suppose that, for some reason, we are interested in the behavior of f only on the interval
[1; 2], not on the entire domain R. Then f has 0 as maximum value, attained at the point
x = 1, while it has 3 as minimum value, attained at the point x = 2. Graphically:
4 y
3
1
1 2
0
O x
-1
-2
-3 -3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
(i) the distinction between maximum (minimum) value and maximizers (minimizers): a
maximizer is an element of the domain at which the function reaches its maximum
value, the unique element of the codomain which is the image of all maximizers;1
(ii) the importance of the subset of the domain in which we are interested in establishing
the existence of maximizers or minimizers.
These two observations lead to next de nition, in which we consider an objective function
f and a subset C of its domain, called choice set.
f (^
x) f (x) 8x 2 C (22.1)
The value f (^
x) of the function at x
^ is called ( global) maximum value of f on C.
In the special case C = A when the choice set is the entire domain, the point x
^ is called
maximizer, without further speci cation (in this way, we recover the de nition of Section
6.6).
In the initial example we considered two cases:
(i) in the rst case, C was the entire domain, that is, C = R, and we had x
^ = 0 and
f (^
x) = max f (R) = 1;
(ii) in the second case, C was the interval [1; 2] and we had x
^ = 1 and f (^
x) = max f ([1; 2]) =
0.
1
As already anticipated in Section 6.6.
22.1. GENERALITIES 651
The maximum value of the objective function f on the choice set C is, thus, nothing but
the maximum of the set f (C) in the real line, i.e.,2
f (^
x) = max f (C)
By Proposition 36, the maximum value is unique. We denote this unique value by
max f (x)
x2C
The maximizers may, instead, be not unique. Their set, called solution set, is denoted by
arg maxx2C f (x), that is,3
with graph
2
y
1.5
0.5
-1 1
0
O x
-0.5
-1
-1.5
-2
-3 -2 -1 0 1 2 3
we have maxx2R f (x) = 0 and arg maxx2R f (x) = [ 1; 1]. So, the set of maximizers is
the entire interval [ 1; 1]. On the other hand, if we restrict ourselves to [1; 1), we have
maxx2[1;1) f (x) = 0 and arg maxx2[1;1) f (x) = f1g, so 1 is the unique maximizer of f on
2
Recall that f (C) = ff (x) : x 2 Cg R is the set (6.1) of all images of the points that belong to C.
3
A convenient shorthand notation is maxC f and arg maxC f . Though we will not use it, readers may
want to familiarize with it by trying to use it by themselves.
652 CHAPTER 22. OPTIMIZATION PROBLEMS
y
1.5
0.5
1
0
O x
-0.5
-1
-1.5
-2
-3 -2 -1 0 1 2 3
f (^
x) > f (x)
Until now we have talked about maximizers, but analogous considerations hold for min-
imizers. For example, in De nition 969 an element x
^ 2 C is a (global) minimizer of f on C
22.1. GENERALITIES 653
if f (^
x) f (x) for every x 2 C, with (global ) minimum value f (^ x) = min f (C), denoted
by minx2C f (x). Maximizing and minimizing are actually two sides of the same coin, as
formalized by the next result. Its obvious proof is based on the observation that
f (x) f (y) () f (x) f (y)
for all x; y 2 A.
Proposition 972 Let f : A Rn ! R be a real-valued function and C a subset of A. A
point x
^ 2 C is a minimizer of f on C if and only if it is a maximizer of f on C, and it is
a maximizer of f on C if and only if it is a minimizer of f on C. In particular,
min f (x) = max ( f ) (x) and max f (x) = min ( f ) (x)
x2C x2C x2C x2C
For example, the minimizers of the function f : R ! R given by f (x) = x2 1 are the
maximizers for the function ( f ) (x) = 1 x2 seen at the beginning of the section.
Thus, between maximizers and minimizers there is a natural duality that makes the re-
sults of one case a simple dual version of the other. From a mathematical viewpoint, the
choice of which of these two equivalent problems to study is only a question of convenience
bearing no conceptual relevance. Given their great importance in economic applications,
throughout the book we focus on the properties of maximizers, leaving the analogous prop-
erties for minimizers to the reader. In any case, the neutral term extremal refers to both
maximizers and minimizers, with optimum value referring to both maximum and minimum
values.
4 y
3
0
O
1 2 x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
f (x) 1 8x 2 R
Finally, f is unbounded below and so has no minimizers; i.e., arg minx2R f (x) = ;. N
Since f (x1 ; x2 ) = x21 6x1 x2 + 9x22 + 3x22 = (x1 3x2 )2 + 3x22 , the function f is the sum
of two squares and so is positive. Since it assumes value 0 only at the origin 0 = (0; 0), we
conclude that the origin is the unique minimizer of f on R2 . Its minimum value is the scalar
0. In symbols:
arg min f (x) = f0g and min f (x) = 0
x2R2 x2R2
Finally, f is unbounded above and so has no maximizers; i.e., arg maxx2R2 f (x) = ;. N
22.1. GENERALITIES 655
Example 975 Let f : R3 ! R be given by f (x) = e x21 x22 x23 for all x = (x1 ; x2 ; x3 ) 2 R3
and consider the optimization problem
Since 0 < f (x1 ; x2 ; x3 ) 1 for all (x1 ; x2 ; x3 ) 2 R3 and f (0; 0; 0) = 1, the origin 0 = (0; 0; 0)
is a maximizer of f on R3 . It is actually the unique maximizer (why?). The maximum value
of f on R3 is the scalar 1. In symbols:
However, f does not have a minimizer { so arg minx2R3 f (x) = ; { because it never attains
the in mum of its values, that is, 0. N
Since 1 cos x 1, all the points at which f (x) = 1 are maximizers and all the points
at which f (x) = 1 are minimizers. The maximizers are, therefore, the points 2k with
k 2 Z and the minimizers are the points (2k + 1) with k 2 Z. The maximum and minimum
values are the scalars 1 and 1, respectively. In symbols:
as well as
max f (x) = 1 and min f (x) = 1
x2R x2R
These maximizers and minimizers on R are not unique. However, if we consider a smaller
choice set, such as C = [0; 2 ), we will nd that the unique maximizer is the origin 0 and
the unique minimizer is the point . N
Example 977 For a constant function, all the points of the domain are simultaneously
maximizers and minimizers. Its constant value is simultaneously the maximum and minimum
value. N
O.R. (i) De nition 969 does not require the function to satisfy any special property; in
particular, neither continuity nor di erentiability are invoked. For example, the function
f : R ! R given by f (x) = jxj attains its minimum value at the origin, where it is not
di erentiable. The function f : R ! R given by
(
x + 1 if x 1
f (x) =
x if x > 1
656 CHAPTER 22. OPTIMIZATION PROBLEMS
with graph
4
y
3
0
O 1 x
-1 -1
-2
-3
-4
-4 -3 -2 -1 0 1 2 3 4
y
4
0
4
O 1 x
-2
-4 -4
-6
-6 -4 -2 0 2 4 6
attains its maximum value at x ^ = 2, an isolated point of the domain ( 1; 1] [ f2g [ (4; +1)
of f .
(ii) As we have already observed, the maximum value of f : A Rn ! R on C A
is nothing but max f (C). It is a value actually attained by f , that is, there exists a point
x
^ 2 C such that f (^x) = max f (C). We can, therefore, choose a point in C of f to \attain"
the maximum.
When the maximum value does not exist, the image set f (C) might still have a nite
supremum sup f (C). The unpleasant aspect is that there might well be no point in C that
attains such a value, that is, we might not be able to attain it. Pragmatically, this aspect is
less negative than it might appear prima facie. Indeed, as Proposition 127 indicates, we can
22.1. GENERALITIES 657
choose a point at which f is arbitrarily close to the sup. If sup f (C) = 48, we will never be
able to get exactly 48, but we can get arbitrarily close to it: we can always choose a point at
which the function has value 47; 9 and, if this is not enough, we can get a point at which f
takes value 47; 999999999999 and, if this is not enough.... Similar remarks hold for minimum
values. H
and
max (g f ) (x) sub x 2 C (22.4)
x
Thus,
f (x) f (y) () g (f (x)) g (f (y)) 8x; y 2 A
| {z } | {z }
s t
Therefore,
8x 2 C; f (^
x) f (x) () 8x 2 C; (g f ) (^
x) (g f ) (x)
Thus, two objective functions { here f and f~ = g f { are equivalent when they are a
strictly increasing transformation one of the other.6 Later in the chapter, we will comment
more on this simple, yet conceptually important, result.
Let us now consider the case, important in economic applications (as we will soon see),
in which the objective function is strongly increasing.
Proof Let x ^ 2 arg maxx2C f (x). We want to show that x ^ 2 @C. Suppose, by contradiction,
that x^2= @C, i.e., x
^ is an interior point of C. There exists, therefore, a neighborhood B" (^
x)
of x
^ included in C. Set
" "
x^" = x^1 + ; :::; x
^1 +
2 2
Clearly, x^" 2 B" (^ x) and x ^ x
^" . Since f is strongly increasing on C, we obtain that
f (^
x" ) > f (^
x), which contradicts the optimality of x ^. We conclude that x^ 2 @C.
The possible solutions of the optimization problem (22.2) are, thus, boundary points
when the objective function is strongly increasing (a fortiori, when it is strictly increasing;
cf. Proposition 225). With this kind of objective function, we can thus simplify problem
(22.2) as follows:
max f (x) sub x 2 @C
x
y
2 1
0
O 1 x
-1
-2
-3
-2 -1 0 1 2 3 4
x0 )
because f (^ f (x) for all x 2 C 0 .
Larger choice sets always lead to higher maximum values of the objective function. In
other terms, to have more opportunities to choose from is never detrimental, whatever the
22.1. GENERALITIES 659
form of the objective function is. This simple principle of monotonicity is often important.
The basic economic principle that removing constraints on agents' choices can only bene t
them is, indeed, formalized by this proposition.
Example 981 Recall the initial example in which we considered two di erent sets of choices,
R and [1; 2], for the function f (x) = 1 x2 . We had maxx2[1;2] f (x) = 0 < 1 = maxx2R f (x),
in accordance with the last proposition. N
In contrast, adding constraints is never bene cial and may even result in empty choice
sets. For instance, suppose C 0 is the set of all points x = (x1 ; x2 ; x3 ) 2 R3 such that
x1 + x2 + x3 < 2 (22.5)
x1 x2 + x3 2 (22.6)
This choice set is not empty: for instance, the point (0; 5; 3) belongs to it. Now, let C be
the choice set that satis es these constraints as well as the additional one
x2 0 (22.7)
So, C C 0 . Yet, too many constraints: the choice set C is empty. Indeed, suppose by
contradiction that C 6= ; and let x 2 C. By (22.6) and (22.7), x1 + x3 2 + x2 2. Along
with (22.5), this implies 2 > x1 + x2 + x3 x1 + x3 2. This contradiction shows that
C = ;.
Optimization problems with concave objective functions and convex choice sets are all-
important in applications. We begin with a basic, yet remarkable, property of their solution
sets.7
Proposition 982 Let C be a convex set in Rn . If f : C ! R is concave, then arg maxx2C f (x)
is convex.
Proof Let x
^1 ; x
^2 2 arg maxx2C f (x) and let 2 [0; 1]. We want to show that x
^1 +
(1 )x
^2 2 arg maxx2C f (x). By concavity,
f( x
^1 + (1 )x
^2 ) f (^
x1 )+(1 ) f (^
x2 ) = max f (x)+(1 ) max f (x) = max f (x)
x2C x2C x2C
Therefore, f ( x
^1 + (1 )x
^2 ) = maxx2C f (x), i.e., x
^1 + (1 )x
^2 2 arg maxx2C f (x).
Since the solution set arg maxx2C f (x) is convex, there are three possibilities:
(iii) arg maxx2C f (x) consists of in nitely many points: there exist in nitely many maxi-
mizers.8
7
To ease exposition, we consider a function f de ned directly on a convex choice set C. Of course, f can
be seen as the restriction of a function de ned on a larger domain A that includes C.
8
Indeed, if x ^1 and x ^2 are two distinct maximizers, all their convex combinations x
^1 + (1 )x^2 , as
varies in [0; 1], are still maximizers because of the convexity of arg maxx2C f (x).
660 CHAPTER 22. OPTIMIZATION PROBLEMS
Thus, under concavity we cannot have nitely many distinct solutions, like 3 or 7.
Example 983 (i) The logarithmic function f : (0; 1) ! R de ned by f (x) = log x is
strictly concave. It is easy to see that it has no maximizers, that is, arg maxx>0 f (x) = ;.
(ii) The function f : R ! R de ned by f (x) = 1 x2 is strictly concave and has a unique
maximizer x ^ = 0, so that arg maxx2R f (x) = f0g.
(iii) De ne f : R ! R by
8
>
> x if x 1
<
f (x) = 1 if x 2 (1; 2)
>
>
:
3 x if x > 2
with graph
2
1.5 y
0.5
O
0
1 2 x
-0.5
-1
-1.5
-2
-2 -1 0 1 2 3 4
The last function of this example, with in nitely many maximizers, is concave but not
strictly concave. The next result shows that, indeed, strict concavity implies that maximizers,
if exist, are necessarily unique. In other words, for strictly concave functions the solution set
is at most a singleton.
Proof Let x ^1 ; x
^2 2 C be two maximizers. We want to show that x ^1 = x
^2 . Suppose, by
contradiction, that x^1 6= x
^2 . Since x
^1 and x
^2 are maximizers, we have f (^
x1 ) = f (^
x2 ) =
maxx2C f (x). Set
1 1
z= x ^1 + x ^2
2 2
Since C is convex, z 2 C. Moreover, by strict concavity,
1 1 1 1 1 1
f (z) = f x
^1 + x^2 > f (^
x1 ) + f (^
x2 ) = max f (x) + max f (x) = max f (x)
2 2 2 2 2 x2C 2 x2C x2C
(22.8)
22.1. GENERALITIES 661
which is a contradiction (no element of C can have a strictly higher value than the maximum
value). We conclude that x ^1 = x
^2 , as desired.
In the last example, f (x) = 1 x2 is a strictly concave function with a unique maximizer
x
^ = 0, while f (x) = log x is a strictly concave function that has no maximizers. The clause
\at most" is, therefore, indispensable because, unfortunately, maximizers might not exist.
To have (at most) a unique maximizer is the key characteristic of strictly concave func-
tions that motivates their widespread use in economic applications. Indeed, strict quasi-
concavity is the simplest condition which guarantees the uniqueness of the maximizer, a key
property for comparative statics exercises (as we remarked earlier in the chapter).
max x2 sub x 0
x
which features a strictly concave objective function. Clearly, the unique solution is x
^ = 0.
By plugging it in the original objective function, we get the maximum value f (^x) = 1.9 N
Example 986 Let f : R2++ ! R be de ned by f (x) = log x1 + log x2 . Consider the
optimization problem
max f (x) sub x1 + x2 = 1
x
The objective function is strictly concave and the choice set x 2 R2++ : x1 + x2 = 1 is
convex. By Proposition 984, there is at most a unique solution x
^. The problem is symmetric
9 1 1
By taking the strictly increasing transformation x 3 1 rather than x 3 we get directly x2 . A suitable
choice of the transformation thus speeds up matters.
662 CHAPTER 22. OPTIMIZATION PROBLEMS
1 1
f (y) f (^
x) = log 2y1 + log 2y2 = 2 log 2y1 + log 2y2 < 2 log (y1 + y2 ) = 2 log 1 = 0
2 2
So, x
^ indeed uniquely solves the problem. Here the maximum value is f (^
x) = log 4. N
The next three examples are a bit more complicated, but they are important in appli-
cations and show how some little thinking can save many calculations. For a given vector
= ( 1 ; :::; n ) 2 Rn++ and scalar > 0, all these examples study under di erent objective
functions f the optimization problem
Example 987 Consider problem (22.9) with a Cobb-Douglas objective function f : Rn+ ! R
de ned by
Yn
f (x) = xai i
i=1
Pn
with i=1 ai = 1 and ai > 0 for each i. It is easy to see that the maximizers belong to
Rn++ , that is, they have strictly positive components. Indeed, if x lies on some axes of Rn
{ i.e., xi = 0 for some i { then f (x) = 0. Since f 0 on C, it is easy to see that such x
cannot solve the problem. For this reason, in place of (22.9) we can consider the equivalent
optimization problem
max f (x) sub x 2 C \ Rn++ (22.10)
x
We can do better: since f > 0 on Rn++ , we can consider the logarithmic transformation
P
f~ = log f of the objective function f , that is, the log-linear function f~ (x) = ni=1 ai log xi .
The problem
max f~ (x) sub x 2 C \ Rn++ (22.11)
x
is equivalent to the previous one by Proposition 978. It is, however, more tractable because
of the log-linear form of the objective function.
Let us ponder over problem (22.11). It has strictly concave objective function and a
convex choice set C \ Rn++ . By Proposition 984, there is at most a unique solution x^. With
this, suppose rst thatP the coe cients ai and i are both equal among themselves, with
ai = 1=n (because ni=1 ai = 1) and i = 1 for each i. The problem is then symmetric
in each xi , so it is natural to guess a symmetric
P solution x ^, with x
^1 = = x^n . Then,
^i = ai for each i because of the constraint ni=1 xi = . If, instead, the coe cients di er,
x
the asymmetry in the solutions should depend on the coe cients i and ai peculiar to each
xi . An (educated) guess is that
a1 an
x
^= ; :::; (22.12)
1 n
22.1. GENERALITIES 663
^ 2 C \ Rn++ because x
Let us verify this guess. We have x ^ 2 Rn++ and
n
X n
X n
X
ix
^i = i ai = ai =
i=1 i=1 i i=1
P P
We now show that ni=1 ai log yi < ni=1 ai log x ^i for every y 2 C \ Rn++ with y 6= x
^. Since
log x is strictly concave, by Jensen's inequality (17.15) we have
n
X n
X n
X n
X n
X
yi yi i yi
ai log yi ai log x
^i = ai log < log ai = log
i=1 i=1 i=1 ai i i=1 ai i i=1
n
X
1 1
= log i yi log = log 1 = 0
i=1
Recall that f is concave but not strictly concave (cf. Example 817). Because of the symmetry
of the objective function, we again guess a symmetric solution x ^, which has then the form
x
^= Pn ; :::; Pn (22.13)
i=1 i i=1 i
because of the constraint. To verify this guess, let x 2 C be a solution of the problem, so
that f (x ) f (y) for all y 2 C. As we will see, by Weierstrass' Theorem such a solution
exists. We want to show that x = x ^. It is easy to check that, if k = (k; :::; k) 2 Rn is a
constant vector and 0 is a positive scalar, we have
f ( x + k) = f (x) + k 8x 2 Rn (22.14)
1 1 1 1
f (x ) f x + x^ = f (x ) + Pn
2 2 2 2 i=1 i
P P
So, mini=1;::;n xi = f (x ) = ni=1 i , that is, xi = ni=1 i for each i. Suppose x 6= x
^,
that is, x > x ^. Since x 2 C, we reach the contradiction
n
X n
X
= i xi > i Pn =
i=1 i=1 i=1 i
We conclude that x = x ^. The constant vector (22.13) is thus the unique solution of the
problem. Interestingly, we have a unique solution even without strict concavity. N
664 CHAPTER 22. OPTIMIZATION PROBLEMS
As the reader can easily check, once we solve problem (22.9) for a normalized choice set
with each i > 0, we can then easily retrieve the solutions when 6= 1. The use of a
normalized choice set permits to ease notation, a signi cant simpli cation.
With this motivation, here we study problem (22.9) with a normalized choice set (22.15).
We consider a convex objective function f : Rn ! R. We start by observing that the
elements of the convex choice set (22.15) can be written as a convex combination of the
vectors
1 1
e~i = ei = 0; :::; 0; ; 0; :::; 0 8i = 1; :::; n
i i
Indeed, if x 2 C then
n
X n
X n
X
i 1 i
x= xi e = i xi e = ~i
i xi e
i=1 i=1 i i=1
Pn
where i xi 0 for each i and i=1 i xi = 1 (because x 2 C). It is easy to check that each
e~i belongs to C. We are now in a position to say something about the optimization problem
(22.9). Since f is convex, we have
n
! n
X X
i
f (x) = f i xi e
~ ~i
i xi f e max f e~i
i=1;:::;n
i=1 i=1
Thus, to nd a maximizer it is enough to check which e~i receives the highest evaluation under
f . Since the vectors e~i lie on some axis of Rn , in this way we nd what in the economics
jargon are called corner solutions.
That said, there might well be maximizers that this simple reasoning may neglect. In
other words, we only showed that:
To say something more about all possible maximizers, i.e., about the set arg maxx2C f (x),
we need to assume more on the objective function f . We consider two important cases:
(i) Assume that f is strictly convex. Then, the only maximizers in C are among the
vectors e~j , that is,
arg max f (x) = arg max f (x)
e1 ;:::;~
x2f~ en g x2C
Indeed, strict convexity yields a strict inequality as soon as at least for two indexes i
we have i xi > 0, that is,
n
! n
X X
i
f (x) = f i xi e
~ < i xi f e~i
i=1 i=1
1 1 1
max x21 + x22 + x23 sub x 2 ; 0; 0 ; 0; ; 0 ; 0; 0;
x 1 2 3
For example, if 1 < 2 < 3 , then e~1 = (1= 1 ; 0; 0) is the only solution, while if
~1 = (1= 1 ; 0; 0) and e~2 = (0; 1= 2 ; 0) are the only two solutions.
1 = 2 < 3 , then e
(ii) Assume that f is a ne, i.e., f (x) = a0 +a1 x1 + +an xn . Then, the set of maximizers
consists of the vectors e~j that solve problem (22.16) and of their convex combinations
(as the reader can easily check). That is,
where left-hand side is the convex envelope of the vectors in arg maxx2f~e1 ;:::;~en g f (x),
which is a polytope. For instance, consider the problem
1 1 1
max a0 + a1 x1 + a2 x2 + a3 x3 sub x 2 ; 0; 0 ; 0; ; 0 ; 0; 0; (22.18)
x 1 2 3
For instance, if a1 = 1 > a2 = 2 > a3 = 3 , then e~1 = (1= 1 ; 0; 0) is the only solution of
problem (22.18), so of problem (22.17). On the other hand, if a1 = 1 = a2 = 2 > a3 = 3 ,
then e~1 = (1= 1 ; 0; 0) and e~2 = (0; 1= 2 ; 0) solve problem (22.18), so the polytope
t (1 t)
co e~1 ; e~2 = t~
e1 + (1 t) e~2 : t 2 [0; 1] = ; ;0 : t 2 [0; 1]
1 2
To sum up, some simple arguments show that optimization problems featuring convex
objective functions and linear constraints have corner solutions. Section 22.6.2 will discuss
these problems, which often arise in applications. N
666 CHAPTER 22. OPTIMIZATION PROBLEMS
with 2 [0; 1] and 2 (0; 1]. In this case the consumption set is A = R2+ .
(ii) Let u : R2++ ! R be the log-linear utility function
with a 2 (0; 1). Here the consumption set is A = R2++ . CES and log-linear consumers have
therefore di erent consumption sets.
(iii) Suppose that the consumer has a subsistence bundle x 0, so that he can consider
only bundles x x (in order to survive). In this case it is natural to take as consumption
set the closed and convex set
For instance, we can consider the restrictions of CES and log-linear utility functions on this
set A. N
Denote by p = (p1 ; p2 ; :::; pn ) 2 Rn+ the vector of the market prices of the goods. Suppose
that the consumer has income w 0. The budget set of the consumer
B (p; w) = x 2 Rn+ : p x w
consists of the a ordable bundles, i.e., of the bundles that he can purchase given the vector
of prices p and his income w. We write B (p; w) to highlight the dependence of the budget
set on p and on w. For example,
w w0 =) B (p; w) B p; w0 (22.20)
that is, to a greater income there corresponds a larger budget set. Analogously,
p p0 =) B (p; w) B p0 ; w (22.21)
p ( x + (1 ) y) = (p x) + (1 ) (p y) w + (1 )w = w
Interestingly, the compactness of budget sets requires the price of each good to be strictly
positive, so none of them is free.
The importance of the no-free goods condition p 0 is obvious: if some of the goods
were free (and available in unlimited quantity), the consumer could obtain any quantity of
it and the budget set would be then unbounded. Note that when w = 0 and p 0 we have
B (p; w) = f0g and so, being a singleton, the budget set is trivially compact.
Proof Let p 0. Let us show that B (p; w) is closed. Consider a sequence of bundles
xk B (p; w) such that
lim xki = +1
k!1
for some good i (i.e., such that the quantity of good i gets unboundedly larger and larger
along the sequence xk of bundles). Since p 0, in particular it holds pi > 0. As xk 0
for all k 1, we therefore have
p xk pi xki 0 8k 1
The consumer (optimization) problem consists in maximizing the consumer utility func-
tion u : A Rn+ ! R on the budget set B (p; w), that is,
Given prices and income, the intersection B (p; w) \ A of the budget set and of the consump-
tion set is the choice set of the consumer problem, consisting of the bundles that are both
a ordable and relevant for the consumer. Let us denote by C (p; w) this choice set, that is,
C (p; w) = B (p; w) \ A = fx 2 A : p x wg
668 CHAPTER 22. OPTIMIZATION PROBLEMS
Consumers with di erent consumption sets may feature di erent choice sets even when they
confront the same budget set: for instance, for a log-linear consumer we have C (p; w) =
x 2 Rn++ : p x w , while for a CES consumer we have C (p; w) = B (p; w). In particular,
the CES consumer exempli es the important case A = Rn+ when the consumer problem takes
the simpler form
max u (x) sub x 2 B (p; w) (22.23)
x
A bundle x
^ 2 C (p; w) is optimal when it solves the optimization problem (22.22), i.e.,
when
u (^
x) u (x) 8x 2 C (p; w)
In particular, maxx2C(p;w) u (x) = u (^
x) is the maximum utility that can be attained by the
consumer.
It is called the indirect utility function.10 When prices and income vary, it indicates how
varies the maximum utility that the consumer may attain.
Example 994 The unique optimal bundle for the log-linear utility function u (x) = a log x1 +
(1 a) log x2 , with a 2 (0; 1), is given by x
^1 = aw=p1 and x ^2 = (1 a) w=p2 (Example 987).
It follows that the indirect utility function associated to the log-linear utility function is
aw (1 a) w
v (p; w) = u (^
x) = a log + (1 a) log
p1 p2
= a (log a + log w log p1 ) + (1 a) (log (1 a) + log w log p2 )
= log w + a log a + (1 a) log (1 a) (a log p1 + (1 a) log p2 )
for every (p; w) 2 Rn++ R++ . N
10
Here, we are tacitly assuming that a maximizer exists for every pair (p; w) of prices and income. Later
in the chapter we will present results, namely Weierstrass' and Tonelli's theorems, that guarantee this.
22.1. GENERALITIES 669
Thanks to (22.20) and (22.21), the property of monotonicity seen in Proposition 980
takes the following form for indirect utility functions.
In other words, consumers always bene t both from a higher income and from lower
prices, regardless of their utility functions (provided they are continuous).
As previously observed (Section 6.4.4), it is natural to assume that the utility function
u : A Rn+ ! R is, at least, increasing. By Proposition 979, if we assume that u is actually
strongly increasing, the solution of the consumer problem belongs to the boundary of the
budget set. Yet, a sharper result holds because of the particular form of the budget set. To
ease matters, we assume that A = Rn+ , so the consumer problem takes the form (22.23).
Proof Let x 2 B (p; w) be such that p x < w. It is easyPto see that there exists y x such
n
that p y w. Indeed, taking any 0 < " < (w p x) = i=1 pi , it is su cient to set
Since u is strongly increasing, we have u (y) > u (x) and therefore x cannot be a solution of
the consumer problem.
The consumer therefore allocates all its income to the purchase of an optimal bundle x ^,
that is, p x^ = w.11 This property is called Walras' law. Thanks to it, in the consumer
problem with strongly increasing utility functions u : Rn+ ! R we can replace the budget set
B (p; w) with the budget line
which is the form of the consumer problem often studied in introductory courses.
Turn now to a producer who must decide the quantity y to produce of a given output. In
taking such a decision the producer must consider both the revenue r (y) that he will have
by selling the quantity y and the cost c (y) that he will bear to produce it.
Let r : [0; 1) ! R be the revenue function and c : [0; 1) ! R be the cost function of
the producer. His pro t is therefore represented by the function : [0; 1) ! R given by
The producer (optimization) problem is to maximize his pro t function : [0; 1) ! R, that
is,
max (y) sub y 0 (22.25)
y
(^
y) (y) 8y 0
while maxy2[0;1) (y) is the maximum pro t that can be obtained by the producer. The set
of the (pro t) maximizing outputs is arg maxy2[0;1) (y).
The form of the revenue function depends on the structure of the market in which the
producer sells the output, while that of the cost function depends on the structure of the
market where the producer buys the inputs necessary to produce the good. Let us consider
some classic market structures.
(i) The output market is perfectly competitive, so that its sale price p 0 is independent
of the quantity that the producer decides to produce. In such a case the revenue
function r : [0; 1) ! R is given by
r (y) = py
(ii) The producer is a monopolist on the output market. Let us suppose that the demand
function on this market is D : [0; 1) ! R, where D (y) denotes the unit price at
which the market absorbs the quantity y of the output. Usually, for obvious reasons,
we assume that the demand function is decreasing: the market absorbs greater and
greater quantities of output as its unit price gets lower and lower. The revenue function
r : [0; 1) ! R is therefore given by
r (y) = yD (y)
(iii) The input market is perfectly competitive, that is, the vectors
x = (x1 ; x2 ; :::; xn )
that are independent of the quantity that the producer decides to buy (wi is
Pthe price
n
of the i-th input). The cost of a vector x of input is thus equal to w x = i=1 wi xi .
But, how does this cost translate in a cost function c (y)?
To answer this question, assume that f : Rn+ ! R is the production function that the
producer has at his disposal to transform a vector x 2 Rn+ of input into the quantity
f (x) of output. The cost c (y) of producing the quantity y of output is then obtained
by minimizing the cost w x among all the vectors x 2 Rn+ that belong to the isoquant
1
f (y) = x 2 Rn+ : f (x) = y
22.1. GENERALITIES 671
that is, among all the vectors that allow to produce the quantity y of output. Indeed,
in terms of production the inputs in f 1 (y) are equivalent and so the producer will
opt for the cheaper ones. In other terms, the cost function c : [0; 1) ! R is given by
c (y) = min w x
x2f 1 (y)
that is, it is equal to the minimum value of the minimization problem for the cost w x
on the isoquant f 1 (y). Since the linear objective function w x is continuous, by the
Weierstrass Theorem this problem has a solution, so the cost function is well de ned,
when the isoquant f 1 (y) is compact.
To sum up, a producer who, for example, is a monopolist in the output market and faces
perfect competition in the inputs' markets, has a pro t function
Instead, a producer who faces perfect competition in all markets, for the output and the
inputs', has a pro t function
22.1.5 Comments
Ordinality Properties of functions that are preserved under strictly increasing transfor-
mations are called ordinal, as we mentioned when discussing utility theory (Sections 6.4.4
and 17.3.3). In view of Proposition 978, a property may hold for all equivalent objective
functions only if it is ordinal. For instance, all them can be quasi-concave but not concave
(quasi-concavity, but not concavity, is an ordinal property). So, if we are interested in a
property of solutions and wonder which properties of objective functions would ensure it,
ideally we should look for ordinal properties. If we come up with su cient conditions that
are not so { for instance, concavity or continuity conditions { chances are that there exist
more general su cient conditions that are ordinal. In any case, any necessary condition
must be ordinal in that it has to hold for all equivalent objective functions.
To illustrate this subtle, yet important, methodological point, consider the uniqueness of
solutions, a most desirable property for comparative statics exercises (as we remarked earlier
in the chapter). We will soon learn that strict quasi-concavity is an ordinal property that
ensures such uniqueness (Theorem 1032). So does strict concavity as well, which is not an
ordinal property. Yet, conceptually it is strict quasi-concavity the best way to frame this
su cient condition { though, operationally, strict concavity might be the workable version.
What about a necessary condition for uniqueness of solutions? At the end of the chapter
we will digress on cuneiformity, an ordinal property that is both necessary and su cient for
uniqueness (Proposition 1067). As soon as we look for necessary conditions, ordinality takes
center stage.
Rationality Optimization problems are fundamental also in the natural sciences, as Leonida
Tonelli well explains in a 1940 piece: \Maximum and minimum questions have always had a
672 CHAPTER 22. OPTIMIZATION PROBLEMS
great importance also in the interpretation of natural phenomena because they are governed
by a general principle of parsimony. Nature, in its manifestations, tends to save the most
possible of what it uses; therefore, the solutions that it nds are always solutions of either
minimization or maximization problems". The general principle to which Tonelli alludes,
the so-called principle of minimum action, is a metaphysical principle (in the most basic
meaning of this term). Not by chance Tonelli continues by writing \Euler said that, since
the construction of the world is the most perfect and was established by the wisest creator,
nothing happens in this world without an underlying maximum or minimum principle". In
economics, instead, the centrality of the optimization problems is based on a (secular) as-
sumption of rationality of economic agents. The resulting optimal choices of the agents {
for example, optimal bundles for the consumers and optimal outputs for the producers { are
the natural benchmark with respect to which to assess any suboptimal, boundedly rational,
behavior that agents may exhibit.
In an optimization problem one must beware of ad hoc constraints, not to undermine its
organizing principle role.
admits a solution whenever f is continuous and C is compact. This holds also for the dual
optimization problem with min in place of max.
Proposition 998 If the utility function u : A Rn+ ! R is continuous on the closed set A,
then the consumer problem
max u (x) sub x 2 C (p; w)
x
Proof By Proposition 992, the budget set B (p; w) is compact. As A is closed, the set
C (p; w) = B (p; w) \ A is compact (why?). By the Weierstrass Theorem, the consumer
problem has then a solution.
In words, when the utility function is continuous and the consumption set is closed,
optimal bundles exist as long as there are no free goods. These conditions are fairly mild
and often satis ed.12 In particular, the most important case when the consumption set is
closed is when A = Rn+ , as it happens to the CES consumer of the next example.
with 2 [0; 1] and 2 (0; 1], is continuous. By the Weierstrass Theorem, the consumer
problem with this utility function has a solution provided p 0. N
The next example shows what may happen with free goods, so when the Weierstrass
Theorem cannot be applied.
Example 1000 Consider the strictly monotone utility function u : R2+ ! R de ned by
u (x) = x1 + x2 . Let p2 = 0. The budget set B (p; w) = x 2 R2+ : p1 x1 w is unbounded
(so not compact) because any amount, however large, of the free good is a ordable and so
belongs to the budget set. The consumer problem
max x1 + x2 sub p1 x1 w
(x1 ;x2 )
u (^
x" ) = x
^1 + x
^2 + " > x
^1 + x
^2 = u (^
x)
12
Free goods short circuit the consumer problem, so constraints may actually help consumers to focus:
homo oeconomicus e vinculis ratiocinatur (to paraphrase a sentence of Carl Schmitt).
674 CHAPTER 22. OPTIMIZATION PROBLEMS
This contradicts the optimality of x^ and so we conclude that there are no optimal bundles.
Intuitively, the consumer, being the two goods perfect substitute for him, shifts all his
consumption on the free good but then enters an hopeless consumption spree.
Assume that, for some reason, the consumer loses interest in the free good. He now
features the strongly, but not strictly, monotone utility function u (x) = x1 . In this case,
w
arg max u (x) = ;x
^2 :x
^2 0
x2B(p;w) p1
The consumer thus exhausts all his wealth on good 1 and is indi erent over any amount of
good 2. There exist uncountably many optimal bundles: the presence of a free good now
resulted in an abundance of optimal bundles. N
Given the importance of the Weierstrass Theorem, we close the section with two possible
proofs. First, we need an important remark on notation.
Notation In the rest of the book, to simplify notation we denote also sequences of vectors by
fxn g rather than fxn g. If needed, the writing fxn g Rn should clarify the vector nature of
the sequence even though here n denotes both the dimension of the space Rn and a generic
term xn of a sequence. It is a slight abuse of notation, as the same letter denotes two
altogether di erent entities, but hopefully it should not cause any confusion.
Lemma 1001 Let A be a subset of the real line. There exists a sequence fan g A that
converges to sup A.
Proof Set = sup A. Suppose that 2 R. By Proposition 127, for every " > 0 there exists
a" 2 A such that a" > ". By taking " = 1=n for every n 1, it is therefore possible to
build a sequence fan g A such that an > 1=n for every n. It is immediate to see
that an ! .
Suppose now = +1. It follows that for every K > 0 there exists aK 2 A such that
aK K. By taking K = n for every n 1, we can therefore build a sequence fan g such
that an n for every n. It is immediate to see that an ! +1.
First proof of the Weierstrass Theorem Set = supx2C f (x), that is, = sup f (C).
By the previous lemma, there exists a sequence fan g f (C) such that an ! . Let
fxn g C be such that an = f (xn ) for every n 1. Since C is compact, the Bolzano-
Weierstrass' Theorem yields a subsequence fxnk g fxn g that converges to some x^ 2 C,
that is, xnk ! x^ 2 C. Since fan g converges to , also the subsequence fank g converges to
. Since f is continuous, it follows that
We conclude that x
^ is a solution and = max f (C), that is, x^ 2 arg maxx2C f (x) and
= maxx2C f (x). A similar argument shows that arg minx2C f (x) is not empty.
The second proof of Weierstrass' Theorem is based on Proposition 597, which says that
the continuous image of a compact set is compact.
22.3. EXISTENCE: TONELLI'S THEOREM 675
(f t) \ C = fx 2 C : f (x) tg (22.26)
Thus, a function is coercive on C when there is at least an upper contour set that has a
non-empty and compact intersection with C. In particular, when A = C the function is just
said to be coercive, without any further speci cation.
Next we show that, under some basic conditions, the condition t 2 Im f can be relaxed
to just t 2 R, that is, the scalar t can be chosen freely, a useful simpli cation when checking
coercivity.
This result is a simple consequence of the following important property of upper and
lower contours sets of continuous functions, which re nes what seen in Example 595.
The hypothesis that C is closed is crucial. Take for example the identity function f :
R ! R given by f (x) = x. If C = (0; 1), we have (f t) \ C = [t; 1) for every t 2 (0; 1) and
these sets are not closed.
13
Needless to say, the theorems of this section can be \ ipped over" (just take f ) in order to guarantee
the existence of minimizers, now without caring about maximizers.
676 CHAPTER 22. OPTIMIZATION PROBLEMS
4 y
3
0
O x
-1 y =t
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
Example 1006 Consider the cosine function f : R ! R given by f (x) = cos x, with graph:
4
y
3
0
O x
-1
-2
-3
-4
-4 -2 0 2 4 6
More generally, from the graph it is easy to see that the set fx 2 [ ; ] : f (x) tg is non-
empty and compact for every t 1. However, the function fails to be coercive on the entire
real line: the set fx 2 R : f (x) tg is unbounded { so, not compact { for every t 1 and
is empty for every t > 1 (as one can easily see from the graph). N
As the last example shows, coercivity is a joint property of the function f and of the set
C, that is, of the pair (f; C). It is an ordinal property:
Example 1008 Thanks to Example 1005 and Proposition 1007, the famous Gaussian func-
2
tion f : R ! R de ned by f (x) = e x is coercive. This should be clear by inspection of its
678 CHAPTER 22. OPTIMIZATION PROBLEMS
graph:
3
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
which is the well-known \bell curve" found in statistics courses (cf. Section 36.4). N
Continuous functions are coercive on compact sets, a simple consequence of the closure
of their upper contour sets established in Lemma 1004.
Continuous functions f on compact sets C are, thus, a rst relevant example of pairs
(f; C) exhibiting coercivity. Let us see a few more examples.
4 y
3
0
O x
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3 4 5
22.3. EXISTENCE: TONELLI'S THEOREM 679
and so the set fx 2 R : f (x) tg is non-empty and compact for every t 1. For example,
for t = 0 we have
fx 2 R : f (x) 0g = [ 1; 1]
which su ces to conclude that f is coercive { indeed, De nition 1002 requires the mere
existence of at least a scalar t 2 Im f for which the set fx 2 R : f (x) tg is non-empty and
compact. N
Example 1012 De ne f : R ! R by
(
log jxj if x 6= 0
f (x) =
0 if x = 0
and so (
; t>0
fx 2 R : f (x) tg \ C =
1; et [ et ; 1 [ f0g t 0
Thus f is coercive on the compact set [ 1; 1]. Note that f is discontinuous at 0, thus making
Proposition 1009 inapplicable. N
22.3.2 Tonelli
The fact that coercivity and continuity of a function guarantee the existence of a maximizer
is rather intuitive. The upper contour set (f t) indeed \cuts out the low part" { i.e.,
under the value t { of Im f leaving untouched the high part { where the maximum value
lies. The following result, a version of a result of Leonida Tonelli, formalizes this intuition
by establishing the existence of maximizers for coercive functions.
680 CHAPTER 22. OPTIMIZATION PROBLEMS
Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t)\C is
non-empty and compact. By Weierstrass' Theorem, there exists x ^ 2 such that f (^x) f (x)
for every x 2 . At the same time, if x 2 C we have that f (x) < t and so f (^
x) t > f (x).
It follows that f (^x) f (x) for every x 2 C, that is, f (^
x) = maxx2C f (x).
It remains to show that arg maxx2C f (x) is compact if C is closed. Since arg maxx2C f (x)
, it is enough to show that arg maxx2C f (x) is closed (in that a closed subset of a compact
set is, in turn, compact). Clearly, we have
Thanks to Proposition 1009, the hypotheses of Tonelli's Theorem are weaker than those
of Weierstrass' Theorem. On the other hand, weaker hypotheses lead to a weaker result (as
always, no free meals) in which only the existence of a maximizer is guaranteed, without mak-
ing any mention of minimizers. Since, as we already noted, in many economic optimization
problems, one is interested in the existence of maximizers, Tonelli's Theorem is important
because it allows to \trim o " overabundant hypotheses (with respect to our needs) from
Weierstrass' Theorem. In particular, we can use Tonelli's Theorem in optimization problems
where the choice set is not compact { for example, in Chapter 37 we will use it with open
choice sets.
has a solution if f is coercive and continuous on C. Under such hypotheses, one cannot say
anything about the dual minimization problem with min instead of max.
2
Example 1014 The functions f; g : R ! R de ned by f (x) = 1 x2 and g (x) = e x are
both coercive (see Examples 1010 and 1008). Since they are continuous as well, by Tonelli's
Theorem we can say that arg maxx2R f (x) 6= ; and arg maxx2R g (x) 6= ; { as easily seen
from their graphs, for both functions the origin is the global maximizer. Note that, instead,
arg minx2R f (x) = arg minx2R g (x) = ;. Indeed, the set R is not compact, thus making
Weierstrass' Theorem inapplicable. N
N.B. The coercivity of f on C amounts to say that there exists a non-empty compact set
K such that
arg max f (x) K C
x2C
22.3.3 Supercoercivity
In light of Tonelli's Theorem, it becomes important to identify classes of coercive functions.
Supercoercive functions are a rst relevant example.15
Example 1018 (i) The function f : R2 ! R given by f (x) = (x1 x2 )2 is not supercoer-
cive. Consider
p the sequence xn = (n; n). One has that f (xn ) = 0 for every n 1, although
kxn k = n 2 ! +1.
(ii) The exponential function f : R ! R given by f (x) = ex is not supercoercive: just
2
consider the sequence xn = n. Its cousin f (x) = ex is, instead, easily checked to be
supercoercive.
(iii) The negative quadratic function f : R ! R de ned by f (x) = x2 is supercoercive,
2
as previously checked. Its strictly increasing transformation e x is, however, not super-
coercive: just observe that the upper contour set (f 0) is equal to the real line, so it is
unbounded. N
The last example shows that supercoercivity is not, unlike coercivity, an ordinal property.
Yet, it implies coercivity for functions f that are continuous on a closed set C. As a result,
Tonelli's Theorem can be applied to the pair (f; C).
Proof The last result implies that, for every t 2 R, the sets (f t) \ C are bounded. Since
f is continuous and C is closed, such sets are also closed. Indeed, take fxn g (f t) \ C
such that xn ! x 2 Rn . By Theorem 174, to show that (f t) \ C is closed it su ces
to show that x 2 (f t) \ C. As C is closed, we have x 2 C. Since f is continuous, we
have lim f (xn ) = f (x). Since f (xn ) t for every n 1, it follows that f (x) t, that is,
x 2 (f t). Hence, x 2 (f t) \ C and the set (f t) \ C is closed. Since it is bounded,
it is compact.
The reader should note that, when considering a supercoercive and continuous func-
tion, all sets (f t) \ C are compact, while coercivity requires only that at least one of
them be non-empty and compact. This shows, once again, how supercoercivity is a much
stronger property than coercivity. However, it is simpler both to formulate and to verify,
thus explaining its appeal.
Proof Let fxn g Rn be such that kxn k ! +1. This implies that there exists n 1
such that kxn k k, and so g (xn ) f (xn ), for every n n. At the same time, since f
is supercoercive, the sequence ff (xn )g is such that f (xn ) ! 1. This implies that for
each K 2 R there exists nK 1 such that xn < K for all n nK . For each K 2 R, set
nK = max fn; nK g. We then have g (xn ) f (xn ) < K for all n nK , thus proving that
g (xn ) ! 1 as well.
f (x) = kxk +
H = fx 2 Rn : a x = bg
That is, they are the level curves of linear functions (cf. Section 16.6). An hyperplane H
de nes two closed half-spaces
H+ = fx 2 Rn : a x bg and H = fx 2 Rn : a x bg
16
For instance, see the proofs of Theorems 1530, 1532 and 1539.
684 CHAPTER 22. OPTIMIZATION PROBLEMS
(ii) strictly separated if there exists an hyperplane H such that X int H+ and Y
int H .
In words, two sets are (strictly) separated when they belong to opposite closed (open)
half-spaces. Intuitively, the separating hyperplane acts like a watershed between them.
Next we give a straightforward, yet useful, characterization.
(ii) strictly separated if and only if there exist 0 6= a 2 Rn and b 2 Rn such that a x > b >
a y for all x 2 X and y 2 Y .
This characterization suggests a more stringent notion of separation, in which the sets
are completely contained in opposite open half-spaces.
De nition 1024 Two sets X and Y of Rn are strongly separated if there exists 0 6= a 2 Rn
such that
inf a x > sup a y
x2X y2Y
i.e., there exist b 2 Rn and " > 0 such that a x b+" > b a y for all x 2 X and y 2 Y .
Our rst result studies the basic case of separation between convex sets and single points.
Proof We only prove (i), while we omit the non-trivial proof of (ii). Without loss of
generality (why?), assume that x0 = 0 2 = C. Consider the continuous function f : Rn ! R
2
given by f (x) = kxk . This function is supercoercive (Example 1017). By Proposition
1019, f is coercive on the closed set C, so it has a maximizer c 2 C by Tonelli's Theorem.
If x is any point of C, we have kck2 k c + (1 ) xk2 . Hence
kck2 2
kck2 + (1 )2 kxk2 + 2 (1 ) c x
2 2
(1 + ) kck (1 ) kxk + 2 c x
22.5. LOCAL EXTREMAL POINTS 685
The strong separation in point (i) continues to hold if we replace singletons with compact
convex sets, as the next important result shows.
Theorem 1026 (Strong Hyperplane Separation Theorem) A compact convex set and
a closed convex set are strongly separated if they are disjoint.
Proof Let K be a compact convex set and C be a closed convex set, with K \ C = ;.
The set K C = fx y : x 2 K; y 2 Cg is a closed and convex set (Proposition 964) that
does not contain the origin 0 since K \ C = ;. By Proposition 1025-(i), the sets f0g and
K C are then strongly separated. So, there exist 0 6= a 2 Rn , b 2 Rn and " > 0 such that
0 = a 0 b < b + " a (x y) for all x 2 K and y 2 C. This implies a x b + " + a y for
all x 2 K and y 2 C. Since K is compact, by the Weierstrass Theorem there exists x ^2K
such that a x a x ^ b + " + a y for all x 2 K and y 2 C. Hence, a x ^ b + " + supy2C a y,
that is, minx2K a x > supy2C a y. We conclude that K and C are strongly separated.
The strict separation in point (i) of Proposition 1025 leads to the following separation
result based on interiors.
Proposition 1027 Two convex sets are separated if they have disjoint and non-empty in-
teriors .
Proof Let A and B be two convex sets with int A and int B non-empty and disjoint. Then,
int A int B is a open and convex set (Proposition 964) that does not contain the origin
0. By Proposition 1025-(ii), the sets f0g and int A int B are then strictly separated. So,
there exist 0 6= a 2 Rn and b 2 Rn such that 0 = a 0 < b < a (x y) for all x 2 int A and
y 2 int B. This implies
a x>b+a y (22.29)
for all x 2 int A and y 2 int B. Since A and B are convex, A int A and B int B
(why?). So, if x 2 A and y 2 B, there exist sequences fxn g int A and fyn g int B
such that xn ! x and yn ! y. By (22.29), a xn > b + a yn for all n 1, and so
a x = limn!1 a xn b + limn!1 a yn = b + a y. Since b > 0, we conclude that
a x b + a y a y for all x 2 A and all y 2 B, as desired.
As the reader can check, this argument can be adapted to prove, more generally, that
two convex sets are separated if they have only boundary points in common and at least one
of them has a non-empty interior.
6
y
5
0 O x
-1
-2
1880 1900 1920 1940 1960 1980 2000
The highest peak is the (global) maximum value, but intuitively the other peaks, too, cor-
respond to points that, locally, are maximizers. The next de nition formalizes this simple
idea.
The value f (^
x) of the function at x
^ is called local maximum value of f on C.
A global maximizer on C is obviously also a local maximizer. The notion of local max-
imizer is, indeed, much weaker than that of global maximizer. As the next example shows,
it may happen that there are (even many) local maximizers and no global maximizers.
10
y
8
-2
-4
O x
-6
-8
-10
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
In particular, the origin x = 0 is a local maximizer, but not a global one. Indeed,
limx!+1 f (x) = limx! 1 f (x) = +1, thus the function has no global maximizers.
(ii) Let f : R ! R be given by
(
cos x if x 0
f (x) =
x if x > 0
with the graph
8
y
6
0
O x
-2
-4
-6
-8
-8 -6 -4 -2 0 2 4 6 8
The function has in nitely many local maximizers (i.e., x = 2k for k 2 N), but no global
ones. N
O.R. The most important part of the de nition of a local maximizer is \if there exists a
neighborhood". A common mistake is to replace the correct \if there exists a neighborhood"
688 CHAPTER 22. OPTIMIZATION PROBLEMS
O.R. An isolated point x0 of C is always both a local maximizer and a local minimizer.
Indeed, by de nition there is a neighborhood B" (x0 ) of x0 such that B" (x0 ) \ C = fx0 g,
so the inequalities f (x0 ) f (x) and f (x0 ) f (x) for every x 2 B" (x0 ) \ C reduce to
f (x0 ) f (x0 ) and f (x0 ) f (x0 ), which are trivially true.
Considering isolated points as both local maximizers and local minimizers is a bit odd. To
avoid this, we could reformulate the de nition of local maximizer and minimizer by requiring
x
^ to be a limit point of C. However, an even more unpleasant consequence would result:
if an isolated point were a global extremal (e.g., recall the example at the end of Section
22.1.1), we should say that it is not so in the local sense. Thus, the remedy would be worse
than the disease. H
Proof Let x
^ 2 C be a local maximizer. By de nition, there exists a neighborhood B" (^
x)
such that
f (^
x) f (x) 8x 2 B" (^x) (22.31)
Suppose, by contradiction, that x^ is not a global maximizer. Then, there exists a y 2 C such
that f (y) > f (^
x). Since f is concave, for every t 2 (0; 1) we have
f (t^
x + (1 t) y) tf (^
x) + (1 t) f (y) > tf (^
x) + (1 t) f (^
x) = f (^
x) (22.32)
3.5 y
3
2.5
2 2
1.5
0.5
0
O 1 x
-0.5
-1
-3 -2 -1 0 1 2 3
This function is quasi-concave because it is monotone. All points x > 1 are local maximizers,
but not global maximizers. N
When f is quasi-concave, the solution set arg maxx2C f (x) is convex, as it was the case
under concavity (Proposition 982). Indeed, let x
^1 ; x
^2 2 arg maxx2C f (x) and let t 2 [0; 1].
By quasi-concavity,
f (t^
x1 + (1 t) x
^2 ) min ff (^
x1 ) ; f (^
x2 )g = max f (x)
x2C
and therefore
f (t^
x1 + (1 t) x
^2 ) = max f (x)
x2C
i.e., t^
x1 + (1 t) x
^2 2 arg maxx2C f (x).
As we discussed before, when the solution set is convex there are either no solutions or a
unique solution or in nitely many solutions. The uniqueness of solutions is ensured by strict
quasi-concavity, as the next important result shows.
Proof The proof is similar to that of Proposition 984 once observed that (22.8) becomes,
by strict quasi-concavity,
1 1
f (z) = f x
^1 + x^2 > min ff (^
x1 ) ; f (^
x2 )g = max f (x)
2 2 x2C
690 CHAPTER 22. OPTIMIZATION PROBLEMS
22.6.2 Minima
Also miniminization problems for concave functions have some noteworthy properties.
Proof Suppose arg minx2C f (x) 6= ; (otherwise the result is trivially true). (i) Let x
^ 2
arg minx2C f (x). Since f is not constant, there exists y 2 C such that f (y) > f (^ x).
Suppose, by contradiction, that x ^ is an interior point of C. Set z = x^ + (1 ) y with
2 R. The points z are the points of the straight line that passes through x ^ and y.
Since x
^ is an interior point of C, there exists > 1 such that z 2 C. On the other hand,
x
^ = z = + y= 1 1 . Therefore, we get the contradiction
1 1 1 1
f (^
x) = f z + 1 y f (z ) + 1 f (y)
1 1
> f (^
x) + 1 f (^
x) = f (^
x)
It follows that x^ 2 @C, as desired. (ii) Let x^ 2 arg minx2C f (x). Suppose, by contradiction,
that x^2 = ext C. Then, there exist x; y 2 C with x 6= y and 2 (0; 1) such that x ^ = x+
(1 ) y. By strict quasi-concavity, f (^
x) = f ( x + (1 ) y) > min ff (x) ; f (y)g f (^
x),
a contradiction. We conclude that x ^ 2 ext C, as desired.
Hence, under (i) the search of minimizers can be restricted to the boundary points of
C.17 More is true under (ii), where the search can be restricted to the extreme points of C,
an even smaller set (Proposition 782).
Extreme points take center stage in the compact case, a remarkable fact because the set
of extreme points can be a small subset of the frontier { for instance, if C is a polytope we
can restrict the search of minimizers to the vertices.
and
;=
6 arg min f (x) arg min f (x) co arg min f (x) (22.34)
x2ext C x2C x2ext C
;=
6 arg min f (x) ext C
x2C
Relative to the previous result, now Weierstrass' Theorem ensures the existence of min-
imizers, and so the equality (22.33) is meaningful. More interestingly, by building on
Minkowski's Theorem this theorem says that a quasi-concave function attains its minimum
value at some extreme point. Under the hypotheses of Bauer's Theorem, in terms of value
attainment the miniminization problem
that only involves extreme points. In particular, when f is strictly quasi-concave we can
take advantage of both (i) and (ii), so
;=
6 arg min f (x) = arg min f (x)
x2ext C x2C
The miniminization problem (22.35) thus reduces to the simpler problem (22.36) in terms of
both solutions and value attainment.
Proof By Weierstrass' Theorem, arg minx2C f (x) 6= ;. Point (ii) thus follows from the last
result. As to (i), we rst prove that
That is, minimizers are a convex combination of extreme points which are, themselves,
minimizers. Let x ^ 2 arg minx2C f (x). By Minkowski's Theorem, we have C =Pco ext C.
Therefore, there exist
P nite collections fxi gi2I ext C and f i gi2I (0; 1], with i2I i =
1, such that x
^ = i2I i xi . Since x ^ is a minimizer, we have f (xi ) f (^
x) for all i 2 I.
Together with quasi-concavity, this implies that
!
X
f (^
x) = f i xi min f (xi ) f (^
x) (22.38)
i2I
i2I
P
Hence, we conclude that i2I i f (xi ) = f (^ x), which is easily seen to imply f (xi ) = f (^
x)
for all i 2 I. It follows that for each i 2 I we have xi 2 arg minx2C f (x) \ ext C, proving
(22.37).
692 CHAPTER 22. OPTIMIZATION PROBLEMS
We are ready to prove (22.34). By (22.37), we have arg minx2C f (x) \ ext C 6= ;. Con-
sider x 2 arg minx2C f (x) \ ext C. Let x ^ 2 arg minx2ext C f (x). Since x 2 ext C, we
have f (^x) f (x). Since x 2 arg minx2C f (x), we have f (x) f (^
x). This implies that
f (x) = f (^x) and, therefore, x
^ 2 arg minx2C f (x). Since x ^ was arbitrarily chosen, it follows
that arg minx2ext C f (x) arg minx2C f (x) \ ext C, proving the rst inclusion in (22.34).
Clearly, ext C \ arg minx2C f (x) arg minx2ext C f (x). So, ext C \ arg minx2C f (x) =
arg minx2ext C f (x) and (22.37) yields the second inclusion in (22.34).
It remains to prove (22.33). Let x ^ 2 arg minx2C f (x). By (22.34),
P there exist nite
collections
P f^
xi gi2I arg minx2ext C f (x) and f i gi2I (0; 1], with i2I i = 1, such that
x
^ = i2I i x ^i . By quasi-concavity:
!
X
min f (x) = f (^x) = f ix
^i min f (^
xi ) = min f (x) min f (x)
x2C i2I x2ext C x2C
i2I
Example 1037 (i) The function f in Example 1034 is strictly concave. In particular, we
have arg minx2ext C f (x) = arg minx2C f (x) = f 1; 1g, while co (arg minx2ext C f (x)) =
[0; 1]. n o
P
(ii) Consider the simplex 2 = x 2 R3+ : 3i=1 xi = 1 of R3 . De ne f : 2 ! R by
1 1
f (x) = (1 x1 x2 )2 (1 x3 )2
2 2
It is easy to check that f is continuous and concave. Since 2 is convex and compact with
extreme points the versors e1 ; e2 ; e3 , by Bauer's Theorem-(i) we have
6 arg min f ei
;= arg min f (x) co arg min f ei (22.39)
i2f1;2;3g x2 2 i2f1;2;3g
It is immediate to check that f ei = 1=2 for all i 2 f1; 2; 3g, that is,
Let x = (1=4; 1=4; 1=2) 2 2 and x ^ = (1=2; 1=2; 0). We have f (x) = 1=4 > 1=2 = f (^ x),
so x does not belong to arg minx2 2 f (x) but, clearly, belongs to co(arg mini2f1;2;3g f ei ).
^ belongs to arg minx2 2 f (x) but, clearly, does not belong to arg mini2f1;2;3g f ei .
Moreover, x
This proves that the inclusions in (22.39) are strict. N
22.7 A nity
22.7.1 Quasi-a ne objective functions
If we consider quasi-a ne functions { i.e., functions that are both quasi-concave and quasi-
convex { we have the following corollary of Bauer's Theorem.
22.7. AFFINITY 693
max f (x) = max f (x) and min f (x) = min f (x) (22.40)
x2C x2ext C x2C x2ext C
as well as
;=
6 arg max f (x) = co arg max f (x) (22.41)
x2C x2ext C
and
;=
6 arg min f (x) = co arg min f (x) (22.42)
x2C x2ext C
When f is a ne, the hypothesis of continuity becomes super uous by Proposition 836.
Proof By (22.33) we have (22.40). The sets in (22.41) and (22.42) are non-empty by
Weierstrass' Theorem. Since f is quasi-a ne, it is also quasi-concave. By (22.34), we have
(22.42) because arg minx2C f (x) is convex given that f is quasi-a ne. Since f is also
quasi-a ne, the result holds for the arg maxx2C f (x) as well.
that only involve extreme points. Moreover, by (22.40), the values attained are the same.
So, the simpler problems are equivalent to the original ones in terms of both solutions and
value attainment.
An earlier instance of such a remarkable simpli cation a orded by quasi-a ne objective
functions was discussed in Example 989-(ii). Next we provide another couple of examples.
f e3 = 4 < f e1 = 6 < f e2 = 7
By (22.41) and (22.42), arg maxx2C f (x) = e2 and arg minx2C f (x) = e3 .
(ii) Consider the a ne function f : R3 ! R de ned by f (x) = x1 + 2x2 + 2x3 + 5. Now
we have
f e1 = 6 < f e2 = f e3 = 7
By (22.41) and (22.42),
of Rn is called polyhedron. Let us write explicitly the row vectors of the matrix A as:
Each row vector ai thus identi es an inequality constraint ai x bi that a vector x 2 Rn has
to satisfy in order to belong to the polyhedron. We can indeed write P as the intersection
m
\
P = Hi
i=1
Example 1040 (i) A ne sets are the polyhedra featuring equality constraints (Proposition
793). (ii) Simplices are polyhedra: for instance 2 in R3 can be written as x 2 R3 : Ax b
with b = (0; 0; 0; 1) 2 R4 and 2 3
1 0 0
6 0 1 0 7
A =6 4
7
4 3 0 0 1 5
1 1 1
Clearly, simplices are examples of compact polyhedra. N
Example 1042 The elements of a polyhedron are often required to be positive, so let P =
x 2 Rn+ : Ax b . This polyhedron can be written, however, in the standard form P 0 =
fx 2 Rn : A0 x b0 g via suitable A0 and b0 . For instance, if we require the elements of the
polyhedron of the previous example to be positive, we have b0 = (1; 1; 2; 0; 0; 0) and
2 3
1 2 2
6 0 2 1 7
6 7
6 0 1 1 7
0
A =6 6 7
6 0 0 1 7
7
4 0 1 0 5
1 0 0
in which we added (negative) versors to the matrix A. In sum, the standard formulation of
polyhedra easily includes positivity constraints. N
We can characterize the extreme points of polyhedra. To this end, denote by Ax the
submatrix of A that consists of the rows ai of A featuring constrains that are binding at x,
i.e., such that ai x = bi . Clearly, (Ax ) (A) max fm; ng.
In other words, a vector is an extreme point of a polyhedron of Rn if and only if there exist
n linearly independent binding constraints at that vector. Besides its theoretical interest,
this characterization operationalizes the search of extreme points by reducing it to checking
a matrix property.
Proof We prove the \if" leaving the converse to the reader. Suppose that (Ax ) = n.
We want to show that x is an extreme point. Suppose, by contradiction, that there exists
2 (0; 1) and two distinct vectors x0 ; x00 2 P such that x = x0 + (1 ) x00 . Denote by
I (x) = fi 2 f1; :::; mg : ai x = bi g the set of binding constrains. Then,
so
ai x0 = ai x00 = bi 8i 2 I (x)
This implies that x0 and x00 are solutions of the linear system
ai x = bi 8i 2 I (x)
In view of Theorem 744, this contradicts the hypothesis (Ax ) = n. We conclude that x is
an extreme point of P .
Example 1044 Let us check that the versors e1 , e2 and e3 are the extreme points of the
simplex 2 . For each x 2 R3 we have
8
>
> x1 = 0
<
x2 = 0
Ax = b ()
>
> x =0
: 3
x1 + x2 + x3 = 1
696 CHAPTER 22. OPTIMIZATION PROBLEMS
So, 2 3
0 1 0
Ae1 =4 0 0 1 5
1 1 1
By Proposition 1043, versor e1 is an extreme point of 2 because (Ae1 ) = 3. A similar
argument shows that also e2 and e3 are the extreme points of 2 . Moreover, it is easy to see
that no other points x of 2 are such that (Ax ) = 3 (indeed, to have (Ax ) > 2 at least
two coordinates of x have to be 0). N
The last result has the following important consequence (cf. Example 783).
Corollary 1045 A polyhedron has at most a nite number of extreme points.
Proof Let P be a polyhedron. Using the notation used in the last proof, if x is an extreme
point of P , then (Ax ) = n and so, by Proposition 741, x is the unique solution in Rn of the
linear system Ax x = bx , where bx is the vector that consists of the scalars fbi : i 2 I (x)g.
Thus,
I (x) = I x0 =) Ax = Ax0 =) x = x0 8x; x0 2 extP
Equivalently, we have
x 6= x0 =) I (x) 6= I x0 8x; x0 2 extP (22.43)
Since I (x) f1; :::; mg for all x 2 extP , this implies that the set extP is at most nite.
Polyhedra are easily seen to be closed. So, they are compact if and only if they are
bounded. Bounded polyhedra are actually old friends.
Proposition 1046 A convex set in Rn is a bounded polyhedron if and only if it is a polytope.
A bounded polyhedron P can be thus written as a convex envelope of a collection of
vectors xi 2 Rn , i.e., as a polytope P = co fx1 ; :::; xm g.18
Proof We only prove the \only if". Let P be a bounded, so compact, polyhedron. By
Minkowski's Theorem, P = co (ext P ). By Corollary 1045, extP is nite and so P is a
polytope.
In view of Corollary 1038, we can solve this optimization problem when P is bounded (so
compact).
18
Recall (16.2).
22.8. CONSUMPTION 697
and
;=
6 arg max c x = co arg max c x (22.46)
x2P x2fy2P : (Ay )=ng
max c x sub x 2 n 1
x
A general study of optimization problem with equality and inequality constraints will be
carried out in Chapter 39. Linear programming is the special case of a concave optimization
problem (Section 39.4) where the objective function is linear and the constraints are expressed
via a ne functions.19
22.8 Consumption
22.8.1 Optimal bundles
Let us go back to the consumer problem:
This powerful theorem generalizes Proposition 998 and covers most cases of interest in
consumer theory.
Pn For instance, consider the P log-linear utility function u : Rn++ ! R given
by u (x) = i=1 ai log xi , with ai > 0 and ni=1 ai = 1. It has an open consumption set
Rn++ , so Proposition 998 cannot be applied. Fortunately, the following lemma shows that it
is coercive on C (p; w). Since it is also continuous and strictly concave, by Theorem 1049
the consumer problem with log-linear utility has a unique solution.
Lemma 1050 The log-linear utility function u : Rn++ ! R is coercive on C (p; w), provided
p 0.
Proof By Proposition 1007, it su ces to show that the result holds for the Cobb-Douglas
n
Y
utility function u (x) = xai i de ned over Rn++ . We begin by showing that the upper
i=1
contour sets (u t) are closed for every t 2 R. If t 0 the statement is trivially true as
(u t) = ;. Let t > 0, so that (u t) 6= ;. Consider a sequence fxn g (u t) that
converges to a bundle x~ 2 Rn . To prove that (u t) is closed, it is necessary to show that
x
~ 2 (u t). Since fxn g Rn++ , we have x~ 0. Let us show that x ~ 0. Suppose, by
Yn
contradiction, that x has at least one null coordinate. This implies that u (xn ) ! ~ai i = 0,
x
i=1
thus contradicting
u (xn ) t>0 8n 1
In conclusion, x ~ 0. Hence, x ~ belongs to the domain of u, so by continuity we have
u (xn ) ! u (~x). As u (xn ) t for every n, we conclude that u (~ x) t, that is, x
~ 2 (u t),
as desired.
It is easily seen that, for t > 0 small enough, the intersection (u t) \ C (p; w) is non-
empty. We have
D (p; w) = x
^ (p; w) 8 (p; w) 2 Rn++ R+
The study of the demand function is usually based on methods of constrained opti-
mization that rely on di erential calculus, as we will see in Section 38.5. However, in the
important case of log-linear utility functions the demand for good i is, in view of Example
987,
w
Di (p; w) = ai (22.47)
pi
The demanded quantity of good i depends on income w, on its price pi and the relative
importance ai that the log-linear utility function gives it with respect to the other goods.
Speci cally, the larger ai is, the higher is good i's relative importance and { ceteris paribus
(i.e., keeping prices and income constant) { the higher is its demand.
The proof is straightforward: it is enough to note that the budget set does not change if
one multiplies prices and income by the same scalar > 0, that is
As simple as it may seem, this proposition underscores an important economic concept: only
relative prices matter. To see why, choose any good among those in bundle x, for example
the rst good x1 , and call it the numeraire { that is, the unit of account. By setting its price
to 1, we can express income and the other goods' prices in terms of the numeraire:
p2 pn w
1; ; :::; ;
p1 p 1 p1
By Proposition 1051, the demand remains the same:
p2 pn w
x
^ (p1 ; :::; pn ; w) = x
^ 1; ; :::; ; 8p 0
p1 p 1 p1
20
Demand functions are a rst, important, illustration of the importance of the uniqueness of the solution
of an optimization problem.
700 CHAPTER 22. OPTIMIZATION PROBLEMS
As an example, suppose that bundle x consists of di erent kinds of fruit { apples, bananas,
oranges, and so on. Assume that good 1, the numeraire, are apples. Set w ~ = w=p1 and
qi = pi =p1 for every i = 2; :::; n, so that
p2 p 3 pn w
1; ; ; :::; ; = (1; q2 ; q3 ; ::; qn ; w)
~
p1 p1 p1 p 1
In terms of the \apple" numeraire, the price of one unit of fruit 2 is of q2 apples, the price
of one unit of fruit 3 is of q3 apples, ..., the price of one unit of fruit n is of qn apples, while
the value of income is of w ~ apples. To give a concrete example, if
p 2 p3 pn w
1; ; ; :::; ; = (1; 3; 7; :::; 5; 12)
p 1 p1 p 1 p1
the price of one unit of fruit 2 is of 3 apples, the price of one unit of fruit 3 is of 7 apples,
..., the price of one unit of good n is of 5 apples, while the value of income is of 12 apples.
Any good in bundle x can be chosen as numeraire: it is merely a conventional choice
within an economy (justi ed by political reasons, availability of the good itself, etc.), con-
sumers can solve their optimization problems using any numeraire whatsoever. Such a role,
however, can also be taken by an arti cial object, money, for instance euros. In this case,
we say that the price of a unit of apples is of p1 euros, the price of a unit of fruit 2 is of p2
euros, the price of a unit of fruit 3 is of p3 euros, ..., the price of a unit of fruit n is of pn
euros, while the value of income is of w euros. It is a mere change of scale, akin to that of
measuring quantities of fruit in kilograms rather than in pounds. In conclusion, in consumer
theory money is a mere unit of account, nothing but a \veil". The choice of optimal bundles
does not vary if relative prices p2 =p1 , ..., pn =p1 , and relative income w=p1 remain unchanged.
\Nominal" price and income variations do not a ect consumers' choices.
where the income w = p ! i now depends on prices because agent i can fund his consumption
by trading at the market price p his endowment and thus earning up to p ! i euros. The
vector z = x ! i is the vector of net trades, per each good, of agent i if he selects bundle
x.22
As a trader, agent i exchange goods at the market price. As a consumer, agent i solves
the optimization problem
Agents thus play two roles in this economy. Their trader role is, however, ancillary to their
consumer role: what agent i cares about is consumption, trading being only instrumental to
that.
^i (p; p ! i ). Since it only depends on the
Assume that there is a unique optimal bundle x
n n
price vector p, the demand function Di : R+ ! R+ of agent i can be de ned by
^i (p; ! i )
Di (p) = x 8p 2 Rn+
The individual demand Di has still the remarkable invariance property Di ( p) = Di (p) for
every > 0. So, nominal changes in prices do not a ect agents' consumption behavior.
Moreover, if ui : Rn+ ! R is strongly increasing, then Walras' law is easily seen to hold for
agent i, i.e.,
p Di (p) = p ! i (22.49)
We can now aggregate individual behavior. The aggregate demand function D : Rn+ ! Rn
is de ned by X
D (p) = Di (p)
i2I
Note that the aggregate demand function inherits the invariance property of individual de-
mand functions, that is,
D ( p) = D (p) 8 >0 (22.50)
So, nominal changes do not a ect the aggregate demand of goods. Condition A.2 of the
Arrow-Debreu's
P Theorem (Chapter 14) is thus satis ed.
Let ! = i2I ! i be the sum of individual endowments, so the total resources in the
economy. The aggregate supply function S : Rn+ ! Rn is given by such sum, i.e.,
S (p) = !
So, in this simpli ed exchange economy the aggregate supply function does not depend on
prices. It is a \ at" supply.
In this economy we have the weak Walras' law
p E (p) 0
where E : Rn+ ! Rn is the excess demand function de ned by E (p) = D (p) !. Indeed,
X X X
p D (p) = p Di (p) = p Di (p) p !i = p !
i2I i2I i2I
22
We say \net trade" because z may be the outcome of several market operations, here not modelled, in
which agents may have been on both sides of the market (i.e., buyers and sellers).
702 CHAPTER 22. OPTIMIZATION PROBLEMS
If Walras' law (22.49) holds for each agent i 2 I, then its aggregate version holds
p E (p) = 0
So, besides condition A.2, also conditions W.1 and W.2 used in the Arrow-Debreu's Theorem
naturally arise in this simple exchange economy.
The wellbeing of each agent i in the economy E depends on the bundle of goods xi =
(xi1 ; :::; xin ) 2 Rn that he receives, as ranked via a utility function ui : Rn+ ! R. A con-
sumption allocation of such bundles is a vector
jIj
x = x1 ; :::; xjIj 2 Rn+
Next we de ne allocations that may arise via market exchanges which are, at the same time,
voluntary and feasible.
jIj
De nition 1052 A pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a
weak Arrow-Debreu ( market) equilibrium of the exchange economy E if
The optimality condition (i) requires that allocation x consists of bundles that, at the
price level p, are optimal for each agent i { so, as a trader, agent i is freely trading. The
market clearing condition (ii) requires that such allocation x relies on trades that are feasible
in the market. Jointly, conditions (i) and (ii) ensure that allocation x is attained via market
exchanges that are both voluntary and feasible.
The Arrow-Debreu equilibrium notion thus aggregates individual behavior. What distin-
guishes a weak equilibrium and an equilibrium is that in the latter optimal bundles exhaust
endowments, so no resources are left unused. The next result is trivial mathematically yet
of great economic importance in that it shows that the aggregate equilibrium notions of
Chapter 14 can be interpreted in terms of a simple exchange economy.
n jIj
P 1053 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:
(i) Arrow-Debreu equilibrium if and only if (14.8) holds, i.e., q = D (p) = S (p);
(ii) weak Arrow-Debreu equilibrium if and only if (14.10) holds, i.e., q = D (p) S (p).
In view of this result, we can then establish the existence of a weak market equilibrium
of the exchange economy E using the existence results of Chapter 14, in particular Arrow-
Debreu's Theorem. For simplicity, next we consider the existence of a weak market price
equilibrium, i.e., a price p such that E (p) 0 (so, at p there is no excess demand).
22.9. EQUILIBRIUM ANALYSIS 703
Proposition 1054 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I,
the endowment ! i is strictly positive and the utility function ui is continuous and strictly
quasi-concave on a convex and compact consumption set Ai . Then, a weak Arrow-Debreu
equilibrium of the exchange economy E exists.
In sum, in this simple exchange economy we have connected individual and aggregate be-
havior via an equilibrium notion. In particular, the existence of a (weak) market equilibrium
is established only via conditions on agents' individual characteristics { i.e., utility func-
tions and endowments { as methodological individualism prescribes. Indeed, to aggregate
individual behavior via an equilibrium notion is a common mode of analysis in economics.
A caveat, however, is in order: indeed, how does a market price equilibrium come about?
The previous analysis provides conditions under which it exists but says nothing about what
kind of individual choices may actually implement it. A deus ex machina, the \market", sets
price equilibria, a signi cant limitation of the analysis from a methodological individualism
viewpoint.
All allocations in C (!) can, in principle, be attained via trading; for this reason, we call
them attainable allocations. Yet, if there exists a mighty planner { say, a pharaon { endowed
with a vector ! of goods, rather than via trading the attainable allocations may result from
an arbitrary consumption allocation selected by the pharaon, who decides which bundle each
agent can consume.
jIj
The operator f : Rn+ ! RjIj given by
f (x) = (u1 (x1 ) ; :::; u (xn )) (22.51)
represents the utility pro le across agents of each allocation. So, the image
f (C (!)) = ff (x) : x 2 C (!)g
consists of all utility pro les (u1 (x1 ) ; :::; u (xn )) that agents can achieve at attainable allo-
cations. Because of its importance, we denote by the more evocative symbol UE such image,
i.e., we set UE = f (C (!)). The subscript reminds us that this set depends on the individual
characteristics { utility functions and endowments { of the agents in the economy.
jIj
A vector x 2 Rn+ is said to be a (weak, resp.) equilibrium market allocation of
economy E if there is a non-zero price vector p such that the pair (p; x) is a (weak, resp.)
704 CHAPTER 22. OPTIMIZATION PROBLEMS
Theorem 1055 (First Welfare Theorem) Let E = f(ui ; ! i )gi2I be an economy in which
! 0 and, for each agent i 2 I, the utility function ui : Rn+ ! R is concave and strongly
increasing. An equilibrium allocation of economy E is (if it exists) Pareto optimal.
Thus, it is not possible to Pareto improve upon an equilibrium allocation. The First
Welfare Theorem can be viewed as a possible formalization of the famous invisible hand of
Adam Smith. Indeed, an exchange economy reaches via feasible and voluntary exchanges
an equilibrium allocation that even a benevolent pharaon would be not be able to Pareto
improve upon, i.e., he would not be able to select a di erent attainable allocation that makes
at least an agent strictly better o , yet none worse o .
Proof Suppose there exists an equilibrium allocation x 2 C (!) under a non-zero price vector
p. Suppose, by contradiction, that there exists a di erent x0 2 C (!) such that f (x0 ) > f (x).
Let i 2 I. If ui (x0i ) > ui (xi ), then p x0i > p ! i because xi is an optimal bundle. If
ui (x0i ) = ui (xi ), then p x0i p ! i ; indeed, if p x0i < p ! i then x0i is an optimal bundle
that violates the individual Walras' law, a contradiction because ui is strongly increasing
and Ai P is closed under majorization (Proposition 996). Being f (x0 ) > f (x), we conclude
P
that p 0 0 0
i2I xi > p !. On the other hand, from x 2 C P (!) it follows that p !P p i2I xi
because p > 0. We thus reached the contradiction p 0 0
i2I xi > p ! p i2I xi . This
proves that x is a Pareto optimum.
A x = b (22.52)
(m n)(n 1) m 1
22.10. LEAST SQUARES 705
may not have a solution. This is often the case when a system has more equations than
unknowns, i.e., m > n.
When a system has no solution, there is no vector x^ 2 Rn such that A^ x = b. That said,
one may wonder whether there is a surrogate for a solution, a vector x 2 Rn that minimizes
the approximation error
kAx bk (22.53)
that is, the distance between the vector of constants b and the image Ax of the linear
operator F (x) = Ax. The error is null in the fortunate case where x solves the system:
Ax b = 0. In general, the error (22.53) is positive as the norm is always positive.
By Proposition 978, to minimize the approximation error is equivalent to minimizing the
quadratic transformation kAx bk2 of the norm. This justi es the following de nition.
The least squares solution in an approximated solution of the linear system, it is the best
we can do to minimize the distance between vectors Ax and b in Rm . As k k2 is a sum
of squares, to nd the least squares solution by solving the optimization problem (22.54)
is called least squares method . The fathers of this method are Gauss and Legendre, who
suggested it to analyze astronomical data at the beginning of the nineteenth century.
As we remarked, when it exist the linear system's solution is also a least squares solution.
To be a good surrogate, a least squares solution should exist also when the system has no
solution. In other words, the more general are the conditions ensuring the existence of
solutions of the optimization problem (22.54), the more useful is the least squares method.
The following fundamental result shows that such solutions do indeed exist and are unique
under the hypothesis that (A) = n. In the more relevant case where m > n, it amounts to
requiring that the matrix A has maximum rank. The result relies on Tonelli's Theorem for
existence and on Theorem 1032 for uniqueness.
Theorem 1057 Let m n. The optimization problem (22.54) has a unique solution if
(A) = n.
Later in the book we will see the form of this unique solution (Sections 24.4 and 31.5.1).
To prove the result, let us consider the function g : Rn ! R de ned by
The following lemma illustrates the remarkable properties of the objective function g which
allow us to use Tonelli's Theorem and Theorem 1032. Note that condition (A) = n is
equivalent to requiring injectivity of the linear operator F (x) = Ax (Corollary 689).
706 CHAPTER 22. OPTIMIZATION PROBLEMS
Proof Let us start by showing that g is strictly concave. Set x1 ; x2 2 Rn and 2 (0; 1).
Condition (A) = n implies that F is injective, hence F (x1 ) 6= F (x2 ). Therefore,
where the strict inequality follows from the strict convexity of k k2 .23 So,
g ( x1 + (1 ) x2 ) = kF ( x1 + (1 ) x2 ) bk2
> kF (x1 ) bk2 (1 ) kF (x2 ) bk2
= g (x1 ) + (1 ) g (x2 )
kyk = ky b + bk ky bk + kbk
hence
kyk ! +1 =) ky bk ! +1 =) f (y) = ky bk2 ! 1
Set Bt = fy 2 Im F : f (y) tg = (f t) \ Im F for t 2 R. As f is supercoercive and
continuous, by Proposition 1019 f is coercive on the closed set Im F and the sets Bt =
(f t) \ Im F are compact for every t. Furthermore
(g t) = fx 2 Rn : f (F (x)) tg = fx 2 Rn : F (x) 2 Bt g = F 1
(Bt )
Proof of Theorem 1057 In light of the previous lemma, problem (22.55), and so problem
(22.54), has a solution thanks to Tonelli's Theorem because g is coercive. Such a solution is
unique thanks to Theorem 1032 because g is strictly concave.
However, the farmer does have data on the pairs (xi ; yi ) of input and output over the
previous m years, that is, for i = 1; :::; m. The farmer wishes to nd the linear production
function f (x) = x, with 2 R, that better ts his data. Linearity is assumed for the sake
of simplicity: once one becomes familiar with the method, more complex formulations of f
can be considered.
It is still unclear what \better ts his data" means precisely. This is, indeed, the crux of
the matter. According to the least squares method, it consists in requiring the function to
be f (x) = x, where the coe cient minimizes
m
X
(yi xi )2
i=1
that is, the sum of the squares of the errors yi xi that are made by using the produc-
tion function f (x) = x to evaluate output. Therefore, one is faced with the following
optimization problem
m
X
min (yi xi )2 sub 2R
i=1
By denoting by X = (x1 ; :::; xm ) and Y = (y1 ; :::; ym ) the data vectors regarding input and
output, the problem can be restated as
which is a special case n = 1 of the optimization problem (22.54) with the notation A = X,
x = and b = Y .24
By Theorem 1057, problem (22.57) has a unique solution 2 R because the rank
condition is trivially satis ed when n = 1. The farmer can use the production function
f (x) = x
in order to decide how much fertilizer to use for the next crop, for whichever level of output
he might choose. Given the data he has at hand and the (possibly, simplistic) choice of a
linear production function, the least squares method suggests the farmer that this is the
production function that best ts the available data.
24
Unfortunately, the notation we have used, which is standard in statistics, is not consistent with that of
Problem (22.54). In particular, here plays the role of x in (22.54). The only bene t of inconsistent notation
is that it provides a litmus test to check whether a topic has been understood.
708 CHAPTER 22. OPTIMIZATION PROBLEMS
8
y
7
1
0 O 1 2 3 4 5 6 7x
Such a procedure can be used in the analysis of data regarding any pair of variables.
The independent variable x, referred to as regressor, is not generally unique. For example,
suppose the same farmer needs n kinds of input x1 , x2 , ..., xn { that is, n regressors { to
produce a quantity y of output. The data collected by the farmer is thus
X1 = (x11 ; x12 ; :::; x1m )
X2 = (x21 ; x22 ; :::; x2m )
Lemma 1060 We have arg maxx2C W (x) arg optx2C f (x) for every .
P
Proof Fix 0, with m i=1 i = 1. Let x ^ 2 arg maxx2C W (x). The point x ^ is clearly a
Pareto optimizer. Otherwise, there exists x 2 C such that f (x) > f (^
x). But, being 0,
this implies W (x) = f (x) > f (^
x) = W (^ x), a contradiction.
This lemma implies the next Weierstrass-type result that ensures the existence of solu-
tions for an operator optimization problem.
In this case, by suitably choosing the vector of weights we can retrieve all optimizers. The
next examples show that this may, or may not, happen.
Example 1062 (i) Consider f : [0; 1] ! R2 given by f (x) = (ex ; e x ). All the points of
the unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = ex + (1 ) e x , where 2 (0; 1). Its maximizer is x ^ = 0 if (1 )= e
and x^ = 1 otherwise. Hence, only the two Pareto optimizers f0; 1g can be found through
scalarization. (ii) Consider f : [0; 1] ! R2 given by f (x) = x2 ; x2 . Again, all the points
of the unit interval are Pareto optimizers for f . The welfare function W : [0; 1] ! R is given
by W (x) = x2 (1 ) x2 = (2 1) x2 , where 2 (0; 1). We have
8
>
> f0g if < 21
<
arg max W (x) = [0; 1] if = 12
x2C >
>
:
f1g if > 12
and so (22.60) holds. In this case, all Pareto optimizers can be retrieved via scalarization.N
jIj
Given f : Rn+ ! RjIj de ned in (22.51), i.e., f (x) = (u1 (x1 ) ; :::; u (xn )), the operator
optimization problem of the planner is
The solutions of this problem, i.e., the Pareto optimizers, are called Pareto optimal allocations
(in accordance with the terminology of the First Welfare Theorem).
In view of the previous
Pm discussion, the planner can tackle his problem through a welfare
function W (x) = i=1 i ui (xi ) and the associated optimization problem
Unless (22.60) holds, some Pareto optimizers will be missed by a planner that relies on this
scalar optimization problem, whatever he chooses to scalarize with.
Example 1063 Consider an exchange economy with two agents and one good. Assume that
the total amount of the good in the economy is ! > 0. For the sake of simplicity, assume
that the two agents have the same preferences over this single good. In this way, they share
the same utility function, for example a linear u : R+ ! R de ned by
u1 (x) = u2 (x) = x
A planner has to allocate the total endowment ! to the two agents. In other words, he has to
choose an attainable vector x = (x1 ; x2 ) 2 R2+ , that is, such that x1 + x2 ! where x1 will
be the share of ! allotted to the rst agent and x2 will be share of the second agent. Indeed,
every agent can only receive a positive quantity of the good, x 2 R2+ , and the planner cannot
allocate to the agents more than what is available in the economy, x1 + x2 !. Here the
collection (22.61) of attainable allocations is
C (!) = x 2 R2+ : x1 + x2 !
De ne f : R2+ ! R2+ by
f (x1 ; x2 ) = (x1 ; x2 )
In other words, the function f associates to each allocation x the utility pro le (u1 (x1 ) ; u2 (x2 )) 2
R2+ . This latter vector represents the utility of the two agents coming from the feasible al-
location x. The planner operator optimization problem (22.59) is here
that is, the allocations that exhaust total resources are the Pareto optimizers of f on C.
Since agents' utility functions are linear, the Pareto frontier is x 2 R2+ : x1 + x2 = ! . N
712 CHAPTER 22. OPTIMIZATION PROBLEMS
Example 1064 If in the previous example we have two agents and two goods, we get back
to the setup of the Edgeworth box (Section 2.5). Recall that we assumed that there is a unit
of each good to split among the two agents (Albert and Barbara), so ! = f1; 1g. They have
the same utility function ui : R2+ ! R de ned by
p
ui (xi1 ; xi2 ) = xi1 xi2
2 p p
De ne f : R2+ ! R2+ by f (x1 ; x2 ) = ( x11 x12 ; x21 x22 ). The planner operator optimiza-
tion problem (22.59) is here
By Proposition 62,
n o
2
arg opt f (x) = x 2 R2+ :0 x11 = x12 = 1 x21 = 1 x22 1
x2C(!)
that is, the allocations that are symmetric { i.e., there is the same quantity of each good {
and that exhaust total resources are the Pareto optimizers of f on C. The Pareto frontier is
p p
( x11 x12 ; x21 x22 ) 2 R2+ : 0 x11 = x12 = 1 x21 = 1 x22 1
O.R. As the First Welfare Theorem suggests, there is a close connection between Pareto op-
timal allocations and equilibrium allocations that would arise if agents were given individual
endowments and could trade among them under a price vector. We do not further discuss
this topic, which readers will study in some microeconomics course. Just note that, through
such connection, the possible equilibrium allocations may be found by solving the operator
optimization problem (22.62) or, under condition (22.60), the standard optimization problem
(22.63). H
1 1 1 1
f (z) = f x+ y > f (x) + f (y) min ff (x) ; f (y)g
2 2 2 2
(ii) Injective functions f : A ! R are cuneiform. Let x; y 2 A be any two distinct elements
of A. Since injectivity implies f (x) 6= f (y), without loss of generality we can assume that
f (x) > f (y). So, x itself can play the role of z in De nition 1065. An important class
of cuneiform functions are, thus, the strictly monotone functions (increasing or decreasing)
de ned on any subset { nite or not { of the real line. N
The next result shows that being cuneiform is a necessary and su cient condition for the
uniqueness of solutions. In view of the last example, this result generalizes the uniqueness
result that we established for strictly quasi-concave functions.
Proof \If". Let f : A ! R be cuneiform. We want to show that there exists at most a
maximizer in A. Suppose, by contradiction, that there exist in A two such points x0 and x00 ,
i.e., f (x0 ) = f (x00 ) = maxx2A f (x). Since f is cuneiform, there exists z 2 A such that
which contradicts the optimality of x0 and x00 . \Only if". Suppose that there exists at most
one maximizer in A. Let x0 and x00 be any two distinct elements in A. If there are no
maximizers, then in particular x0 and x00 are not maximizers; so, there exists z 2 A such
that f (z) > min ff (x0 ) ; f (x00 )g. We conclude that f is cuneiform. On the other hand, if
there is one maximizer, it is easy to check that it plays the role of z in De nition 1065. Also
in this case f is cuneiform.
Though for brevity we omit details, it is easy to see that there is a dual notion in which the
inequality in the previous de nition is reversed and the previous result holds for minimizers.
there do not exist any three positive integers x, y and z such that xn + y n = z n (Section
1.3.2)
Let us consider the optimization problem
inf f (x; y; z; n) = 0
(x;y;z;n)2C
p p 2 p
since limn!1 f 1; 1; n 2; n = limn!1 1 cos 2 n 2 = 0. Indeed, limn!1 n 2 = 1
(Proposition 346).
The minimum value is thus 0. The question is whether there is a solution of the problem,
that is, a vector (^
x; y^; z^; n
^ ) 2 C such that f (^
x; y^; z^; n
^ ) = 0. Since f is a sum of squares, this
requires that in such a vector they all be null:
^n^ + y^n^
x z^n^ = 1 cos 2 x
^=1 cos 2 y^ = 1 cos 2 z^ = 0
The last three equalities imply that the points x ^, y^ and z^ are integers.29 In order to belong
to the set C, they must be positive. Therefore, the vector (^ x; y^; z^; n
^ ) 2 C must be made
up of three positive integers such that x^n^ + y^n^ = z^n^ for n
^ 3. This is possible if and only
if Fermat's Last Theorem is false. Now that we know it to be true, we can conclude that
this optimization problem has no solution. We could not have made such a statement before
1994: till then, it would have been unclear whether this optimization problem had a solution.
Be it as it may, solving this optimization problem, which only has four variables, amounts
to solving one of the most well-known problems in mathematics.
29
Reall that cos 2x = 1 if and only if x is an integer.
Chapter 23
Semicontinuous optimization
In some optimization problems, continuity turns out to be a too strong property and a weak-
ened notion of continuity, called semicontinuity, comes to play a key role. Fortunately, a more
general version of Tonelli's Theorem continues to hold. We rst introduce semicontinuity,
and then present this ultimate version of Tonelli's Theorem.1
kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A (23.1)
If in this de nition we keep the second inequality, we have the following weakening of
continuity.
A function that is upper semicontinuous at each point of a set E is called upper semicon-
tinuous on E. The function is called upper semicontinuous when it is upper semicontinuous
at all the points of its domain.3
Upper semicontinuity has a dual notion of lower semicontinuity, with f (x) > f (x0 ) "
in place of f (x) < f (x0 ) + ".
715
716 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
Proof The \if" is obvious. As to the converse, assume that f is both upper and lower
semicontinuous at x0 2 A. Fix " > 0. There exist 0" ; 00" > 0 such that, for each x 2 A,
0 00
kx x0 k < " =) f (x) < f (x0 ) + " and kx x0 k < " =) f (x) > f (x0 ) "
0 00
So, by taking " = min "; " , we have
kx x0 k < " =) f (x0 ) " < f (x) < f (x0 ) + " 8x 2 A
In view of (23.1), we conclude that f is continuous at x0 , as desired.
The study of the two forms of semicontinuity, upper and lower, is analogous: indeed, it
is easy to see that f is upper semicontinuous if and only if f is lower semicontinuous. For
this reason, we will focus on upper semicontinuity because it is more relevant for the study
of maximizers.
By Proposition 552, for continuous functions we have lim f (xn ) = f (x0 ), so this sequen-
tial characterization of semicontinuous functions helps to understand to what extent upper
semicontinuity generalizes continuity. For lower semicontinuous, we have the dual condition
lim inf f (xn ) f (x0 ).4
Proof Let f be upper semicontinuous at the point x0 . Let fxn g be such that xn ! x0 . Fix
" > 0. There is n" 1 such that kxn x0 k < " for all n n" . By De nition 1068, we then
have f (xn ) < f (x0 ) + " for each n n" . Therefore, lim sup f (xn ) f (x0 ) + ". Since this
is true for each " > 0, we conclude that lim sup f (xn ) f (x0 ).
Suppose now that lim sup f (xn ) f (x0 ) for each sequence fxn g such that xn ! x0 .
Suppose, per contra, that f is not upper semicontinuous at x0 . Therefore, there exists
" > 0 such that for each > 0 there is x with kx x0 k < and f (x ) f (x0 ) + ".
Setting = 1=n, it follows that for each n there exists xn such that kxn x0 k < 1=n and
f (xn ) f (x0 ) + ". In this way we can construct a sequence fxn g such that xn ! x0
and f (xn ) f (x0 ) + " for each n. Therefore, lim inf f (xn ) f (x0 ) + " > f (x0 ), which
contradicts lim sup f (xn ) f (x0 ). We conclude that f is upper semicontinuous at x0 .
3
y
2 2
1 1
0
O 1 x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
The function is upper semicontinuous at x0 = 1. In fact, let fxn g R with xn ! 1. For every
such xn we have f (xn ) 1 and therefore lim sup f (xn ) 1 < 2 = f (1). By Proposition
1070, f is then upper semicontinuous at x0 (so, it is upper semicontinuous because it is
continuous at each x 6= x0 ). N
This last example shows that, in general, if a function f has a removable discontinuity
at a point x0 { i.e., the limit limx!x0 f (x) exists but it is di erent from f (x0 ) { then
at x0 is either upper semicontinuous if f (x0 ) > limx!x0 f (x) or lower semicontinuous if
f (x0 ) < limx!x0 f (x).
there continuous from the left, that is, limx!x f (x) = f (x0 ). For example, let us modify
0
the function (23.2) at x0 = 1, so to have
(
2 if x > 1
f (x) =
x if x 1
23.1.2 Properties
The upper contour sets of continuous functions are closed (Example 1004 and Lemma 1004).
Remarkably, this property is still true for upper semicontinuous functions, so this weaker
notion of continuity preserves this important property.
In words, the function 1C takes on value 1 on C and 0 elsewhere. Though not continuous, it
is upper semicontinuous. Indeed, let x0 2 Rn . If x0 2 C, then f (x0 ) f (x) for all x 2 Rn ,
so it trivially holds that lim sup f (xn ) f (x) whenever xn ! x. If x0 2 = C, then it belongs
to the open set C c . Given any " > 0, if xn ! x then there is n" 1 such that xn 2 C c , so
f (xn ) = 0, for all n n" . Thus, lim f (xn ) = f (x0 ) = 0. By Proposition 1070, we conclude
that f is upper semicontinuous since x0 was arbitrarily chosen. Its upper contour sets:
8 n
>
> R if t 0
<
(1C t) = C if t 2 (0; 1]
>
>
:
; if t>1
From the previous result it follows that also Proposition 1009 continues to hold under
upper semicontinuity.
23.1. SEMICONTINUOUS FUNCTIONS 719
Example 1078 The union of the closed sets An = [ 1 + 1=n; 1 1=n] is the open interval
( 1; 1), as noted after Corollary 166. The supremum of the in nitely many upper semicon-
tinuous functions
fn (x) = 1[ 1+ 1 ;1 1 ] (x)
n n
h (x) = sup 1[ 1
1+ n ;1 1
] (x) = 1( 1;1) (x)
n2N n
Proof of Proposition 1077 Let x0 2 A. Given " > 0, there exists i 2 I such that
fi (x0 ) < g (x0 ) + ". Since fi is upper semicontinuous, there exists " > 0 such that
So,
kx x0 k < " =) g (x) fi (x) < fi (x0 ) + " < g (x0 ) + 2" 8x 2 A
that is,
kx x0 k < " =) g (x) < g (x0 ) + 2" 8x 2 A
720 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
This proves that g is upper semicontinuous at x0 2 A. We leave to the reader the proof that
h is upper semicontinuous at x0 2 A when I is nite.
Dual properties hold for lower semicontinuous functions: lower semicontinuity is pre-
served by suprema over sets of functions of any cardinality, while is preserved under in ma
only over nite sets of functions. Now the analogy is with the stability properties of open
sets relative to intersections and unions. Indeed, a tight connection { dual to the established
in Example 1075 { is easily seen to exist for lower semicontinuous functions and open sets.
In view of Proposition 1069, we then have the following important corollary about the
\ nite" stability of continuous functions.
The proof is a slight modi cation of the rst proof of Weierstrass' Theorem, which essen-
tially still goes through under upper semicontinuity (a further sign that upper semicontinuity
is the relevant notion of continuity to establish the existence of maximizers).
Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t) \ C
is non-empty and compact. Set = supx2 f (x), that is, = sup f ( ). By Lemma
1001, there exists a sequence fan g f ( ) such that an ! . Let fxn g be such that
an = f (xn ) for every n 1. Since is compact, the Bolzano-Weierstrass' Theorem yields a
subsequence fxnk g fxn g that converges to some x ^ 2 , that is, xnk ! x^ 2 . Since fan g
converges to , also the subsequence fank g converges to . Since f is upper semicontinuous,
it follows that
= lim ank = lim f (xnk ) f (^ x)
k!1 k!1
23.3. THE ORDINAL TONELLI 721
Here the penultimate inequality is due to upper semicontinuity. So, = f (^ x) and we thus
conclude that f (^ x) f (x) for every x 2 . At the same time, if x 2 C we have
f (x) < t and so f (^ x) t > f (x). It follows that f (^
x) f (x) for every x 2 C, that is,
f (^
x) = maxx2C f (x).
It remains to show that arg maxx2C f (x) is compact if C is closed. Since arg maxx2C f (x)
, it is enough to show that arg maxx2C f (x) is closed. Clearly,
The compactness of the set of maximizers makes coercivity a necessary condition for
global optimality for upper semicontinuous objective functions.
If arg maxx2C f (x) is not compact, coercivity is no longer a necessary condition for
optimality, as the constant function on Rn shows (recall the discussion after Example 1014).
Proof \If". Let arg maxx2C f (x) be non-empty and compact. Since arg maxx2C f (x) =
(f maxx2C f (x)) \ C, the function f is coercive on C. \Only if". Let f be coercive on C.
By Tonelli's Theorem, arg maxx2C f (x) is non-empty and compact.
We conclude by observing that for minimizers hold dual versions of the results that we
established, with for instance lower contour sets in place of the upper ones (as readers can
check).
Like the upper contour sets of upper semicontinuous functions, also those of upper quasi-
continuous functions are closed but with an important caveat: the \levels" have to be images
of the function.
23.3. THE ORDINAL TONELLI 723
We can now state and prove a general ordinal version of Tonelli's Theorem in which
upper quasi-continuity replaces upper semicontinuity.5
Theorem 1088 (Ordinal Tonelli) A function f : A Rn ! R which is coercive and
upper quasi-continuous on a subset C of A admits (at least) a maximizer in C. If, in
addition, C is closed, then arg maxx2C f (x) is compact.
The proof relies on a sharpening of Lemma 1001.
Lemma 1089 Let A be a subset of the real line. There exists a convergent and increasing
sequence fan g A such that an " sup A.
Proof Set = sup A. Suppose that 2 R. In the proof of Lemma 1001 we proved the
existence of a sequence fan g A such that an and an ! . Set bn = max fa1 ; :::; an g.
Then 0 bn an ! 0, so bn ! . Suppose now = +1. In the proof of Lemma
1001 we proved the existence of a sequence fan g A such that an ! +1. Again, by setting
bn = max fa1 ; :::; an g, we have bn " +1.
Proof Since f is coercive, there exists t 2 R such that the upper contour set = (f t) \ C
is non-empty and compact. Set = supx2 f (x). By Lemma 1089, there exists a sequence
fan g f ( ) such that an " . Let fxn g be such that an = f (xn ) for every n 1. Since
is compact, the Bolzano-Weierstrass' Theorem yields a subsequence fxnk g fxn g that
converges to some x ^ 2 . We want to show that = f (^ x). Suppose, by contradiction, that
f (^
x) < . Since ank " , then there exists k 1 large enough so that nk nk > f (^
x)
for all k k. Hence, f (xnk ) f xnk for all k k. Since f is upper quasi-continuous at
x
^, we then have f (^ x) f xnk > , a contradiction.6 We conclude that = f (^ x). So,
f (^
x) f (x) for every x 2 . At the same time, if x 2 C we have f (x) < t and so
f (^
x) t > f (x). It follows that f (^x) f (x) for every x 2 C, as desired.
In view of Proposition 1087, the compactness of arg maxx2C f (x) can be proved along
the lines of the semicontinuous Tonelli's Theorem.
Corollary 1082 continues to hold in the upper quasi-continuous case, as readers can check.
The ordinal Tonelli's Theorem is the most general form of this existence theorem that we
present. The earlier pre-coda version of Tonelli's Theorem for continuous functions, Theorem
1013, is enough for the results of the book. Yet, when later in the book readers will come
across topics that rely on Tonelli's Theorem, they may then wonder how much generality
would be gained via its stronger semicontinuous and quasi-continuous versions.
5
We leave to readers the dual minimization version, based on a lower quasi-continuity notion.
6
Here xnk plays the role of y in (23.3).
724 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
The vectors in RC are called directions of recession. Intuitively, along these directions
the set C is unbounded.
Example 1091 In the plane, the recession cones of the convex sets
1
C1 = (x1 ; x2 ) 2 R2 : x2 and x1 > 0 and C2 = (x1 ; x2 ) 2 R2 : x2 x21
x1
are the positive orthant and the vertical axis, respectively. That is, RC1 = R2+ and RC2 =
f(0; x2 ) : x2 0g. N
Proof Let y 2 RC , so that x + ty 2 Rn for all x 2 C and all t 0. For each 0, we then
have x + t ( y) = x + (t ) y 2 C for all x 2 C and all t 0. So, y 2 RC and we conclude
that RC is a cone. To show that it is convex, let y 0 ; y 00 2 RC and 2 [0; 1]. Then,
x+t y 0 + (1 ) y 00 = x + ty 0 + (1 ) x + ty 00 2 C
The next lemma gives some basic properties of recession cones of closed convex sets.
Observe that by point (iii) to check whether a vector y is a direction of recession is enough
to check a single x 2 C.
Lemma 1093 Let C be a closed convex subset of Rn . Then, the following conditions are
equivalent:
(i) y 2 Rn belongs to RC ;
(ii) there exist fxn g C and f ng R+ , with n " 1, such that lim (xn = n) = y.
Proof (i) trivially implies (iii). (iii) implies (ii). Suppose there is x 2 C such that x+ y 2 C
for all t 0. Hence xn = x + ny 2 C for all n 1. If we set n = n, we have xn = n ! y.
(ii) implies (i). Suppose y 2 Rn is such that (xn = n ) = y for some fxn g C and
f n g R+ , with n " 1. Given any x 2 C and t 0, set
t t
zn = 1 x+ xn 8n 1
n n
We have zn 2 C for all n large enough (i.e., such that t= n 1), and limn zn = x + ty. Since
C is closed, we have x + ty 2 C. So, y 2 RC .
Example 1094 (i) Let C = R2++ [ 0. It is easy to see that RC = C, so the recession cone
might not be closed when C is not closed. Note that properties (ii) and (iii) of the last
lemma are not true for this set C. N
Proof We prove only that RC\D RC \ RD since the converse is trivial. Let y 2 RC\D
and x 2 C \ D. Then, x + ty 2 C \ D for all t 0. By Lemma 1093, y belongs to both RC
and RD .
Since the directions of recession of a set are, intuitively, the directions along which the
set is unbounded, it is natural to expect that a set without such directions be bounded. The
next important result con rms that this intuition is correct as long as the set is closed and
convex. Thus, this is the class of sets where the notion of recession cone is most meaningful.
Theorem 1097 Let C be a closed convex subset of Rn . Then, C is bounded if and only if
RC = f0g.
726 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
So, a closed and convex set is compact if and only if its recession cone is trivial. It is a
remarkable characterization of compactness for closed convex sets.
Example 1098 (i) Let C = f(x1 ; 0) : x1 0g [ f(0; x2 ) : x2 0g be the union of the hori-
zontal and vertical axes, a non-convex and unbounded set of the plane. We have RC = f0g,
which shows that in the last theorem the hypothesis of convexity is key.
Since RC1 = C1 where C1 = f(x1 ; 0) : x1 0g, this example also shows that larger sets
might well not feature larger recession cones.
(ii) Let C = (0; 1) [0; 1) [ f(0; 0) ; (1; 0)g, a non-closed and unbounded set of the plane.
We have RC = f0g, and so in the last theorem also the closedness hypothesis is key. N
The vectors in LC are called directions of constancy. Along them, going both backward
and forward, the set C is \symmetrically" unbounded. A vector space, not just a cone, then
results.
LC = RC \ R C = RC \ RC (23.5)
Thus, a vector y is a direction of constancy if and only if both y and y are directions of
recession. A direction of constancy can thus be regarded as two-sided direction of recession.
Proof It is easy to check that LC is a vector space. Let y 2 RC \ R C . Given any t < 0
and x 2 C, consider x + ty. Then, x 2 C and so x + ( t) y 2 C, i.e., x + ty 2 C.
23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 727
Along with Corollary 1096, this proposition has the following interesting consequence.
Corollary 1102 We have LP = fy 2 Rn : Ay = 0g.
Proof By Corollary 1096, RP = fy 2 Rn : Ay 0g and RP = fy 2 Rn : Ay 0g. By
(23.5), we have LP = fy 2 Rn : Ay = 0g.
The following result clari es the nature of Rf for concave functions by showing that
Rf = R(f ) for all non-empty (f ).
Lemma 1105 Let f : C ! R be an upper semicontinuous concave function de ned on a
closed convex subset C of Rn . Then,
Rf = R(f )
Hence, RA = (y; 0) : y 2 R(f ) . On the other hand, A = hypo f \ (Rn f g). Since
RRn f g = Rn f0g, by Proposition 1095 we have
We conclude that
i.e., R(f ) = fy 2 Rn : (y; 0) 2 Rhypo f g. Since was arbitrarily chosen, we conclude that
R(f ) = Rf .
(i) Rf = f0g;
In view of Proposition 1016, this corollary implies that an upper semicontinuous concave
function f : Rn ! R has a trivial recession cone if and only if it is supercoercive. So, the
condition Rf = f0g can be viewed as a general condition of supercoercivity for concave
functions de ned on closed convex sets of Rn .
Proof In view of the last lemma, the only non-trivial implication is that (ii) implies (iii). By
hypothesis, all upper contour sets of f are closed and convex. Suppose that one of them, say
f for some 2 R, is compact. By Theorem 1097, R(f ) = f0g. By the last lemma,
we then have R(f ) = R(f ) = f0g for all 2 R. Again by Theorem 1097, (f ) is then
compact for all 2 R.
Next we characterize Rf for concave functions. In particular, point (iii) shows that Rf
is the set of all directions of increase of f , while point (v) provides a remarkable asymptotic
characterization of the elements of Rf .
(i) y 2 Rf ;
The proof relies on the following lemma, which reports yet another remarkable property
of concave functions.
Lemma 1108 For a concave function : [0; 1) ! R the following properties are equivalent:
(i) is increasing;
Proof (iii) implies (i). Suppose, by contradiction, that is not increasing. So, there exist
w > y 0 with (w) < (y). Let z > w. There is 2 (0; 1) such that w = z + (1 ) y.
Since is concave, we have (w) (z) + (1 ) (y) > (z) + (1 ) (w), and so
(w) > (z). Since z was arbitrarily chosen, we have (w + h) < (w) for all h > 0. In
turn, this implies 0+ (w) 0. In view of Corollary 1552, from (y) > (w) > (z) it follows
02 = @ (w). By Proposition 1518, we have @ (w) = 0+ (w) ; 0 (w) . Since 0 2 = @ (w) and
0 0
+ (w) 0, we have (w) < 0. By the de nition of superdi erential (Section 32.1), we
have
(x) (w) + 0 (w) (x w) 8x 0
We can write each scalar x > w as x = tw for some t > 1. For each t 1 we thus have
(tw) (w) + 0 (w) (tw w) = (w) + (t 1) 0 (w) w, which in turn implies
0
(x) (tw) (w) (w) w
lim = lim lim + (t 1)
x!+1 x t!+1 tw t!+1 tw tw
(w) 0 t 1
= lim + (w) lim = 0 (w) < 0
t!+1 tw t!+1 t
because 0 (w) < 0. We reached a contradiction, so we conclude that is increasing.
(i) trivially implies (ii). (ii) implies (iii). Suppose that is bounded below, i.e., there
exists some k 2 R such that k. We have (t) =t k=t for all t > 0, so limt!+1 (t) =t
limt!+1 k=t = 0.
Proof
T (i) implies (ii). Let x 2 C and let t 0. We have x 2 (f f (x)), and so y 2
R
2R (f ) implies x + ty 2 (f f (x)) for all t 0, i.e., f (x + ty) f (x).
0 00
(ii) implies (iii). Let x 2 C. Let t > t . As f (x + ty) f (x) for all t 0, we have
x + ty 2 C for all t 0. Hence, f (x + t0 y) = f (x + (t0 t00 ) y + t00 y) f (x + t00 y) since
x + t00 y 2 C.
(iii) trivially implies (iv) because limt!+1 f (x + ty) f (x).
730 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
as desired. By (iv), is bounded below. By Lemma 1108, is then increasing. So, (t) =t
(0) =t for all t 0, which implies
(i) y 2 Rf ;
Proof (i) implies (iii) by Proposition 1107. (iii) implies (ii). Let x0 2 C be such that
limt!1 f (x0 + ty) =t 0. Again, de ne the concave function : [0; 1) ! R by (t) =
f (x0 + ty). By Lemma 1108, is bounded below, so inf t 0 f (x + ty) > 1.
(ii) implies (i). Let x0 2 C be such that inf t 0 f (x + ty) > 1. By Lemma 1108,
is increasing. Hence, f (x0 + ty) = (t) (0) = f (x0 ). That is, (x0 ; f (x0 )) 2 hypo f is
such that (x0 ; f (x0 )) + t (y; 0) 2 hypo f for all t 0. This implies (y; 0) 2 Rhypf and so, by
Lemma 1105, y 2 Rf .
Next we show that also the vector spaces L(f ) coincide for the non-empty upper contour
sets (f ) of concave functions.
Lf = L(f )
for all 2 R such that (f ) 6= ;, i.e., such that (f ) 6= ;. Fix 2 R such that
(f ) 6= ;. We have R (f ) = R(g ) . In fact, y 2 R (f ) if and only x + ty 2
(f ) for all x 2 (f ) and all t 0, i.e., if and only if g (x + ty) for all
x 2 (g ) and all 0. By Lemma 1105, (23.6) holds.
We conclude that R (f ) are all equal provided (f ) 6= ;. In turn, this implies
Lf = L(f ) for all (f ) 6= ;.
As readers can check, Propositions 1107 and 1109 take the following form for lineality
spaces, where the real line replaces the positive half-line.
(i) y 2 Lf ;
23.4.3 Maxima
We can nally state and prove the main result of this section, an existence result for maximiz-
ers of concave functions that does not rely on any compactness assumption. In reading the
result recall that, under the hypotheses of the theorem, the set arg maxx2C f (x) is convex.
(i) RC \ Rf = f0g;
(ii) f is coercive;
Proof (i) implies (iii) Suppose RC \Rf = f0g. Then, the condition RC \Rf LC \Lf is triv-
ially satis ed, and so arg maxx2C f (x) 6= ; by Theorem 1116. Since ; = 6 arg maxx2C f (x) =
(f maxx2C f (x))\C, by Proposition 1095 we have R(f maxx2C f (x))\C = R(f maxx2C f (x)) \
RC = Rf \ RC = f0g. By Theorem 1097, the set (f maxx2C f (x)) \ C is then compact,
as desired.
(iii) implies (ii). Suppose that the convex set arg maxx2C f (x) is non-empty and compact.
Since arg maxx2C f (x) = (f maxx2C f (x)) \ C, the function f is coercive.
(ii) implies (i) By coercivity, there is t 2 R such that (f t) \ C is compact and non-
empty. So, by Proposition 1095 and Theorem 1097 we have RC \ Rf = RC \ R(f t) =
RC\(f t) = f0g, as desired.
The proof of Theorem 1116 relies on the next lemma that gives a condition under which
a monotone sequence of closed convex sets has non-empty intersection.
Proof Observe rst that there exists, for every n 1, an element xn 2 Cn of minimal norm,
i.e.,
xn 2 arg min kxn k
x2Cn
Indeed, given any xn 2 Cn the set fx 2 Cn : kxk kxn kg is compact, so a minimizer exists
by (a dual) Tonelli's Theorem because the norm is coercive on Cn .
Because of the monotonicity of the sequence fCn g, the sequence fkxn kg is easily seen to
be increasing: kx1 k kxn k . It is also bounded. Suppose, by contradiction,
that kxn k " 1. The sequence xn = kxn k belongs to the unit ball B1 (0) of Rn , which is
compact. By the Bolzano-Weierstrass' Theorem, there exist a subsequence xnk k and a
vector y 2 @B1 (0) such that limk xnk = xnk = y. Fix m 1. Then,Txnk 2 Cm for all
k large enough. TBy Lemma 1093, y 2 RCm . Since m is arbitrary, y 2 n RCn and so, by
hypothesis, y 2 n LCn . Since limk xnk = xnk = y, we have
xnk xnk y
lim =0 (23.7)
k xnk
T
Moreover, y 2 n LCn implies xnk + tk y 2 Cnk for all tk 2 R and all k 1. Then,
xnk + tk y xnk
for all k 1 since by construction each xnk is a minimum norm vector. Setting tk = xnk ,
we then have
xnk xnk y
1 8k 1 (23.8)
xnk
which contradicts (23.7). We conclude that the monotone sequence fkxn kgn is bounded.
Then,Tby Corollary 324 n
T there is a vector z 2 R such that xn = z. It is easy to check
that z 2 n Cn , and so n Cn 6= ;.
734 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
Proof of Theorem 1116. Set = supx2C f (x) and consider an increasing sequence
f n g f (C) with n " (Lemma 1089). Set Cn = (f n ) \ C for all n 1. Since f is
upper semicontinuous and concave on C, each Cn is closed and convex. Then, by Proposition
1095 we have RCn = R(f n )\C = R(f n ) \ RC =TRf \ RC . Similarly, T LCn = Lf \ LC
and so RCn = LCn for all n 1. By Lemma 1118, n Cn 6= ;. Let x 2 n Cn . We have
f (x ) n for all n 1, and so f (x ) = supx2C f (x). We conclude that x is a
maximizer, as desired.
where the objective function f : Rn ! R is constant. Here Theorem 1116, but not Tonelli's
Theorem, applies. In fact, Rf = Lf = RRn = LRn = Rn , so the condition RC \ Rf =
LC \ Lf is trivially satis ed. Instead, condition RC \ Rf = f0g is not satis ed and, indeed,
arg maxx2Rn f (x) = Rn . N
In view of Examples 1110 and 1115, by Theorem 1116 this problem has solutions provided
fx 2 LC : f (x) = f ( x) = 0g = fx 2 RC : f (x) 0g
which is equivalent to
In particular, by Corollary 1117 the set of solutions is non-empty and compact if and only if
f (x) 0 =) x = 0 8x 2 RC (23.11)
f (x) = min xi
i=1;:::;n
23.4. ASYMPTOTIC ANALYSIS: BEYOND COMPACTNESS 735
Intuitively, arg maxx2C f (x) is non-empty if the set C has no positive directions of recession,
i.e., if RC \Rn+ = f0g. Indeed, these are directions along which f can keep growing. Condition
(23.10) makes precise this simple insight. For, let x 2 RC be such that f (x) 0. It then
follows that x 0 and so RC \ Rn+ = f0g implies x = 0. Hence, f ( x) = 0 and x 2 RC ,
so condition (23.10) holds. Under condition RC \ Rn+ = f0g it also holds condition (23.11),
so we conclude that arg maxx2C f (x) is non-empty and compact. N
Example 1122 If in the last example the objective function f is linear, by Example 1115
we have Lf = f0g. So, by Corollary 1117 in this case the set of solutions is non-empty and
compact if and only if RC \ Rf = f0g. This is the case when fx 2 RC : f (x) 0g = f0g,
that is,
f (x) 0 =) x = 0 8x 2 RC
For instance, given a vector c 2 Rn and a non-empty polyhedron P = fx 2 Rn : Ax bg,
consider the linear programming problem
with f (x) = c x. This problem has a non-empty and compact set of solutions if and only if
RP \ Rf = fx 2 Rn : Ax 0g \ fx 2 Rn : c x 0g = f0g
that is, if and only if x = 0 is the unique solution of the following system of linear inequalities
8
>
> c1 x1 c2 x2 cn xn 0
>
>
< a11 x1 + a12 x2 + + a1n xn 0
a21 x1 + a22 x2 + + a2n xn 0
>
>
>
>
:
am1 x1 + am2 x2 + + amn xn 0
This is the case, for instance, if P is bounded because RP = f0g by Theorem 1097 (cf. the
Fundamental Theorem of Linear Programming). N
In the rest of the book we will never invoke Theorem 1116. However, when we will use
Tonelli's Theorem for an optimization problem with a concave objective function the coda
reader should wonder about the use of the more general Theorem 1116. Finally, we leave to
readers an ordinal version of Theorem 1116.
736 CHAPTER 23. SEMICONTINUOUS OPTIMIZATION
Chapter 24
1.5
1
x
0.5
0 ||x-m||
O
-0.5
m
-1
-1.5
-2
-1 0 1 2 3 4
Clearly, the problem is trivial if x belong to V : just set m = x. Things become interesting
when x is not in V . In this regard, note that we can paraphrase the problem by saying that
it consists in nding in V the best approximation of a given x 2 Rn : the vector subspace
V thus represents the space of \admissible approximations" and x m is interpreted as an
\approximation error" because it represents the error made by approximating x with m.
The problem described above is an optimization problem that consists in minimizing
kx yk under the constraint y 2 V , that is,
737
738 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
The following theorem addresses all these questions. It relies on the notions of orthogonal-
ity we studied earlier in the book (Chapter 4). In particular, recall that two vectors x; y 2 Rn
are orthogonal, written x?y, when their inner product is null. When x is orthogonal to all
vectors in a subset S of Rn , we write x?S.
Note that the uniqueness of m implies that kx mk < kx yk for each y 2 V di erent
from m.
This remarkable result ensures the existence and uniqueness of the solution, thus an-
swering the rst two questions, and characterizes it as the vector in V which makes the
approximation error orthogonal to V itself. Orthogonality with respect to the error is a key
property of the solution that has a number of consequences in applications. Furthermore,
Theorem 1128 will show how orthogonality allows for identifying the solution in closed form
in terms of a basis of V , thus fully answering also the last question.
Thanks to the following lemma, one can apply Tonelli's Theorem and Theorem 1032 to
this optimization problem.
Proof The proof is analogous to that of Lemma 1058 and is thus left to the reader (note
that, from Proposition 861, V is a closed and convex subset of Rn ).
Proof of the Projection Theorem In light of the previous lemma, problem (24.2), so
problem (24.1), has a solution by Tonelli's Theorem because f is coercive on V and such a
solution is unique by Theorem 1032 because f is strictly concave.
It remains to show that, if m minimizes kx yk, then (x m) ?V . Suppose, by contra-
diction, that there is a y~ 2 V which is not orthogonal to x m. Without loss in generality,
suppose that k~y k = 1 (otherwise, it would su ce to take y~= k~
y k which has norm 1) and that
(x m) y~ = 6= 0. Denote by y 0 the element in V such that y 0 = m + y~. We have that
2
x y0 = kx m y~k2 = kx mk2 2 (x m) y~ + 2
= kx mk2 2
< kx mk2
24.2. PROJECTIONS 739
thus contradicting the assumption that m minimizes kx yk as the element y 0 would make
kx yk even smaller. The contradiction proves the desired result.
Denote by V ? = fx 2 Rn : x?V g the set of vectors that are orthogonal to V . The reader
can easily check that such a set is a vector subspace of Rn . It is thus called the orthogonal
complement of V .
Example 1125 Let V = span fy1 ; :::; yk g be the vector subspace generated by the vectors
fyi gki=1 and let Y 2 M (k; n) be the matrix whose rows are such vectors. Given x 2 Rn ,
we have x?V if and only if Y x = 0. Therefore, V ? consists of all the solutions of this
homogeneous linear system. N
In words, any vector can be uniquely represented as sum of vectors in V and in its
orthogonal complement V ? , and this can be done for any vector subspace V of Rn . The
uniqueness of such a decomposition is remarkable as it entails that the vectors y and z are
uniquely determined. For this reason we say that Rn is a direct sum of subspaces V and V ? ,
in symbols Rn = V V ? , as we will see momentarily in Section 24.5. In many applications it
is important to be able to regard Rn as a direct sum of one of its subsets and its orthogonal
complement.
24.2 Projections
Given a vector subspace V of Rn , the solution of the minimization problem (24.1) is called
projection of x onto V . In such way one can de ne an operator PV : Rn ! Rn , called
projection, that associates to each x 2 Rn its projection PV (x).
Therefore,
( PV (x) + PV (y) ( x + y)) ?V
and, by the Projection Theorem and by the uniqueness of decomposition (24.3), PV (x) +
PV (y) is the projection of x + y on V , that is, PV ( x + y) = PV (x) + PV (y).
740 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
Being linear, projections have a matrix representation. To nd it, consider a set fyi gki=1
of vectors that generate the subspace V , that is, V = span fy1 ; :::; yk g. Given x 2 Rn , by the
Projection Theorem we have (x PV (x)) ?V , so
(x PV (x)) yi = 0 8i = 1; :::; k
which are called normal equations
P of the projection. Since PV (x) 2 V , we can write it as a
linear combination PV (x) = ki=1 k yk . The normal equations then become:
k
!
X
x k yk yi = 0 8i = 1; :::; k
i=1
that is,
k
X
k (yk yi ) = x yi 8i = 1; :::; k
i=1
We thus end up with the system
8
>
> 1 (y1 y1 ) + 2 (y2
y1 ) + + k (yk y1 ) = x y1
<
1 (y1 y2 ) + 2 2 y2 ) +
(y + k (yk y2 ) = x y2
>
>
:
1 (y1 yk ) + 2 (y2 yk ) + + k (yk yk ) = x yk
Let Y 2 M (n; k) be the matrix that has as columns the generating vectors fyi gki=1 . We can
rewrite the system in matrix form as
YT Y = YT x (24.4)
k nn kk 1 k nn 1
We thus end up with the Gram square matrix Y T Y , which has rank equal to that of Y by
Proposition 692, that is, Y T Y = (Y ).
If the vectors fyi gki=1 are linearly independent, matrix Y has full rank k and so the Gram
matrix is invertible. By multiplying all elements in system (24.4) by the inverse of the Gram
1
matrix Y T Y , we then have
1
= Y TY Y Tx
So, the projection is given by
k
X 1
PV (x) = k yk =Y = Y Y TY Y Tx 8x 2 Rn
i=1
1
In conclusion, the matrix Y Y T Y Y T represents the linear operator PV .
Geometrically, V is the straight line determined by y (cf. Example 87). So, PV (x) is the
point y in such line that is closest to x. Its coe cient is the ratio
Pn
y x xi yi
= Pi=1 n 2 (24.6)
y y i=1 yi
This can be also checked directly because the optimization problem (24.1) here takes the
form
Xn
min (xi yi )2 sub 2 R
i=1
and the value of that solves this problem is easily checked to be (24.6). For instance, let
y = (2; 3) 2 R2 , so that V = f(2 ; 3 ) : 2 Rg. We have
for all x 2 R2 . N
f (x) = x 8x 2 V
By Theorem 762, such a set is non-empty. Remarkably, the projection of its elements on V
are the same:
f (x) = x = (PV ( ) + y) x = PV ( ) x + y x = PV ( ) x
If 0 2 we have
0
f (x) = PV x = PV ( ) x 8x 2 V
and so PV ( 0 ) x = 0 for every x 2 V . It follows that PV ( 0 ) 2 V ? , that is,
PV ( 0 ?
) 2 V \ V since by de nition PV ( 0 ?
) 2 V . However, V \ V = f0g and so
PV ( 0 0
) = 0, that is, PV ( ) = PV ( ).
742 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
In light of this lemma, let us denote the common projection as , that is = PV ( ) with
2 . By the decomposition (24.3), every 2 can be uniquely written as = +", where
" 2 V ? , so that the vectors " and are orthogonal. In other words, = +":"2V? .
Since
f (x) = x = ( + ") x = x+" x= x 8x 2 V
the projection is the only vector in V that represents f . We have thus proven the following
version of Riesz's Theorem for vector subspaces.
f (x) = x 8x 2 V
In what follows, when mentioning Riesz's Theorem we will refer to this general version
of the result.
Projections have made it possible to address the multiplicity of vectors that a icted
Theorem 762, which in turn resulted from the multiplicity of the extensions f : Rn ! R of
a function f on Rn provided by the Hahn-Banach's Theorem (Section 15.11).
In particular, if f : Rn ! R is a linear function on Rn and is the unique vector of Rn
such that f (x) = x for every x 2 Rn , for its restriction fjV on a vector subspace V the
vector = PV ( ) is the only vector in V such that f (x) = x for every x 2 V . By (24.5),
we then have the following remarkable formula
1
= Y Y TY YT
Least squares The least squares solution x 2 Rn solves the minimization problem
At the same time, since the image Im F of the linear operator F (x) = Ax is a vector subspace
of Rm , the projection PIm F (b) of vector b 2 Rm solves the optimization problem
that is,
kPIm F (b) bk ky bk 8y 2 Im F
24.4. LEAST SQUARES AND PROJECTIONS 743
that is, if and only if its image Ax is the projection of b on the vector subspace Im F
generated by the columns of A. The image Ax is often denoted as y . With such a
notation, (24.8) can be rewritten as y = PIm F (b).
Errors Equality (24.8) shows the tight relationship between projections and least squares.
In particular, by the Projection Theorem the error Ax b is orthogonal to the vector
subspace Im F :
(Ax b) ? Im F
or, equivalently, (y b) ? Im F .
The vector subspace Im F is generated by the columns of A, which are therefore orthogo-
nal to the approximation error. For example, in the statistical interpretation of least squares
from Section 22.10.2, matrix A is denoted as X and has the form (22.58); each column XiT of
X displays data on the i-th regressor in every period. If we identify each such column with
the regressor whose data it portrays, we can see Im F as the vector subspace of Rm generated
by the regressors. The least squares method is equivalent to considering the projection of
the output vector Y on the subspace generated by the regressors X1 , ..., Xn . In particular,
the regressors are orthogonal to the approximation error:
(X Y ) ?Xi 8i = 1; ::; n
By setting Y = X one equivalently has that (Y Y ) ?Xi for every i = 1; ::; n, a classic
property of least squares that we already mentioned.
Solution's formula Assume that (A) = n, so matrix A has full rank and the linear
operator F is injective (Corollary 689). In this case, we have
1
x =F (PIm F (b)) (24.9)
so that the least squares solution can be determined via the projection. Equality (24.9) is
even more signi cant if we can express it in matrix form. In doing so, note that the linearly
independent (since (A) = n) columns of A generate the subspace Im F , thus taking the
role of matrix Y from Section 24.2. By Theorem 1128, we have
1
Ax = PIm F (b) = A AT A AT b
This is the matrix representation of (24.9) that is made possible by the matrix representation
of projections established in Theorem 1128. Cramer's Theorem is the special case when A is
an invertible square matrix of order n. Indeed, in this case also the transpose AT is invertible
(Proposition 717), so by Proposition 704 we have
1 1
x = AT A AT b = A 1
AT AT b = A 1
b
We have thus found the least squares solution when the matrix A has full rank. Using
the statistical notation, we end up with the well-known least squares formula
1
= X TX X TY
x = x1 + x2
When V1 \ V2 = f0g, we say that V1 + V2 is a direct sum of the two vector subspaces and
denote it by
V1 V 2
A basic illustration is the plane as the direct sum of its Cartesian axes.
In view of the last lemma, when x belongs to a direct sum V1 V2 , it can be uniquely
written as a sum x = x1 + x2 with x1 2 V1 and x2 2 V2 . In particular, this uniqueness
permits to de ne the projections PV1 : V1 V2 ! V1 and PV2 : V1 V2 ! V2 by
V23 V13 = V3
24.6. A FINANCE ILLUSTRATION 745
that is, the direct sum of these two axes is the horizontal plane V3 = x 2 R3 : x3 = 0 in
R3 . In particular, the projections PV23 : V3 ! V23 and PV13 : V3 ! V13 are given by
for all x = (x1 ; x2 ; 0) 2 V3 . Clearly, x = PV23 (x) + PV13 (x) for all x 2 V3 . N
Since PV1 (x) + PV1 (x0 ) 2 V1 and PV2 (x) + PV2 (x0 ) 2 V2 , we thus have
as desired.
Rn = V V?
for all vector subspaces V of Rn . In this case, the direct sum of the two vector subspaces V
and V ? is the entire space Rn . This orthogonal form is the only possible one that a such
direct sum can take, as next we show.
Proposition 1136 If the space Rn is the direct sum of the two vector subspaces V1 and V2 ,
then V1 = V2? and V2 = V1? .
Thus, Rn can be the direct sum only of a vector subspace and of its orthogonal com-
plement. In this case, we are back to the projection PV of Section 24.2 and to its twin
PV ? .
where yij represents its payo if state si obtains. Portfolios of primary assets can be formed
in the market, each identi ed by a vector of weights x = (x1 ; :::; xn ) 2 Rn where xj is the
traded quantity of primary asset yj . If xj 0 (resp., xj 0) the portfolio is long (resp.,
short) on asset yj , that is, it buys (resp., sells) xj units of the asset (cf. Example 917).
In particular, the primary asset y1 is identi ed by the portfolio e1 = (1; 0; :::; 0) 2 Rn , the
primary asset y2 by e2 = (0; 1; 0::::; 0) 2 Rn , and so on.
The linear combination Xn
xj yj 2 Rk
j=1
Example 1137 Suppose the payments of the primary assets depend on the state of the
economy (e.g., dividends if assets are shares), which can be of three types:
in which yij is the payment of the asset in case state si obtains, for i = 1; 2; 3. Suppose there
exist only four assets on the market, with L = fy1 ; y2 ; y3 ; y4 g. Let xj be the quantity of
asset yj held, so that the vector of coe cients x = (x1 ; x2 ; x3 ; x4 ) 2 R4 represents a portfolio
formed by these assets. The quantities xj can be both positive and negative. In the rst
case we are long in the asset and we are paid yij in case state si obtains; when xj is negative
we are instead short on the asset and we have to pay yij when si obtains. The payment
of a portfolio x 2 R4 in the di erent states is, therefore, given by the linear combination
x1 y1 + x2 y2 + x3 y3 + x4 y4 2 R3 . For instance, suppose
y1 = ( 1; 0; 2) , y2 = ( 3; 0; 3) , y3 = (0; 2; 4) , y4 = ( 2; 0; 2) (24.10)
Then, the portfolio x = (1; 2; 1; 2) has payo y1 + 2y2 + y3 + 2y4 = ( 11; 2; 16) 2 R3 . N
Example 1138 In the previous example the market generated by the four primary assets
(24.10) is easily seen to be complete. On the other hand, suppose that only the rst two
assets are available, that is, L = fy1 ; y2 g. Then, W = span L = f(x; 0; y) : x; y 2 Rg, so
the market is now incomplete. Indeed, it is not possible to replicate contingent claims that
feature non-zero payments when state s2 obtains. N
is the linear operator that describes the contingent claim determined by portfolio x. In other
words, Ri (x) is the payo of portfolio x if state si obtains. Clearly, W = Im R and so the
rank (R) of the linear operator R : Rn ! Rk is the dimension of the market W .
To derive the matrix representation of the payo operator R, consider the payo matrix
2 3
y11 y12 y1n
6 y21 y22 y2n 7
Y = (yij ) = 6
4
7
5
k n
yk1 yk2 ykn
It has k rows (states) and n columns (assets). Entry yij represents the payo of primary
asset yj in state si . In words, Y is the matrix rendering of the collection L of primary assets.
It is easy to see that the payo operator R : Rn ! Rk can be represented as
R (x) = Y x
The payo matrix Y is thus the matrix associated with operator R. Its rank is then the
dimension of market W (see Section 15.4.2).
In a frictionless market, the (market) value
Xn
v (x) = p x = pj xj
j=1
of a portfolio x is its (today) cost caused by the market operations it requires.2 The (market)
value function v : Rn ! R is the linear function that assigns to each portfolio x its value
v (x). In particular, the value of primary assets is their price. For, recalling that the primary
asset yj is identi ed by the portfolio ej , we have
v ej = p ej = pj (24.11)
Note that it is the frictionless nature of the market that ensures the linearity of the value
function. For instance, if there are transaction costs and so the price of asset yj depends on
the traded quantity { e.g., v 2ej < 2pj { then the value function is no longer linear.
2
Since there are no restrictions to trade, and so it is possible to go long or short on assets, to be precise
v (x) is actually a cost if positive, but a bene t if negative.
748 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
De nition 1139 The nancial market (L; p) satis es the law of one price (LOP) if, for all
portfolios x; x0 2 Rn ,
R (x) = R x0 =) v (x) = v x0 (24.12)
In words, portfolios that induce the same contingent claims must share the same market
value. Indeed, the contingent claims that they determine is all that matters in portfolios,
which are just instruments to achieve them. If two portfolios inducing the same contingent
claim had di erent market values, a (sure) saving opportunity would be missed in the market.
The LOP requires that the nancial market takes advantage of any such opportunity.
Since W = Im R, we have R (x) = R (x0 ) if and only if x; x0 2 R 1 (w) for some w 2 W .
The LOP can be then equivalently stated as follows: given any replicable claim w 2 W ,
x; x0 2 R 1
(w) =) v (x) = v x0 (24.13)
All portfolios x that replicate a contingent claim w thus share the same value v (x). It is
then natural to regard such common value as the price of the claim.
In words, pw is the market cost v (x) incurred today to form a portfolio x that tomorrow
will ensure the contingent claim w, that is, w = R (x). By the form (24.13) of the LOP, the
de nition is well posed: it is immaterial which speci c replicating portfolio x is considered
to determine price pw . The LOP thus permits to price all replicable claims.
For primary assets we get back to (24.11), that is, pj = v ej . In general, we have
Xn
1
pw = v (x) = pj xj 8x 2 R (w)
j=1
The price of a contingent claim in the market is thus the linear combination of the prices of
the primary assets held in any replicating portfolio, weighted according to assets' weights in
such portfolio.
which permits to price all contingent claims in the market, starting from the market prices
of primary assets.
24.6. A FINANCE ILLUSTRATION 749
Theorem 1142 Suppose the nancial market (L; p) satis es the LOP. Then, the pricing
rule f : W ! R is linear.
Proof First observe that, by the LOP, v = f R, that is, v (x) = f (R (x)) for each x 2 Rn .
Let us prove the linearity of f . Let w; w0 2 W and ; 2 R. We want to show that
f ( w + w0 ) = f (w) + f (w0 ). Since W = Im R, there exist vectors x; x0 2 Rn such that
R (x) = w and R (x0 ) = w0 . By De nition 1140, pw = v (x) and pw0 = v (x0 ). By the linearity
of R and v, we then have
f w + w0 = f R (x) + R x0 =f R x + x0 =v x + x0
= v (x) + v x0 = pw + pw0 = f (w) + f w0
The fact that the linearity of the pricing rule characterizes the (frictionless) nancial
markets in which the LOP holds is a remarkable result, upon which modern asset pricing
theory relies. It permits to price all contingent claims in the market in terms of other
contingent claims, thus generalizing formula (24.14). For, suppose a contingent claim w
can
Xmbe written as a linear combination of some replicable contingent claims, that is, w =
j wj . Then w is replicable, with
j=1
Xm Xm Xm
pw = f (w) = f j wj = jf (wj ) = j pw j (24.15)
j=1 j=1 j=1
Formula (24.14) is the special case where the contingent claims wj are primary assets and
their weights are the portfolio ones. In general, it may be easier (e.g., more natural from a
nancial standpoint) to express a contingent claim in terms of other contingent claims rather
in terms of primary assets. The pricing formula
Xm
pw = j pw j (24.16)
j=1
permits to price contingent claims when expressed in terms of other contingent claims.
Inspection of the proof of Theorem 1142 shows that the pricing rule inherits its linearity
from that of the value function, which in turn depends on the frictionless nature of the
nancial market. We conclude that, in the nal analysis, the pricing rule is linear because
the nancial market is frictionless. Whether or not the market is complete is, instead,
irrelevant.
750 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
Theorem 1143 Suppose the nancial market (L; p) satis es the LOP. Then, there exists a
unique vector 2 W such that
f (w) = w 8w 2 W (24.17)
The representing vector is called the pricing kernel. When the market is complete,
2 Rk . In this case we have i = pei where pei is the price of the Arrow contingent claim
i
e ; indeed, by (24.17)
pei = f ei = ei = i
In words, the i-th component of the pricing kernel i is the price of the Arrow contingent
claim that corresponds to state si . That is, i is the cost of having, for sure, one euro
tomorrow if state si obtains (and zero otherwise).
As a result, when the market is complete the price of a contingent claim w is the weighted
average
Xk
pw = f (w) = w= i wi (24.18)
i=1
of its payments in the di erent states, each state weighted according to how much it costs
today to have one euro tomorrow at that state. Consequently, the knowledge of the pricing
kernel (i.e., of the prices of the Arrow contingent claims) permits to price all contingent
claims in the market via the pricing formula
k
X
pw = i wi (24.19)
i=1
The earlier pricing formulas (24.14) and (24.16) require, to price each claim, the knowledge
of replicating portfolios or of prices of some other contingent claims. In contrast, the pricing
formula (24.19) requires only a single piece of information, the value of the Ppricing kernel,
to price all claims. In particular, for primary assets it takes the form pj = ki=1 i yij .
Example 1144 In the three-state economy of Example 1137, there are three Arrow con-
tingent claims e1 , e2 , and e3 . Suppose that the today market price of having tomorrow one
euro in the recession state (and zero otherwise) is higher than in the stasis state, which is in
turn higher than in the growth state, say pe1 = 3, pe2 = 2, and pe3 = 1. Then, the pricing
kernel is = (3; 2; 1) and the pricing formula (24.19) becomes pw = 3w1 + 2w2 + w3 for all
w 2 W . For instance, the price of the contingent claim w = (2; 1; 4) is pw = 12. N
24.6. A FINANCE ILLUSTRATION 751
24.6.6 Arbitrage
A portfolio x 2 Rn is an arbitrage if either of the following conditions holds
Yx 0 Yx>0
I ; II
p x<0 p x 0
A portfolio that satis es condition I has a strictly negative market value and, nevertheless,
ensures a positive payment in all states. On the other hand, a portfolio that satis es condition
II has a negative market value and, nevertheless, a strictly positive payo in at least some
states. Well-functioning nancial markets should be able to take advantage of any such
opportunity of a sure gain, and so they should feature no arbitrage portfolios.
In this section we will study such well-functioning markets. In particular, in a market
without arbitrages I we have:
The rst no arbitrage condition is enough to ensure that the market satis es the LOP.
Lemma 1145 A nancial market (L; p) that has no arbitrages I satis es the LOP.
R (x) 0 =) v (x) 0 8x 2 Rn
that is,
R (x) 0 =) v (x) 0 8x 2 Rn
Along with (24.20), this implies
R (x) = 0 =) v (x) = 0 8x 2 Rn
Let x and x0 be two portfolios such that R (x) = R (x0 ). The linearity of R implies
R (x x0 ) = 0, and so v (x0 x) = 0, i.e., v (x0 ) = v (x) by the linearity of v.
Consider a complete market, that is, W = Rk . Thanks to the lemma, the no arbitrage
condition (24.20) implies that contingent claims are priced according to the formula (24.17).
But much more is true: under this no arbitrage condition the vector is positive, and so the
pricing rule becomes linear and increasing. Better claims command higher market prices.
Proposition 1146 A complete nancial market (L; p), with p 6= 0, satis es the no arbitrage
condition (24.20) if and only if the pricing rule is linear and increasing, that is, there exists
a unique vector 2 Rk+ such that
f (w) = w 8w 2 W (24.22)
752 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
Proof \If". Let R (x) 0. Then, v (x) = f (R (x)) = R (x) 0 since 0 by hypothesis.
\Only if". Since the market is complete, we have W = Im R = Rk . By Lemma 1145, the
LOP holds and so f is linear (Proposition 1142). We need to show that f is increasing. Since
f is linear, this amounts to show that it is positive, i.e., that w 0 implies f (w) 0. Let
w 2 Rk with w 0. Being Im R = Rk , there exists x 2 Rn such that R (x) = w. We thus
have R (x) = w 0, and so (24.20) implies v (x) 0 because of the no arbitrage condition.
Hence, f (w) = f (R (x)) = v (x) 0. We conclude that the linear function f is positive, and
so increasing. By the Riesz-Markov Theorem, there exists a unique positive vector 2 Rk+
such that f (z) = z for every z 2 Rk .
The result becomes sharper when the market also satis es the second no arbitrage condi-
tion (24.21): the vector then becomes strictly positive, so that the pricing rule gets linear
and strictly increasing. Strictly better claims thus command strictly higher market prices.
As the no arbitrage conditions (24.20) and (24.21) are both compelling, a well-functioning
market should actually satisfy both of them. We thus have the following important result
(as its demanding name shows).3
Theorem 1147 (Fundamental Theorem of Finance) A complete nancial market (L; p),
with p 6= 0, satis es the no arbitrage conditions (24.20) and (24.21) if and only if the pricing
rule is linear and strictly increasing, that is, there exists a unique vector 2 Rk++ such that
f (w) = w 8w 2 W (24.23)
Proof \If". Let R (x) > 0. Then, v (x) = f (R (x)) = R (x) > 0 because 0 by
hypothesis. \Only if." By Proposition 1146, f is linear and increasing. We need to show
that f is strictly increasing. Since f is linear, this amounts to show that is strictly positive,
i.e., that w > 0 implies f (w) > 0. Let w 2 Rk with w > 0. Being Im R = Rk , there
exists x 2 Rn such that R (x) = w. We thus have R (x) = w > 0, and so (24.21) implies
v (x) 0 because of the linearity of v. Hence, f (w) = f (R (x)) = v (x) > 0. We conclude
that the linear function f is strictly positive, and so strictly increasing. By the Riesz-Markov
Theorem, there exists a unique strongly positive vector 2 Rk++ such that f (z) = z for
every z 2 Rk .
The price of any replicable contingent claim w is thus the weighted average
k
X
pw = f (w) = w= i wi
i=1
of its payments in the di erent states, with strictly positive weights. If market prices do not
have this form, the market is not exhausting all arbitrage opportunities. Some sure gains
are still possible.
3
We refer interested readers to Cochrane (2005) and Ross (2005).
24.7. CODA MONOTONA 753
x+ = x _ 0 , x = (x ^ 0) and jxj = x _ ( x)
A Riesz subspace thus contains all the joins x _ y and meets x ^ y of all pairs of its vectors
x and y. Therefore, it also contains the positive x+ and negative parts x as well as the
modules jxj of all its elements x.
jyj jxj =) y 2 V 8y 2 Rn
Proposition 1151 A Riesz subspace V is an ideal if and only if, for all x 2 V ,
0 y x =) y 2 V 8y 2 Rn
In other words, a Riesz subspace is an ideal if and only if it contains all intervals [0; x]
determined by its positive elements x.
VI = fx 2 Rn : 8i 2 I; xi = 0g
The ideal VI is easily seen to be isomorphic to RjIj . This is actually the general form of
an ideal, as next we show.
Proposition 1153 A Riesz subspace V is an ideal if and only if there is an index set
I f1; :::; ng such that V = VI .
This result implies, inter alia, that Rn is the only ideal that contains non-zero constant
vectors.
Proof In view of the last example, we just need the prove the \only if". Let V be an
ideal in Rn . If V = Rn we have V = V; . So, assume that V =
6 Rn . For each x 2 V set
I (x) = fi 2 f1; :::; ng : xi = 0g.
Claim For each x 2 V there exists 0 y 2 V such that I (x) = I (y). Moreover, for any
nite family fxi gki=1 of positive vectors in V there exists 0 x 2 V such that
k
\
I (xi ) = I (x) (24.24)
i=1
Let 0 x 2 V . If x 0, then for each versor ei there is i > 0 small enough so that
ei
i 2 [0; x]. In turn, this implies span [0; x] = Rn and so V = Rn , a contradiction. Thus,
I (x) 6= ; for all 0 x 2 V . By the Claim, I (x) 6= ; for all x 2 V . Set
\
I= I (x) = fi 2 f1; :::; ng : 9x 2 V; xi = 0g
x2V
\
Since 0 2 V , this intersection is non-empty. Moreover, by the Claim we have I = I (x).
0 x2V
Since there are at most nitely many distinct sets fI (x) : 0 x 2 V g, by (24.24) there exists
0 xI 2 V such that I (xI ) = I.
24.7. CODA MONOTONA 755
De nition 1154 Two vectors x and y in Rn are disjoint, written x > y, if jxj ^ jyj = 0.
Clearly, > is a symmetric relation: we have x > y if and only if y > x. We can naturally
extend disjointness to pairs of sets by requiring that all their elements be pairwise disjoint.
In particular, two vector subspaces V1 and V2 are disjoint when x1 > x2 for all x1 2 V1 and
all x2 2 V2 . This notion provides an order-theoretic angle on direct sums of ideals.
Proposition 1155 Two ideals V1 and V2 are disjoint if and only if V1 \ V2 = f0g.
Proof \If". Suppose that V1 \ V2 = f0g. Let x1 2 V1 and x2 2 V2 . We have jx1 j 2 V1 and
jx2 j 2 V2 . Since jx1 j ^ jx2 j jx1 j and jx1 j ^ jx2 j jx2 j, we have jx1 j ^ jx2 j 2 V1 \ V2 because
V1 and V2 are ideals. Hence, jx1 j ^ jx2 j = 0.
\Only if". Suppose that V1 and V2 are disjoint. Let x 2 V1 \ V2 . Then, jxj = jxj ^ jxj = 0
and so x = 0.
for all x; y 2 V1 V2 .
This result implies that, for each x 0 that belongs to V1 V2 , in the unique decomposi-
tion x = y1 + y2 we have y1 0 and y2 0. The direct sum thus acquires an order-theoretic
nature.
Proof We prove the result for PV1 (the argument for PV2 is similar). Since PV1 is a linear
operator, it is enough to prove that it is positive (Proposition 650). Let x 0. Since
x PV1 (x) = PV2 (x) 2 V2 , we have jx PV1 (x)j ^ jPV1 (x)j = 0. If 0 = jPV1 (x)j
jx PV1 (x)j, then PV1 (x) = 0 and so PV1 (x) = x 0. If 0 = jx PV1 (x)j jPV1 (x)j, then
PV1 (x) x = 0 and so so PV1 (x) = x 0. We conclude that the linear operator PV1 is
positive, as desired.
x y =) PV (x) PV (y)
for all x; y 2 Rn .
756 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
x=y+z (24.25)
Proposition 1158 Let V be a vector subspace. If the projections PV and PV ? are positive,
then V and V ? are ideals.
Proof We only prove that V is an ideal, the argument for V ? being similar. Given x 2 V ,
let y 2 Rn be such that 0 y x. In view of Proposition 1151, we need to show that y 2 V .
Set z = x y 0. It holds
and so
PV ? (y) + PV ? (z) = x (PV (y) + PV (z)) 2 V \ V ? = f0g
| {z } | {z }
2V ? 2V
Thus, PV ? (y) + PV ? (z) = 0. Since PV ? (y) and PV ? (z) are both positive, we then have
PV ? (y) = PV ? (z) = 0. In turn, this implies y = PV (y) 2 V , as desired. We conclude that
V is an ideal.
V+0 = fx 2 Rn : 8y 2 V+ ; x y 0g
In view of the Riesz's Theorem, the dual cone is isomorphic to the collection
f 2 V 0 : 8y 2 V+ ; f (y) 0
f 2 V 0 : 80 6= y 2 V+ ; f (y) > 0
(i) x y if x y 2 V+0 ;
(iii) x y if x 0 .
y 2 V++
We call dual orders these binary relations. They are transitive and, in particular, is
a preorder (i.e., it is re exive and transitive). It is immediate to see that subspaces with
larger positive cones induce coarser dual orders. In particular, the standard orders on Rn
can be seen as the dual orders induced by Rn+ , the largest positive cone, and so are coarser
than all other dual orders:
x y =) x y ; x > y =) x > y ; x y =) x y
for all x; y 2 Rn . Next we give an extremal characterization of dual orders based on the
Minkowski Theorem. Here
( n
)
X
(V ) = x 2 V+ : xi = 1
i=1
is the simplex of subspace V , which is non-empty when V+ 6= f0g. It reduces to the standard
simplex n 1 when V = Rn .
Proof (i) We prove only the \if", the converse being trivial. Assume that x z y z for
all zP2 ext (V ). By the Minkowski Theorem, each z 2 (V ) is a convex combination
z = ni=1 i zi of extreme points zi of the simplex (V ). Thus,
n
X n
X n
X n
X
x z y z () x i zi y i zi () i (x zi ) i (y zi )
i=1 i=1 i=1 i=1
This implies that x z y z for all z 2 (V ). In turn, this readily implies x z y z for
all z 2 V+ , that is, x y. This completes the proof of (i). The other points are similarly
proved.
4
We write x y if both x y and y x.
758 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
Dual orders take a familiar form when V is an ideal. To show this, we call support of a
vector subspace V of Rn , written supp V , the set
fi 2 f1; :::; ng : 9x 2 V; xi 6= 0g
A vector space has full support when supp V = f1; :::; ng. By Proposition 1153, a Riesz
subspace V does not have a full support if and only if it is an ideal distinct from Rn : indeed,
supp V is the complement I c of the index set I.
In particular, when V = Rn the dual orders reduce to the standard ones ( ; >; ).
Proof In view of Proposition 1160, it is enough to observe that the simplex of the ideal V
is (V ) = ei : i 2 supp V , as easily checked.
Next we further illustrate the dual orders with a couple of one-dimensional vector spaces.
Example 1162 (i) Let V be the vector subspace f( ; ) : 2 Rg of the plane R2 gener-
ated by the point (1; 1). Its positive cone V+ = f0g is trivial and so for all vectors x; y 2 R2
we have x y. Hence, for no x; y 2 R2 we have x > y, let alone x y. These two partial
orders are here empty and so have no bite.
(ii) Let V be the vector subspace f( ; ) : 2 Rg of the plane R2 generated by the point
(1; 1), graphically the 45 degree line. Its positive cone is V+ = f( ; ) : 0g. In particular,
we have
x y () x1 + x2 y1 + y2
for all x; y 2 R2 because, being (V ) = f(1=2; 1=2)g,
1 1 1 1
x y () (x1 ; x2 ) ; (y1 ; y2 ) ;
2 2 2 2
1 1 1 1
() x1 + x2 y1 + y2 () x1 + x2 y1 + y2
2 2 2 2
Similarly, we have
x > y () x y () x1 + x2 > y1 + y2
for all x; y 2 R2 . N
Next we state and prove the ultimate version of the Riesz-Markov's Theorem.
f (x) = x 8x 2 V
Before proving this result, we illustrate it using the vector subspaces discussed in the last
example.
If i > 0 for all i 2 supp V , the function f is strictly increasing, while it is strongly increasing
if i > 0 for at least some i 2 supp V . Finally, since
(
1 if i 2 supp V
PV (1) =
0 else
The proof of the Riesz-Markov Theorem relies on a lemma of independent interest that
generalizes Corollary 1157.
for all x; y 2 Rn .
5
For each 2 R2 with 1 = 2, it holds f (x) = x for all x 2 V . Among them, = 0 is the only one
that belongs to V .
6
For each 2 R2 with 1 + 2 = f (1; 1), it holds f (x) = x for all x 2 V . Among them, (24.26) is the
only one that belongs to V .
760 CHAPTER 24. PROJECTIONS AND APPROXIMATIONS
Hence, 2 (V ).
\Only if". Let f (x) = x for all x 2 V , with 2 (V ). Then, since 2 V we have
n
X
f (PV (1)) = (PV (1) + PV ? (1)) = 1= i =1
i=1
as desired.
Chapter 25
In this chapter we consider two important topics, eigenvalues of symmetric matrices and
quadratic forms. The latter are an important class of functions that plays a key role in
optimization as it will be seen later in the book. The rst topic is also instrumental to the
second one, so we begin by studying eigenvalues.
25.1 Spectra
25.1.1 Eigenvalues
De nition 1166 Let A be a symmetric matrix of order n. A scalar 2 R is called eigen-
value of A and a vector 0 6= x 2 Rn is called eigenvector of A if they jointly solve the
equation
Ax = x (25.1)
det (A I) = 0 (25.2)
Proof \Only if". Let 2 (A). So, there is 0 6= x 2 Rn such that (A I) x = 0, which
in turn implies that the matrix A I is singular, that is, det (A I) = 0.
\If". Suppose 2 R is such that det (A I) = 0. Then A I is singular, so there
exists 0 6= x 2 Rn such that (A I) x = 0, that is, such that Ax = x. We conclude that
2 (A).
761
762 CHAPTER 25. FORMS AND SPECTRA
To nd an eigenpair ( ; x), one can rst nd the eigenvalue by solving (25.2) and then
nd the eigenvector x by solving the homogeneous linear system (A I) x = 0.
This example shows two things: (i) there can be multiple eigenvalues; (ii) to each such
eigenvalue there may correspond multiple eigenvectors. The latter point is best clari ed by
the next result.
25.1. SPECTRA 763
Proposition 1169 Let A be a symmetric matrix of order n. If 2 (A), then the collection
W = fx 2 Rn : (A I) x = 0g
The eigenvectors associated to an eigenvalue are thus the non-zero elements of the
eigenspace W .
(A I) x + x0 = (A I) x + (A I) x0 = 0
and so x + x0 2 W , as desired.
and ( p ! )
p
5 1
W2 5 = ;1 : 2R
2
Both eigenspaces have dimension 1 (geometrically, they are the straight lines determined by
any eigenvector). N
+ =n
Clearly, the dimension of the eigenspace represents the number of linear independent
eigenvectors associated with 2 (A). Since the matrix A I is singular, we thus have
1 n. In the last example, 2+p5 = 2 p5 = 1.
where the third equality follows from Proposition 717. This implies that T1 = I I = O.
n
Therefore, W1 = ker O = R and so 1 = n. The same conclusion holds when A = O. It is
easy to check that = 0 is the only eigenvalue. Indeed:
where, again, the third equality follows from Proposition 717. This implies that T0 = O
I = O. Therefore, W0 = ker O = Rn and so 0 = n. N
764 CHAPTER 25. FORMS AND SPECTRA
Proposition 1172 Let A be a symmetric matrix of order n. For all eigenpairs ( ; x) and
0 0
; x of A, we have
6= 0 =) x ? x0
0 0
Proof Let ; 2 (A), with 6= . We have
Ax x0 = x x0 = x x0
as well as
Ax x0 = x AT x0 = x Ax0 = x 0 0
x = 0
x x0
0 0
So, (x x0 ) = (x x0 ), which implies x x0 = 0 because 6= .
Having established that there are at most n eigenvalues, it remains to address the most
basic question: do eigenvalues exist? Formally, is the spectrum (A) non-empty? To address
this question we introduce an important notion.
pA (t) = tn + n 1t
n 1
+ + ( 1)n det A
In view of this result we conclude that 1 j (A)j n because the characteristic poly-
nomial has at least 1 and at most n roots, as some roots can be repeated, as well-known.
In particular, denote by m ( ) the multiplicity of the eigenvalue 2 (A), that is, its
multiplicity as a root of the characteristic polynomial. We then have
X
m( ) = n
2 (A)
dim W = m ( )
We omit the proof of this result. To understand its scope, we need the following important
orthogonalization result, whose proof introduces a classic orthogonalization procedure based
on the Projection Theorem.2
So, we can always generate a vector subspace through an orthonormal basis. In partic-
ular, Proposition 1175 then implies that there exists an orthonormal basis of Rn formed by
eigenvectors of a symmetric matrix.3 A such basis is called an eigenbasis.
We have (A) = f1; 3g with m (1) = 1 and m (3) = 2. The normalized eigenvector associated
to the eigenvalue 1 is
! r !
p1
2 1 1 2
1
p ; 1p ; 0 = p ; ;0
2 6 2 6 3 3
So, ( ! )
r
1 2
W1 = p ; ;0 : 2R
3 3
and ( r ! )
1 2
W3 = (0; 0; 1) + p ; ;0 : ; 2R
3 3
form an eigenbasis of R3 . N
Proof of Proposition 1176 Let S = fx1 ; :::; xk g be a set of k linearly independent vectors
of Rn . We can turn S into an orthonormal basis of V via the so-called Gram-Schmidt
orthonormalization. De ne a family S~ = f~ x1 ; :::; x
~k g of vectors as follows. If x2 x1 = 0, we
can just take x
~1 = x1 and x
~2 = x2 . So, suppose x2 x1 6= 0. De ne rst
x1
x
~1 =
kx1 k
To de ne x
~2 , rst consider the auxiliary vector y2 = x2 (x2 x
~1 ) x
~1 and then set
y2
x
~2 =
ky2 k
Clearly, x
~1 ; x
~2 2 span fx1 ; x2 g, so span f~
x1 ; x
~2 g span fx1 ; x2 g. Note that, since x
~1 x
~1 =
x1 k2 = 1, we have
k~
y2 = x2 Pspanf~x1 g (x2 )
25.1. SPECTRA 767
That is, we de ned y2 by subtracting from x2 its projection on the vector subspace generated
by x
~1 (cf. Example 1129). By the Projection Theorem, we then have x ~1 ? y2 and so x
~1 ? x
~2 .
This can be easily checked directly:
x2 (x2 x ~1 ) x
~1 1
x
~2 x
~1 = x
~1 = [~
x1 x2 (x2 x
~1 ) (~
x1 x
~1 )]
ky2 k ky2 k
1
= [~
x1 x2 (x2 x
~1 )] = 0
ky2 k
We also have x
~2 6= 0; otherwise
(x2 x1 )
x2 = x1
kx1 k2
and so x2 and x1 would be linear dependent. Clearly, x1 2 span f~ x1 ; x
~2 g. On the other
hand, x2 = y2 + (x2 x ~1 ) x~1 = ky2 k x
~2 + kx2 k (~
x2 x
~1 ) x
~1 , so x2 2 span f~ x1 ; x
~2 g. Thus,
span fx1 ; x2 g span f~x1 ; x
~2 g. We conclude that span fx1 ; x2 g = span f~ x1 ; x
~P2 g.
k 1
We can continue by induction till we de ne the auxiliary vector yk = xk j=1 (xk x ~j ) x
~j ,
which is non-zero because of the linear independence of the vectors in S, and then set
~k = yk = kyk k. One can then prove that the collection S~ = f~
x x1 ; :::; x
~n g so constructed is such
that span S = span S.~
If a symmetric matrix is invertible, then its inverse matrix is symmetric as well. Indeed,
T T T T
I = A 1 A = AT A 1 = A A 1 and so A 1 = A 1 . This motivates the follow-
ing elegant result that shows that the eigenvalues of the inverse matrix are the reciprocals
of the eigenvalues of the original matrix.
Proposition 1179 A symmetric matrix A is invertible if and only if all its eigenvalues are
non-zero. In this case,
1
2 (a) () 2 A 1 (25.4)
Proof To prove the equivalence, just note pA (0) = det ( A) = ( 1)n det A, so 0 2 (A) if
and only if det A = 0. It remains to prove (25.4). Let A be invertible and 2 (A). By
what just established, we have 6= 0. Let 0 6= x 2 Rn be the eigenvector associated with .
Then, Ax = x implies
1
A 1 (Ax) = x = Ax
25.1.2 Diagonalization
We are now ready to move towards the result that motivates for us the study of eigenvalues.
Orthogonal matrices generalize a key feature of identity matrices, that is, that they have
orthonormal rows as well as orthonormal columns (given by the orthonormal set e1 ; :::; en ).
The next proposition clari es.
Proposition 1181 For a square matrix B, the following conditions are equivalent:
(i) B is orthogonal;
Proof We prove (i) and leave (ii) to readers. First note that, by de nition,
8 n
>
> X (
Xn < b2ik if i = j kbi k2 if i = j
T T
BB ij = bik bkj = k=1 =
>
> bi bT if i 6= j
k=1 : b bT if i 6= j j
i j
\Only if". Suppose that the rows b1 ; :::; bn of B are orthonormal, i.e., bi bj = 0 for all
1 i 6= j n and kbi k2 = 1 for all i = 1; :::; n. Then
( (
kbi k2 if i = j 1 if i = j
=
bi bTj if i 6= j 0 if i 6= j
and 2 3
bj1
6 bj2 7
bi bT
j = bi1 bi2 bin 6 7 = b i bj = 0
4 5
bjn
\If". Suppose that BB T = I. Then,
8 n
( >
> X
1 if i = j < b2ik if i = j
BB T = = k=1
ij
0 if i 6= j >
>
: b bT if i 6= j
i j
n
X
and so bi bj = 0 for all 1 i 6= j n, as well as b2ik = kbi k2 = 1.
k=1
25.1. SPECTRA 769
Proof of Proposition 1181 Before starting, by Binet's Theorem recall that det B T B =
det B T det B and det BB T = det B det B T . This implies that
B T B = I =) BTB B 1
=B 1
=) B T I = B 1
=) B T = B 1
BB T = I =) B 1
BB T = B 1
=) IB T = B 1
=) B T = B 1
1 1 1 2 1 1 1 1
p ;p ;p ; p ;p ;p ; 0; p ;p
3 3 3 6 6 6 2 2
are orthonormal. So, by Proposition 1181 the matrix
2 1 1
3
p p p1
3 3 3
6 7
6 p2 p1 p1 7
A=6 6 6 6 7 (25.6)
4 5
0 p1 p1
2 2
and so det B = 1. N
Example 1184 In view of Proposition 1172, a matrix of order n whose rows are normalized
eigenvectors associated to distinct n eigenvalues is orthogonal. For instance, consider the
symmetric matrix " #
13 4
5 5
A= 4 7
5 5
770 CHAPTER 25. FORMS AND SPECTRA
We have the eigenpairs ( 1 ; x1 ) = (1; ( 1=2; 1)) and ( 2 ; x2 ) = (3; (2; 1)). The normalized
eigenvectors are
!
1
x1 1 1 2
= p2 ; p = p ;p
kx1 k 5 5 5 5
2 2
x2 2 1
= p ;p
kx2 k 5 5
So, the matrix 2 3
p1 p2
5 5
B=4 5 (25.7)
p2 p1
5 5
is orthogonal. Its determinant is 1. Note that this orthogonal matrix is symmetric. N
We can now state the main result of this section.
Theorem 1185 A symmetric matrix A is orthogonally diagonalizable, that is, there exists
an orthogonal matrix B such that
B T AB =
where is the diagonal matrix that has the eigenvalues as its entries, each repeated according
to its multiplicity.
Since A = BB T ABB T = B B T , the diagonalization implies that a symmetric matrix
can be decomposed as:
A = B BT
which is a most convenient spectral decomposition of a symmetric matrix. Note that, by
Binet's Theorem
det A = (det B)2 det = det = 1 2 n
because det B = 1 (see Example 1183). So, the determinant of a symmetric matrix is the
product of its eigenvalues (with their multiplicity):
det A = 1 2 n (25.8)
With a radiation metaphor, we can think of the spectrum of the matrix A as its \X-rays"
plate: if we radiate A we would observe its skeleton . The orthogonal matrix B consists of
the \soft issues" of the matrix A that let X-rays pass through.
The next proof clari es how to construct the orthogonal matrix B through the eigenvec-
tors. So, the diagonalization is actually a joint outcome of the eigenpairs.
Proof Let A be a symmetric matrix of order n. Assume rst that it has n distinct eigenvalues
and denote by xi a normalized eigenvector associated to the eigenvalue i for i = 1; :::; n. Let
B be the matrix whose rows are normalized eigenvectors associated to distinct eigenvalues:
2 1 3
x
6 x2 7
6 7
B=6 6
7
7
4 5
xn
25.1. SPECTRA 771
We have
2 3
xi1
6 xi2 7
6 7
xi1 ; xi2 ; :::; xin A6
6
7 = xi Axi = xi
7 ix
i
= i xi xi = i xi = i
4 5
i
xn
where the last inequality holds because xi is normalized, i.e., xi = 1. Moreover, for i 6= j
we have 2 j 3
x
6 1j 7
6 x2 7
i i i 6 7
x1 ; x2 ; :::; xn A 6 7 = xi Axj = xi j xj = j xi xj = 0
6 7
4 5
j
xn
where the last inequality holds because xi ? xj when i 6= j. In turn, this implies
2 1 32 32 1 3
x1 x1n a11 a1n x1 xn1
6 x21 x2n 7 6 a2n 7 6 1 xn2 7
6 7 6 a21 7 6 x2 7
B AB = 6
T
6
76
76
76
76
7
7
4 54 54 5
x1n xnn an1 ann 1
xn xnn
2 1 1 1 2 1 n
3 2 3
x Ax x Ax x Ax 1 0 0
6 x2 Ax1 x2 Ax2 x2 Axn 7 6 0 7
6 7 6 0 2 7
= 6 .. .. .. .. 7 = 6 .. .. .. .. 7 =
4 . . . . 5 4 . . . . 5
xn Ax1 xn Ax2 xn Axn 0 0 n
as desired. Finally, if some eigenvalue are repeated, one repeats accordingly the associated
eigenvector.
The proof shows that the role of the orthogonal matrix in the diagonalization can be
played by the matrix whose columns are normalized eigenvectors associated to distinct eigen-
values. The next examples illustrate (and show that other orthogonal matrices can be also
considered).
We have (A) = f1; 3g with m (1) = m (3) = 1. By (25.8), det A = 1 3 = 3. The matrix
(25.7) of normalized eigenvectors is
2 3
p1 p2
5 5
B=4 5
p2 p1
5 5
772 CHAPTER 25. FORMS AND SPECTRA
0 1 0
is a matrix of normalized eigenvectors. Through it, we get the orthogonal diagonalization of
matrix A: 2 q 3 2 q 3
p1 2
0 2 3 p1 0 2
6 3 3 7 1 0 0 6 3 3 7
6 7 6 q 7
A=6 6 0 0 1 7 4
7 0 3 0 6
56 2
0 p1 7
7
4 q 5 0 0 3 4 3 3 5
2 p1
3 3
0 0 1 0
A di erent orthogonal diagonalization of matrix A is:
2 1 3 2 3
p p1 p1 2 3 p1 p2 0
3 3 3 3 6
6 7 1 0 0 6 7
6 p2 p1 p 7
1 4 0 3 0 56 p1 p1 p1 7
A=6 6 6 6 7 6 3 6 2 7
4 5 4 5
0 0 3
0 p1 p1 p1 p1 p1
2 2 3 6 2
where the orthogonal matrix (25.6) appears. N
We close by noting that eigenvalues and eigenvectors can be de ned for any square matrix,
not necessarily symmetric. Yet, in this general case eigenvalues have to be allowed to take
complex values, so we do not discuss this important topic that we leave to more advanced
courses.
25.2. FORMS 773
25.2 Forms
25.2.1 Forms
A function f : Rn ! R of the form
For instance, the functions f (x1 ; x2 ; x3 ) = x81 + 5x1 x32 x43 and f (x1 ; x2 ; x3 ) = x1 x3 +
5x2 x3 + x23 are forms, while the function f (x1 ; x2 ; x3 ) = x1 x2 x3 + x1 x52 x3 is not a form.
A formPk is linear if it is the sum of monomials of rst degree, which we can write as
f (x) = i=1 ki xi . By Riesz's Theorem, linear forms are thus the linear functions.
A form is quadratic if it is the sum of monomials of second degree. For example,
f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is a quadratic form because it is the sum of the monomi-
als of second degree 3x1 x3 and x2 x3 . It is easy to see that the following functions are
quadratic forms:
f (x) = x2
f (x1 ; x2 ) = x21 + x22 4x1 x2
f (x1 ; x2 ; x3 ; x4 ) = x1 x4 2x21 + 3x2 x3
Quadratic forms are the most important nonlinear forms. In what follows we study them
in detail.
In other words, given a symmetric matrix A there exists a unique quadratic form
n n
f : Rn ! R for which (25.9) holds. Vice versa, given a quadratic form f : Rn ! R there
exists a unique symmetric matrix A for which (25.9) holds.
n n
4 T
To ease notation we write x Ax instead of the more precise x AxT (cf. the dicussion on vector
notation in in Section 15.2.4). So, we drop all the \T".
774 CHAPTER 25. FORMS AND SPECTRA
P P
Proof For the \if", just note that x Ax = ni=1 aii x2i +2 1 i<j n aij xi xj , so f (x) = x Ax
is a quadratic form. As to the converse, let f : Rn ! R be a quadratic form. It is easy to see
that f (x) = x Ax where aii corresponds to the coe cient of x2i and aij + aji corresponds
to the coe cient of xi xj . In particular, A is symmetric if and only if aij and aji are equal
to a half of the coe cient of xi xj . So, there is a unique symmetric matrix A for which we
have f (x) = x Ax.
The matrix A = (aij ) is called the symmetric matrix associated to the quadratic form f .
We can write (25.9) in an extended manner as
The coe cients of the squares x21 , x22 , ..., x2n are therefore the elements (a11 ; a22 ; :::ann ) of
the diagonal of A, while for every i; j = 1; 2; :::n the coe cient of the monomial xi xj is 2aij .
It is therefore simple to move from the symmetric matrix to the quadratic form and vice
versa. Let us see give some examples.
Example 1190 The symmetric matrix associated to the quadratic form f (x1 ; x2 ; x3 ) =
3x1 x3 x2 x3 is given by
2 3
3
0 0 2
A=4 0 0 1 5
2
3 1
2 2 0
are such that f (x) = x Ax, although they are not symmetric. What we loose without
symmetry is the bijective correspondence between quadratic forms and matrices. Indeed,
while given the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 there exists a unique symmetric
matrix for which (25.9) holds, this is no longer true if we do not require the symmetry of
the matrix, as the two matrices in (25.10) show: for both of them, (25.9) holds. N
25.2. FORMS 775
Example 1191 As to the quadratic form f (x1 ; x2 ) = x21 + x22 4x1 x2 , we have
1 2
A=
2 1
1 2 x1
x Ax = (x1 ; x2 ) = (x1 ; x2 ) (x1 2x2 ; 2x1 + x2 )
2 1 x2
= x21 2x1 x2 2x1 x2 + x22 = x21 + x22 4x1 x2
N
P
Example 1192 Let f : Rn ! R be de ned by f (x) = kxk2 = ni=1 x2i . The symmetric
matrix
Pn associated to this quadratic form
Pn is the2 identity matrix I. Indeed, x Ix = x x =
2
i=1 xi . More generally, let f (x) = i=1 i xi with i 2 R for every i = 1; :::; n. It is easy
to see that the symmetric matrix associated to f is the diagonal matrix
2 3
1 0 0 0
6 0 0 0 7
6 2 7
6 0 0 0 7
6 3 7
4 0 0 0 0 5
0 0 0 n
(iii) inde nite if there exist x; x0 2 Rn such that f (x) < 0 < f (x0 ).
Note a basic duality: f is negative de nite (semi-de nite) if and only if f is positive
de nite (semi-de nite). So, properties established for the positive cases have dual versions
for negative ones. For this reason, only the positive case is often explicitly considered.
In view of Proposition 1189, we have a parallel classi cation for symmetric matrices that
translates that of the corresponding quadratic form. In particular, a symmetric matrix A of
order n is:
Here the previous duality takes the following form: a symmetric matrix is negative de nite
(semi-de nite) if and only if A positive de nite (semi-de nite). So, to check whether a
matrix A is negative de nite (semi-de nite) one can check whether A is positive de nite
(semi-de nite). Criteria that establish whether a symmetric matrix is positive de nite (semi-
de nite) thus have dual versions for the negative case.
2 1
A=
1 12
We have x Ax = 2 (x1 + x2 =2)2 , so A is positive semi-de nite but not positive de nite. For
instance, for x = (1; 2) we have x Ax = 0. N
Proposition 1195 A positive de nite matrix is invertible. Its inverse matrix is also positive
de nite.
Proof Let A be a positive de nite matrix. Suppose, by contradiction, that it is not invertible.
So, there exists 0 6= x 2 Rn such that Ax = 0. In turn, this implies x Ax = 0, which
contradicts the positive de niteness of A.
It remains to prove that A 1 is a positive de nite matrix. Let 0 6= x 2 Rn . Set y = A 1 x,
so that 0 6= y 2 Rn . We then have
0 1
Xn Xn Xn n X
X n
x A 1x = x y = xi yi = @ aij yj A yi = aij yj yi = y Ay > 0
i=1 i=1 j=1 i=1 j=1
as desired.
Proposition 1196 A positive semi-de nite matrix is invertible if and only if it is positive
de nite.
Positive de nite matrices can thus be regarded as the positive semi-de nite matrices that
are invertible. So, the semi-de nite matrices that are not positive de nite are the singular
ones (cf. Example 1194).
The proof of this remarkable property relies on the following lemma, of independent
interest.
Lemma 1197 Let A be a positive semi-de nite matrix. For each x 2 Rn , we have x Ax = 0
if and only if Ax = 0.
When A is positive semi-de nite, the homogeneous linear system Ax = 0 is thus equiva-
lent to the quadratic equation x Ax = 0.
25.2. FORMS 777
Proof We prove the \only if", the converse being trivial. Let A be a positive semi-de nite
matrix. Let x 2 Rn be such that x Ax = 0. De ne an auxiliary map p : R ! R by
p (t) = (tx + Ax) A (tx + Ax).5 Some algebra shows that
Since A is positive semi-de nite, we have p (t) 0 for all t 2 R. So, Ax Ax = 0 because,
otherwise, we would have p (t) < 0 for some t < 0. In turn, from Ax Ax = 0 it follows that
Ax = 0, as desired.
Proof of Proposition 1196 We prove the \only if" as the converse is given by Proposition
1195. Let A be a positive semi-de nite matrix which is invertible. Let x 2 Rn be such that
x Ax = 0. By the last lemma, Ax = 0. Since A is invertible, we have x = 0. So, A is
positive de nite.
Since positive de nite matrices are invertible, their eigenvalues are non-zero (Proposition
1179). Much more is true: next we show that eigenvalues provide a key characterization of
positive de nite matrices.
(i) positive de nite if and only if all its eigenvalues are strictly positive;
(ii) positive semi-de nite if and only if all its eigenvalues are positive.
The positivity of eigenvalues is thus what characterizes semi-de nite matrices among
symmetric matrices, as well as positive de nite matrices among symmetric invertible matrices
(cf. Proposition 1179). The proof of this spectral characterization relies on the orthogonal
diagonalization of symmetric matrices previously established in this chapter.
Proof We only prove (i) because the proof of (ii) is similar.6 \Only if" Suppose the sym-
metric matrix A is positive de nite. Let 2 (A). From the proof of Theorem 1185 we
T
know that = x AxT , where x is a normalized eigenvector associated to the eigenvalue
. Since x 6= 0, we have
T
= x AxT > 0
as desired. \If" Suppose that > 0 for all 2 (A). By Theorem 1185, there is an
orthogonal matrix B such that B AB = . Let 0 6= x 2 Rn . Set y T = B T x. Since B is
T
as desired.
We close by showing that positive de nite matrices are invertible matrices of a famil-
iar form: they are Gram matrices. It is another remarkable dividend of the orthogonal
diagonalization.
Theorem 1200 A symmetric matrix A is positive de nite if and only if there exists an
invertible matrix B such that A = B T B.
Proof \If". Assume that there exists an invertible matrix B such that A = B T B. Let
0 6= x 2 Rn and set y = Bx. Since B is invertible, y 6= 0. Then x Ax = x B T Bx = y T y =
y y > 0, and so A is positive de nite.
\Only if". Assume that A is a positive de nite matrix. By Theorem 1185, there is an
orthogonal matrix C such that A = C C T . Then
1 1
A = (C 2 )( 2 C T)
1
where 2 is the diagonal matrix with entries that are square roots of the corresponding
entries of (since A is positive de nite, all the diagonal terms of are strictly positive).
1 1 p p p 1
By Binet's Theorem, det C 2 = det C det 2 = 1 2 n 6= 0, so C 2
1 1 1
is invertible. Moreover, (C 2 )T = 2 C T . If we set B = 2 C T , we conclude that A =
1 1
(C 2 )( 2 C T ) = B T B.
Interestingly, inspection of the proof shows that, via the change of variable y = Bx, we
can write a positive de nite quadratic form f (x) = x Ax as an inner product y y, that is,
as a sum of squares. This observation sheds further light on the nature of positive de nite
quadratic forms (cf. also Brioschi's Theorem below).
O.R. It can be proved that the matrix B in the last theorem can be uniquely chosen to be
upper triangular with strictly positive diagonal entries. This is important from a computa-
tional viewpoint because triangular matrices are especially easy to handle. So, denote by
L the transpose of B, which is lower triangular with strictly positive diagonal entries. The
\triangular" version of the factorization established in this last result is often written as
A = LLT
and is called Cholesky factorization. If one is able to compute L, this factorization may
greatly simplify, for example, the computation of the solution of a linear system Ax = b
when A is positive de nite. Indeed, in this case if one rst nds the solution y of the system
Ly = b, then to solve the system Ax = b amounts to solve the system LT x = y . In the
two steps computations are substantially simpli ed by the triangular nature of the involved
matrices (as readers will learn in more advanced courses). H
25.2. FORMS 779
In the negative cases, the inequalities are reversed. These conditions are only necessary:
for instance, the matrix
1 2
2 1
is easily checked to be not positive de nite. Yet, as previously
P remarked, the conditions
become su cient for \diagonal" quadratic forms f (x) = ni=1 i x2i .
This result gives a simple necessary \diagonal" conditions that are mostly useful for a
preliminary inspection: if these conditions are violated, the matrix cannot be de nite or
semi-de nite. For a deeper analysis, we need more sophisticated criteria. Among them,
next we will present the classic Sylvester-Jacobi criterion. To introduce it, we need some
terminology about submatrices and minors (Section 15.6.7). Let A be a square matrix of
order n. We call:
(i) principal submatrices the square submatrices of A obtained by eliminating k rows and
columns that have the same indexes (place), with 0 k n 1;
(iii) leading principal submatrices the principal submatrices obtained by eliminating the
last k rows and columns, with 0 k n 1, that is, in symbols
2 3
a11 a12 a13
a11 a12
A1 = [a11 ] ; A2 = ; A3 = 4 a21 a22 a23 5 ; :::; An = A
a21 a22
a31 a32 a33
(25.11)
1 3 1 2
det A = 101; det = 29; det = 3; det [1] = 1
10 1 5 7
1 2
det = 1; det [7] = 7
3 7
1 row and 1 column of index 3 (i.e., the elimination of the last row and column),
1 row and 1 column of index 1 (i.e., the elimination of the rst row and column),
2 rows and 2 columns of indexes 2 and 3 as well as 2 rows and 2 columns of indexes 1
and 3 (in both cases, we end up with the square submatrix [1]),
2 rows and 2 columns of index 2 (i.e., the elimination of the middle rows and columns),
1 3
det A1 = det [1] = 1; det A2 = det = 29; det A3 = det A = 101
10 1
Theorem 1203 (Brioschi) Let A be a symmetric matrix of order n with non-zero leading
principal minors. Then, there exists an upper triangular matrix C of order n with unitary
diagonal entries such that, for all x 2 Rn ,
n
X det Ak 2
x Ax = z (25.12)
det Ak 1 k
k=1
where z = Cx.
Proof We only prove the case n = 2 originally established by Lagrange (a complete proof
can be found in Debreu, 1952). The left hand side of (25.12) is
Next we generalize the simple necessary conditions of Proposition 1201, which are the
special case of the next result involving only \diagonal" principal minors of the form aii .
Proposition 1204 Let A be a symmetric matrix of order n.
(i) If A is positive de nite, then its principal minors are all strictly positive.
(ii) If A is positive semi-de nite, then its principal minors are all positive.
Proof We only prove (i) because the proof of (ii) is similar. Assume that A is positive
de nite. Let Aii be the (n 1) (n 1) principal submatrix of A obtained by eliminating
the rows and columns of index i { e.g., A11 results from the elimination of the rst rows and
columns. Let x 2 Rn 1 and consider x ~ 2 Rn be such that x~ i = x.8 Then, x Aii x = x
~ A~
x>0
for all 0 6= x 2 R n 1 . So, Aii is positive de nite. A similar argument, just notationally
messier, proves that any principal submatrix B of A is positive de nite. By Corollary 1199,
det B > 0.
This proposition empowers the preliminary analysis that was based on Proposition 1201:
now it is enough to exhibit any principal minor that violates the positivity conditions to
conclude that the matrix is not de nite or semi-de nite. Yet, the main interest of this result
is as a stepping stone towards the Sylvester-Jacobi criterion, which we can now state and
prove.
8
Recall the notation x i introduced in Section 14.
782 CHAPTER 25. FORMS AND SPECTRA
(i) positive de nite if and only if its leading principal minors are all strictly positive;
(ii) negative de nite if and only if its leading principal minors are not zero and change sign
starting with a negative sign;
(iii) inde nite if its leading principal minors are not zero and the sequence of their signs
does not respect (i) and (ii).
Remarkably, it is enough to check just the leading principal minors to establish whether
a symmetric matrix is positive or negative de nite, a computationally much lighter task than
checking all principal minors.
Proof (i) The \only if" part follows from Proposition 1204. The \if" part follows from
Brioschi's Theorem. Indeed, let 0 6= x 2 Rn . Let C be the triangular matrix C of Brioschi's
Theorem. Since it has unitary diagonal entries, it is invertible. Thus, we have z = Cx 6= 0.
So, (25.12) implies x Ax > 0. Point (ii) is just the dual \negative" version of (i) because
det ( A) = ( 1)n det A for a square matrix of order n (see Proposition 717). Finally, point
(iii) is a straightforward consequence of points (i) and (ii).
Example 1206 Let f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2 . The symmetric matrix
associated to f is: 2 3
1 21 0
A = 4 12 2 12 5
0 12 1
Indeed, we have
2 1
32 3
1 2 0 x1
x Ax = (x1 ; x2 ; x3 ) 4 1
2 2 1 54
2 x2 5
1
0 2 1 x3
1 1 1 1
= (x1 ; x2 ; x3 ) x1 + x2 ; x1 + 2x2 + x3 ; x2 + x3
2 2 2 2
= x21 + 2x22 + x23 + (x1 + x3 ) x2
det A1 = 1 > 0
7 1
1 2
det A2 = det = >0
1
2 42
3
det A3 = det A = > 0
2
Hence, by the Sylvester-Jacobi criterion the quadratic form f is positive de nite. N
Proposition 1207 A symmetric matrix is positive semi-de nite if all its principal minors,
leading or not, are positive.
Here we need to consider all principal minors, not just the leading ones as in the pos-
itive de nite case. The Sylvester-Jacobi criterion is thus computationally heavier, so less
attractive, in the positive semi-de nite case.
0 0
A=
0 a22
are positive, yet the matrix is positive semi-de nite if and only if a22 0. N
784 CHAPTER 25. FORMS AND SPECTRA
Part VI
Di erential calculus
785
Chapter 26
Derivatives
c = c (x + x) c (x)
x 2 f:::; 3; 2; 1; 1; 2; 3; :::g
As the production increases, while the average cost decreases the di erence quotient in-
creases. This means that the average cost of each additional unit increases. Therefore, to
increase the production is, \at the margin", more and more expensive for the producer. In
787
788 CHAPTER 26. DERIVATIVES
particular, the last additional unit has determined an increase in costs of 5 euros: for the
producer such increase in the production is pro table if (and only if) there is an at least
equal increase in the di erence quotient of the return R(x), that is, in the return of each
additional unit:
R R (x + x) R (x)
= (26.2)
x x
Let us add to the table two columns with the returns and their di erence quotients:
c(x) c R
x c (x) x x R (x) x
The rst two increases in production are pro table for the producer: they determine a
di erence quotient of the returns equal to 50 euros and 33:3 euros, respectively, versus a
di erence quotient of the costs equal to 3 euros and 3:3 euros, respectively. After the last
increment in production, the di erence quotient of the returns decreases to only 4 euros,
lower than the corresponding value of 5 euros of the di erence quotient of the costs. The
producer will nd, therefore, pro table to increase the production to 105 units, but not to
106. That this choice is correct is con rmed by the trend of the pro t (x) = R (x) c (x),
which for convenience we add to the table:
c(x) c R
x c (x) x x R (x) x (x)
100 4; 494 44:94 5; 000 506
The pro t of the producer continues to increase up to the level 105 of produced output, but
decreases in case of a further increase to 106. The \incremental" information, quanti ed by
di erence quotients such as (26.1) and (26.2), is therefore key for the producer ability to
assess his production decisions. In contrast, the information on average costs or on average
returns is, for instance, completely irrelevant (in our example it is actually misleading: the
decrease in average costs can lead to wrong decisions). In the economics jargon, the producer
should decide based on what happens at the margin, not on average.
Until now we have considered the ratio (26.1) for discrete variations x. Idealizing, let
us consider arbitrary non-zero variations x 2 R and, in particular, smaller and smaller
variations, that is, x ! 0. Their limit c0 (x) is given by
c (x + x) c (x)
c0 (x) = lim (26.3)
x!0 x
26.2. DERIVATIVES 789
When it exists and is nite, c0 (x) is called the marginal cost at x: it indicates the variation
in cost determined by in nitesimal variations of output with respect to the \initial" quantity
x.
This idealization permits to frame marginal analysis within di erential calculus, a fun-
damental mathematical theory that will be the subject matter of the chapters of this part of
the book. Because it formalizes marginal analysis, di erential calculus pervades economics.
26.2 Derivatives
For a function f : (a; b) ! R, the di erence quotient (26.1) takes the form
f f (x + h) f (x) f (x + h) f (x)
= = (26.4)
x (x + h) x h
Therefore, the derivative is nothing but the limit of the di erence quotient when it exists
and is nite. Other notations used for the derivative at x0 are
df
Df (x0 ) and (x0 )
dx
The notation f 0 (x0 ), which we will mostly use, is probably the most convenient; sometimes
we will use also the other two notations, whenever convenient.2
Note the double requirement that the limit exist and be nite: if at a point the limit of
the di erence quotient (26.5) exists but is in nite, the function does not have a derivative
at that point (see Example 1213).
A few remarks are in order. (i) Di erential calculus, of which derivatives are a rst key
notion, originated in the works of Leibniz and Newton in the second part of the seventeenth
century. Newton was motivated by physics, which indeed features a classic example of a
derivative: let t be the time and s be the distance covered by a mobile object. Suppose
the function s(t) indicates the total distance covered until time t. The di erence quotient
s= t is the average velocity in a time interval of t. Therefore, its derivative at a point
t0 can be interpreted as the instantaneous velocity at t0 . If space is measured in kilometers
and time in hours, the velocity is measured in km/h, that is, in \kilometers per hour" (as
speedometers do).
(ii) In applications, the dependent and independent variables y and x that appear in
a function y = f (x) take a concrete meaning and are both evaluated in terms of a unit of
1
Since the domain (a; b) is an open interval, for h su ciently small we have x + h 2 (a; b).
2
Di erent notations for the same mathematical object can be convenient in di erent contexts. For this
reason, it may be important to have several notations at hand (provided they are then used consistently).
790 CHAPTER 26. DERIVATIVES
measure (e, $, kg, liters, years, miles, parsecs, etc.): if we denote by T the unit of measure of
the dependent variable y and by S that of the independent variable x, the di erence quotient
y= x (and so the derivative, if it exists) is then expressed in the unit of measure T =S. For
instance, if in the initial example the cost is expressed in euros and the quantity produced
in quintals, the di erence quotient (26.1) is expressed in e/q, that is, in \euros per quintal".
(iii) The notation df =dx (or the equivalent dy=dx) wants to suggest that the derivative
is a limit of ratios.3 Note, however, that df =dx is only a symbol, not a true ratio { indeed,
it is the limit of ratios. Nevertheless, heuristically it is often treated as a true ratio (see, for
example, the remark on the chain rule at the end of Section 26.9). This can be a useful trick
to help our intuition as long as what found is then checked formally.
(iv) The terminology \derivable at" is not so common, but its motivation will become
apparent in Section 26.12.2. In any case, a function f : (a; b) ! R which is derivable at each
point of (a; b) is called derivable, without any further quali cation.
y
5
f(x +h)
4 0
3
f(x )
0
2
0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6
f (x0 + h) f (x0 )
y = f (x0 ) + (x x0 ) (26.6)
h
3
This notation is due to Leibniz, while the f 0 notation is due to Lagrange.
26.3. GEOMETRIC INTERPRETATION 791
which is the equation of the sought-after straight line passing through the points (x0 ; f (x0 ))
and (x0 + h; f (x0 + h)). Taking the limit as h ! 0, we get
that is, the equation of the straight line which is tangent to the graph of f at the point
(x0 ; f (x0 )) 2 Gr f .
As h tends to 0, the straight line (26.6) thus tends to the tangent (straight) line, whose
slope is the derivative f 0 (x0 ). The graph of the tangent line is:
y
5
f(x +h)
4 0
3
f(x )
0
2
0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6
In sum, geometrically the derivative can be regarded as the slope of the tangent line at
the point (x0 ; f (x0 )). In turn, the tangent line can be regarded as a local approximation
of the function f at x0 , a key observation that will be developed through the fundamental
notion of di erential (Section 26.12).
The derivative exists at each x 2 R and is given by 2x. For example, the derivative at x = 1
is f 0 (1) = 2, with tangent line
y = f (1) + f 0 (1) (x 1) = 2x 2
792 CHAPTER 26. DERIVATIVES
y
3
0
-1 O 1 x
-1
-2
-2 -1 0 1 2 3
y
3
0
-1 O 1 x
-1
-2
-2 -1 0 1 2 3
In this case the tangent line is horizontal (constant) and is always equal to 1. N
Example 1211 Consider a constant function f : R ! R, that is, f (x) = k for every x 2 R.
For every h 6= 0 we have
f (x + h) f (x) k k
= =0
h h
and therefore f 0 (x) = 0 for every x 2 R. The derivative of a constant function is zero. N
with graph:
10
0
x
-2
-4
-6
-8
-10
-5 0 5
At a point x 6= 0 we have
1 1
f (x + h) f (x) x (x + h)
f 0 (x) = lim = lim x+h x
= lim
h!0 h h!0 h h!0 hx (x + h)
h 1 1
= lim = lim =
h!0 hx (x + h) h!0 x (x + h) x2
The derivative exists at each x 6= 0 and is given by x 2. For example, the derivative at
x = 1 is f 0 (1) = 1, and at x = 2 is f 0 ( 2) = 1=4.
If we consider the origin x = 0 we have, for h 6= 0,
1
f (x + h) f (x) h 0 1
= =
h h h2
so that
f (x + h) f (x)
lim = +1
h!0 h
The limit is not nite and hence the function does not have a derivative at x = 0. Recall
that the function is not continuous at this point (Example 554). N
( p
x if x 0
f (x) = p
x if x < 0
794 CHAPTER 26. DERIVATIVES
with graph:
3.5
3 y
2.5
1.5
0.5
0
O x
-0.5
-1
-1.5
-2
-6 -4 -2 0 2 4 6 8
De nition 1214 Let f : (a; b) ! R be a function with domain of derivability D (a; b).
0 0
The function f : D ! R that to each x 2 D associates the derivative f (x) is called the
derivative function of f .
The derivative function f 0 describes the derivative of f at the di erent points where it
exists, thus describing its overall behavior. In the examples previously discussed:
26.5. ONE-SIDED DERIVATIVES 795
(iii) for f (x) = 1=x = x 1, the derivative function f 0 : R f0g ! R is given by f 0 (x) =
x 2.
The notion of derivative function allows to frame in a bigger picture the computations
that we did in the examples of the last section: to compute the derivative of a function f at
a generic point x of the domain amounts to computing its derivative function f 0 . When we
have found that the derivative of f (x) = x2 1 is, at any point x 2 R, given by 2x, we have
actually found that its derivative function f 0 : R ! R is given by f 0 (x) = 2x.
Example 1215 Let r : R+ ! R be the return function and c : R+ ! R be the cost function
of a producer (see Section 22.1.4). The derivative function r0 : D R+ ! R is called the
marginal return function, and the derivative function c0 : D R+ ! R is called the marginal
cost function. Their economic interpretation should be, by now, clear. N
De nition 1216 A function f : (a; b) ! R is said to be derivable from the right at the
point x0 2 (a; b) if the one-sided limit
f (x0 + h) f (x0 )
lim (26.8)
h!0+ h
exists and is nite, and to be derivable from the left at x0 2 (a; b) if the one-sided limit
f (x0 + h) f (x0 )
lim (26.9)
h!0 h
When it exists and is nite, the limit (26.8) is called the right derivative of f at x0 , and
it is denoted by f+0 (x0 ). Analogously, when it exists and is nite, the limit (26.9) is called
left derivative of f at x0 , and it is denoted by f 0 (x0 ). Since two-sided limits exist if and
only if both one-sided limits exist (Proposition 521), we have:
1
1
0
O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
Through unilateral derivatives we can classify two important classes of points where
derivability fails. Speci cally, a point x0 of the domain of f is called:
(i) a corner point if the right derivative and the left derivative exist but are di erent, i.e.,
f+0 (x0 ) 6= f 0 (x0 );
(ii) a cuspidal point (or a cusp) if the right and left limits of the di erence quotient are
in nite with di erent sign:
f (x0 + h) f (x0 ) f (x0 + h) f (x0 )
lim = 1 and lim = 1
h!0+ h h!0 h
26.5. ONE-SIDED DERIVATIVES 797
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
At x0 = 0 we have
(
f (x0 + h) f (x0 ) jhj 1 if h > 0
= =
h h 1 if h < 0
The two-sided limit of the di erence quotient does not exist at 0, so the function is not
derivable at 0. Nevertheless, at 0 there exist the one-sided derivatives. In particular,
f (0 + h) f (0) f (0 + h) f (0)
f+0 (0) = lim =1 ; f 0 (0) = lim = 1
h!0+ h h!0 h
The origin x0 = 0 is, therefore, a corner point. The reader can check that the function is
derivable at each point x 6= 0, with
(
0 1 if x > 0
f (x) =
1 if x < 0
( p
x if x 0
f (x) = p
x if x < 0
798 CHAPTER 26. DERIVATIVES
has a cuspidal point at the origin x = 0, as we can see from its graph:
3.5
3 y
2.5
1.5
0.5
0
O x
-0.5
-1
-1.5
-2
-6 -4 -2 0 2 4 6 8
We close by noting that the right and left derivative functions are de ned in the same
way, mutatis mutandis, as the derivative function. In Example 1219, the one-sided derivative
functions f+0 : R ! R and f 0 : R ! R are given by
( (
0 1 if x 0 0 1 if x > 0
f+ (x) = and f (x) =
1 if x < 0 1 if x 0
Proof We have to prove that limx!x0 f (x) = f (x0 ). Since f is derivable at x, the limit of
the di erence quotient exists and is nite, and it is equal to f 0 (x0 ):
f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
Let us rewrite the limit by setting x = x0 + h, so that h = x x0 . Observing that, as h
tends to 0, we have that x tends to x0 , we get:
f (x) f (x0 )
lim = f 0 (x0 )
x!x0 x x0
Therefore, by the algebra of limits (Proposition 333) we have:
f (x) f (x0 ) f (x) f (x0 )
lim (f (x) f (x0 )) = lim (x x0 ) = lim lim (x x0 )
x!x0 x!x0 x x0 x!x0 x x0 x!x0
0 0
= f (x0 ) lim (x x0 ) = f (x0 ) 0 = 0
x!x0
26.6. DERIVABILITY AND CONTINUITY 799
where the last equality holds since f 0 (x0 ) exists and is nite. We have thus proved that
limx!x0 (f (x) f (x0 )) = 0. On the other hand, again by algebra of limits, we have:
0 = lim (f (x) f (x0 )) = lim f (x) lim f (x0 ) = lim f (x) f (x0 )
x!x0 x!x0 x!x0 x!x0
Derivability at a point thus implies continuity at that point. The converse is false: the
absolute value function f (x) = jxj is continuous at x = 0 but is not derivable at that point
(Example 1219). In other words, continuity is a necessary, but not su cient, condition for
derivability.4
Proposition 1221, and the examples seen until now, allow us to identify ve possible
causes of non-derivability at a point x:
(iv) f has at x a point at which a one-sided derivative exist but, at the other side, the limit
of the di erence quotient is +1 or 1; for example, the function
( p
x if x 0
f (x) =
x if x < 0
seen in Example 1213 has a vertical tangent at x = 0 because limh!0 f (h) =h = +1.
The ve cases just identi ed are, however, not exhaustive: there are other sources of
non-derivability. For example, the function
8
< x sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
is continuous everywhere.5 At the origin x0 = 0 it is, however, not derivable because the
limit
f (x0 + h) f (x0 ) h sin h1 0 1
lim = lim = lim sin
h!0 h h!0 h h!0 h
4
In the coda we say more on this important issue.
5
Indeed, limx!0 x sin (1=x) = 0 because jsin (1=x)j 1 and so x x sin (1=x) x.
800 CHAPTER 26. DERIVATIVES
does not exist. The origin is not a corner point and there is no vertical tangent at this point.
The lack of derivability here is due to the fact that f has, in any neighborhood of the origin,
in nitely many oscillations { which are such that the di erence quotient sin (1=h) oscillates
in nitely many times between 1 and 1. Note that in this example the one-sided derivatives
f+0 (0) and f 0 (0) do not exist as well.
We close with an important piece of terminology, often used in the rest of the book.
Terminology When f is derivable at all the interior points (a; b) and is one-sided derivable
at the endpoints a and b, we say that it is derivable on the closed interval [a; b]. It is
immediate to see that f is then also continuous on such interval.
The next proposition clari es the import of this piece of terminology by showing that a
function is derivable on a closed interval if and only if it is the restriction on such interval
of a function which is derivable on the entire real line. So, one can always regard a function
derivable on a closed interval as a restriction of a function derivable everywhere.
Proof We prove only the \only if", the converse being obvious. So, let f : [a; b] ! R be
derivable on [a; b], that is, derivable at all the interior points (a; b) and one-sided derivable
at the endpoints a and b. De ne g : R ! R by
8
< f (a) + f+0 (a) (x a) if x < a
g (x) = f (x) if x 2 [a; b]
: 0
f (b) + f (b) (x b) if x > b
Clearly, f is the restriction of g on [a; b], i.e., g (x) = f (x) for all x 2 [a; b]. Next we show
that g is derivable on R. It is easily seen to be derivable on ( 1; a) [ (a; b) [ (b; 1). Indeed,
g 0 (x) = f 0 (x) for all x 2 (a; b), g 0 (x) = f+0 (a) for all x < a, and g 0 (x) = f 0 (b) for all
x > b. The only points which are delicate are x = a and x = b. By hypothesis, we have
g+0 (a) = f 0 (a) and g 0 (b) = f 0 (b). At the same time, we have
+
f 0 (x) = nxn 1
(26.10)
26.7. DERIVATIVES OF ELEMENTARY FUNCTIONS 801
For example, the function f (x) = x5 has derivative function f 0 (x) = 5x4 and the function
f (x) = x3 has derivative function f 0 (x) = 3x2 .
We give two proofs of this basic result.
f (x + h) f (x) (x + h)n xn
f 0 (x) = lim = lim
h!0 h h!0 h
Pn n! n k k n
k=0 k!(n k)! x h x
= lim
h!0 h
n
x + nx n 1 h + n(n2 1) xn 2 h2 + + nxhn 1 + hn xn
= lim
h!0 h
n (n 1) n 2
= lim nxn 1 + x h+ + nxhn 2 + hn 1
h!0 2
= nxn 1
as claimed.
Proof 2 We establish (26.10) by induction, using the derivative of the product of functions
(see Section 26.8). First, we show that the derivative of the function f (x) = x is equal to 1.
The limit of the di erence quotient of f is
f (x + h) f (x) x+h x h
lim = lim = lim =1
h!0 h h!0 h h!0 h
Therefore f 0 (x) = 1, so (26.10) thus holds for n = 1. Suppose that (26.10) holds for n 1
(induction hypothesis), that is,
D(xn 1
) = (n 1)xn 2
Consider the function xn = x (xn 1 ). Using the derivative of the product of functions (see
26.13 below) and the induction hypothesis, we have
D(xn ) = 1 (xn 1
) + x D(xn 1
) = (xn 1
) + x (n 1)xn 2
= (1 + n 1)(xn 1
) = nxn 1
f 0 (x) = x
log
In particular, dex =dx = ex , that is, the derivative function of the exponential function is
the exponential function itself. So, the exponential function equals its derivative function, a
truly remarkable invariance property that gives the exponential function a special status in
di erential calculus.
802 CHAPTER 26. DERIVATIVES
Proof We have
x+h x x h 1
f (x + h) f (x)
f 0 (x) = lim = lim = lim
h!0 h h!0 h h!0 h
h 1
= x lim = x log
h!0 h
where the last equality follows from the basic limit (12.34).
f 0 (x) = cos x
Proof From the basic trigonometric formula sin (a + b) = sin a cos b + cos a sin b, it follows
that
f (x + h) f (x) sin (x + h) sin x
f 0 (x) = lim = lim
h!0 h h!0 h
sin x cos h + cos x sin h sin x
= lim
h!0 h
sin x (cos h 1) + cos x sin h
= lim
h!0 h
cos h 1 sin h
= sin x lim + cos x lim = cos x
h!0 h h!0 h
The last equality follows from the basic limits (12.32) and (12.31) for cos x and sin x, respec-
tively.
In a similar way it is possible to prove that the function f : R ! R given by f (x) = cos x
is derivable at each x 2 R, with derivative function f 0 : R ! R given by
Proposition 1226 Let f; g : (a; b) ! R be two derivable functions at x 2 (a; b). The sum
function f + g : (a; b) ! R is derivable at x, with
The result actually holds, more generally: for any linear combination f + g : (a; b) ! R,
with ; 2 R, we have
( f + g)0 (x) = f 0 (x) + g 0 (x) (26.12)
In particular, the derivative of f (x) is f 0 (x).
Proof We prove the result directly in the more general form (26.12). We have
( f + g) (x + h) ( f + g) (x)
( f + g)0 (x) = lim
h!0 h
( f )(x + h) + ( g) (x + h) ( f )(x) ( g) (x)
= lim
h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim +
h!0 h h
f (x + h) f (x) g (x + h) g (x)
= lim + lim
h!0 h h!0 h
0 0
= f (x) + g (x)
as desired.
Thus, the sum behaves in a simple manner with respect to derivatives: the \derivative of
a sum" is the \sum of the derivatives".6 More subtle is the case of the product of functions.
Proposition 1227 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b). The product
function f g : (a; b) ! R is derivable at x, with
Proof We have
(f g) (x + h) (f g) (x) f (x + h) g (x + h) f (x) g (x)
(f g)0 (x) = lim = lim
h!0 h h!0 h
f (x + h) g (x + h) f (x) g (x + h) + f (x) g (x + h) f (x) g (x)
= lim
h!0 h
g (x + h) (f (x + h) f (x)) + f (x) (g (x + h) g (x))
= lim
h!0 h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim +
h!0 h h
g (x + h) (f (x + h) f (x)) f (x) (g (x + h) g (x))
= lim + lim
h!0 h h!0 h
f (x + h) f (x) g (x + h) g (x)
= lim g (x + h) lim + f (x) lim
h!0 h!0 h h!0 h
0 0
= g (x) f (x) + f (x) g (x)
6
The converse does not hold: if the sum of two functions has a derivative, it is not necessarily true that
the individual functions have a derivative (for example, the origin is a corner point of both f (x) = jxj and
g (x) = jxj, but the sum f + g is a constant function that has derivative at every point of the real line).
The same is true for the multiplication and division operations on functions.
804 CHAPTER 26. DERIVATIVES
as desired. In the last step we have limh!0 g (x + h) = g (x) thanks to the continuity of g,
which is ensured by its derivability.
The derivative of the product, therefore, is not the product of the derivatives, but it is
given by the more subtle product rule (26.13). A similar rule { the so-called quotient rule {
holds mutatis mutandis for the quotient.
Proposition 1228 Let f; g : (a; b) ! R be two functions derivable at x 2 (a; b), with
g (x) 6= 0. The quotient function f =g : (a; b) ! R is derivable at x, with
0
f f 0 (x) g (x) f (x) g 0 (x)
(x) = (26.14)
g g (x)2
Proof We start with the case in which f is constant and equal to 1. We have
1 1
0
1 g (x + h) g (x) g (x) g (x + h)
(x) = lim = lim
g h!0 h h!0 g (x) g (x + h) h
1 g (x) g (x + h)
= lim
g (x) h!0 g (x + h) h
1 g (x + h) g (x) 1 g 0 (x)
= lim lim =
g (x) h!0 h h!0 g (x + h) g (x)2
Now consider any f : (a; b) ! R. Thanks to (26.13), we have
0
f 1 0 1 1 0
(x) = f (x) = f 0 (x) (x) + f (x) (x)
g g g g
f 0 (x) g 0 (x) f 0 (x) g 0 (x)
= + f (x) = f (x)
g (x) g (x)2 g (x) g (x)2
f 0 (x) g (x) f (x) g 0 (x)
=
g (x)2
as desired.
Example 1229 (i) Let f; g : R ! R be given by f (x) = x3 and g (x) = sin x. We have
and
(f g)0 (x) = 3x2 sin x + x3 cos x 8x 2 R
as well as 0
f 3x2 sin x x3 cos x
(x) = 8x 2 R fn : n 2 Zg
g sin2 x
In the last formula fn : n 2 Zg is the set of the points f ; 2 ; ; 0; ; 2 ; g where
the function g (x) = sin x in the denominator is zero.
(ii) Let f : R ! R be given by f (x) = tan x. Since tan x = sin x= cos x, we have
1
f 0 (x) = 1 + tan2 x =
cos2 x
26.8. ALGEBRA OF DERIVATIVES 805
c (x)
cm (x) =
x
By the quotient rule, we have
c(x)
xc0 (x) c (x) x c0 (x) x c0 (x) cm (x)
c0m (x) = = =
x2 x2 x
Since x > 0, we have
Therefore, at a point x the variation in average costs is positive if and only if marginal costs
are larger than average costs. In other words, average costs continue to increase until they
are lower than marginal costs (cf. the numerical examples with which we began the chapter).
More generally, the same reasoning holds for each function f : [0; 1) ! R that represents,
when x 0 varies, an economic \quantity": return, pro t, etc.. The function fm : (0; 1) !
R de ned by
f (x)
fm (x) =
x
is the corresponding \average quantity" (average return, average pro t, etc.), while the
derivative function f 0 (x) represents the \marginal quantity" (marginal return, marginal
pro t, etc.). At each x > 0, the function f 0 (x) can be interpreted geometrically as the slope
of the tangent line of f at x, while fm (x) is the slope of the straight line passing through
the origin and the point (x; f (x)).
150 150
y y
f(x)
100 100
f'(x)
50 50
f(x)/x
0 0
O x O x
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5
Geometrically, (26.15) says that the variation of the average fm is positive at a point x > 0,
0 (x)
that is, fm 0, until the slope of the tangent line is larger than that of the straight line
passing through the origin and the point (x; f (x)), that is, f 0 (x) fm (x). N
806 CHAPTER 26. DERIVATIVES
Proposition 1230 Let f : (a; b) ! R and g : (c; d) ! R be two functions with Im f (c; d).
If f is derivable at x 2 (a; b) and g is derivable at f (x), then the composite function g f :
(a; b) ! R is derivable at x, with
Thus, the chain rule features the product of the derivatives g 0 and f 0 , where g 0 has as its
argument the image f (x). Before proving it, we provide a simple heuristic argument. For h
small enough, we have
g (f (x + h)) g (f (x)) g (f (x + h)) g (f (x)) f (x + h) f (x)
=
h f (x + h) f (x) h
If h ! 0, then
g (f (x + h)) g (f (x)) g (f (x + h)) g (f (x)) f (x + h) f (x)
lim = lim lim
h!0 h h!0 f (x + h) f (x) h!0 h
= g (f (x)) f 0 (x)
0
Note that we tacitly assumed that the denominator f (x + h) f (x) is always di erent from
zero, something that the hypotheses of the theorem do not guarantee. For this reason, we
need the following rigorous proof.
which holds also for k = 0. Choose h small enough and set k = f (x + h) f (x). Since f is
derivable at x, f is continuous at x, so k ! 0 as h ! 0. By (26.17), we have
It follows that
g (f (x + h)) g (f (x)) [f (x + h) f (x)]
= g 0 (f (x)) + o (1) ! g 0 (f (x)) f 0 (x) ;
h h
proving the statement.
26.9. THE CHAIN RULE 807
and
(f g)0 (x) = f 0 (g (x)) g 0 (x) = 3 sin2 x cos x
Example 1232 Let f : (a; b) ! R be any function derivable at each x 2 (a; b) and let
g (x) = ex . We have
(g f )0 (x) = g 0 (f (x)) f 0 (x) = ef (x) f 0 (x) (26.18)
4 4
For example, if f (x) = x4 , (g f ) (x) = ex and (26.18) becomes (g f )0 (x) = 4x3 ex . N
The chain rule is very useful to compute the derivative of a function that can be written
as a composition of other functions.
Example 1233 Let ' : R ! R be given by ' (x) = sin3 (9x + 1). To calculate '0 (x) it is
useful to write ' as
'=f g h (26.19)
'0 (x) = f 0 ((g h) (x)) (g h)0 (x) = f 0 ((g h) (x)) g 0 (h (x)) h0 (x)
= 3 sin2 (9x + 1) cos (9x + 1) 9 = 27 sin2 (9x + 1) cos (9x + 1)
Expressing the function ' as in (26.19) thus simpli es the computation of its derivative. N
O.R. If we write z = f (x) and y = g (z), we clearly have y = g (f (x)). What we have
proved can be summarized by stating that
dy dy dz
=
dx dz dx
O.R. The chain rule has an onion avor because the derivative of a composite function is
obtained by successively \peeling" the function from the outside:
H
808 CHAPTER 26. DERIVATIVES
1 0 1
f (y0 ) = (26.20)
f 0 (x0 )
In short, the derivative of the inverse function of f , at y0 , is the reciprocal of the derivative
of f , at x0 . The geometric intuition of this result should be clear once one remembers that
the graph of the inverse function is the mirror image of the graph of the function with respect
to the 45 degree line (see Section 6.4.2).
It would be nice to invoke the chain rule and say that, from y0 = f f 1 (y0 ) it
0 0
follows that 1 = f 0 f 1 (y0 ) f 1 (y0 ), so that 1 = f 0 (x0 ) f 1 (y0 ), which is formula
(26.20). Unfortunately, we cannot use the chain rule because we are not sure (yet) that f 1
is derivable: indeed, this is what we rst need to prove in this theorem.
Proof Set f (x0 + h) = y0 + k and observe that, by the continuity of f , when h ! 0, also
k ! 0. By the de nition of inverse function, x0 = f 1 (y0 ) and x0 + h = f 1 (y0 + k).
Therefore, h = f 1 (y0 + k) f 1 (y0 ). By hypothesis, there exists
f (x0 + h) f (x0 )
lim = f 0 (x0 )
h!0 h
But
f (x0 + h) f (x0 ) y0 + k y0 1
= 1 (y
= 1 (y 1 (y
h f 0 + k) f 1 (y0 ) f 0 + k) f 0)
k
Therefore, provided f 0 (x0 ) 6= 0, the limit of the ratio
f 1 (y + k) f 1 (y
0 0)
k
as k ! 0 also exists, and it is the reciprocal of the previous one, i.e., f 1 0 (y = 1=f 0 (x0 ).
0)
The derivative of the inverse function is thus given by a unit fraction in which at the
denominator the derivative f 0 has as its argument the preimage f 1 (y), that is,
1 0 1 1
f (y) = =
f 0 (x) f 0 (f 1 (y))
This example, along with the chain rule, yield the important formula
d log f (x) f 0 (x)
=
dx f (x)
for strictly positive derivable functions f . It is the logarithmic version of (26.18).
The last example, again along with the chain rule, also leads to an important generaliza-
tion of Proposition 1223.
f 0 (x) = axa 1
Proof We have
a
xa = elog x = ea log x (26.21)
Setting f (x) = ex and g (x) = a log x, from (26.21) it follows that
d (xa ) a a
= f 0 (g (x)) g 0 (x) = ea log x = xa = axa 1
dx x x
as desired.
Condition f 0 (x0 ) 6= 0 cannot be omitted in Theorem 1234. Indeed, when this condition
fails, anything goes: the inverse may or may not be derivable at y0 = f (x0 ), something that
one has to check directly. The next example illustrates.
1
Example 1239 De ne f; g : R ! R by f (x) = x3 and g (x) = x 3 . We have:
O.R. Denoting by y = f (x) a function and by x = f 1 (y) its inverse, we can summarize
what we have seen by writing
dx 1
=
dy dy
dx
Again the symbol d =d behaves like a true ratio, a further proof of its Pinocchio nature.H
We relegate to an example the derivative of a function with variable base and exponent.
g(x)
F (x) = elog[f (x)] = eg(x) log f (x)
f 0 (x)
F 0 (x) = eg(x) log f (x) D [g (x) log f (x)] = F (x) g 0 (x) log f (x) + g (x)
f (x)
dxx 1
= xx log x + x = xx (1 + log x)
dx x
2
while the derivative of F (x) = xx is
2
dxx 2 1 2 +1
= xx 2x log x + x2 = xx (1 + 2 log x)
dx x
x
The reader can try to calculate the derivative of F (x) = xx . N
26.11. FORMULARY 811
26.11 Formulary
The chain rule permits to broaden considerably the scope of the results on the derivatives of
elementary functions seen in Section 26.7. In Example 1232 we already saw how to calculate
the derivative of a generic function ef (x) , which is much more general than the exponential
ex of Proposition 1224.
In a similar way it is possible to generalize all the results on the derivatives of elemen-
tary functions seen until now. We summarize all this in two tables: the rst one lists the
derivatives of elementary functions, while the second one contains its generalization that can
be obtained through the chain rule.
f f0 Reference
k 0 Example 1211
xa axa 1 Proposition 1236
ex ex Proposition 1224
x x log Proposition 1224
1
log x Example 1235
x
1
loga x Exercise for the reader
x log a
sin x cos x Proposition 1225
cos x sin x Observation 26.11
1
tan x = 1 + tan2 x Example 1229
cos2 x
1
cotanx = cotan2 x Exercise for the reader
sin2 x
1
arcsin x p Example 1237
1 x2
1
arccos x p Exercise for the reader
1 x2
1
arctan x Example 1238
1 + x2
1
arccotanx Exercise for the reader
1 + x2
Given their importance in so many contexts, it is useful to memorize the previous table,
as one learned as a child by heart the multiplication tables. Let us now see its general
version obtained through the chain rule. In the next table, f are the elementary functions
of the previous table, while g is any derivable function. Most of the derivatives that arise in
812 CHAPTER 26. DERIVATIVES
f g (f g)0 Image of g
a
g (x) ag (x)a 1 g 0 (x) A R
eg(x) g 0 (x) eg(x) A R
g(x) g 0 (x) g(x) log A R
g 0 (x)
log g (x) A (0; 1)
g (x)
g 0 (x) 1
loga g (x) A (0; 1)
g (x) log a
sin g (x) g 0 (x) cos g (x) A R
cos g (x) g 0 (x) sin g (x) A R
g 0 (x)
tan g (x) = g 0 (x) 1 + tan2 g (x) A R
cos2 g (x)
g 0 (x)
arcsin g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arccos g (x) p A [0; 1]
1 g 2 (x)
g 0 (x)
arctan g (x) A R
1 + g 2 (x)
26.12.1 Di erential
A fundamental question is whether it is possible to approximate a function f : (a; b) ! R
locally { that is, in a neighborhood of a given point of its domain { by an a ne function,
namely, by a straight line (recall Proposition 820). If this is possible, we could locally
approximate the function { even if very complicated { by the simplest function: a straight
line.
To make precise this idea, given a function f : (a; b) ! R and a point x0 2 (a; b), suppose
that there exists an a ne function r : R ! R that approximates f at x0 in the sense that
adequate approximation of f at x0 . First, the straight line must coincide with f at x0 , that
is, f (x0 ) = r (x0 ): at the point x0 the approximation must be exact, without any error.
Second, and most important, the approximation error
f (x0 + h) r (x0 + h)
at x0 + h is o (h), that is, as x0 + h approaches x0 , the error goes to zero faster than h: the
approximation is (locally) \very good".
Since the straight line r can be written as r (x) = mx + q, the condition f (x0 ) = r (x0 )
implies
r (x0 + h) = m (x0 + h) + q = mh + mx0 + q = mh + f (x0 )
Denote by l : R ! R the linear function de ned by l (h) = mh, which geometrically is
a straight line passing through the origin. The approximation condition (26.22) can be
equivalently written as
This expression (26.23) emphasizes the linearity of the approximation l (h) of the di erence
f (x0 + h) f (x0 ), as well as the goodness of this approximation: the di erence f (x0 + h)
f (x0 ) l (h) is o (h). This emphasis is important and motivates the following de nition.
for every h 2 (a x0 ; b x0 ).
In other words, the de nition requires that there exists a number m 2 R, independent
of h but dependent on x0 , such that
O.R. Di erentiability says that a function can be well approximated by a straight line { that
is, by the simplest type of function { at least nearby the point of interest. The approximation
is good in the close proximity of the point but, as we move away from it, in general its quality
deteriorates rapidly. Such an approximation, even if rough, however conveys at least two
valuable pieces of information:
(i) its mere existence ensures that the function is well behaved (it is continuous);
(ii) it reveals whether the function goes up or down and, with its slope, it tells us approx-
imately which is the rate of change of the function at the point studied.
These two pieces of information are often useful in applications. Chapter 29 will study
in more depth these issues and will present sharper local approximations. H
The di erential at a point can be thus written in terms of the derivative at that point.
Inter alia, this also shows the uniqueness of the di erential df (x0 ).
The linear function l : R ! R is a straight line passing through the origin, so there exists
m 2 R such that l (h) = mh. Hence
Di erentiability and derivability are, therefore, equivalent notions for scalar functions.
When they hold, we have, as h ! 0,
or, equivalently, as x ! x0 ,
is the equation of the tangent line at x0 . This con rms the natural intuition that such line
is the a ne approximation that makes f di erentiable at x0 . Graphically:
y
5
f(x +h)
4 0
3
f(x )
0
2
0
O a x x +h b x
0 0
-1
-1 0 1 2 3 4 5 6
O.R. The di erence f (x0 + h) f (x0 ) is called the increment of f at x0 and is often denoted
by f (x0 ) (h). When f is di erentiable at x0 , we have
So,
f (x0 ) df (x0 ) as h ! 0
816 CHAPTER 26. DERIVATIVES
The two in nitesimals f (x0 ) and df (x0 ) are, therefore, of the same order. This is another
way of saying that, when f is di erentiable at x0 , the di erential well approximates the true
increment. H
The converse is clearly false, as shown by the absolute value function f (x) = jxj at
x0 = 0.
Therefore, f is continuous at x0 .
Example 1245 (i) The quadratic function f : R ! R given by f (x) = x2 is twice di eren-
tiable at all points of the real line, with derivative function f 0 : R ! R given by
f 0 (x) = 2x
Let
D0 D
be the domain of di erentiability of f 0 (i.e., the collection of points where the second deriva-
tive exists). The second derivative function f 00 : D0 ! R associates to every x 2 D0 the
second derivative f 00 (x).
Example 1246 (i) The quadratic function has second derivative f 00 : R ! R given by
f 00 (x) = 2 for all x 2 R. (ii) The logarithmic function has second derivative f 00 : (0; 1) ! R
given by f 00 (x) = x 2 . N
D00 D0 (26.31)
denotes the domain of di erentiability of f 00 , we can write the third derivative function as
f 000 : D00 ! R.
818 CHAPTER 26. DERIVATIVES
Example 1247 (i) The quadratic function is three times di erentiable at all points of the
real line: its second derivative function f 00 : R ! R has derivative f 000 (x) = 0 at each x 2 R.
Thus, the third derivative function f 000 : R ! R of the quadratic function is the zero function.
(ii) The logarithmic function is three times di erentiable at all strictly positive points of
the real line: its second derivative function f 00 : R ! R has derivative f 000 (x) = 2x 3 at each
x > 0. So, the third derivative function f 000 : (0; 1) ! R of the logarithmic function is given
by f 000 (x) = 2x 3 .
(iii) De ne f : R ! R by
( 3
3
x for x 0
f (x) = jxj =
( x)3 for x < 0
We have ( (
3x2 for x 0 3x2 for x 0
0
f (x) = =
3 ( x)2 for x < 0 3x2 for x < 0
and (
6x for x 0
f 00 (x) = 6 jxj =
6x for x < 0
Thus, f 000 (0) does not exist and
(
6 for x > 0
f 000 (x) =
6 for x < 0
We can iterate ad libitum, with fourth derivative, fth derivative, and so on. Denoting
by
f (n) : D(n 1)
!R
the n-th derivative function of f : (a; b) ! R, we can de ne by recurrence the di erentiability
of higher order of a function.
f (n 1) (x + h) f (n 1) (x)
lim (26.32)
h!0 h
exists nite.
For n = 0 we put f (0) = f . When n = 1, we have ordinary di erentiability and the limit
(26.32) de nes the ( rst) derivative. When n = 2, this limit de nes the second derivative,
and so on.
Example 1249 (i) The quadratic function is n times di erentiable for each n 1. Its n-th
derivative function f (n) : R ! R is given, for each n 3, by f (n) (x) = 0:
(ii) The function f : R ! R given by f (x) = e x is n times di erentiable for each n 1.
Its n-th derivative function f (n) : R ! R is given by f (n) (x) = ( 1)n e x . N
26.13. HIGHER ORDER DERIVATIVES 819
Functions can be classi ed according to their degree of di erentiability. Speci cally, when
the derivative function f (n) : D(n 1) ! R is continuous on a subset E of its domain, we say
that f is n times continuously di erentiable on E. As usual, when E = (a; b) the function
is said to be n times continuously di erentiable, without further speci cation.
We denote by C n (E) the collection of all n times continuously di erentiable functions
on E. For n = 1 we go back to the class C 1 (E) of the continuously di erentiable functions,
previously introduced.
f 0 (x) = 4x3 ; f 00 (x) = 12x2 ; f 000 (x) = 24x; f (iv) (x) = 24; f (v) (x) = 0
and f (n) (x) = 0 for every n 5. The function f is thus n times continuously di erentiable,
so f 2 C n (R), for all n 1.
(ii) The function f : R ! R given by
( 2
x for x > 0
f (x) =
0 for x 0
As derivability implies continuity, we have the following simple but interesting result.
Example 1252 The quadratic function and the function f : R ! R given by f (x) = x4
both belong to C 1 (R). N
Observe that the di erence quotient in (26.30) is meaningful when the point x 2 D is
interior, i.e., when
x 2 int D
8
In nitely di erentiable means that f has derivatives of all orders on (a; b). To ease notation we write
C (a; b) in place of C n ((a; b)).
n
820 CHAPTER 26. DERIVATIVES
This interiority condition ensures the existence of a neighborhood B" (x) = (x "; x + ") of
x contained in D, so that the values f 0 (x + h) are well de ned for all x + h 2 B" (x), i.e.,
for all " < h < ".
Thus, D0 consists of interior points, that is,
D0 int D
Similarly, we have:
For this reason, in De nition 1248 we tacitly (to ease exposition) assumed that x 2 int D(n 1) ,
an hypothesis that we now make explicit. This awareness leads to a lemma that will come
in handy when studying Taylor approximations.
Proof Let f be n times di erentiable at x0 2 (a; b). Hence, x0 belongs to the domain D(n 1)
of the n-th derivative function f (n) . Since D(n 1) int D(n 2) , there exists a neighborhood
B" (x0 ) included in D(n 2) , i.e., in the domain of the (n 1)-th derivative function f (n 1) .
As, by de nition, f is n 1 times di erentiable on D(n 2) , so does on B" (x0 ).
This formula generalizes formula (26.12). The same is true for products f g, with the so-called
Leibnitz's formula
n
X n (n k) (k)
(f g)(n) (x0 ) = f g
k
k=0
This interesting formula reduces to the standard product formula (26.13) when n = 1:
1 (1) 1 (0)
(f g)0 (x0 ) = f (x0 ) g (0) (x0 ) + f (x0 ) g (1) (x0 ) = f 0 (x0 ) g (x0 ) + f (x0 ) g 0 (x0 )
0 1
Let us turn to composition. The chain rule (g f )0 (x) = g 0 (f (x)) f 0 (x) gives the rst
derivative of a composite function g f . It is sometimes important in applications to nd
26.13. HIGHER ORDER DERIVATIVES 821
the higher order derivatives of the composite function g f , that is, to extend the chain rule
to higher order derivatives. Some simple algebra shows that, when the involved derivatives
exist, we have:
(g f )000 (x) = g 000 (f (x)) f 0 (x)3 + 3g 00 (f (x)) f 00 (x) f 0 (x) + g 0 (f (x)) f 000 (x) (26.35)
A pattern seems to emerge in these formulas. Indeed, the next result establishes a powerful
combinatorial formula that gives the derivatives of any order of a composite function, a
higher order chain rule.
Theorem 1254 (Faa di Bruno) Let f : (a; b) ! R and g : (c; d) ! R be two functions
with Im f (c; d). If f is n times di erentiable at x 2 (a; b) and g is n times di erentiable
at f (x), then the composite function g f : (a; b) ! R is n times di erentiable at x, with
!kn
(n)
X n! f 0 (x) k1
f 00 (x) k2
f (n) (x)
(g f ) (x) = g (k) (f (x))
k1 !k2 ! kn ! 1 2! n!
P
where k = k1 + + kn and the sum is over all natural numbers k1 , k2 , ..., kn that solve
the equation
n = k1 + 2k2 + + nkn
This combinatorial formula for the derivative (g f )(n) is the so-called Faa di Bruno
formula.9 For instance, for n = 2 the sum is over all natural numbers k1 and k2 that solve
the equation
2 = k1 + 2k2
There are two solutions. The rst one is k1 = 2 and k2 = 0, which gives the term
2 0
2! (2) f 0 (x) f 0 (x)
g (f (x)) = g 00 (f (x)) f 0 (x)2
2!0! 1 2!
Formula (26.34) is thus a special case of the Faa di Bruno formula. The reader can check
that also (26.35) is a special case of this formula for n = 3.
k
1 X k
k
hf (x0 ) = ( 1)k i
f (x0 + ih)
hk i
i=0
where the last equality follows from (10.5). By de nition the rst derivative is the limit, as
h approaches 0, of the di erence quotient h f (x0 ). Interestingly, the next result shows that
also the second di erence quotient converges to the second derivative, the third di erence
quotient converges to the third derivative, and so on.
10
Here it is convenient to start the sequence at n = 0.
26.14. DISCRETE LIMITS 823
Proof We only prove the case n = 2. In Chapter 29 we will establish the following quadratic
approximation:
1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2
2
Then, f (x0 + 2h) = f (x0 ) + 2f 0 (x0 ) h + 2f 00 (x0 ) h2 + o h2 , so
as desired.11
Conceptually, this result shows that derivatives can be viewed as limits of nite di er-
ences, so the \discrete" and \continuous" calculi are consistent. Indeed, some important
continuous properties can be viewed as inherited, via limits, from discrete ones: for instance,
the algebra of derivatives can be easily deduced from that of nite di erences via limits. All
this is important (and, in a sense, reassuring) because discrete properties are often much
easier to grasp intuitively.
By establishing a \direct" characterization of second and of higher order derivatives, this
proposition is also important for their numerical computation. For instance, inspection of
the proof shows that f 00 (x0 ) = 2h f (x0 ) + o(h2 ). In general, 2h f (x0 ) is much easier to
compute numerically than f 00 (x0 ), with o(h2 ) being the magnitude of the approximation
error.
So far so good. Yet, from this formula one might be tempted to take ner and ner
subdivisions by letting n ! +1. For each k we have
n(k) nk
as well as
k
hf (x0 ) f (k) (x0 )
provided f is in nitely di erentiable. Indeed, by Proposition 1256 we have kh f (x0 ) !
f (k) (x0 ) as h ! 0, so as n ! +1. Unfortunately, the equivalence relation does not
necessarily go through sums, let alone through in nite ones (cf. Lemma 355). Yet, if we take
a leap of faith { in a eighteen century style { we \then" have a series expansion
1
X f (k) (x0 )
f (x) (x x0 )k 8x 2 R
k!
k=0
Fortunately, later in the book Chapter 30 will make rigorous all this by showing that in nitely
di erentiable functions that are analytic admit an (exact) series expansion, something that
makes them the most tractable class of functions. Though rough, the previous heuristic
argument based on discrete calculus thus opens a door on a key topic.
satis es the hypotheses of the theorem, so it is continuous and nowhere di erentiable. This
function is de ned, point by point, via a trigonometric series that can be proved to be
convergent. For instance, at x = 0 we have
X1 X1
1 1 1
f (0) = n
cos (0) = n
= 1 =2
2 2 1 2
n=0 n=0
26.15. CODA: WEIERSTRASS' MONSTER 825
Though f is de nitely not a friendly function, yet we can have a sense of its behavior via its
graph:
We omit the proof of this result, a version of a much more general theorem due to Hans
Rademacher.13 To appreciate its scope, recall that concave and convex functions de ned on
open convex sets are locally Lipschitz continuous (Theorem 904).
13
He actually quali ed the \somewhere" by showing that the set of points at which the function is not
di erentiable is small in a well de ned sense (it has \measure zero", as readers will learn in more advanced
courses).
826 CHAPTER 26. DERIVATIVES
Chapter 27
Our study of di erential calculus has so far focused on functions of a single variable. Its
extension to functions of several variables is a fundamental, but subtle, topic. We can begin,
however, with a simple notion of di erentiation in Rn : partial di erentiation. Let us start
with the two-dimensional case. Consider the origin x = (0; 0) in the plane. There are,
intuitively, two main directions along which to approach the origin: the horizontal one {
that is, moving along the horizontal axis { and the vertical one { that is, moving along the
vertical axis.
0.8
0.6
0.4
0.2
0
O
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.5 0 0.5 1
As we can approach the origin along the two main directions, vertical and horizontal, the
same can be done for any point x in the plane.
827
828 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
0.8
0.6
0.4
0.2
x0
2
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
To formalize this intuition, let us consider the two versors e1 = (1; 0) and e2 = (0; 1) in
R2 . For every x = (x1 ; x2 ) 2 R2 and every scalar h 2 R, we have
Graphically
0.8
0.6
0.4
0.2
x 1
x + he
x0
2
-0.2
-0.4
-0.6
O x x +h
-0.8 1 1
-1
-1 -0.5 0 0.5 1
The set
x + he1 : h 2 R
is, therefore, formed by the vectors of R2 with the same second coordinate, but with a
di erent rst coordinate.
27.1. PARTIAL DERIVATIVES 829
0.8
0.6
0.4
0.2
1
x x { x + he , h ∈ ℜ }
02
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
Graphically, it is the horizontal straight line that passes through the point x. For example,
if x is the origin (0; 0), the set
x + he1 : h 2 R = f(h; 0) : h 2 Rg
Graphically
0.8
0.6
x x
2
0.4
0.2
x + h0 2
2 x + he
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
In this case the set x + he2 : h 2 R is formed by the vectors of R2 with the same rst
coordinate, but with a di erent second coordinate.
830 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
1 2
{ x + he , h ∈ ℜ }
0.8
0.6
x x
2
0.4
0.2
-0.2
-0.4
-0.6
O x
-0.8 1
-1
-1 -0.5 0 0.5 1
Graphically, it is the vertical straight line that passes through the point x. When x is the
origin (0; 0), the set x + he2 : h 2 R is the vertical axis.
Though key for understanding the meaning of partial derivatives, (27.1) and (27.2) are
less useful to compute them. To this end, for a xed x 2 R2 we introduce the two auxiliary
scalar functions, called projections, '1 ; '2 : R ! R de ned by
The partial derivative @f =@xi is nothing but the ordinary derivative '0i of the scalar function
'i calculated at t = xi , with i = 1; 2. Thus, using the auxiliary functions 'i we go back
to the di erentiation of scalar functions studied in the last chapter. Formulas (27.3) and
(27.4) are very useful for the computation of partial derivatives, which is thus reduced to
the computation of standard derivatives of scalar functions.
(ii) Let f : R2 ! R be given by f (x1 ; x2 ) = x21 x2 . Let us compute the partial derivatives
of f at x = (1; 2). We have
'1 (t) = f (t; 2) = 2t2 ; '2 (t) = f (1; t) = t
Therefore, at the point t = 1 we have '01 (1) = 4 and at the point t = 2 we have '02 (2) = 1.
Hence,
@f @f
(1; 2) = '01 (1) = 4 ; (1; 2) = '02 (2) = 1
@x1 @x2
Again, more generally, at any point x 2 R2 we have
'1 (t) = t2 x2 ; '2 (t) = x21 t
Therefore, their derivatives at the point x are '01 (x1 ) = 2x1 x2 and '02 (x2 ) = x21 , so
@f @f
(x) = '01 (x1 ) = 2x1 x2 ; (x) = '02 (x2 ) = x21
@x1 @x2
N
The partial derivatives at (x1 ; x2 ) are therefore simply the slopes of the two projections at
this point. H
exist and are nite. These limits are called the partial derivatives of f at x.
The limit (27.5) is the i-th partial derivative of f at x, denoted by either fx0 i (x) or
@f
(x)
@xi
Often, it is actually convenient to write
@f (x)
@xi
The choice among these alternatives will be just a matter of convenience. The vector
@f @f @f
(x) ; (x) ; :::; (x) 2 Rn
@x1 @x2 @xn
Also in the general case of n independent variables, to calculate the partial derivatives
at a point x one can introduce the projections 'i de ned by
which generalizes to Rn formulas (27.3) and (27.4), reducing in this case, too, the calculation
of partial derivatives to that of standard derivatives of scalar functions.
and therefore
Hence
@f @f
(x) = '01 (x1 ) = 1 ; (x) = '02 (x2 ) = x3 ex2 x3
@x1 @x2
@f @f
(x) = '03 (x3 ) = x2 ex2 x3 ; (x) = '04 (x4 ) = 4x4
@x3 @x4
By putting them together, we have the gradient
rf (x) = (1; x3 ex2 x3 ; x2 ex2 x3 ; 4x4 )
N
As in the special case n = 2, also in the general case to calculate the partial derivative
@f (x) =@xi through the projection 'i amounts to considering f as a function of the single
variable xi , keeping constant the other n 1 variables. We then calculate the ordinary
derivative at xi of this scalar function. In other words, we study the incremental behavior
of f with respect to variations of xi only, by keeping constant the other variables.
@f @f @f
(x) = x2 x3 ; (x) = x1 x3 ; (x) = x1 x2
@x1 @x2 @x3
of the function f (x1 ; x2 ; x3 ) = x1 x2 x3 are functions on all R3 . Together they form the
derivative operator
@f @f @f
rf (x) = (x) ; (x) ; (x) = (x2 x3 ; x1 x3 ; x1 x2 )
@x1 @x2 @x3
of f . N
with g : R ! R strictly increasing. Since u and g u are utility functions that are equivalent
from the ordinal point of view, this shows that the di erences (27.8) per se have no meaning.
For this reason, the ordinalist consumer theory uses marginal rates of substitution and not
marginal utilities { as we will see in Section 34.3.2. Nevertheless, marginal utility remains a
notion commonly used in economics because of its intuitive appeal.
27.2 Di erential
The notion of di erential introduced in De nition 1241 naturally extends to functions of
several variables.
i.e.,
f (x + h) f (x) df (x) (h) o (khk)
lim = lim =0
h!0 khk h!0 khk
df (x) (h) = h
for a suitable vector 2 Rn . The next important theorem identi es such a vector and shows
that di erentiability guarantees both continuity and partial derivability.
of the di erential in the scalar case (Theorem 1243). In this case, approximation (27.13)
thus reduces to the scalar one (26.27).
But:
4
In the scalar case the clause \for every h 2 Rn such that x0 + h 2 U " reduces to the clause \for every
h 2 (x0 a; b x0 )" of De nition 1241.
5
As in the scalar case, note that h is in df (x) (h) the argument of the di erential df (x) : Rn ! R. In
other words, df (x) is a function of the variable h, while x denotes the speci c point at which the di erential
approximates the function f .
838 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
(i) limh!0 l (h) = l (0) = 0 since linear functions l : Rn ! R are continuous (Theorem
646);
(ii) by the de nition of little-o, limh!0 o (khk) = 0.
To show the existence of partial derivatives at x, let us consider the case n = 2 (the
general case does not present novelties, except of notation). In this case, (27.10) implies the
existence of = ( 1 ; 2 ) 2 R2 such that
f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 ) 1 h1 2 h2
lim p =0 (27.15)
(h1 ;h2 )!(0;0) h21 + h22
Setting h2 = 0 in (27.15), we have
f (x1 + h1 ; x2 ) f (x1 ; x2 ) 1 h1 f (x1 + h1 ; x2 ) f (x1 ; x2 ) 1 h1
0 = lim = lim
h1 !0 jh1 j h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
= lim 1
h1 !0 h
and therefore
f (x1 + h1 ; x2 ) f (x1 ; x2 ) @f (x1 ; x2 )
1 = lim =
h1 !0 h @x1
In a similar way it is possible to prove that 2 = @f (x1 ; x2 ) =@x2 , that is, rf (x1 ; x2 ) = .
In conclusion, both partial derivatives exist, so the function f is partially derivable, with
df (x1 ; x2 ) (h1 ; h2 ) = rf (x1 ; x2 ) (h1 ; h2 )
This proves (27.12).
generalizes the tangent line (26.29). The approximation (27.10) assumes the form f (x) =
r (x) + o (kx x0 k), that is,
In the special case n = 2, the a ne function (27.16) that best approximates a function
f : U R2 ! R at a point x0 = (x01 ; x02 ) 2 U takes the form6
@f (x0 ) @f (x0 )
r(x1 ; x2 ) = f (x01 ; x02 ) + (x1 x01 ) + (x2 x02 )
@x1 @x2
4
x3
-2
-4 -2
2 -1
1 0
0 1
-1
-2 2
x2
x1
For n 3, the a ne function (27.16) that best approximates a function in the neighborhood
of a point x0 of its domain is called tangent hyperplane. For obvious reasons, it cannot be
visualized graphically.
We close with a piece of terminology. When f is di erentiable at all the points of a subset
E of U , for brevity we say that f is di erentiable on E. When f is di erentiable at all the
points of its domain, it is called di erentiable, without further speci cation.
6
Here x01 and x02 denote the components of the vector x0 .
840 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
But, this is not possible. Indeed, if for example we consider the points on the straight line
x2 = x1 , that is, of the form (t; t), we get
r r r
jhkj t2 1
= = 8t 6= 0
h2 + k 2 t2 + t2 2
This shows that f is not di erentiable at (0; 0),7 even if it has partial derivatives at (0; 0).N
Summing up:
di erentiability implies partial derivability (Theorem 1268), but not vice versa when
n 2 (Example 1270);
di erentiability implies continuity (Theorem 1268);
partial derivability does not imply continuity when n 2 (Example 1263).
It is natural to ask which additional hypotheses are required for partial derivability to
imply di erentiability (so, continuity). The answer is given by the next remarkable result
that extends Theorem 1243 to the vector case by showing that, under a simple regular-
ity hypotheses (the continuity of partial derivatives), a partially derivable function is also
di erentiable (so, continuous).
7
For the more demanding reader: note pthat each neighbourhood
p of the origin contains points
p of the type
(t; t) with t 6= 0. For such points we have hk= (h2 + k2 ) = 1=2. Therefore, for 0 < " < 1=2 there is no
p
neighbourhood of the origin such that, for all its points (h; k) 6= (0; 0), we have hk= (h2 + k2 ) 0 < ".
27.2. DIFFERENTIAL 841
Theorem 1271 Let f : U ! R be partially derivable. If the partial derivatives are contin-
uous, then f is di erentiable.
Proof8 For simplicity of notation, we consider the case in which n = 2, the function f is
de ned on the entire plane R2 , and the partial derivatives @f =@x1 and @f =@x2 exist on R2 .
Apart from more complicated notation, the general case can be proved in a similar way.
Therefore, let f : R2 ! R and x 2 R2 . Assume that @f =@x1 and @f =@x2 are both
continuous at x. By adding and subtracting f (x1 + h1 ; x2 ), for each h 2 R2 we have:
f (x + h) f (x) (27.17)
= f (x1 + h1 ; x2 ) f (x1 ; x2 ) + f (x1 + h1 ; x2 + h2 ) f (x1 + h1 ; x2 )
The partial derivative @f =@x1 (x) is the derivative of the function 1 : R ! R de ned by
9
1 (x1 ) = f (x1 ; x2 ), in which x2 is considered as a constant. By the Mean Value Theorem,
there exists z1 2 (x1 ; x1 + h1 ) R such that
0 0
Since by construction @f =@x1 (z1 ; x2 ) = 1 (z1 ) and @f =@x2 (x1 + h1 ; z2 ) = 2 (z2 ), we can
rewrite (27.17) as:
@f @f
f (x + h) f (x) = (z1 ; x2 ) h1 + (x1 + h1 ; z2 ) h2
@x1 @x2
8
Since this proof uses the Mean Value Theorem for scalar functions that will be presented in the next
chapter, it is best understood after learning that result. The same remark applies to the proof of Schwartz's
Theorem.
9
The Mean Value Theorem for scalar functions will be studied in the next chapter.
842 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
On the other hand, by de nition rf (x) h = @f =@x1 (x1 ; x2 ) h1 + @f =@x2 (x1 ; x2 ) h2 . Thus:
jf (x + h) f (x) rf (x) hj
lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) h2 @x1 (x1 ; x2 ) h1 + @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 + @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
= lim
h!0 khk
@f @f @f @f
@x1 (z1 ; x2 ) @x1 (x1 ; x2 ) h1 @x2 (x1 + h1 ; z2 ) @x2 (x1 ; x2 ) h2
lim + lim
h!0 khk h!0 khk
@f @f jh1 j
= lim (z1 ; x2 ) (x1 ; x2 )
h!0 @x1 @x1 khk
@f @f jh2 j
+ lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x2 @x2 khk
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) + lim (x1 + h1 ; z2 ) (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
where the last inequality holds because
jh1 j jh2 j
0 1 and 0 1
khk khk
On the other hand, since z1 2 (x1 ; x1 + h1 ) and z2 2 (x2 ; x2 + h2 ), we have z1 ! x1 for
h1 ! 0 and z2 ! x2 for h2 ! 0. Therefore, being @f =@x1 and @f =@x2 both continuous at
x, we have
@f @f @f @f
lim (z1 ; x2 ) = (x1 ; x2 ) and lim (x1 + h1 ; z2 ) = (x1 ; x2 )
h!0 @x1 @x1 h!0 @x2 @x2
which implies
@f @f @f @f
lim (z1 ; x2 ) (x1 ; x2 ) = lim (x1 + h1 ; z2 ) (x1 ; x2 ) =0
h!0 @x1 @x1 h!0 @x2 @x2
In conclusion, we have proved that
jf (x + h) f (x) rf (x) hj
lim =0
h!0 khk
and the function f is thus di erentiable at x.
Example 1272 (i) Consider the function f : Rn ! R given by f (x) = kxk2 . Its gradient is
@f @f
rf (x) = (x) = 2x1 ; :::; (x) = 2xn = 2x 8x 2 Rn
@x1 @xn
The partial derivatives are continuous on Rn and therefore f is di erentiable on Rn . By
(27.10), at each x 2 Rn we have
df (x) (h) = rf (x) h 8h 2 Rn
27.2. DIFFERENTIAL 843
and
kx + hk2 kxk2 = 2x h + o (khk)
as khk ! 0. P
(ii) Consider the function f : Rn++ ! R given by f (x) = ni=1 log xi . Its gradient is
@f 1 @f 1
rf (x) = (x) = ; :::; (x) = 8x 2 Rn++
@x1 x1 @xn xn
The partial derivatives are continuous on Rn++ and therefore f is di erentiable on Rn++ . By
(27.10), at each x 2 Rn++ we have
df (x) (h) = rf (x) h 8h 2 Rn
so that, as khk ! 0,
n
X n
X n
X hi
log (xi + hi ) log xi = + o (khk)
xi
i=1 i=1 i=1
N
Pn
Example 1273 P Let u : Rn++ ! R be the log-linear utility function u (x1 ; :::; xn ) = i=1 ai log xi
with ai > 0 and ni=1 ai = 1. Its total di erential is
a1 an
du = dx1 + + dxn
x1 xn
The impact of each in nitesimal variation dxi on the overall variation of utility du is deter-
mined by the coe cient ai =xi . N
However evocative, one should not forget that the total di erential (27.18) is only a
heuristic version of the di erential df (x), which is the rigorous notion.10
@g (x) @g (x)
r (f g) (x) = f 0 (g (x)) rg (x) = f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
In the scalar case n = 1, we get back the classic rule (f g)0 (x) = f 0 (g (x)) g 0 (x).
Moreover, by Theorem 1268 the di erential of the composition f g is:
n
X @g (x)
d (f g) (x) (h) = f 0 (g (x)) hi (27.19)
@xi
i=1
df @g df @g
d (f g) = dx1 + + dxm (27.20)
dg @x1 dg @xm
The variation of f g can decomposed according to the di erent in nitesimal variations dxi ,
each of which induces the variation (@g=@xi ) dxi on g, which in turn causes a variation df =dg
on f . Summing these partial e ects we get the overall variation d (f g).
10
As we already remarked a few times, heuristics plays an important role in the quest for new results (of
a \vanguard of heuristic e orts towards the new" wrote Carlo Emilio Gadda). The rigorous veri cation of
the results so obtained is, however, key; only few outstanding mathematicians, dear to the gods, can rely on
intuition without caring too much of rigor. Yet, one of them, the great Archimedes, so writes in his Method
\... certain things became clear to me by a mechanical method, although they had to be demonstrated by
geometry afterwards because their investigation by the said method did not furnish an actual demonstration."
(Trans. Heath).
27.2. DIFFERENTIAL 845
Example 1275 (i) Let f : R ! R be given by f (x) = e2x and let g : R2 ! R be given by
g (x) = x1 x22 . Let us calculate with the chain rule the di erential of the composite function
f g : R2 ! R given by
2
(f g) (x) = e2x1 x2
We have
2 2
r (f g) (x) = 2x22 e2x1 x2 ; 4x1 x2 e2x1 x2
and therefore
2
d (f g) (x) (h) = 2e2x1 x2 x22 h1 + 2x1 x2 h2
for every h 2 R2 . The total di erential is
2
d (f g) = 2e2x1 x2 x22 dx1 + 2x1 x2 dx2
(ii) Let f : (0; 1) ! R be given by f (x) = log x and let g : R2++ [ R2 ! R be given
p
by g (x1 ; x2 ) = x1 x2 . Here the function g must be restricted to R2++ [ R2 to satisfy the
condition Im g (0; 1). Let us calculate with the chain rule the di erential of the composite
function f g : R2++ [ R2 ! R given by
p
(f g) (x) = log x1 x2
We have r r
@g (x) 1 x2 @g (x) 1 x1
= and =
@x1 2 x1 @x2 2 x2
so that
@g (x) 0 @g (x)
r (f g) (x) = f 0 (g (x)) ; f (g (x))
@x1 @x2
r r
1 1 x2 1 1 x1 1 1
= p ;p = ;
x1 x2 2 x1 x1 x2 2 x2 2x1 2x2
and
1 1
d (f g) (x) (h) = h1 + h2
2x1 2x2
for every h 2 R2 . The total di erential is
1 1
d (f g) = dx1 + dx2
2x1 2x2
Pn 1
(iii) Let g : Rn++ ! R and f : R+ ! R be given by g (x) = i=1 ai xi and f (x) = x ,
with ai 2 R and 6= 0, so that f g : Rn++ ! R is
n
!1
X
(f g) (x) = ai xi
i=1
@g 1 @g 1
rg (x) = (x) = a1 x1 ; :::; (x) = an xn
@x1 @xn
846 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
so that
@g (x) @g (x)
r (f g) (x) =f 0 (g (x)) ; :::; f 0 (g (x))
@x1 @xn
0 ! 1 !1 1
n 1 n 1
1 X 1 X
=@ ai xi a1 x1 1 ; :::; ai xi an xn 1A
i=1 i=1
0 !1 !1 1
n
X n
X
= @a1 ai xi x1 1
; :::; an ai xi xn 1A
i=1 i=1
and
n n
!1 n
!1 n
X X X X
1 1
d (f g) (x) (h) = ai ai xi xi hi = ai xi ai xi hi
i=1 i=1 i=1 i=1
and
n
X n
a e xi 1 X
d (f g) (x) (h) = Pn i xi
h i = ai e xi
hi
i=1 i=1 ai e g (x)
i=1
for every h 2 Rn . The total di erential is
n
1 X xi
d (f g) = ai e dxi
g (x)
i=1
N
27.3. PARTIAL DERIVATIVES OF HIGHER ORDER 847
Hence, it makes sense to talk about existence of partial derivatives of the partial deriva-
tives functions @f =@xi : U ! R at a point x 2 U . In this case, for every i; j = 1; :::; n we
have the partial derivative
@f
@ @x i
(x)
@xj
with respect to xj of the partial derivative @f =@xi . These partial derivatives are called
second-order partial derivatives of f and are denoted by
@2f
(x)
@xi @xj
Example 1277 Let f : R3 ! R be given by f (x) = ex1 x2 + 3x2 x3 for every x 2 R3 , and let
us compute its Hessian matrix. We have:
@f @f @f
(x) = x2 ex1 x2 ; (x) = x1 ex1 x2 + 3x3 ; (x) = 3x2
@x1 @x2 @x3
848 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
and so
The second-order partial derivatives can, in turn, be seen as functions of several variables.
We can therefore look for their partial derivatives, which (if they exist) are called the third-
order partial derivatives. We can then move to their partial derivatives (if they exist) and
get the fourth-order derivatives, and so on.
For instance, going back to the previous example, consider the partial derivative
@2f
(x) = (1 + x1 x2 ) ex1 x2
@x1 @x2
@2f @2f
(x) = (x) (27.21)
@xi @xj @xj @xi
for every i; j = 1; :::; n.
Proof For simplicity we consider the case n = 2. In this case, (27.21) reduces to:
@2f @2f
= (27.22)
@x1 @x2 @x2 @x1
Again for simplicity, we also assume that the domain A is the whole space R2 , so that we
consider a function f : R2 ! R. By de nition,
@f f (x1 + h1 ; x2 ) f (x1 ; x2 )
(x) = lim
@x1 h1 !0 h1
and therefore:
@f @f
@2f @x1 (x1 ; x2 + h2 ) @x1 (x1 ; x2 )
(x) = lim
@x1 @x2 h2 !0 h2
1 f (x1 + h1 ; x2 + h2 ) f (x1 ; x2 + h2 )
= lim lim
h2 !0 h2 h1 !0 h1
f (x1 + h1 ; x2 ) f (x1 ; x2 )
lim
h1 !0 h1
@2f (h1 ; h2 )
(x) = lim lim (27.23)
@x1 @x2 h2 !0 h1 !0 h 2 h1
Consider in addition the scalar auxiliary function 1 : R ! R de ned by 1 (x) = f (x; x2 + h2 )
f (x; x2 ) for each x 2 R. We have:
0 @f @f
1 (x) = (x; x2 + h2 ) (x; x2 ) (27.24)
@x1 @x1
Moreover, by the Mean Value Theorem there exists z1 2 (x1 ; x1 + h1 ) such that
@f
Let 2 : R ! R be another auxiliary scalar function de ned by 2 (x) = @x1 (z1 ; x) for each
x 2 R. We have:
0 @2f
2 (x) = (z1 ; x) (27.26)
@x2 @x1
By the Mean Value Theorem, there exists z2 2 (x2 ; x2 + h2 ) such that
@f @f
0 2 (x2 + h2 ) 2 (x2 ) @x1 (z1 ; x2 + h2 ) @x1 (z1 ; x2 )
2 (z2 ) = =
h2 h2
and therefore, by (27.26), such that
@f @f
@2f @x1 (z1 ; x2 + h2 ) @x1 (z1 ; x2 )
(z1 ; z2 ) =
@x2 @x1 h2
Together with (27.25), this implies that
@2f (h1 ; h2 )
(z1 ; z2 ) = (27.27)
@x2 @x1 h2 h 1
Go back now to (27.23). Thanks to (27.27), expression (27.23) becomes:
@2f @2f
(x) = lim lim (z1 ; z2 ) (27.28)
@x1 @x2 h2 !0 h1 !0 @x2 @x1
Thus, when they are continuous, the order in which we take partial derivatives does not
matter: we can compute rst the partial derivative with respect to xi and then the one with
respect to xj , or vice versa, with the same result. So, we can choose the way that seems
computationally easier, obtaining then \for free" the other second-order partial derivative.
This simpli es considerably the computation of derivatives and, moreover, results in an
elegant property of symmetry of the Hessian matrix.
Example 1280 (i) Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x21 x2 x3 . Simple calculations
show that:
@2f @2f
(x) = (x) = 2x1 x3
@x1 @x2 @x2 @x1
in accordance with Schwarz's Theorem because the second partial derivatives are continuous.
(ii) Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = cos (x1 x2 ) + e x3 . The Hessian matrix of
f is
2 3
x22 cos (x1 x2 ) sin (x1 x2 ) x1 x2 cos (x1 x2 ) 0
6 7
6 7
2 6
r f (x) = 6 sin (x1 x2 ) x1 x2 cos (x1 x2 ) 2
x1 cos (x1 x2 ) 0 7 7
4 5
0 0 e x 3
The reader can verify that: (i) f has continuous partial derivatives @f =@x1 and @f =@x2 ; (ii)
f has second-order partial derivatives @ 2 f =@x1 @x2 and @ 2 f =@x2 @x1 de ned on all R2 , but
discontinuous at the origin (0; 0). Therefore, the hypothesis of continuity of the second-order
partial derivatives of Schwarz's Theorem does not hold at the origin, so the theorem cannot
say anything about the behavior of these derivatives at the origin. Let us calculate them:
@2f @2f
(0; 0) = 1 and (0; 0) = 1
@x1 @x2 @x2 @x1
So,
@2f @2f
(0; 0) 6= (0; 0)
@x1 @x2 @x2 @x1
The continuity of the second-order partial derivatives is, therefore, needed for the validity of
equality (27.21). N
partial derivatives consider in nitesimal variations along the basic directions identi ed by
the vectors ei . But, what about the other directions? Intuitively, there are in nitely many
ways to approach a point in Rn and one may wonder about in nitesimal variations along
them. In particular, are they consistent, in some sense, with the variations along the basic
directions? In this section we address this issue and, in so doing, we expatiate on the
incremental (marginal) viewpoint in multivariable di erential calculus.
To take into account the in nite directions along which we can approach a point in Rn ,
we generalize the quotient (27.30) as follows
f (x + hy) f (x)
lim
h!0 h
This limit represents the in nitesimal increments of the function f at the point x when we
move along the direction determined by the vector y of Rn , which is no longer required to
be a versor ei . Graphically:
Not all lines hx; x + yi identify di erent directions: the next result shows that, given a
vector y 2 Rn , all vectors y identify the same direction when 6= 0.
x + y 0 = x + y = x + (1 ) x + y = (1 )x + (x + y)
and therefore x + y 0 2 hx; x + yi. This implies hx; x + y 0 i hx; x + yi. Since y = (1= ) y 0 ,
by proceeding in a similar way we can prove that hx; x + yi hx; x + y 0 i. We conclude that
hx; x + yi = hx; x + y 0 i. \Only if". Suppose that hx; x + y 0 i = hx; x + yi. Suppose y 6= y 0
(otherwise the result is trivially true). At least one of them has then to be non-zero, say y 0 .
Since x+y 0 2 hx; x + yi and y 0 6= 0, there exists h 6= 0 such that x+y 0 = (1 h) x+h (x + y).
This implies y 0 = hy and therefore, by setting = h, we have the desired result.
The next corollary shows that this redundancy of the directions translates, in a simple
and elegant way, in the homogeneity of the directional derivative, a property that permits
to determine the value f 0 (x; y) for every scalar once we know the value of f 0 (x; y).
854 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
f (x + ( h) y) f (x) f (x + ( h) y) f (x)
lim = lim = f 0 (x; y)
h!0 h ( h)!0 h
f (x + 0) f (x)
f 0 (x; y) = f 0 (x; 0) = lim =0
h!0+ h
Therefore, f 0 (x; y) = 0 = f 0 (x; y), which completes the proof.
Partial derivatives are nothing but the directional derivatives computed along the fun-
damental directions in Rn represented by the versors ei . That is,
@f (x)
f 0 x; ei =
@xi
for each i = 1; 2; :::; n. So, functions that are derivable at x, are partially derivable there.
The converse is false, as the next example shows.
0 if x1 x2 = 0
f (x1 ; x2 ) =
1 if x1 x2 6= 0
is partially derivable at the origin. However, it is not derivable at the origin 0 = (0; 0).
Indeed, consider x = 0 and y = (1; 1). We have
In sum, partial derivability is a weaker notion than derivability, something not surpris-
ing: the former notion controls, indeed, only two directions out of the in nitely many ones
controlled by the latter notion.
27.5. INCREMENTAL AND APPROXIMATION VIEWPOINTS 855
27.5.2 Algebra
Like that of partial derivatives, also the calculus of directional derivatives can be reduced to
the calculus of ordinary derivatives of scalar functions. Given a point x 2 Rn and a direction
y 2 Rn , de ne an auxiliary scalar function as (h) = f (x + hy) for every h 2 R. The
domain of is the set fh 2 R : x + hy 2 U g, which is an open set in R containing 0. By
de nition of derivative, we have
Example 1286 (i) Let f : R3 ! R be de ned by f (x1 ; x2 ; x3 ) = x21 + x22 + x23 . Compute
the directional derivative of f at x = (1; 1; 2) along the direction y = (2; 3; 5). We have:
and therefore
It follows that 0 (h) = 76h + 18 and, by (27.33), we conclude that f 0 (x; y) = 0 (0) = 18.
(ii) Let us generalize the previous example and consider the function f : Rn ! R de ned
by f (x) = kxk2 . We have
n n
d X X
0
(h) = (xi + hyi )2 = 2 yi (xi + hyi ) = 2y (x + hy)
dh
i=1 i=1
Therefore, f 0 (x; y) = 0 (0) = 2x y. The directional derivative of f (x) = kxk2 thus exists
at all the points and along all possible directions, that is, f is derivable on Rn . Its general
form is
f 0 (x; y) = 2x y
In the special direction y = (2; 3; 5) of point (i), we indeed have f 0 (x; y) = 2 (1; 1; 2) (2; 3; 5) =
18.
(iii) Consider the function f : R2 ! R de ned by
8 2
< x21 x22 if (x1 ; x2 ) 6= (0; 0)
x1 +x2
f (x1 ; x2 ) =
:
0 if (x1 ; x2 ) = (0; 0)
Consider the origin 0 = (0; 0). For every y 2 R2 we have (h) = f (hy) = hy1 y22 = y12 + y22
and so f 0 (0; y) = 0 (0) = y1 y22 =y12 + y22 . In conclusion,
f 0 (0; y) = f (y)
for every y 2 R2 . So, the function f is derivable at the origin and equals its own directional
derivative there. N
856 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
Using the auxiliary functions , it is easy to prove that for directional derivatives the
usual algebraic rules hold:
We already learned that partial derivability does not imply di erentiability (Example
1270). Now we learned that even full- edged derivability is not enough to imply di eren-
tiability. It is, indeed, not even enough to imply continuity: there exist functions that are
derivable at some point but are discontinuous there, as the following example shows.
27.6. DIFFERENTIAL OF OPERATORS 857
h6 y 4 y 2 hy 4 y 2
= lim 5 4 18 2 4 = lim 4 81 2 4 =0
h!0 h h y1 + y2 h!0 h y1 + y2
Therefore, f 0 (0; y) = 0 for every y 2 R2 and the directional derivative at the origin 0 is then
the null linear function. It follows that f is derivable at 0. However, it is not continuous
at 0 (a fortiori, it is not di erentiable at 0 by Theorem 1268). Indeed, consider the points
t; t2 2 R2 that lie on the graph of the parabola x2 = x21 . For each t 6= 0 we have
2
2 t4 t 2 t4 t 4 1
f t; t = = =
t8 + (t2 )4 8
t +t 8 2
Along these points the function is constant and takes on value 1=2. It follows that limt!0 f t; t2 =
1=2 and, being f (0) = 0, the function is discontinuous at 0. N
di erentiability implies derivability (Theorem 1287), but not vice versa when n 2
(Example 1288);
These relations sharpen some of the ndings of Section 27.2.1 on partial derivability.
This de nition generalizes De nition 1267, which is the special case m = 1. The linear
approximation is now given by a linear operator with values in Rm , while at the numerator
of the incremental ratio in (27.35) we nd a norm instead of an absolute value because we
now have to deal with vectors in Rm .
The di erential for operators satis es properties that are similar to those that we saw
in the case m = 1. Naturally, instead of the vector representation of Theorem 1268 we now
have a more general matrix representation based on the operator version of Riesz's Theorem
(Theorem 672). To see its form, we introduce the Jacobian matrix. Recall that an operator
f : U ! Rm can be regarded as a m-tuple (f1 ; :::; fm ) of functions de ned on U and with
values in R. The Jacobian matrix Df (x) of an operator f : U ! Rm at x 2 U is, then, a
matrix m n given by:
2 @f @f1 @f1
3
1
@x1 (x) @x2 (x) @xn (x)
6 7
6 @f2 @f2 @f2 7
6
Df (x) = 6 @x1 (x) @x2 (x) @xn (x) 7
7
4 5
@fm @fm @fm
@x1 (x) @x2 (x) @xn (x)
that is,
2 3
rf1 (x)
6 rf (x) 7
Df (x) = 6
4
2 7
5 (27.36)
rfm (x)
We can now give the matrix representation of di erentials, which shows that the Ja-
cobian matrix Df (x) is, indeed, the matrix associated to the linear operator df (x). This
representation generalizes the vector representation of Theorem 1268 because the Jacobian
matrix Df (x) reduces to the gradient rf (x) in the special case m = 1.
Proof We begin by considering a simple property of the norm. Let x = (x1 ; :::; xn ) 2 Rn .
For every j = 1; ::; n we have:
v
q uX
u n 2
jxj j = xj t
2 xj = kxk (27.37)
j=1
f x + tej f (x)
lim df (x) ej =0 (27.38)
t!0 jtj
fi x + tej fi (x)
lim dfi (x) ej =0
t!0 jtj
for each i = 1; :::m. We can therefore conclude that, for every i = 1; :::; m and every
j = 1; :::; n, we have:
A = f e1 ; f e2 ; :::; f (en )
as desired.
and so
4x1 1 1
Df (x) = 3
1 4x2 0
By Theorem 1291, the di erential at x is given by the linear operator df (x) : R3 ! R2
de ned by
df (x) (h) = Df (x) h = 4x1 h1 + h2 + h3 ; h1 4x32 h2
for each h 2 R3 . For example, at x = (2; 5; 3) we have df (x) (h) = (8h1 + h2 + h3 ; h1 500h2 ).
N
Example 1294 Let f : R ! R3 be de ned by f (x) = (x; sin x; cos x). For example, if
x = , then f (x) = ( ; 0; 1) 2 R3 . We have:
and so 2 3
1
Df (x) = 4 cos x 5
sin x
By Theorem 1291, the di erential at x is given by the linear operator df (x) : R ! R3 de ned
by
df (x) (h) = Df (x) h = (h; h cos x; h sin x)
for each h 2 R. For example, at x = we have df (x) (h) = (h; h; 0). N
Example 1295 (i) Let f : Rn ! Rm be the linear operator represented by f (x) = Ax,
with 2 3
a11 a12 a1n
6 a21 a22 a2n 7
A=64
7
5
am1 am2 amn
Let a1 ; :::; am be the row vectors of A, that is, a1 = (a11 ; a12 ; :::; a1n ) ; ::::; am = (am1 ; am2 ; :::; amn ).
We have:
which implies Df (x) = A. Hence, the Jacobian matrix of a linear operator coincides with
the associated matrix A. By Theorem 1291, the di erential of a linear operator f is therefore
given by the linear operator itself, i.e., at each x 2 Rn it holds
This naturally generalizes the well-known result that, for scalar functions of the form f (x) =
ax with a 2 R, the di erential is df (x) (h) = ah.
27.6. DIFFERENTIAL OF OPERATORS 861
(ii) Let f : U ! Rn be the identity operator f (x) = x for all x 2 U . By (27.40), at each
x 2 U the di erential df (x) is the identity operator itself, i.e.,
df (x) (h) = h 8h 2 Rn
The Jacobian matrix is the identity matrix, i.e., Df (x) = I (x) at each x 2 U . N
The right-hand side is the product of the linear operators df (g (x)) and dg (x). By
Theorem 677, its matrix representation is given by the product Df (g (x)) Dg (x) of the
Jacobian matrices. We thus have the fundamental chain rule formula:
In the scalar case n = m = q = 1, the rule takes its basic form (f g)0 (x) = f 0 (g (x)) g 0 (x)
studied in Proposition 1230.
Another important special case is when q = 1. In this case we have f : B Rm ! R
and g = (g1 ; :::; gm ) : U n m
R ! R , with g (U ) B. For the composite function f g :
U Rn ! R the chain rule takes the form:
r (f g) (x)
= rf (g (x)) Dg (x)
2 @g1 @g1 @g1
3
@x1 (x) @x2 (x) @xn (x)
6 7
@f @f 6 @g2 @g2 @g2 7
= (g (x)) ; :::; (g (x)) 6 @x1 (x) @x2 (x) @xn (x) 7
@x1 @xm 6 7
4 5
@gm @gm @gm
@x1 (x) @x2 (x) @xn (x)
m m
!
X @f @gi X @f @gi
= (g (x)) (x) ; :::; (g (x)) (x)
@xi @x1 @xi @xn
i=1 i=1
Grouping the terms for @f =@xi , we get the following equivalent form:
n
X n
X
@f @g1 @f @gm
d (f g) (x) (h) = (g (x)) (x) hi + + (g (x)) (x) hi
@x1 @xi @xm @xi
i=1 i=1
862 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
This is the formula of the total di erential for the composite function f g. The total
variation d (f g) of f g is the result of the sum of the e ects on the function f of the
variations of the single functions gi determined by in nitesimal variations dxi of the di erent
variables.
The di erential is
m
X @f dgi
d (f g) (x) (h) = (g (x)) (x) h
@xi dx
i=1
@f dg1 @f dgm
d (f g) = dx + + dx
@g1 dx @gm dx
Therefore,
t 1 1
d (f g) (t) (h) = 6e h 8h 2 R
t2 t
and the total di erential (27.44) is
t 1 1
d (f g) = 6e dt
t2 t
N
4x1 1 1
Dg (x) = 3
1 4x2 0
1 0
Df (x) =
x2 x1
and therefore
1 0
Df (g (x)) =
x1 x42 2x21 + x2 + x3
It follows that:
Df (g (x)) Dg (x)
1 0 4x1 1 1
=
x1 x42 2x21 + x2 + x3 1 4x32 0
4x1 1 1
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
864 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
d (f g) (x) (h)
2 3
h1
4x1 1 1 4 h2 5
=
6x21 4x1 x42 + x2 + x3 x1 8x21 x32 5x42 4x32 x3 x1 x42
h3
Naturally, though it is in general more complicated, the Jacobian matrix of the composition
f g can be computed directly, without using the chain rule, by writing explicitly the form
of f g and by computing its partial derivatives. In this example, f g : R3 ! R2 is given
by
Therefore,
and we have:
@ (f g)1 @ (f g)1 @ (f g)1
= 4x1 ; = 1; =1
@x1 @x2 @x3
@ (f g)2
= 6x21 4x1 x42 + x2 + x3
@x1
@ (f g)2
= x1 8x21 x32 5x42 4x32 x3
@x2
@ (f g)2
= x1 x42
@x3
This equality is called Euler's Formula.13 The more interesting cases are = 0 and
n
= 1. For instance, the indirect utility function v : R++ R+ ! R is easily seen to be
homogeneous of degree 0 (cf. Proposition 1051). By Euler's Formula, we have:
n
X @v (p; w) @v (p; w)
pi = w
@pi @w
i=1
Proof Fix x 2 Rn++ and consider the scalar function ' : (0; 1) ! R de ned by ' (t) =
f (tx). If we de ne g : (0; 1) ! Rn++ by g (t) = tx, we can write ' = f g. By (27.44),
we have '0 (t) = rf (tx) x. On the other hand, homogeneity implies ' (t) = t f (x), so
'0 (t) = t 1 f (x). We conclude that rf (tx) x = t 1 f (x). For t = 1, it is Euler's
Formula.
We have
Consider the linear operator df (g (x)). By Lemma 899, there exists k > 0 such that
kdf (g (x)) (h)k k khk for each h 2 Rm . Since (h) 2 Rm for each h 2 Rn , we therefore
13
The reader can also check that the partial derivatives are homogeneous of order 1.
866 CHAPTER 27. DIFFERENTIAL CALCULUS IN SEVERAL VARIABLES
have kdf (g (x)) ( (h))k k k (h)k. On the other hand, g is di erentiable at x, and so
limh!0 k (h)k = khk = 0. It follows that
k (k)k
lim =0 (27.49)
k!0 kkk
Fix " > 0. By (27.49), there exists " > 0 such that kkk " implies k (k)k = kkk ". In
other words, there exists " > 0 such that kg (x + h) g (x)k " implies
k (g (x + h) g (x))k
"
kg (x + h) g (x)k
On the other hand, since g is continuous at x, there exists 1 > 0 such that khk 1
im-
plies kg (x + h) g (x)k " . Therefore, for khk su ciently small we have k (g (x + h) g (x))k
" kg (x + h) g (x)k. By applying Lemma 899 to the linear operator dg (x), there exists k > 0
such that
Since " was xed arbitrarily, it can be taken as small as we like. Therefore:
as desired.
Chapter 28
Di erential methods
and so
0 f (x0 + h) f (x0 )
fjU (x0 ) = lim
h!0 h
We can therefore consider directly the limit
f (x0 + h) f (x0 )
lim
h!0 h
and say that its value, denote by f 0 (x0 ), is the derivative of f at the interior point x0 if it
exists and is nite.
In sum, derivability and di erentiability are local notions that use only the properties of
the function in a neighborhood, however small, of the point at hand. They can therefore be
de ned at any interior point of any set.
867
868 CHAPTER 28. DIFFERENTIAL METHODS
f 0 (^
x) = 0 (28.1)
Proof Let x ^ 2 C be an interior point and a local maximizer on C (a similar argument holds
if it is a local minimizer). There exists therefore B" (^x) such that (22.30) holds, that is,
f (^
x) f (x) for every x 2 B" (^ x) \ C. For every h > 0 su ciently small, that is, h 2 (0; "),
we have x ^ + h 2 B" (^
x). Hence
f (^
x + h) f (^
x)
0 8h 2 (0; ")
h
which implies
f (^
x + h) f (^
x) f (^
x + h) f (^
x)
lim = lim 0 (28.2)
h!0 h h!0+ h
where the limits exist and are equal because f is di erentiable at x
^.
On the other hand, for every h < 0 su ciently small, that is, h 2 ( "; 0), we have
x
^ + h 2 B" (^
x). Therefore,
f (^
x + h) f (^
x)
0 8h 2 ( "; 0)
h
which implies
f (^
x + h) f (^
x) f (^
x + h) f (^
x)
lim = lim 0 (28.3)
h!0 h h!0 h
where, again, the limits exist and are equal because f is di erentiable at x
^.
Together, inequalities (28.2) and (28.3) imply
f (^
x + h) f (^
x)
f 0 (^
x) = lim =0
h!0 h
as desired.
there (otherwise, an in nitesimal decrease in x would be bene cial). Thus, the derivative, if
it exists, must be zero.1
The rst-order condition (28.1) will turn out to be key in solving optimization problems,
hence the important instrumental role of local extremal points. Conceptually, it tells us
that in order to maximize (or minimize) an objective function we need to consider what
happens at the margin: a point cannot be a maximizer if there is still room for improvement
through in nitesimal changes, be they positive or negative. At a maximizer, all marginal
opportunities must have been exhausted.
The fundamental principle highlighted by the rst-order condition is that, to maximize
levels of utility (or of production or of welfare and so on), one needs to work at the mar-
gin. In economics, the understanding of this principle was greatly facilitated by a proper
mathematical formalization of the optimization problem that made it possible to rely on dif-
ferential calculus { so, on the shoulders of the giants who created it. What becomes crystal
clear through calculus, is highly non-trivial otherwise, in particular if we just use a purely
literary analysis. Only in the 1870s the marginal principle was fully understood and was
at the heart of the marginalist theory of value, pioneered in the 1870s by Jevons, Menger,
and Walras. This approach has continued to evolve since then (at rst with the works of
Edgeworth, Marshall, and Pareto) and, over the years, has shown a surprising ability to shed
light on economic phenomena. In all this, the rst-order condition and its generalizations
(momentarily we will see its multivariable version) is, like Shakespeare's Julius Caesar: the
colossus that bestrides the economics world.
That said, let us continue with the analysis of Fermat's Theorem. It is important to
focus on the following aspects:
(i) The hypothesis that x ^ is an interior point of C is essential for Fermat's Theorem.
Indeed, consider for example f : R ! R given by f (x) = x, and let C = [0; 1]. The boundary
point x = 0 is a global minimizer of f on [0; 1], but f 0 (0) = 1 6= 0. In the same way, the
boundary point x = 1 is a maximizer, but f 0 (1) = 1 6= 0. Therefore, if x is a boundary local
extremal point, it is not necessarily true that f 0 (x) = 0.
(ii) Fermat's Theorem cannot be applied to functions that, even if they have interior
maximizers or minimizers, are not di erentiable at these points. A classic example is the
function f : R ! R given by f (x) = jxj: the point x = 0 is a global minimizer but f , at
1
This heuristic argument can be also articulated as follows. Since f is derivable at x0 , we have f (x0 + h)
f (x0 ) = f 0 (x0 ) h + o (h). Heuristically, we can set f (x0 + h) f (x0 ) = f 0 (x0 ) h by neglecting the term o (h).
If f 0 (x0 ) > 0, we have f (x0 + h) > f (x0 ) if h > 0, so a strict increase is strictly bene cial; if f 0 (x0 ) < 0, we
have f (x0 + h) > f (x0 ) if h < 0, so a strict decrease is strictly bene cial. Only if f 0 (x0 ) = 0, such strictly
bene cial variations cannot occur, so f may be maximized at x0 .
870 CHAPTER 28. DIFFERENTIAL METHODS
that point, does not admit derivative, so the condition f 0 (x) = 0 is not relevant in this case.
Another example is the following.
q
Example 1301 Let f : R ! R be given by f (x) = 3
(x2 5x + 6)2 , with graph
2.5
y
2
1.5
0.5
0
O 2 5/2 3 x
-0.5
-1
-1.5
0 1 2 3 4 5
2 2 1 2 (2x 5)
f 0 (x) = x 5x + 6 3
(2x 5) = p
3
3 3 x2 5x + 6
and so it does not exist where x2 5x + 6 is zero, that is, at the two minimizers! The
point x = 5=2 is such that f 0 (x) = 0 and is a local maximizer (being unbounded above, this
function has no global maximizers). N
(iii) Lastly, the condition f 0 (x) = 0 is only necessary. The following simple example
should not leave any doubt on this.
28.1. EXTREMAL AND CRITICAL POINTS 871
5
y
4
0
O x
-1
-2
-3 -2 -1 0 1 2 3 4
We have f 0 (0) = 0, although the origin x0 = 0 is neither a local maximizer nor a local
minimizer.2 Condition (28.1) is therefore necessary, but not su cient, for a point to be a
local extremum. N
We now address the multivariable version of Fermat's Theorem. In this case the rst-order
condition (28.1) takes the more general form (28.4) in which gradients replace derivatives.
We leave the proof to the reader. Indeed, mutatis mutandis, it is the same as that of
Fermat's Theorem.3
The observations (i)-(iii), just made for the scalar case, continue to hold in the multi-
variable case. In particular, as in the scalar case the rst-order condition is necessary, but
not su cient, as the next example shows.
The unique solution of this system is (0; 0), which in turn is the unique point in R2 where
f satis es condition (28.4). It is easy to see that this point is neither a maximizer nor a
minimizer. Indeed, if we consider any point (0; x2 ) di erent from the origin on the vertical
axis and any point (x1 ; 0) di erent from the origin on the horizontal axis, we have
In every neighborhood of the point (0; 0) there are, therefore, both points in which the
function is strictly positive and points in which it is strictly negative: as we can see from the
gure
0
x3
-2
-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
the origin (0; 0) is a \saddle" point of f which is neither a maximizer nor a minimizer. N
Next we introduce a classi cation of points that will play a key role in our analysis.
De nition 1305 A point x 2 A such that rf (x) = 0 { i.e., for n = 1 a point such that
f 0 (x) = 0 { is said to be a critical or stationary of f : A Rn ! R.
Throughout the book we use interchangeably the terms critical and stationary. Using
this terminology, Theorem 1303 can be paraphrased as saying that a necessary condition for
an interior point x to be a local minimizer or maximizer is to be stationary.
Example 1306 Let f : R ! R be given by f (x) = 10x3 (x 1)2 . The rst-order condition
(28.1) becomes
10x2 (x 1) (5x 3) = 0
The points that satisfy it are x = 0, x = 1 and x = 3=5. They are the stationary points of
f. N
4x1 3 + x2 = 0
2x2 3 + x1 = 0
It is easy to see that x = (3=7; 9=7) is the unique solution of the system, so it is the unique
stationary point of f . N
that is, (
x31 = x2
x32 = x1
The stationary points are (0; 0), (1; 1), and ( 1; 1). Among them we have to look for the
possible solutions of the unconstrained optimization problem
N
4
Recall that in Section 22.1 optimization problems were called unconstrained when C is open.
874 CHAPTER 28. DIFFERENTIAL METHODS
6
y
1
O a c b x
0
0 1 2 3 4 5
Proof By Weierstrass' Theorem, there exist x1 ; x2 2 [a; b] such that f (x1 ) = minx2[a;b] f (x)
and f (x2 ) = maxx2[a;b] f (x). Denote m = minx2[a;b] f (x) and M = maxx2[a;b] f (x). If
m = M , then f is constant, that is, f (x) = m = M , and therefore f 0 (x) = 0 for every
x 2 (a; b). If m < M , then at least one of the points x1 and x2 is interior to [a; b]. Indeed,
they cannot be both boundary points because f (a) = f (b). If x1 is an interior point of [a; b],
that is, x1 2 (a; b), then by Fermat's Theorem we have f 0 (x1 ) = 0, so x ^ = x1 . Analogously,
0
if x2 2 (a; b), we have f (x2 ) = 0, and therefore x^ = x2 .
p
Example 1310 Let f : [ 1; 1] ! R be given by f (x) = 1 x2 . This function is contin-
uous on [ 1; 1] and di erentiable on ( 1; 1). Since f ( 1) = f (1) = 0, by Rolle's Theorem
^ 2 ( 1; 1), that is, a point such that f 0 (^
there exists a critical point x x) = 0. In particular,
1
from f 0 (x) = x 1 x2 2
it follows that this point is x
^ = 0. N
Given a function f : [a; b] ! R, consider the points (a; f (a)) and (b; f (b)) of its graph.
The straight line passing through these points has equation
f (b) f (a)
y = f (a) + (x a) (28.6)
b a
as the reader can verify by solving the system
(
f (a) = ma + q
f (b) = mb + q
28.2. MEAN VALUE THEOREM 875
This straight line plays a key role in the important Mean Value (or Lagrange's) Theorem,
which we now state and prove.
Theorem 1311 (Mean Value) Let f : [a; b] ! R be continuous on [a; b] and di erentiable
on (a; b). Then, there exists x
^ 2 (a; b) such that
f (b) f (a)
f 0 (^
x) = (28.7)
b a
Rolle's Theorem is the special case in which f (a) = f (b), so that condition (28.7)
becomes f 0 (^
x) = 0.
Note that
f (b) f (a)
b a
is the slope of the straight line (28.6) passing through the points (a; f (a)) and (b; f (b)) of the
graph of f , while f 0 (x) is the slope of the straight line tangent to the graph of f at the point
(x; f (x)). The Mean Value Theorem establishes, therefore, a simple su cient condition for
the existence of a point x ^ 2 (a; b) such that the straight line tangent at (^
x; f (^
x)) is parallel
to the straight line passing through the points (a; f (a)) and (b; f (b)). Graphically:
6
y
1
O a c b x
0
0 1 2 3 4 5
Note that the increment f (b) f (a) on the whole interval [a; b] can be written, thanks
to the Mean Value Theorem, as
f (b) f (a) = f 0 (^
x) (b a)
f (b) f (a)
g (x) = f (x) f (a) + (x a)
b a
It is the di erence between f and the straight line passing through the points (a; f (a))
and (b; f (b)). The function g is continuous on [a; b] and di erentiable on (a; b). Moreover,
g (a) = g (b) = 0. By Rolle's Theorem, there exists x^ 2 (a; b) such that g 0 (^
x) = 0. But
f (b) f (a)
g 0 (x) = f 0 (x)
b a
and therefore
f (b) f (a)
f 0 (^
x) =0
b a
That is, x
^ satis es condition (28.7).
A rst interesting application of the Mean Value Theorem shows that constant functions
are characterized by having a zero derivative at every point.
Corollary 1312 Let f : [a; b] ! R be continuous on [a; b] and di erentiable on (a; b). Then
f 0 (x) = 0 for every x 2 (a; b) if and only if f is constant, that is, if and only if there exists
k 2 R such that
f (x) = k 8x 2 [a; b]
Proof Let us prove the \only if", since the \if" is the simple property of derivatives seen
in Example 1211. Let x 2 (a; b] and let us apply the Mean Value Theorem on the interval
[a; x]. It yields a point x
^ 2 (a; x) such that
f (x) f (a)
0 = f 0 (^
x) =
x a
that is, f (x) = f (a). Since x is any point in (a; b], it follows that f (x) = f (a) for any
x 2 [a; b].
This characterization of constant functions will prove important in the theory of integra-
tion. In particular, the following simple generalization of Corollary 1312 will be key.
Corollary 1313 Let f; g : [a; b] ! R be continuous on [a; b] and di erentiable on (a; b).
Then f 0 (x) = g 0 (x) for every x 2 (a; b) if and only if there exists k 2 R such that
Two functions that have the same rst derivative are, thus, equal up to an (additive)
constant k.
Proof Here too we prove the \only if", the \if" being obvious. Let h : [a; b] ! R be the
auxiliary function h (x) = f (x) g (x). We have h0 (x) = f 0 (x) g 0 (x) = 0 for every
x 2 (a; b). Therefore, by Corollary 1312 h is constant on [a; b]. That is, there exists k 2 R
such that h (x) = k for every x 2 [a; b], so f (x) = g (x) + k for every x 2 [a; b].
28.2. MEAN VALUE THEOREM 877
Via higher order derivatives, next we establish the ultimate version of the Mean Value
Theorem.5
Theorem 1314 Let f : [a; b] ! R be n 1 times continuously di erentiable on [a; b] and n
times di erentiable on (a; b). Then, there exists x
^ 2 (a; b) such that
n
X1 f (k) (a) f (n) (^
x)
f (b) f (a) = (b a)k + (b a)n (28.8)
k! n!
k=1
The Mean Value Theorem is the special case n = 1 because (28.7) can be equivalently
written as
f (b) f (a) = f 0 (^
x) (b a)
The mean-value formula (28.8) can be seen as a version of Taylor's formula, arguably the
most important formula of calculus that will be studied in detail later in the book (Chapter
29). For this reason, we call it the Lagrange-Taylor formula.
The function g is continuous on [a; b] and di erentiable on (a; b). Some algebra shows that
(b x)n 1
g 0 (x) = k f (n) (x)
(n 1)!
Let the scalar k be such that g (a) = 0, i.e.,
n
!
X1 f (k) (a) n!
k= f (b) f (a) (b a)k
k! (b a)n
k=1
We close by noting that, as easily checked, there is a dual version of (28.8) involving the
derivatives at other endpoint of the interval:
n
X1 f (k) (b) f (n) (^
x)
f (a) f (b) = (a b)k + (a b)n (28.9)
k! n!
k=1
where, again, x
^ 2 (a; b).
5
In the statement we adopt the convention that \0 times continuous di erentiability" just amounts to
continuity. Moreover, f (0) = f .
878 CHAPTER 28. DIFFERENTIAL METHODS
Although it might be discontinuous, the derivative function still satis es the intermediate
value property of Lemma 574, as the next important result proves.
Theorem 1316 (Darboux) Let f : [a; b] ! R be di erentiable, with f 0 (a) < f 0 (b). If
f 0 (a) z f 0 (b)
then there exists a c b such that f 0 (c) = z. If f 0 is strictly increasing, such c is unique.
Proof Let f 0 (a) < z < f 0 (b) (otherwise the result is trivially true). Set g(x) = f (x) zx. We
have g 0 (x) = f 0 (x) z, and therefore g 0 (a) < 0 and g 0 (b) > 0. The function g is continuous
on [a; b] and, therefore, by Weierstrass' Theorem it has a minimizer xm on [a; b]. Let us
prove that the minimizer xm is interior. Since g 0 (a) < 0, there exists a point x1 2 (a; b) such
that g(x1 ) < g(a). Analogously, being g 0 (b) > 0, there exists a point x2 2 (a; b) such that
g(x2 ) < g(b). This implies that neither a nor b are minimizers of g on [a; b], so xm 2 (a; b).
By Fermat's Theorem, g 0 (xm ) = 0, that is, f 0 (xm ) = z. In conclusion, there exists c 2 (a; b)
such that f 0 (c) = z.
As in Lemma 574, the case f 0 (a) > f 0 (b) is analogous. We can thus say that, for any z
such that
min f 0 (a) ; f 0 (b) z max f 0 (a) ; f 0 (b)
there exists a c b such that f (c) = z. If f 0 is strictly monotone, such c is unique.
Since in general the derivative function is not continuous (so Weierstrass' Theorem cannot
be invoked), Darboux's Theorem does not imply { unlike Lemma 574 { a version of the
Intermediate Value Theorem for the derivative function. Still, Darboux's Theorem is per se
a remarkable property of continuity of the derivative function that implies, inter alia, that
such function can only have essential non-removable discontinuities.
Moreover, the function is said to be (locally) strictly increasing if these inequalities are all
strict.
It is strictly increasing if all these inequalities are strict. For instance, the quadratic function
f (x) = x2 is strictly increasing at all points x0 > 0 and strictly decreasing at all points
x0 < 0; at the origin, it is neither increasing nor decreasing. N
880 CHAPTER 28. DIFFERENTIAL METHODS
Proof If f is increasing, the di erence quotients of f at x0 are all positive (at least for h
su ciently small), so their limit is 0. If instead f 0 (x0 ) > 0, the di erence quotients are, at
least for h close to 0, strictly positive by the Theorem on the permanence of sign. It follows
that f (x0 + h) > f (x0 ) for h > 0 and f (x0 + h) < f (x0 ) for h < 0, with h su ciently
small, so f is strictly increasing at x0 .
Note the asymmetry between points (i) and (ii) of the previous proposition:
We might think that, if a function is monotone at each point of a set A, it enjoys the
same type of monotonicity on the entire set A, i.e., globally. This is not the case. Indeed,
28.4. MONOTONICITY AND DIFFERENTIABILITY 881
consider the function f (x) = 1=x de ned on the open set R f0g. It is strictly increasing
at each point of its domain because f 0 (x) = 1=x2 > 0 for every x 6= 0. However, it is not
increasing at all because, for example 1 < 1, while f ( 1) = 1 > 1 = f (1). Graphically:
8 y
0
O x
-2
-4
-6
-8
-4 -3 -2 -1 0 1 2 3 4 5
Therefore, monotonicity at each point of a set does not imply global monotonicity (of the
same type). Intuitively, this may happen because if such set is a union of disjoint intervals,
then at each interval the function \gets back to the beginning". The next important result
con rms this intuition by showing that the implication does hold when the set is an interval
(so we get rid of the case unions of disjoint intervals just mentioned). It is the classic
di erential criterion of monotonicity.
Proposition 1322 Let f : (a; b) ! R be a di erentiable function, with a; b 2 R. Then, f is
(globally) increasing on (a; b) if and only if f 0 (x) 0 for every x 2 (a; b).
Because of the clause a; b 2 R, the interval (a; b) can be unbounded, for example
(a; b) = R. A dual result, with negativity of the derivative on (a; b), holds for the de-
creasing monotonicity. Note that Corollary 1312 is then a special case since f 0 (x) = 0 for
every x 2 (a; b) is equivalent to having both f 0 (x) 0 and f 0 (x) 0 for every x 2 (a; b)
and therefore, being simultaneously increasing and decreasing, f is constant.
Proof \Only if". Suppose that f is increasing. Let x 2 (a; b). For every h > 0 we have
f (x + h) f (x), hence
f (x + h) f (x)
0
h
It follows that
f (x + h) f (x) f (x + h) f (x)
f 0 (x) = lim = lim 0
h!0 h h!0 + h
\If". Let f 0 (x) 0 for every x 2 (a; b). Let x1 ; x2 2 (a; b) with x1 < x2 . By the Mean
Value Theorem, there exists x
^ 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (^
x) = (28.13)
x2 x1
882 CHAPTER 28. DIFFERENTIAL METHODS
Since f 0 (^
x) 0 and x2 x1 > 0, this shows that f (x2 ) f (x1 ).
Example 1323 (i) Let f : R ! R be given by f (x) = 3x5 +2x3 . Since f 0 (x) = 15x4 +6x2
0 for every x 2 R, by Proposition 1322 the function is increasing. (ii) Let f : R ! R be
the quadratic function f (x) = x2 . We have f 0 (x) = 2x and hence Proposition 1322 (and its
analog for decreasing monotonicity) shows that f is neither increasing nor decreasing on R,
and that it is increasing on (0; 1) and decreasing on ( 1; 0), as previously argued (Example
1319). N
Next we show that the strict positivity of the derivative implies strict increasing mono-
tonicity, thus providing a most useful di erential criterion of strict monotonicity.
Proof The proof is similar to that of Proposition 1322 and is a simple application of the
Mean Value Theorem. Let f 0 (x) > 0 for every x 2 (a; b) and let x1 ; x2 2 (a; b) with x1 < x2 .
By the Mean Value Theorem, there exists c 2 [x1 ; x2 ] such that
f (x2 ) f (x1 )
f 0 (c) = (28.14)
x2 x1
Since f 0 (c) > 0 for every c and x2 x1 > 0, from (28.14) it follows that f (x2 ) > f (x1 ).
The next example shows that the converse of the last result is false, so that for the
derivative of a strictly increasing function on an interval we can only say that it is 0 (and
not > 0).6
Propositions 1322 and 1324 give very useful di erential criteria for the monotonicity of
scalar functions (dual versions hold for decreasing monotonicity). They hold also for closed or
half-closed intervals, once the derivatives at the boundary points are understood as one-sided
ones.
We illustrate Proposition 1324 with an example.
Example 1326 (i) By Proposition 1324 (and its analog for decreasing monotonicity), the
quadratic function f (x) = x2 is strictly increasing on (0; 1) and strictly decreasing on
( 1; 0).
(ii) By Proposition 1324, the function f (x) = 3x5 + 2x3 is strictly increasing both on
( 1; 0) and on (0; 1). Nevertheless, the proposition cannot say anything about the strict
increasing monotonicity of f on R because f 0 (0) = 0. We can, however, check whether f is
strictly increasing on R through the de nition of strict increasing monotonicity. To this end,
note that f (y) < f (0) = 0 < f (x) for every y < 0 < x, so f is indeed strictly increasing on
the entire real line. N
6
Later in the book, we will see that the converse holds under concavity (Corollary 1441).
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 883
That said, we close with a curious characterization of strict monotonicity that, in a sense,
completes Proposition 1324.
Thus, it is the strict positivity at the points of an \order dense" subset of the domain (a; b)
that characterizes strictly increasing functions. In view of Proposition 218, for a di erentiable
monotone function this strict positivity amounts to injectivity.
We can revisit the last two examples in view of Proposition 1327. Indeed, by this result
we can say that the cubic function and the function f (x) = 3x5 + 2x3 are both strictly
increasing because their derivatives are everywhere strictly positive except at the origin.
A nal twist: under continuous di erentiability, the \dense" strict positivity of the deriva-
tive actually characterizes strictly increasing functions.
Proof In view of Proposition 1327, it is enough to show that f 0 0 if for every a x0 <
x00 b there exists x0 z x00 such that f 0 (z) > 0. Let x 2 (a; b). For each n large enough
so that x + 1=n 2 (a; b), there is a point x zn x + 1=n with f 0 (zn ) > 0. Since f 0 is
continuous, from zn ! x it follows that f 0 (x) = lim f 0 (zn ) 0. Since x was arbitrarily
chosen, we conclude that f 0 0.
The su cient condition in which we are interested is based on a simple intuition: for x0
to be a local maximizer, there must exist a neighborhood of x0 in which the function rst
increases { i.e., f 0 (x) > 0 if x < x0 { and then, once it has reached the maximum value at
x0 , decreases { i.e., f 0 (y) < 0 if y > x0 . Graphically:
6
y
O x x
0
1
0
-1 0 1 2 3 4 5
If the inequalities in (28.15) are strict, the local maximizer is strong (so unique).
In a dual way, a local minimizer if in (28.15) we have f 0 (x) 0 f 0 (y), which is strong
if f 0 (x) < 0 < f 0 (y).7 Note that the di erentiability of f at x0 is not required, only its
continuity.
Proof Without loss of generality, assume that B" (x0 ) = (x0 "; x0 + ") C. Let x 2
(x0 "; x0 ). By the Mean Value Theorem, there exists 2 (x0 "; x0 ) such that
f (x0 ) f (x)
= f 0( )
x0 x
By (28.15), we have f 0 ( ) 0, from which we deduce that f (x0 ) f (x). In a similar way,
we can prove that f (x0 ) f (y) for every y 2 (x0 ; x0 + "). So, f (x0 ) f (x) for every
x 2 B" (x0 ) and therefore x0 is a local maximizer.
In particular, the following classic corollary of Proposition 1329 holds. Though weaker,
in many cases it is good enough.
7
In particular, if in (28.15) we have f 0 (x) = 0 = f 0 (y), the point x0 is simultaneously a local maximizer
and a local minimizer, that is, the function f is locally constant at x0 .
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 885
If the inequalities in (28.16) are strict, the local maximizer is strong (so unique).
Example 1332 Let f : R ! R be given by f (x) = jxj and take x0 = 0. The function is
continuous at x0 and di erentiable at each x 6= 0. We have
(
1 if x < 0
f 0 (x) =
1 if x > 0
and hence (28.15) is satis ed in a strict sense. By Proposition 1329, x0 is a strong local
maximizer. Note that in this case Corollary 1330 cannot be applied. N
The previous su cient condition can be substantially simpli ed if we assume that the
function is twice continuously di erentiable. In this case, it is indeed su cient to evaluate
the sign of the second derivative at the point.
Proof Thanks to the continuity of f 00 at x0 , we have limx!x0 f 00 (x) = f 00 (x0 ) < 0. The
Theorem on the permanence of sign implies the existence of a neighborhood B" (x0 ) such
that f 00 (x) < 0 for every x 2 B" (x0 ). Hence, by Proposition 1324 the rst derivative f 0 is
strictly decreasing in B" (x0 ), that is,
Example 1334 Going back to Example 1331, in view of Corollary 1333 it is actually su -
cient to observe that f 00 (0) = 2 < 0 to conclude that x0 = 0 is a strong local maximizer.
Instead, Corollary 1333 cannot be applied to Example 1331 because f (x) = jxj is not
di erentiable at x0 = 0. N
The next example shows that the condition f 00 (x0 ) < 0 is su cient, but not necessary:
there exist local maximizers x0 for which we do not have f 00 (x0 ) < 0.
28.5.2 Searching local extremal points via rst and second-order condi-
tions
Let x0 be an interior point of C, that is, x0 2 int C. In view of Corollary 1333, we can say
that:
(iii) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local maximizer;
(iv) f 0 (x0 ) = 0 and f 00 (x0 ) 0 does not exclude that x0 is a local minimizer.
(i) necessary condition for a point x0 2 int C to be a local maximizer is that f 0 (x0 ) = 0
and f 00 (x0 ) 0;
(ii) su cient condition for a point x0 2 int C to be a strong local maximizer is that f 0 (x0 ) =
0 and f 00 (x0 ) < 0.
Intuitively, if f 0 (x0 ) = 0 and f 00 (x0 ) < 0, the derivative function f 0 at x0 is zero and
strictly decreasing (because its derivative f 00 is strictly negative): therefore it goes, being
zero at x0 , from positive values to negative ones. Hence, the function is increasing before
x0 , stationary at x0 and decreasing after x0 . It follows that x0 is a maximizer.8 A similar
intuition holds for the necessary part.
As it should be clear by now, (i) is a necessary but not su cient condition, while (ii) is
a su cient but not necessary condition. It is an unavoidable asymmetry which we have to
live with.
Terminology The conditions on the second derivatives of the last corollary are called second-
order conditions. In particular:
(i) the inequality f 00 (x0 ) 0 (resp., f 00 (x0 ) 0) is called second-order necessary condition
for a maximizer (resp., for a minimizer ).
(ii) the inequality f 00 (x0 ) < 0 (resp., f 00 (x0 ) > 0) is called second-order su cient condition
for a maximizer (resp., for a minimizer ).
The interest of Corollary 1336 is in allowing to establish a procedure for the search of
local maximizers and minimizers on C of a twice-di erentiable function f : A R ! R.
Though it will be considerably re ned in Section 29.2.2, it is often good enough.
Suppose that f is twice continuously di erentiable on the set of the interior points int C
of C. The procedure has two stages, based on the rst and second-order su cient conditions.
Speci cally:
8
Alternatively, at x0 the function f is stationary and concave (see below), so it admits a maximizer.
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 887
1. Determine the set S int C of the stationary interior points of f ; in other words, solve
the rst-order condition f 0 (x) = 0.
2. Compute f 00 at each of the stationary points x 2 S and check the second-order su cient
conditions: the point x is a strong local maximizer if f 00 (x) < 0, while it is a strong
local minimizer if f 00 (x) > 0. If f 00 (x) = 0 the procedure fails.
The procedure is based on Corollary 1336-(ii). The rst stage { i.e., the solution of the
rst-order condition { is based on Fermat's Theorem: stationary points are the only interior
points that are possible candidates for local extremal points. Hence, the knowledge acquired
in the rst stage is \negative", it rules out all the interior points that are not stationary as
none of them can be a local maximizer or minimizer.
The second stage { i.e., the check of the second-order condition { examines one by one the
possible candidates from the rst stage to see if they meet the su cient condition established
in Corollary 1336-(ii).
Example 1337 Let f : R ! R be given by f (x) = 10x3 (x 1)2 and C = R. Via the
procedure, we search the local extremal points of f on R. We have C = int C = R and f is
twice continuously di erentiable on R. As to stage 1, by recalling what we saw in Example
1306, we have:
S = f0; 1; 3=5g
The stationary points in S are the unique candidates for local extremal points. As to stage
2, we have
f 00 (x) = 60x (x 1)2 + 120x2 (x 1) + 20x3
and therefore f 00 (0) = 0, f 00 (1) > 0 and f 00 (3=5) < 0. Hence, the point 1 is a strong local
minimizer, the point 3=5 is a strong local maximizer, while the nature of the point 0 remains
undetermined. N
The procedure, although very useful, has important limitations. First of all, it can deal
only with the interior points of C at which f is twice continuously di erentiable. It is,
instead, completely silent on the other points of C { that is, on its boundary points as well
as on its interior points at which f is not twice continuously di erentiable.
The boundary points 0 and 1 are maximizers, but the procedure is not able to recognize
them as such. N
A further limitation of the procedure is its indeterminacy in the case f 00 (0) = 0, as the
simple function f (x) = x4 most eloquently shows: whether or not the stationary point x = 0
is a local minimizer cannot be determined through the procedure because f 00 (0) = 0. Let us
see another example which is as trivial as disconcerting (for the procedure's self-esteem).
888 CHAPTER 28. DIFFERENTIAL METHODS
28.5.3 Searching global extremal points via rst and second-order condi-
tions
We can apply what we just learned to the unconstrained optimization problem (28.5), re ning
for the scalar case the analysis of Section 28.1.3. So, consider the unconstrained optimization
problem
1. Determine the set S C of the stationary interior points of f by solving the rst-order
condition f 0 (x) = 0.
Note that the procedure is not conclusive because a key piece of information is lacking:
whether the problem actually admits a solution. The di erential methods of this chapter
do not ensure the existence of a solution, which only Weierstrass' and Tonelli's Theorems
are able to guarantee (in the absence of concavity properties of the objective functions).
In Chapter 37, we will show how the elimination method re nes, in a resolutive way, the
procedure that we outlined here by combining such existence theorems with the di erential
methods.
Example 1340 As usual, the study of the cubic function f (x) = x3 is of illuminating
simplicity: though the unconstrained optimization problem
max x3 sub x 2 R
x
does not admit solutions, nevertheless the procedure determines the singleton S3 = f0g.
According to the procedure, the point 0 is the unique candidate solution of the problem:
unfortunately, the solution does not exist and it is, therefore, a useless candidacy. N
28.5. SUFFICIENT CONDITIONS FOR LOCAL EXTREMAL POINTS 889
It is important to observe how the global nature of the solution gives a di erent perspec-
tive on Corollary 1336. Of this result, we are interested in point (i) that provides a necessary
conditions for local maximizers (second-order necessary condition of the form f 00 (x) 0).
At the same time, in the previous search for local extremal points we considered point (ii) of
such result that covers su ciency (second-order su cient condition of the form f 00 (x) < 0).
From the \global" point of view, the fact that f 00 (x) < 0 implies that x is a strong local
maximizer is of secondary importance. Indeed, it is not conclusive: the point could be just a
local maximizer and, moreover, we could also have solutions where f 00 (x) = 0.9 In contrast,
the information f 00 (x) > 0 is conclusive in that it excludes, ipso facto, that x may be a
solution.
This is another example of how the global point of view, the one which we are really
interested in applications, can lead to view things in a di erent way relative to a local point
of view.10
Naturally, x < x0 < y implies f 0 (x) 0 f 0 (y) is the dual version of (28.17) that leads
to global minimizers.
Proof Let x 2 (a; b) be such that x < x0 . Fixing any " 2 (x0 x; x0 a), it follows that
x 2 (x0 "; x0 ). By the Mean Value Theorem there exists 2 (x0 "; x0 ) such that
f (x0 ) f (x)
= f 0( )
x0 x
Despite being attractive because of its simplicity, the global hypothesis (28.17) on deriva-
tives is less relevant than one can think prima facie because in applications it is typically
subsumed by concavity. Indeed, under concavity the rst derivative (if exists) is decreasing
(cf. Corollary 1426), so condition (28.17) automatically holds provided the rst-order con-
dition f 0 (x0 ) = 0 holds. Though condition (28.17) can be used to nd the maximizers of
functions that are not concave { e.g., in Section 36.4 we will apply it to the Gaussian func-
tion, which is neither concave nor convex { it is much more convenient to consider a general
property of a function, like concavity, that does not require, a priori, the identi cation of
a point x0 on which to check (28.17). All this explains the brevity of this section (and its
title). The role of concavity, instead, will be studied at length later in the book.
Theorem 1345 (de l'Hospital) Let f; g : (a; b) ! R be di erentiable on (a; b), with a; b 2
R and g 0 (x) 6= 0 for every x 2 (a; b), and let x0 2 [a; b], with
f 0 (x)
lim =L2R (28.18)
x!x0 g 0 (x)
If either limx!x0 f (x) = limx!x0 g (x) = 0 or limx!x0 f (x) = limx!x0 g (x) = 1, then
f (x)
lim =L
x!x0 g (x)
Thus, de l'Hospital's rule says that, under the hypotheses just indicated, we have
f 0 (x) f (x)
lim = L =) lim =L
x!x0 g 0 (x) x!x0 g (x)
i.e., the calculation of the limit limx!x0 f (x) =g (x) can be reduced to the calculation of
the limit of the ratio of the derivatives limx!x0 f 0 (x) =g 0 (x). The simpler the second limit
compared to the original one, the greater the usefulness of the rule.
Note that the { by now usual { clause a; b 2 R allows the interval (a; b) to be unbounded.
The rule holds, therefore, also for limits as x ! 1. Moreover, it applies also to one-sided
limits, even if for brevity we have omitted this case in the statement.
11
The result is actually due to Johann Bernoulli.
892 CHAPTER 28. DIFFERENTIAL METHODS
We omit the proof of the l'Hospital's Theorem. Next we illustrate his rule with some
examples.
Example 1347 Let f; g : R ! R be given by f (x) = sin x and g (x) = x. Set x0 = 0 and
consider the classic limit limx!x0 f (x) =g (x). In every interval ( "; ") the hypotheses of de
l'Hospital's rule are satis ed, so
It is nice to see how de l'Hospital's rule solves, in a simple way, this classic limit. N
The next example shows that for the solution of some limits it may be necessary to apply
de l'Hospital's rule several times.
f 0 (x) ex 1 ex f (x) ex 1 ex
lim = lim = lim =) lim = lim = lim (28.19)
x!x0 g 0 (x) x!+1 2x 2 x!+1 x x!x0 g (x) x!+1 x2 2 x!+1 x
obtaining a simpler limit, but still not solved.
Let us apply again de l'Hospital's rule to the derivative functions f 0 ; g 0 : R ! R given by
f (x) = ex and g 0 (x) = x. Again in every interval (a; +1), with a > 0, the hypotheses of
0
f 00 (x) ex f 0 (x) ex
lim = lim = +1 =) lim = lim = +1
x!x0 g 00 (x) x!+1 1 x!x0 g 0 (x) x!+1 x
28.6. DE L'HOSPITAL'S THEOREM AND RULE 893
f (x) ex
lim = lim 2 = +1
x!x0 g (x) x!+1 x
Example 1350 In a similar way it is possible to calculate the limit of the ratio between
f (x) = 1 cos x and g (x) = x2 as x ! 0:
and therefore the application of de l'Hospital's rule has led to a more complicated limit than
the original one. In this case, the rule is useless, while the limit can be solved very easily in
a direct way:
2
ex 2
lim = lim ex x = lim ex(x 1) = +1
x!+1 ex x!+1 x!+1
As usual, cogito ergo solvo: mindless mechanical arguments may well lead astray. N
does not exist. If we tried to compute the simple limit limx!x0 f (x) =g (x) = 0 through de
l'Hospital's rule we would have used a tool both useless, given the simplicity of the limit,
and ine ective. Again, a mechanical use of the rule can be very misleading. N
Summing up, de l'Hospital's rule is a useful tool in the computation of limits, but its use-
fulness must be evaluated case by case. Moreover, it is important to note that de l'Hospital's
Theorem states that, if lim f 0 =g 0 exists, then lim f =g exists too, and the two limits are equal.
The converse does not hold: it may happen that lim f =g exists but not lim f 0 =g 0 . We have
already seen an example of this, but we show two other examples, a bit more complicated.
894 CHAPTER 28. DIFFERENTIAL METHODS
sin x
f (x) x sin x 1
lim = lim = lim x =1
x!1 g (x) x!1 x + sin x x!1 sin x
1+
x
but
f 0 (x) 1 cos x
lim 0
= lim
x!1 g (x) x!1 1 + cos x
does not exist because both the numerator and the denominator oscillate between 0 and 2,
so the ratio oscillates between 0 and +1. N
1
Example 1354 Given f (x) = x2 sin and g (x) = x, we have
x
f (x) x2 sin x1 1
lim = lim = lim x sin = 0
x!0 g (x) x!0 x x!0 x
But
f 0 (x) 2x sin x1 cos x1
lim = lim
x!0 g 0 (x) x!0 1
does not exist because in the numerator the rst summand tends to 0 and the second one
does not admit limit. N
f (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
g(x)
with limx!x0 1=g (x) = 0 and de l'Hospital's rule is applicable to the functions f and 1=g. If
f is di erent from zero in a neighborhood of x0 , we can also write
g (x)
lim f (x) g (x) = lim 1
x!x0 x!x0
f (x)
with limx!x0 1=f (x) = 1. In this case, de l'Hospital's rule can be applied to the functions
g and 1=f . Which one of the two possible applications of the rule is more convenient must
be evaluated case by case.
28.6. DE L'HOSPITAL'S THEOREM AND RULE 895
g (x)
lim (f (x) + g (x)) = lim f (x) 1 + (28.20)
x!x0 x!x0 f (x)
and apply de l'Hospital's rule to the limit limx!x0 g (x) =f (x), which has the form 1=1.
Alternatively, we can consider
1 1
+
f (x) g (x)
lim (f (x) + g (x)) = lim (28.21)
x!x0 x!x0 1
f (x) g (x)
Approximation
(ii) the quality of the approximation, given by the error term o (h).
Intuitively, there is a tension between these two properties: the simpler the approximating
function, the worse the quality of the approximation. In other terms, the simpler we desire
the approximating function to be, the higher the error which we may incur.
In this section we study in detail the trade-o between these two key properties when
the approximating function is a general polynomial of degree n, not necessarily of degree 1
{ i.e., a straight line like (29.1). The desideratum that we posit is that to a more complex
approximating polynomial { i.e., to a polynomial with a higher degree n { corresponds an
improved error term of magnitude o (hn ) that, as h ! 0, goes to zero faster than hn . An
increase in the complexity of the approximating polynomial should thus be compensated by
an improvement in the quality of the approximation.
To formalize these ideas, we introduce polynomial expansions. Recall that a polynomial
pn : R ! R of, at most, degree n 0 has the form pn (h) = 0 + 1 h + 2 h2 + + n hn .
897
898 CHAPTER 29. APPROXIMATION
f (x0 + h) = 0 + h + o (h) as h ! 0
Next we establish a key property: when they exist, polynomial expansions are unique.
Lemma 1358 A function f : (a; b) ! R has at most one polynomial expansion of degree n
at each point x0 2 (a; b).
To better understand this important lemma, we rst prove it in the special quadratic
case and then in full generality.
Quadratic case We want to show that f has at most one quadratic expansion at each
point x0 2 (a; b). Suppose that, for every 0 6= h 2 (a x0 ; b x0 ), there exist two quadratic
expansions
2 2
0 + 1h + 2h + o h = 0 + 1 h + 2 h2 + o h2 (29.4)
To show that they are equal we need to show that their coe cients are equal, i.e., that
0 = 0 , 1 = 1 and 2 = 2 . To this end, we rst observe that
2
0 = lim 0 + 1h + 2h + o h2 = lim 0 + 1h + 2h
2
+ o h2 = 0
h!0 h!0
1
The condition 0 6= h 2 (a x0 ; b x0 ) ensures that (29.2) holds for every h 6= 0 with x0 + h 2 (a; b), i.e.,
for every h 6= 0 where f (x0 + h) is well de ned.
29.1. TAYLOR'S POLYNOMIAL APPROXIMATION 899
1 + 2h + o (h) = 1 + 2h + o (h)
Hence,
1 = lim ( 1 + 2h + o (h)) = lim ( 1 + 2h + o (h)) = 1
h!0 h!0
In turn, this implies that (29.5) becomes
2
2h + o h2 = 2h
2
+ o h2
Dividing both sides by h2 , we get
o h2 o h2
= 2+ 2+
h2 h2
By taking the limits as h ! 0, we get 2 = 2 . This completes the proof that the two
quadratic expansions in (30.8) are equal.
Proof of Lemma 1358 Suppose that, for every 0 6= h 2 (a x0 ; b x0 ), there exist two
expansions
2 n
0 + 1h + 2h + + nh + o (hn ) = 0 + 1h + 2h
2
+ + nh
n
+ o (hn ) (29.6)
We want to show that they are equal, i.e., that they have equal coe cients. We begin by
observing that
2 n
0 = lim 0 + 1h + 2h + + nh + o (hn )
h!0
2 n
= lim 0 + 1h + 2h + + nh + o (hn ) = 0
h!0
Hence,
n 1
1 = lim 1 + 2h + + nh + o hn 1
h!0
n 1
= lim 1 + 2h + + nh + o hn 1
= 1
h!0
De nition 1359 Let f : (a; b) ! R be a function n times di erentiable at a point x0 2 (a; b).
The polynomial Tn : R ! R of degree at most n given by
1 1 (n)
Tn (h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + + f (x0 ) hn
2 n!
Xn
f (k) (x0 ) k
= h
k!
k=0
To ease notation we put f (0) = f . The polynomial Tn has as coe cients the derivatives of
f at the point x0 , up to order n. In particular, at the origin x0 = 0 the Taylor's polynomial
is called Maclaurin's polynomial.
The next approximation result, fundamental and of great elegance, shows that when f
has a suitable number of derivatives at x0 , the unique polynomial expansion at x0 is given
precisely by the Taylor polynomial.
The Taylor polynomial Tn is the unique polynomial of degree at most n that satis es De -
nition 1357, i.e., which is able to approximate f (x0 + h) with error o (hn ).
For n = 1, the Taylor-Peano Theorem coincides with the \if" part of Theorem 1243
because
T1 (h) = f (x0 ) + df (x0 ) (h)
2
The formula is named after Brook Taylor, who came up with it in 1715. Approximation (29.8) was proved
in 1884 by Giuseppe Peano (see annotation 67 of Genocchi and Peano, 1884, which also contains historical
remarks on Taylor's formula).
29.1. TAYLOR'S POLYNOMIAL APPROXIMATION 901
For n = 1, the polynomial approximation (29.9) thus reduces to the linear approximation
(26.27), that is, to
1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2 as h ! 0 (29.10)
2
and so on for higher orders.
Approximation (29.9) is key in applications and is the di erential form of the aforemen-
tioned tension between the complexity of the approximating polynomial and the goodness
of the approximation. The trade-o must be solved case by case, according to the relative
importance that the two properties of the approximation { complexity and quality { have in
the particular application which we are interested in. That said, in many cases the quadratic
approximation (29.10) is a good compromise and so, among all the possible approximations,
it has a special importance.
O.R. The linear approximation is graphically, as by now we know very well, the straight line
tangent to the graph of the function. The quadratic approximating is, instead, the parabola
that shares at x0 the same value of the function, the same slope ( rst derivative), and the
same curvature (second derivative). For this reason it is called the osculating parabola.3 H
In view of the importance of the Taylor-Peano Theorem, we rst prove the special
quadratic case, with a simpli ed argument that uses a stronger hypothesis, and then prove
the result in full generality and rigor.
Quadratic case Let x0 2 (a; b). Assume that f is twice di erentiable on the entire interval
(a; b), not just at x0 . We want to establish the quadratic approximation (29.10). De ne the
auxiliary function ' : (a; b) ! R by
1 00
' (h) = f (x0 + h) f (x0 ) f 0 (x0 ) h f (x0 ) h2
2
' (h)
lim =0 (29.11)
h!0 h2
It holds
'0 (h) = f 0 (x0 + h) f 0 (x0 ) f 00 (x0 ) h
As f is twice di erentiable, by Proposition 1244 both f and f 0 are continuous at x0 . Hence,
lim ' (h) = ' (0) = 0 and lim '0 (h) = '0 (0) = 0
h!0 h!0
3
From the Latin os, mouth, so it is the \kissing" parabola (where the kiss is with f at x0 ).
902 CHAPTER 29. APPROXIMATION
'0 (h)
lim =0
h!0 h
We have
'0 (h) f 0 (x0 + h) f 0 (x0 ) f 00 (x0 ) h
lim = lim
h!0 h h!0 h
f 0 (x0 + h) 0
f (x0 )
= lim f 00 (x0 ) = f 00 (x0 ) f 00 (x0 ) = 0
h!0 h
as desired.
Thanks to (29.13) and (29.14), we can apply de l'Hospital's rule n 1 times, and get
with L 2 R. Simple calculations show that (n 1) (h) = n!h. From (29.15) it then follows
that
'(n 1) (h) '(n 1) (h) '(0) (h) ' (h)
lim = lim (n 1) = 0 =) lim (0) = lim =0
h!0 h h!0 (h) h!0 (h) h!0 hn
To prove (29.12) it thus remains to show that
'(n 1) (h)
lim =0
h!0 h
We have
'(n 1) (h) f (n 1) (x
0 + h) f (n 1) (x )
0 hf (n) (x0 )
lim = lim
h!0 h h!0 h !
f (n 1) (x
0 + h) f (n 1) (x )
0
= lim f (n) (x0 )
h!0 h
= f (n) (x0 ) f (n) (x0 ) = 0
as desired.
f 0 (x) = 4x3 9x2 ; f 00 (x) = 12x2 18x ; f 000 (x) = 24x 18 ; f (iv) (x) = 24
and so
f 00 (0)
0 = f (0) = 0 , 1 = f 0 (0) = 0 , 2 = =0
2!
f 000 (0) 18 f (iv) (0) 24
3 = = = 3 , 4 = = =1
3! 6 4! 24
N
904 CHAPTER 29. APPROXIMATION
h h2
log (1 + x0 + h) = log (1 + x0 ) +
2 (1 + x0 )2
1 + x0
h3 n+1 hn
+ + + ( 1) + o (hn )
3 (1 + x0 )3 n (1 + x0 )n
Xn
hk
= log (1 + x0 ) + ( 1)k+1 k
+ o (hn )
k=1
k (1 + x0 )
A simple polynomial thus locally approximates the logarithmic function. In particular, the
Maclaurin's expansion of order n of f is
x2 x3 xn
log (1 + x) = x + + + ( 1)n+1 + o (xn ) (29.18)
2 3 n
n
X xk
= ( 1)k+1 + o (xn )
k
k=1
Example 1363 In a similar way the reader can verify the Maclaurin's expansions of order
n of the following elementary functions:
X xk n
x2 x3 xn
ex = 1 + x + + + + + o (xn ) = + o (xn )
2 3! n! k!
k=0
n X ( 1)k n
1 3 1 ( 1)
sin x = x x + x5 + + x2n+1
+o x2n+1
= x2k+1 + o x2n+1
3! 5! (2n + 1)! (2k + 1)!
k=0
n n
X
1 2 1 ( 1) 2n ( 1)k 2k
cos x = 1 x + x4 + + x + o x2n = x + o x2n
2 4! (2n)! (2k)!
k=0
Here too it is important to observe how these functions can be locally (well) approximated
by simple polynomials. N
is in nitely di erentiable at each point of its domain. Let us calculate the second-order
Maclaurin expansion. We have
3x2 3x4 + 6x
f 0 (x) = 6 cos x sin x , f 00 (x) = 6(cos2 x sin2 x)
1 + x3 (1 + x3 )2
So,
1
f (x) = f (0) + f 0 (0) x + f 00 (0) x2 + o x2 = 3x2 + o x2 (29.19)
2
N
is in nitely di erentiable at each point of its domain. We leave to the reader to verify that
the third-order Taylor expansion at x0 = 3 is given by
log 4 1 5 4 log 4 16 log 4 25
f (x) = 3
+1+ (x 3) + (x 3)2
e 4e3 32e3
63 32 log 4
+ 3
(x 3)3 + o (x 3)3
192e
N
As a rst illustration of the usefulness of Taylor expansions, we show how they signi -
cantly simplify the calculation of limits. Indeed, by suitably expanding f at x0 we reduce the
original limit to a simple limit of polynomials. We illustrate this with a couple of examples.
log 1 + x3 3 sin2 x
lim
x!0 log (1 + x)
Since the limit involves x ! 0, we can use the second-order Maclaurin's expansions (29.19)
and (29.18) to approximate the numerator and the denominator, respectively. Using Lemma
547 and the little-o algebra, we have
log 1 + x3 3 sin2 x 3x2 + o x2 3x2
lim = lim = lim =0
x!0 log (1 + x) x!0 x + o (x) x!0 x
The calculation of the limit has, therefore, been considerably simpli ed through the combined
use of Maclaurin's expansions and of the comparison of in nitesimals seen in Lemma 547.
(ii) Consider the limit
x sin x
lim
x!0 log2 (1 + x)
This limit can also be calculated by combining an expansion and a comparison of in nitesi-
mals:
x sin x x (x + o (x)) x2 + o x2 x2
lim = lim = lim = lim =1
x!0 log2 (1 + x) x!0 (x + o (x))2 x!0 x2 + o (x2 ) x!0 x2
N
906 CHAPTER 29. APPROXIMATION
In particular,
f (n+1) (x0 + #h h) hn+1
= o (hn ) as h ! 0 (29.21)
(n + 1)!
Under the hypotheses of this theorem,4 the error term o (hn ) can be thus taken equal to
where the (n + 1)-th derivative is computed at an intermediate point x0 +#h h between x0 and
x0 +h. This expression allows us to better control the approximation error: if f (n+1) (x) k
for all x 2 (a; b), then
n
X f (k) (x0 ) k
f (x0 + h) hk jhjn+1
k! (n + 1)!
k=0
Error term (29.22) is called the Lagrange remainder, while o (hn ) is called the Peano
remainder. The former permits error estimates, as just remarked, but the latter is often
enough to express the quality of the approximation.
Proof Let 0 6= h 2 (a x0 ; b x0 ), i.e., such that x0 + h 2 (a; b). Suppose that h > 0 (a
similar argument holds when h < 0). Consider the interval [x0 ; x0 + h] (a; b). By formula
(28.8), we have
n
X f (k) (x0 ) k f (n+1) (^
x) n+1
f (x0 + h) = h + h
k! (n + 1)!
k=0
for some x ^ 2 (x0 ; x0 + h). Thus, for some 0 < t < 1 we have x
^ = tx0 + (1 t) (x0 + h), so
x
^ = x0 + #h by setting # = 1 t. We thus get (29.20). As the number # depends on h, we
write #h .
So far we only needed f to be n times continuously di erentiable. Now, the n + 1 times
continuous di erentiability at x0 allows us to write:
For the coda reader, an elegant application of what found in this example is the following
series expansion of a logarithmic function, which inter alia generalizes Proposition 407.
Corollary 1369 It holds
1
X xk
log (1 + x) = ( 1)k+1 8x 2 ( 1; 1] (29.25)
k
k=1
Proof Let x 2 [0; 1]. In view of the last example, for each n there exists #x;n 2 (0; 1) such
that
Xn n+1
xk ( 1)n+2 x 1
log (1 + x) ( 1)k+1 =
k 1+n 1 + #x;n x n+1
k=1
where the inequality holds because 0 x= (1 + #x;n x) 1. As n ! +1, we get (29.25). For
the case x 2 ( 1; 0) one needs the so-called Cauchy remainder (for brevity, we omit details).
29.2 Omnibus
29.2.1 Omnibus proposition for local extremal points
Although for simplicity we have studied the Taylor-Peano Theorem for functions de ned on
intervals (a; b), it holds at any interior points x0 of any set A where f is n times di erentiable.
This version allows us to state an \omnibus" proposition for local extremal points that
includes and extends both the necessary condition f 0 (x0 ) = 0 of Fermat's Theorem and
the su cient condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0 of Corollary 1333 (see also Corollary
1336-(ii)).
908 CHAPTER 29. APPROXIMATION
(i) If n is even and f (n) (x0 ) < 0, the point x0 is a strong local maximizer.
(ii) If n is even and f (n) (x0 ) > 0, the point x0 is a strong local minimizer.
For n = 1, point (iii) is nothing but the fundamental rst-order necessary condition
f 0 (x0 ) = 0. Indeed, for n = 1, point (iii) states that if f 0 (x0 ) 6= 0, then x0 is not a
local extremal point (i.e., neither a local maximizer nor a local minimizer). By taking the
contrapositive, this amounts to saying that if x0 is a local extremal point, then f 0 (x0 ) = 0.
Hence, (iii) extends to higher order derivatives the rst-order necessary condition.
Point (i) instead, together with the hypothesis f (k) (x0 ) = 0 for every 1 k n 1,
extends to higher order derivatives the second-order su cient condition f 00 (x0 ) < 0 for
strong local maximizers. Indeed, for n = 2 (i) is exactly condition f 00 (x0 ) < 0. Analogously,
(ii) extends the analogous condition f 00 (x0 ) > 0 for minimizers.5
N.B. In this and in the next section we will focus on the generalization of su ciency point
(ii) of Corollary 1336. It is possible to generalize in a similar way its necessity point (i), as
readers can check. O
Proof (i). Let n be even and let f (n) (x0 ) < 0. By the Taylor-Peano Theorem, from the
hypothesis f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0 it follows that
Since limh!0 o (hn ) =hn = 0, there exists > 0 such that jhj < implies jo (hn ) =hn j < 1.
Hence,
o (hn )
h2( ; ) =) 1 + >0
hn
Since f (n) (x0 ) < 0, we have therefore, because hn > 0 being n even,
So, x0 is a local maximizer. This proves (i). In a similar way we prove (ii). Finally, (iii) can
be proved by adapting in a suitable way the proof of Fermat's Theorem.
5
Observe that, given what has been proved about the Taylor's approximation, the case n = 2 presents
an interesting improvement with respect to Corollary 1333: it is required that the function f be twice
di erentiable on the neighborhood B" (x0 ), but f 00 is not required to be continuous.
29.2. OMNIBUS 909
and therefore
does not exist. Thus, Proposition 1370 cannot be applied and so it is not able to say
anything about the nature of the stationary point x = 0. Nevertheless, the graph of f shows
that the origin is not a local extremal point since f has in nitely many oscillations in any
neighborhood of zero. N
Example 1373 The general version of the previous example considers the function f : R !
R de ned, for n 1, by
8
< xn sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
and shows that it does not have derivatives of order n at the origin (in the case n = 1,
this means that at the origin the rst derivative does not exist). We leave to the reader the
analysis of this example. N
Suppose that f is twice di erentiable on the interior points of C, that is, on int C. The
omnibus procedure consists in the following two stages:
1. Determine the set S of stationary points by solving the rst-order condition f 0 (x) = 0.
If S = ; the procedure ends (we conclude that, since there are no stationary points,
there are no extremal ones); otherwise we move to the next step.
This is the classic procedure to nd local extremal points based on rst-order and second-
order conditions of Section 28.5.2. The version just presented improves what we have seen
there because, using again what we observed in a previous footnote, it requires only that the
function has two derivatives on int C, not necessarily continuous. However, we are still left
with the other limitations discussed in Section 28.5.2.
29.3. MULTIVARIABLE TAYLOR EXPANSION 911
Suppose that f is in nitely di erentiable on int C. The omnibus procedure consists in the
following stages:
1. Determine the set S of the stationary points by solving the equation f 0 (x) = 0. If
S = ;, the procedure ends; otherwise move to the next step.
3. Compute f 000 at each point of S (2) : if f 000 (x) 6= 0, the point x is not an extremal one.
Call S (3) the subset of S (2) in which f 000 (x) = 0. If S (3) = ;, the procedure ends;
otherwise move to the next step.
4. Compute f (iv) at each point of S (3) : the point x is a strong local maximizer if f (iv) (x) <
0; a strong local minimizer if f (iv) (x) > 0. Call S (4) the subset of S (3) in which
f (iv) (x) = 0. If S (4) = ;, the procedure ends; otherwise move to the next step.
The procedure thus ends if there exists n such that S (n) = ;. Otherwise, the procedure
iterates ad libitum (or ad nauseam).
Example 1374 Consider again the function f (x) = x4 , with C = R. We saw in Example
1335 that for its maximizer x0 = 0 it was not possible to apply the su cient condition
f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have, however,
so that
S = S (2) = S (3) = f0g and S (4) = ;
Stage 1 identi es the set S = f0g, about which stage 2 has however nothing to say since
f 00 (0) = 0. Also stage 3 does not add any extra information since f 000 (0) = 0. Stage 4
instead is conclusive: since f (iv) (0) < 0, we can assert that x = 0 is a strong local maximizer
(actually, it is a global maximizer, but this procedure does not allow us to say this). N
Naturally, the procedure is of practical interest when it ends after few stages.
Expression (29.28) is called the quadratic (or second-order ) Taylor expansion (or for-
mula). The polynomial in the variable x
1
f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 )
2
is called the Taylor polynomial of second degree at the point x0 . The second-degree term
is a quadratic form. Its associated matrix, the Hessian r2 f (x), is symmetric by Schwarz's
Theorem.
In the format (29.26) the quadratic Taylor expansion is
1
f (x + h) = f (x0 ) + rf (x0 ) h + h r2 f (x0 ) h
2
Xn n n n
@f (x0 ) 1 X @ 2 f (x0 ) 2 1 X X @ 2 f (x0 )
= f (x0 ) + hi + h i + hi hj
@xi
i=1
2 @x2i i=1
2 @xj @xi
i=1 j=1
where we have also wrote the expansion through sums, a version that may be useful to carry
out calculations.
Naturally, if terminated at the rst-order the Taylor's expansion reduces to (29.26) and
(29.27). Moreover, observe that in the scalar case Taylor's polynomial assumes the well-know
form:
1
f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 )2
2
6
In the rest of this section U is an open convex set.
29.3. MULTIVARIABLE TAYLOR EXPANSION 913
As in the scalar case, here too we have a trade-o between the simplicity of the approx-
imation and its accuracy. Indeed, the rst-order approximation (29.27) has the advantage
of simplicity compared to the quadratic one: we approximate with a linear function rather
than with a second-degree polynomial, but to the detriment of the degree of accuracy of the
approximation, given by o (kx x0 k) instead of the better o kx x0 k2 .
Also in the multivariable case, therefore, the choice of the order at which to terminate
the Taylor expansion depends on the particular use we are interested in, and on which aspect
of the approximation is more important, simplicity or accuracy.
2
Example 1376 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 ex2 . We have:
2 2
rf (x) = 6x1 ex2 ; 6x21 x2 ex2
and " #
2 2
6ex2 12x1 x2 ex2
r2 f (x) = 2 2
12x1 x2 ex2 6x21 ex2 1 + 2x22
By Theorem 1375, the Taylor expansion at x0 = (1; 1) is
Proof of Theorem 1375 For simplicity, assume that the domain of f is all Rn . Fix
a point y 2 Rn and introduce the auxiliary scalar functions ; : R ! R de ned by
(t) = f (x0 + ty) and (t) = x0 + ty for each t 2 R. We have (t) = f ( (t)) for every
t 2 R, i.e., = f . In particular,
0 1 00
(t) = (0) + (0) t + (0) t2 + o t2 (29.31)
2
914 CHAPTER 29. APPROXIMATION
Consider now the auxiliary scalar function 'i : R ! R de ned by 'i (t) = (@f =@xi ) (x0 + ty)
for each t 2 R and each i = 1; ::; n. We have 'i = (@f =@xi ) , and so by the chain rule we
have:
n @ @f
X n
X
0 @xi @2f
'i (t) = ( (t)) 0j (t) = (x0 + ty) yj
@xj @xj @xi
j=1 j=1
and therefore
n X
X n
00 @2f
(0) = (x0 ) yj yi = y r2 f (x0 ) y (29.34)
@xj @xi
i=1 j=1
By de nition,
x x0
(kx x0 k) = f x0 + kx x0 k = f (x)
kx x0 k
Therefore, we conclude that:
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2
2
as desired.
We close with a rst-order approximation with Lagrange remainder that sharpens the
approximation (29.26) with Peano remainder.7
7
Higher order approximations with Lagrange remainders are notationally cumbersome, and we leave them
to more advanced courses.
29.3. MULTIVARIABLE TAYLOR EXPANSION 915
So, 8
> p1 (a + b) if t > 0
>
< 2
b a @f (t; t) @f (t; t)
p p = + (a + b) = 0 if t = 0
2 2 @x1 @x2 >
>
: p1 (a + b) if t < 0
2
that is, 8
>
> 1 if t > 0
b a <
= 0 if t = 0
a+b >>
:
1 if t < 0
But this contradicts the arbitrary nature of the scalars a and b. For instance, if b = 2a then
(b a) = (a + b) = 1=3. N
Proof (i) Let x ^ be a local maximizer on U . We want to prove that the quadratic form
h r2 f (^ x) h is negative semi-de nite. For simplicity, let us suppose that x ^ is the origin
2
0 = (0; 0). First of all, let us prove that v r f (0) v 0 for every unit vector v of Rn . We
will then prove that h r2 f (0) h 0 for every vector h 2 Rn .
Since 0 is a local maximizer and U is open, there exists a small enough neighborhood
B" (0) so that B" (0) U and f (0) f (x) for every x 2 B" (0). Note that every vector
x 2 B" (0) can be written as x = tv, where v is a unit vector of Rn (i.e., jjvjj = 1) and
t 2 R.10 Clearly, tv 2 B" (0) if and only if jtj < ". Fix an arbitrary unit vector v in Rn , and
de ne the function v : ( "; ") ! R by v (t) = f (tv). Since tv 2 B" (0) for jtj < ", we have
for every t 2 ( "; "). It follows that t = 0 is a local maximizer for the function v and hence,
being v di erentiable and t = 0 an interior point of the domain of v , by applying Corollary
1336 we get 0v (0) = 0 and 00v (0) 0. By applying the chain rule to the function
1.5 h=t v
h
1
v
0.5
0
1
-0.5
-1
-1.5
-2
-2 -1 0 1 2
10
Intuitively, v represents the direction of x and t its norm (indeed, jjxjj = jtj).
918 CHAPTER 29. APPROXIMATION
Then
h r2 f (0) h = th v r2 f (0) th v = t2h v r2 f (0) v
Since v r2 f (0) v 0, we have also h r2 f (0) h 0. This holds for every h 2 Rn , so the
quadratic form h r2 f (0)h is negative semi-de nite.
(ii) We prove the result for maximizers (a similar argument holds for minimizers). Since
f is twice continuously di erentiable, we have, for each x 2 U ,
1
f (x) = f (^
x) + rf (^
x) (x x
^) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x
2
Consider the unit sphere @B1 (0) = fh 2 Rn : khk = 1g. De ne the quadratic form g : Rn !
R by
g (h) = h r2 f (^
x) h
Clearly, g is continuous. Moreover, by hypothesis g is negative de nite; hence, g (h) < 0 for
all h 2 @B1 (0). As the unit sphere is compact, by the Weierstrass Theorem there exists a
maximizer h 2 @B1 (0) of g, that is,
Set " = g h =2 > 0. By de nition of little-o, there exists > 0 such that, for all x 2 U ,
o kx ^k2
x
0 < kx x
^k < =) <" (29.38)
kx ^k2
x
f (x) f (^
x) 1
(x ^) r2 f (^
x x) (x x
^) o kx ^k2
x
2
2 = +
kx x
^k kx ^ k2
x kx ^k2
x
x) th o kx x
1 th r2 f (^ ^k2 x) h o kx x
1 h r2 f (^ ^k2
= + = +
2 kthk2 ^k2
kx x 2 khk2 ^k2
kx x
1 g (h) o kx x^ k2 1 o kx x^k2
= + g h + < "+"=0
2 khk2 ^ k2
kx x 2 ^k2
kx x
In the scalar case we get back to the usual second-order conditions, based on the sign of
the second derivative f 00 (^
x). Indeed, we already observed in (29.29) that in the scalar case
one has
x r2 f (^
x) x = f 00 (^
x) x2
29.3. MULTIVARIABLE TAYLOR EXPANSION 919
Thus, in this case the sign of the quadratic form depends only on the sign of f 00 (^ x). That is,
it is negative (positive) de nite if and only if f 00 (^
x) < 0 (> 0), and it is negative (positive)
semi-de nite if and only if f 00 (^
x) 0 ( 0).
Naturally, as in the scalar case, also in this general multivariable case condition (i) is
only necessary for x ^ to be a local maximizer on U . Note that this condition implies that, if
the quadratic form h r2 f (^ x) h is inde nite, the point x ^ is neither a local maximizer nor a
local minimizer on U .
Example 1380 Consider the function f (x1 ; x2 ) = x21 x2 . At the origin x ^ = 0 we have
r2 f (0) = O. The corresponding quadratic form x r2 f (0) x is identically zero and is
therefore both negative and positive semi-de nite. Nevertheless, x ^ = 0 is neither a local
maximizer nor a local minimizer. Indeed, by taking a generic neighborhood B" (0), let
x = (x1 ; x2 ) 2 B" (^ x) be such that x1 = x2 . Let t be such a common value, so that
p p "
(t; t) 2 B" (0) () k(t; t)k = t2 + t2 = jtj 2 < " () jtj < p
2
Since f (t; t) = t3 , for every (t; t) 2 B" (0) we have f (t; t) < f (0) if t < 0 and f (0) < f (t; t)
if t > 0, which shows that x ^ = 0 is neither a local maximizer nor a local minimizer.11 N
We have
@f @f
rf (x) = (x); (x) = 3x22 + 4x1 ; 4x32 6x1 x2
@x1 @x2
and 2 3
@2f @2f
@x21
(x) @x1 @x2 (x) 4 6x2
r2 f (x) = 4 5=
@2f @2f 6x2 12x22 6x1
@x2 @x1 (x) @x22
(x)
So, at the origin x
^ = 0 the Hessian is the positive semi-de nite matrix
4 0
r2 f (0) =
0 0
Givenpa scalar m 2 R, let Em be the collection of points (x1 ; x2 ) 2 R2 of the plane such that
x2 = 2mx1 , that is,
8 p
>
> x1 ; 2mx1 : x1 0 if m > 0
>
>
<
Em = f(x1 ; 0) : x1 2 Rg if m = 0
>
>
>
>
: x ; p2mx : x 0 if m < 0
1 1 1
11
In an alternative way, it is su cient to observe that at each point of the I or II quadrant, except the
axes, we have f (x1 ; x2 ) > 0, and that at each point of the III or IV quadrant, except the axes, we have
f (x1 ; x2 ) < 0. Every neighborhood of the origin contains necessarily both points of the I and II quadrants
(except the axes), for which we have f (x1 ; x2 ) > 0 = f (0), and points of the III and IV quadrants (except
the axes), for which we have f (x1 ; x2 ) < 0 = f (0). Hence 0 is neither a local maximizer nor a local minimizer.
920 CHAPTER 29. APPROXIMATION
Clearly, 0 2 Em for all m 2 R. Each Em with m 6= 0 is the graph of a root function. For
p
instance, for m = 1=2 we have E1=2 = x1 ; x1 2 R2 : x1 0 , which is the graph of the
basic root function.
For each x 2 Em , we have
p 1
f (x1 ; x2 ) = f x1 ; 2mx1 = (2mx1 2x1 ) (2mx1 x1 ) = 4x21 (m 1) m
2
Thus
1
m 2 ; 1 =) f (x1 ; x2 ) < 0 80 6= (x1 ; x2 ) 2 Em
2
1
m 2
= ; 1 =) f (x1 ; x2 ) > 0 80 6= (x1 ; x2 ) 2 Em
2
Since f (0) = 0 we conclude that the origin is neither a local minimizer nor a local maximizer.
Yet, the origin is a maximizer of f on the set Em if m 2 (1=2; 1), while it is minimizer of f
on the set Em if m 2 = [1=2; 1]. So, if we approach the origin along the graph of a root function
we reach at the origin a maximum value if m 2 (1=2; 1), a minimum value if m 2 = [1=2; 1].
The behavior of this function is quite peculiar. N
Similarly, condition (ii) is only su cient for x
^ to be a local maximizer.
Example 1382 Consider the function f (x) = x21 x22 . The origin x
^ = 0 is clearly a (global)
maximizer for the function f but r2 f (0) = O, so the corresponding quadratic form x
r2 f (0) x is not negative de nite. N
The Hessian r2 f (^x) is the symmetric matrix associated to the quadratic form x
2
r f (^
x) x. We can therefore equivalently state Theorem 1379 in the following way:
a necessary condition for x
^ to be a maximizer (minimizer) is that the Hessian matrix
2
r f (^
x) is negative (positive) semi-de nite,
a su cient condition for x^ to be a strong maximizer (minimizer) is that the Hessian
matrix is negative (positive) de nite.
This Hessian version is important operationally because there exist criteria, such as the
Sylvester-Jacobi one, to determine whether a symmetric matrix is positive/negative de nite
or semi-de nite. For instance, consider a generic function of two variables f : R2 ! R that
is twice continuously di erentiable. Let x0 2 R2 be a stationary point rf (x0 ) = (0; 0) and
let 2 @2f 3
@2f
2 (x0 ) @x @x (x 0 )
5= a b
@x1 1 2
r2 f (x0 ) = 4 (29.39)
@2f @2f c d
@x2 @x1 (x0 ) @x 2
2
(x 0 )
be the Hessian matrix computed at the point x0 . Since the gradient at x0 is zero, the point
is a candidate to be a maximizer or minimizer of f . To determine its exact nature, it is
necessary to analyze the Hessian matrix at the point. By Theorem 1379, x0 is a maximizer
if the Hessian is negative de nite, a minimizer if it is positive de nite, and it is neither a
maximizer, nor a minimizer if it is inde nite. If the Hessian is only semi-de nite, positive or
negative, it is not possible to draw conclusions on the nature of x0 . Applying the Sylvester-
Jacobi criterion to the matrix (29.39) we have that:
29.3. MULTIVARIABLE TAYLOR EXPANSION 921
(i) if a > 0 and ad bc > 0, the Hessian is positive de nite, so x0 is a strong local
minimizer;
(ii) if a < 0 and ad bc > 0, the Hessian is negative de nite, so x0 is a strong local
maximizer;
(iii) if ad bc < 0, the Hessian is inde nite, and therefore x0 is neither a local maximizer,
nor a local minimizer.
In all the other cases it is not possible to say anything on the nature of the point x0 .
Example 1383 Let f : R2 ! R be given by f (x1 ; x2 ) = 3x21 + x22 + 6x1 . We have rf (x) =
(6x1 + 6; 2x2 ) and
6 0
r2 f (x) =
0 2
It is easy to see that the unique point where the gradient vanishes is x0 = ( 1; 0) 2 R2 ,
that is, rf ( 1; 0) = (0; 0). Moreover, in view of the previous discussion, since a > 0 and
ad bc > 0, the point x0 = ( 1; 0) is a strong local minimizer of f . N
Example 1384 Let f : R3 ! R be given by f (x1 ; x2 ; x3 ) = x31 + x32 + 3x23 2x3 + x21 x22 . We
have
rf (x) = 3x21 + 2x1 x22 ; 3x22 + 2x21 x2 ; 6x3 2
and 2 3
6x1 + 2x22 4x1 x2 0
6 7
r2 f (x) = 4 4x1 x2 6x2 + 2x21 0 5
0 0 6
The stationary points are x0 = ( 3=2; 3=2; 1=3) and x00 = (0; 0; 1=3). At x0 , we have
2 9 3
2 9 0
r2 f x0 = 4 9 9
2 0
5
0 0 6
and therefore
9 9
9
det < 0; det 2
9 < 0; det r2 f x0 < 0
2 9 2
By the Sylvester-Jacobi criterion the Hessian matrix is inde nite. By Theorem 1379, the
point x0 = ( 3=2; 3=2; 1=3) is neither a local minimizer nor a local maximizer. For the
point x00 = (0; 0; 1=3) we have
2 3
0 0 0
r2 f x00 = 4 0 0 0 5
0 0 6
which is positive semi-de nite since x r2 f (x00 ) x = 6x23 (note that it is not positive de nite:
for example, we have (1; 1; 0) r2 f (x00 ) (1; 1; 0) = 0). N
922 CHAPTER 29. APPROXIMATION
where C is an open convex set of Rn . Assume that f 2 C 2 (C). By Theorem 1379-(i), the
procedure of Section 28.5.3 assumes the following form:
1. Determine the set S C of the stationary interior points of f by solving the rst-order
condition rf (x) = 0 (Section 28.1.3).
2. Calculate the Hessian matrix r2 f at each of the stationary points x 2 S and determine
the set
S2 = x 2 S : r2 f (^x) is negative semi-de nite
Also here the procedure is not conclusive because nothing ensures the existence of a
solution. Later in the book we will discuss this crucial problem by combining in the method
of elimination such existence theorems with the di erential methods.
Here C = R2++ is the rst quadrant of the plane without the axes (hence an open set). We
have
rf (x) = ( 4x1 + 3 x2 ; 2x2 + 3 x1 )
Therefore, from the rst-order condition rf (x) = 0 it follows that the unique stationary
point is x = (3=7; 9=7), that is, S = f3=7; 9=7g. We have
4 1
r2 f (x) =
2 1
By the Sylvester-Jacobi criterion, the Hessian matrix r2 f (x) is negative de nite.12 Hence,
S2 = f3=7; 9=7g. Since S2 is a singleton, we have trivially S3 = S2 . In conclusion, the point
x = (3=7; 9=7) is the unique candidate to be a solution of the unconstrained optimization
problem. One can show that this point is the solution of the problem. For the moment we
can only say that, by Theorem 1379-(ii), it is a local maximizer. N
12
Since r2 f (x) is negative de nite for all x 2 Rn
++ , this also proves that f is concave.
Chapter 30
Analytic functions
The alpha coe cients depend, of course, on the point x0 . Lagrange de ned the rst derivative
f 0 (x0 ) at x0 as the rst coe cient 1 . As x0 varies, this de nes a new function f 0 , the
derivative function, which it is assumed to admit, like the parent function, a power expansion
in h at x0 . Lagrange de ned the second derivative f 00 (x0 ) as the rst coe cient of the
expansion of f 0 . One can continue in this way to de ne derivatives of all orders. These
functions are, indeed, assumed to be in nitely di erentiable.
Example 1386 By the Newton binomial formula, for the power function f (x) = xn we
have
n n 1 n n
f (x0 + h) = (x0 + h)n = xn0 + x0 h + x 2 2
h + + hn
1 2 0
n (n 1) n 2 2
= f (x0 ) + nxn0 1
h+ x0 h + + hn
2
So, f 0 (x0 ) = nxn0 1
. In turn,
n 1
f 0 (x0 + h) = n (x0 + h)n 1
= nxn0 1
+n x0n 2
h+ + nhn 1
1
(n 1) (n 2)
= f 0 (x0 ) + n (n 1) xn0 2
h+ x0n 3 2
h + + nhn 1
2
So, f 00 (x0 ) = n (n 1) xn0 2 . By iterating, one nds the higher order derivatives of the power
function (which are 0 for orders > n). N
923
924 CHAPTER 30. ANALYTIC FUNCTIONS
Lagrange then proves, in a key result, that the alpha coe cients can be expressed in
terms of the higher order derivatives, so that (30.1) becomes
1 f (k) (x0 ) k
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + + h + (30.2)
2 k!
X1 (k)
f (x0 ) k
= f (x0 ) + h
k!
k=1
This exact Taylor formula makes these functions the best behaved from a di erentiable
viewpoint, a calculus paradise.
Lagrange's calculus thus builds on power series, a di erent approach to calculus from the
incremental one adopted in this book, which is the standard approach since the nineteen
century works, in primis by Cauchy and Weierstrass, that made calculus rigorous by freeing
it from the uncertain metaphysical status of in nitesimals. Indeed, such a status was a main
concern for Lagrange, whose approach was motivated by the desire to introduce derivatives
without using any, suspiciously metaphysical, notion of in nitesimal.1
In this chapter we will study, within the incremental approach which we adopted, the
functions that admit a power series representation (30.1), the so-called analytic functions. In
particular, in Proposition 1398 we will establish for them an exact Taylor formula (30.2), thus
recovering in our setup Lagrange's important representation. Many di erentiable functions
are not analytic, as it should be obvious by now, and indeed analytic functions no longer
play the central theoretical role that they had at the time of Lagrange. Yet, they are widely
used in applications because of their remarkable di erential properties.
'n (x) = (x x0 )n
are the special case of (30.3) in which the asymptotic scale is formed by power functions.
Contrary to the polynomial case, where x0 had to be a scalar, now we can take x0 =
1. Indeed, general expansions are relevant because, relative to special case of polynomial
expansions, they may allow us to approximate functions for large values of the argument,
that is, asymptotically.
In symbols, condition (30.3) can be expressed as
n
X
f (x) k 'k (x) as x ! x0
k=0
By using the scale of power functions, we get back to the usual polynomial quadratic ap-
proximation
2
f (x) 0 + 1x + 2x as x ! 0
4
When, as in this example, we have x0 = +1 the interval (a; b) is understood to be unbounded with
b = +1 (the example of the negative power function scale was made by Poincare himself.)
926 CHAPTER 30. ANALYTIC FUNCTIONS
The key uniqueness property of polynomial expansions (Lemma 1358) still holds in the
general case.
Lemma 1390 A function f : (a; b) ! R has, at each point x0 2 [a; b], at most a unique
expansion of order n with respect to scale .
P
Proof Consider the expansion nk=0 k 'k (x) + o ('n ) at x0 2 [a; b]. We have
Pn
f (x) k=0 k 'k (x) + o ('n )
lim = lim = 0 (30.5)
x!x0 '0 (x) x!x0 '0 (x)
Pn
f (x) 0 '0 (x) k=1 k 'k (x) + o ('n )
lim = lim = 1 (30.6)
x!x0 '1 (x) x!x0 '1 (x)
Pn 1
f (x) k=0 k 'k (x)
lim = n (30.7)
x!x0 'n (x)
Suppose that, for every x 2 (a; b), there are two di erent expansions
n
X n
X
k 'k (x) + o ('n ) = k 'k (x) + o ('n ) (30.8)
k=0 k=0
Equalities (30.5)-(30.7) must hold for both expansions. Hence, by (30.5) we have that 0 =
0 . Iterating such a procedure, from equality (30.6) we get 1 = 1 , and so on until n = n .
Limits (30.5)-(30.7) are crucial: it is easy to prove that the expansion (30.3) holds if
and only if the limits exist (and are nite).5 Such limits, in turn, determine the expansion
coe cients f k gnk=0 .
5
The \only if" part is shown in the previous proof, the reader can verity the converse.
30.2. ASYMPTOTIC SCALES AND EXPANSIONS 927
Example 1391 Let us determine the quadratic asymptotic approximation with respect to
the scale of negative power functions for the function f : ( 1; 1) ! R de ned by
1
f (x) =
1+x
Thanks to the equalities (30.5)-(30.7), we have
1
f (x) 1
0 = lim = lim 1+x = lim =0
x!x0 '0 (x) x!x0 1 x!x0 1 + x
1
f (x) 0 '0 (x) 1+x x
1 = lim = lim 1 = lim =1
x!x0 '1 (x) x!x0
x
x!x0 1+x
1 1
f (x) 0 '0 (x) 1 '1 (x) 1+x x x
2 = lim = lim 1 = lim = 1
x!x0 '2 (x) x!x0
x2
x!x0 1+x
Hence, the desired approximation is
2
1 1 1
as x ! +1
1+x x x
By the previous lemma, it is the only quadratic asymptotic approximation with respect to
the scale of negative power functions. N
If we change the scale, the expansion as well changes. For example, approximation
(30.4) is a quadratic approximation for 1= (x 1) with respect to the scale of negative power
functions. However, by changing scale one obtains a di erent quadratic approximation: for
example when at x0 = +1 we consider the asymptotic scale 'n (x) = (x + 1) =x2n we obtain
the quadratic asymptotic approximation
1 x+1 x+1
+ as x ! +1
x 1 x2 x4
In fact,
1 x+1 x+1 1 x+1
+ = =o as x ! +1
x 1 x2 x4 (x 1) x4 x4
In conclusion, di erent asymptotic scales lead to di erent unique approximations (as
long as they exist). Di erent functions can, however, share the same expansion, as the next
example shows.
This function is log-convex (cf. Example 949). Moreover, it satis es a key recursion.
Lemma 1394 (x + 1) = x (x) for all x > 0.
Proof By integrating by parts, one obtains that for every 0 < a < b
Z b Z b Z b
x t t x b x 1 t b x a x
t e dt = e t a + x t e dt = e b + e a + x tx 1
e t dt
a a a
The asymptotic expansion is, for every given x, a geometric series. Therefore, it converges
for every x > 1 { i.e., for every x in the domain of f { with
1
X 1
f (x) =
xk
k=1
In this (fortunate) case the asymptotic expansion is correct: the series determined by the
asymptotic expansion converges to f (x) for every x 2 (a; b).
(ii) Also the function f : (1; 1) ! R de ned by
1+e x
f (x) =
x 1
930 CHAPTER 30. ANALYTIC FUNCTIONS
has, with respect to the scale of negative power functions, the asymptotic expansion (30.9)
for x ! +1. However, in this case we have, for every x > 1,
1
X 1
f (x) 6=
xk
k=1
In this example the asymptotic expansion is merely an approximation, with degree of accu-
racy x n for every n.
(iii) Consider the function f : (1; 1) ! R de ned by:8
Z x t
x e
f (x) = e dt
1 t
The asymptotic expansion thus determines a divergent series. In this, very unfortunate, case
not only the series does not converge to f , but it even diverges. N
8
This example is from de Bruijn (1961).
30.3. ANALYTIC FUNCTIONS 931
Let us go back to the polynomial case, in which the asymptotic expansion of a function
f : (a; b) ! R at a point x0 2 (a; b) has the power series form:
1
X
f (x) k (x x0 )k as x ! x0
k=0
So, the Taylor series is a power series in h (so, in the di erence x x0 ), a key observation.
But, when can we turn in =, that is, when can these approximations become, at least
locally, exact? To answer this important question, we introduce the following classic notion.
Lemma 1390 implies that k = f (k) (x0 ) =k! for every 1 k n. Since n was arbitrarily
chosen, the desired result follows.
This proposition shows that analytic functions are, indeed, the class of functions studied
by Lagrange in his 1813 book. A particularly interesting case occurs when one can take
B (x0 ) equal to the entire domain (a; b) at a point x0 2 (a; b), so that the analytic function
f admits at x0 a global exact Taylor expansion
1
X f (k) (x0 )
f (x) = (x x0 )k 8x 2 (a; b)
k!
k=0
At the origin, this global exact Taylor expansion takes the convenient Maclaurin form
1
X f (k) (0)
f (x) = xk 8x 2 (a; b)
k!
k=0
A global exact expansion, in particular at the origin, renders f most tractable. Not sur-
prisingly, next we show that this property is enjoyed, at all points of its domain, by the
marvelous exponential function.
The exponential function is thus analytic. Remarkably, it admits at each point x0 of its do-
main a global exact Taylor expansion, the Maclaurin one (30.12) being the most convenient.
N
The domain A is the interval of convergence of the power series on the right-hand side. Its
interior is the open interval of convergence ( r; r) identi ed by the radius of convergence r
of the power series (Chapter 11).
Generating functions are, by de nition, expandable as power series on their domains:
next we show that, as one may expect, this property makes them analytic on the open
interval of convergence. In so doing, we generalize Proposition 480.
P
1
Proposition 1400 Let an xn be a power series with radius of convergence r 2 (0; 1].
n=0
P
1
The function f : ( r; r) ! R given by f (x) = an xn is analytic, with
n=0
f (n) (0)
an = 8n 0 (30.13)
n!
Proof Let x0 2 ( r; r) and B" (x0 ) ( r; r). By the binomial formula, for each x 2 B" (x0 )
we have
1
X 1
X 1
X n
X
n n n
f (x) = n
an x = an (x x0 + x0 ) = an x m
(x x0 )m
m 0
n=0 n=0 n=0 m=0
1 1
!
X X n
= an xn0 m
(x x0 )m
n=m
m
m=0
where for the change in the order of summation in the last step we refer readers to, e.g.,
Rudin (1976) p. 176. By setting
1
X n
bm = an xn0 m
n=m
m
P
1
we then have f (x) = bm (x x0 )m for all x 2 B" (x0 ). This proves the analyticity of f .
m=0
that is, we can take B (0) = ( r; r). We can thus express f in di erential form.
For perspective, we close with easy corollary for general power series, not necessarily
centered at the origin.
934 CHAPTER 30. ANALYTIC FUNCTIONS
P
1
Corollary 1401 Let an (x x0 )n be a power series with x0 2 R and radius of conver-
n=0
P
1
gence r 2 (0; 1]. The function f : (x0 r; x0 + r) ! R given by f (x) = an (x x0 )n is
n=0
analytic, with
f (n) (x0 )
an = 8n 0
n!
As the reader can check, here we have a global exact Taylor expansion at x0 .
converges (trivially) to the zero function, not to f (x), for all x 6= x0 . As a result, the
function f is not analytic although it is in nitely di erentiable.
(ii) Relatedly, the function f : R ! R given by
( 1
e x if x > 0
f (x) = (30.16)
0 if x 0
is in nitely di erentiable, but non-analytic at the origin. Indeed, here as well one can prove
that f (n) (0) = 0 for every n 1. So, the Maclaurin series converges to the zero function,
not to f (x), for all x > 0. N
30.3. ANALYTIC FUNCTIONS 935
In these examples analyticity fails at the origin because the Maclaurin series { i.e., the
Taylor series at the origin { converges but, unfortunately, not to the function f . There
exist more dramatic examples, discovered around the year 1880, where analyticity fails at
a point because the Taylor series does not even converge (except, trivially, at the origin
itself). Functions of this kind are called nowhere analytic. The existence of this class of
functions is counter-intuitive; interestingly, they came up at about the same time when
nowhere di erentiable functions, another counter-intuitive class of functions, emerged.
Rather than reporting a (typically involved) example of a nowhere analytic function, here
we observe that their existence is a consequence of the Borel-Peano Theorem that will be
presented in the coda. Indeed, consider the power series
1
X
n!xn
n=0
at x0 either converges to the \wrong" function { i.e., not to f { or, more dramatically, does
not converge at all (except, trivially, at the point x0 itself).
1
rf (x) = r
k f (k) (x)
lim supk!1 k!
The value rf (x) is, by the Cauchy-Hadamard Theorem, the radius of convergence of the
Taylor series of f at x. If f is analytic, we clearly have
Indeed, rf (x0 ) = 0 at a point x0 2 (a; b) would imply that the Taylor series of f at x0 does
not converge at any point x 6= x0 . Condition (30.17) is thus necessary for analyticity. It
is, however, not su cient: for instance, it is satis ed by the non-analytic function f de ned
in (30.16) since f is analytic at all x 6= 0 and rf (0) = +1.10 A stronger version becomes,
however, su cient.
Example 1404 Consider the hyperbola f : (0; 1) ! R de ned by f (x) = 1=x. It holds
n!
f (n) (x) = 8n 0
xn+1
and so
1
rf (x) = 1 =x
limn!1 n
x n+1
By the Pringsheim criterion, f is analytic on the interval (a; 1) for every a > 0. In turn,
this readily implies the analyticity of f . N
The second, quite striking, analyticity criterion is based on the sign of the derivatives. It
is the occasion to introduce an important class of functions.
Proof Let f : (a; b) ! R be absolutely monotone. We prove the theorem in the special case
when, at each x 2 (a; b), there is a constant Mx > 0 such that
f (n) (x)
Mx 8n 1 (30.19)
n!
Let x0 2 (a; b) and x 0 < h < 1 such that x0 + h < b. By the Lagrange-Taylor formula
(28.8), we have
n
X1 f (k) (x0 ) f (n) (^
x) n
f (x0 + h) f (x0 ) = hk + h
k! n!
k=1
10
Indeed, the coe cients of the Maclaurin series are all zero and so it converges on the entire real line {
though not to f (x) for x > 0.
30.3. ANALYTIC FUNCTIONS 937
Because of the positivity of derivatives, the function f and its derivatives are increasing
functions. So,
f (n) (^
x) n f (n) (x0 + h) n
0 h h Mx0 +h hn ! 0 as n ! 1
n! n!
under the hypothesis (30.19). In turn, this implies
1
X f (k) (x0 )
f (x0 + h) f (x0 ) = hk (30.20)
k!
k=1
Similarly, by the dual formula (28.9), if we take 1 < h < 0 such that x0 + h > a we
have
n
X1 f (k) (x0 ) f (n) (^
x) n
f (x0 + h) f (x0 ) = hk + h
k! n!
k=1
and
f (n) (^
x) n f (n) (x0 + h) n
0 h h Mx0 +h hn ! 0 as n ! 1
n! n!
under the hypothesis (30.19). In turn, this implies (30.20).
We conclude that there exists a small enough neighborhood B (x0 ) of x0 such that
1
X f (k) (x0 )
f (x) = f (x0 ) + (x x0 )k
k!
k=1
Example 1407 (i) The exponential function ex is, clearly, absolutely continuous on the real
line. (ii) The function f : ( 1; 0) ! R de ned by f (x) = log ( x) is easily seen to be
absolutely continuous. N
1
f (x) =
1 x
we have, for all k 1,
k!
f (k) (x) =
(1 x)k
Indeed, we can proceed by induction. For k = 1, the result is obvious. If we assume that
the result is true for k 1 (induction hypothesis), then
(k 1)!
df (k 1) (x) d d (1 x)1 k
d (1 x)1 k
(k) (1 x)k 1
f (x) = = = (k 1)! = (k 1)!
dx dx ! dx dx
(1 k) k!
= (k 1)! k
=
(1 x) (1 x)k
938 CHAPTER 30. ANALYTIC FUNCTIONS
as desired. At all x < 1 we thus have f (k) (x) 0 for all k 1. That is, the function f is
absolutely continuous, so analytic by Bernstein's Theorem, on the interval ( 1; 1).
Hence, at all points x0 < 1 there is a neighborhood B (x0 ) ( 1; 1) such that
1
X 1
X k
f (k) (x0 ) k x x0
f (x) = (x x0 ) = 8x 2 B (x0 )
k! 1 x0
k=0 k=0
because j(x x0 ) = (1 x0 )j < 1 if and only if x 2 (2x0 1; 1). So, we can take B (x0 ) =
(2x0 1; 1), a neighborhood of x0 of radius 1 x0 .11 For instance, at x0 = 0 we have
1
X 1
X
f (k) (0) k
f (x) = x = xk 8x 2 ( 1; 1)
k!
k=0 k=0
f (x) = (1 + x)
( 1) ( k) (k+1)
f (k) (x) = ( 1) ( k + 1) (1 + x) = (k)
(1 + x) 0
This is the beautiful formula (11.12) that in Example 485 permitted to say that f is the
generating function of the binomial sequence (11.11). As remarked back then, it generalizes
formula (B.8). To see why it holds, observe that, for each natural number k 0,
as desired. N
11
Note that x0 < 1 implies 2x0 1 < 1.
12
The relevant terminology is in Example 485.
30.3. ANALYTIC FUNCTIONS 939
Example 1411 (i) The function f : R ! R given by f (x) = e x , with > 0, is completely
monotone. (ii) The hyperbola f : (0; 1) ! R given by f (x) = 1=x is completely monotone.
N
Complete and absolute monotonicity are dual notions: a function f (x) is absolutely
monotone if and only f (a + b x) is completely monotone.13 In turn, this duality readily
proves the following corollary of Bernstein's Theorem.
In sum, absolutely and completely monotone functions are important examples of ana-
lytic functions. For instance, the analyticity of the exponential function can be seen as a
consequence of its absolute continuity.
Proof We prove only (i) and leave (ii) to the reader. Let x0 be any point of the interval (a; b).
We want to show that the function f + g : (a; b) ! R is analytic at xn0 . By
o de nition, there1
f g 1
exist neighborhoods B f (x0 ) and B g (x0 ) as well as scalar sequences k and k k=0
k=0
such that
1
X 1
X
f g
f (x) = k (x x0 )k 8x 2 B f (x0 ) ; g (x) = k (x x0 )k 8x 2 B g (x0 )
k=0 k=0
Thus,
1
X f g
( f + g) (x) = ( k + k ) (x x0 )k 8x 2 B f (x0 ) \ B g (x0 )
k=0
13
To sum a + b x presupposes that the interval (a; b) is bounded. Yet, the duality is readily extended to
unbounded open intervals. For instance, on the real line f (x) is absolutely monotone if and only f ( x) is
completely monotone.
940 CHAPTER 30. ANALYTIC FUNCTIONS
Proposition 1414 If the functions f : (a; b) ! R and g : (c; d) ! R are both analytic, with
Im f (c; d), then their composite function g f : (a; b) ! R is analytic.
Since the Faa di Bruno formula gives the derivatives of all orders of the composite function
g f , this result provides a chain rule for analytic functions.
Combined with analyticity criteria, these stability properties under linear combinations,
products and compositions permit to establish that many functions of interest are analytic.
The following result shows that, indeed, some classic elementary functions are analytic.
(ii) The trigonometric functions sine and cosine are analytic, with
1
X 1
X
( 1)k 2k+1 ( 1)k 2k
sin x = x and cos x = x 8x 2 R
(2k + 1)! (2k)!
k=0 k=0
Remarkably, when one looks at their polynomial asymptotic expansions, exponential and
logarithmic functions look much more similar to trigonometric functions than they appear to
be prima facie. The emergence of deeper connections between what may otherwise appear
to be very di erent classes of functions is a further dividend of analyticity.
Trigonometric functions thus have global exact Maclaurin expansions. In contrast, the
logarithmic function log (1 + x) has an exact Maclaurin expansion only for all x 2 ( 1; 1].
14
For a proof see Krantz and Parks (2002) p. 19.
30.3. ANALYTIC FUNCTIONS 941
Yet, by the last result log (1 + x) is analytic on the entire interval ( 1; 1). Thus, in view of
(29.17), for each x0 > 1 there is a neighborhood B (x0 ) ( 1; 1) such that
1
X ( 1)k+1 (x x0 )k
log (1 + x) = log (1 + x0 ) + 8x 2 B (x0 )
k=1
k (1 + x0 )k
This formula was proved earlier in the book in Proposition 407 with a direct method. We
can now see of which forest it is a tree.
Proposition 1417 A non-zero analytic function has at most countably many zeros.
Lemma 1418 The zeros of a non-zero analytic function are isolated points.
Z = fx 2 (a; b) : f (x) = 0g
be the collection of the zero points of f . Let x0 2 Z. We want to show that x0 is an isolated
point. To this end, observe that, being f analytic, there is a neighborhood B (x0 ) and a
scalar sequence f k g1k=0 such that
1
X
f (x) = k (x x0 )k 8x 2 B (x0 )
k=0
Let k be the position of the rst non-zero k (as f is non-zero, such a position exists). De ne
' : B (x0 ) ! R by
1
X
' (x) = k (x x0 )k k
= k + k+1 (x x0 ) + k+2 (x x0 )2 +
k=k
Clearly, ' (x0 ) = k 6= 0. As ' is continuous, there is a small enough > 0 such that
' (x) 6= 0 for all x 2 B (x0 ). As (x x0 )k for all x0 6= x 2 B (x0 ), we conclude that
942 CHAPTER 30. ANALYTIC FUNCTIONS
f (x) 6= 0 for all x0 6= x 2 B (x0 ). Thus fx0 g = Z \ B (x0 ), proving that the zero point x0
is isolated.
An analytic function is thus either identically zero or has at most countably many zeros.
This result nicely complements the Bolzano-type theorems that ensure the existence of ze-
ros. In particular, it implies that an equation de ned by an analytic function has at most
countably many solutions.
Next we present couple of interesting consequences of this cardinality property that
further illustrate the remarkable properties of analytic functions. The rst one is about
the cardinality of level sets.
Corollary 1419 The level sets of a non-constant analytic function are at most countable.
Proof Let (f = k) be a non-empty level set, with k 2 R. A point of the level set (f = k)
is a zero of the analytic function h = f k. As h is non-zero since f is non-constant, by
Proposition 1417 the level set has at most countably many elements.
Corollary 1420 Two analytic function f and g that are equal on an open interval I (a; b)
are equal everywhere.
Proof Let f (x) = g (x) for all x 2 I. By Proposition 1417, the analytic function h = f g
is identically zero.
30.4 Coda
30.4.1 Hille-Taylor's formula
We can now state a beautiful version of Taylor's formula, due to Einar Hille, for continuous
functions.15
Theorem 1421 (Hille) Let f : (0; 1) ! R be a bounded continuous function and x0 > 0.
Then, for each h > 0,
1
X k f (x )
0
f (x0 + h) = lim hk (30.23)
!0 + k!
k=0
15
We omit its non-trivial proof (see Feller, 1966).
30.4. CODA 943
We call Hille-Taylor formula the limit (30.23). When f is in nitely di erentiable, this
formula intuitively should approach the series expansion (30.11), i.e.,
1
X f (k) (x0 )
f (x0 + h) = hk
k!
k=0
because lim !0+ k f (x0 ) = f (k) (x0 ) for every k 1 (Proposition 1256). This is actually
true when f is analytic because in this case (30.11) and (30.23) together imply
1
X 1
kf(x0 ) k X f (k) (x0 ) k
lim h = h
!0+ k! k!
k=0 k=0
Hille-Taylor formula, however, holds when f is just bounded and continuous, thus providing
a remarkable generalization of the Taylor expansion for analytic functions.
and let g : R ! R be the zero function (a trivially analytic function). We have f (k) (0) =
g (k) (0) for all k 0, so f and g are an example of two distinct in nitely di erentiable
functions that have the same Maclaurin series. Indeed, Taylor series pin down uniquely only
analytic functions.
But, do coe cients of Taylor (in particular, of Maclaurin) series have some character-
izing property? Is there some peculiar property that such coe cients satisfy? For analytic
functions the answer is positive: the Cauchy-Hadamard Theorem requires
p
n
lim sup j nj < +1
So, only scalar sequences f k g1 k=0 satisfying such a bound may qualify to be coe cients
of a Taylor series of some analytic function. Yet, we do not live in the Lagrange calculus
paradise, there exist in nitely di erentiable functions that are not analytic. Indeed, the next
deep theorem { whose highly non-trivial proof we omit { shows that, in general, the previous
questions have a negative answer.16
where the scalar sequence fak g is arbitrary, while the scalar sequence fbk g is positive and
chosen to ensure the convergence of series in a neighborhood of the origin. Peano showed
that, given any scalar sequence fck g, by judiciously choosing the sequences fak g and fbk g
one can establish (30.24).
sequence f k g1
So, anything goes: given any scalar p k=0 , there is an in nitely di erentiable
function f { not analytic when lim sup n j n j = +1 { such that f (k) (0) = k k! for all k, so
with those scalars as the coe cients of its Maclaurin series.
This function is actually not unique: given any function f satisfying (30.24) and any
scalar , the function f : R ! R de ned by
( 1
f (x) + e x2 if x 6= 0
f (x) =
f (0) if x = 0
is easily seen to satisfy (30.24) as well. A continuum of in nitely di erentiable functions that
satisfy (30.24) thus exist.
Chapter 31
Concave functions have remarkable di erential properties that con rm the great tractability
of these widely used functions. The study of these properties is the subject matter of this
chapter. We begin with scalar functions and then move to functions of several variables.
Throughout the chapter C always denotes a convex set (so an interval in the scalar case).
For brevity, we will focus on concave functions, leaving to the readers the dual results that
hold for convex functions.
f (y) f (x)
y x
as one can verify with a simple modi cation of what done for (26.6). Graphically:
f(y)
4
f(y)-f(x)
3
f(x)
2
y-x
0
O x y
-1
-1 0 1 2 3 4 5 6
945
946 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
If the function f is concave, the slope of the chord decreases when we move the chord
rightward. This basic geometric property characterizes concavity, as the next lemma shows.
Lemma 1423 A function f : C R ! R is concave if and only if, for any four points
x; w; y; z 2 C with x w y z, x 6= y and w 6= z, we have
f (y) f (x) f (z) f (w)
(31.1)
y x | z {z w }
| {z }
slope AC slope BD
In other words, by moving rightward from [x; y] to [w; z], the slope of the chords decreases.
Graphically:
5 D
C
4
3
B
2
1 A
0
O x w y z
-1
-1 0 1 2 3 4 5 6
Proof \Only if". Let f be concave. The proof is divided in two steps: rst we show that
the chord AC has a greater slope than the chord BC:
5
C
4
3
B
2
1 A
0
O x w y
-1
-1 0 1 2 3 4 5 6
31.1. SCALAR FUNCTIONS 947
Then, we show that the chord BC has a greater slope than the chord BD:
5 D
C
4
3
B
2
0
O w y z
-1
-1 0 1 2 3 4 5 6
The rst step amounts to proving (31.1) for z = y. Since x w < y, there exists 2 [0; 1]
such that w = x + (1 )y. Since f is concave, we have f (w) f (x) + (1 )f (y), so
that
f (y) f (w) f (y) f (x) (1 )f (y) f (y) f (x)
= (31.2)
y w y x (1 )y y x
This completes the rst step. We now move to the second step, which amounts to proving
(31.1) for x = w. Since w < y z, there exists 2 [0; 1] such that y = w + (1 )z.
Further, since f is concave we have f (y) f (w) + (1 )f (z), so that
as desired.
\If". Assume (31.1). Let x; z 2 C, with x < z, and 2 [0; 1]. Set y = x + (1 ) z. If
in (31.1) we set w = x, we have
f ( x + (1 ) z) f (x) f (z) f (x)
x + (1 )z x z x
Since x + (1 )z x = (1 ) (z x), we then have
f ( x + (1 ) z) f (x) f (z) f (x)
(1 ) (z x) z x
that is, f ( x + (1 ) z) f (x) (1 ) (f (z) f (x)). In turn, this implies that f is
concave, as desired.
The geometric property (31.1) has the following analytical counterpart, of great economic
signi cance.
948 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
Proof Let x y and h 0. Then the points y and x + h belongs to the interval [x; y + h].
Under the change of variable z = y +h, we have x+h; z h 2 [x; z]. Hence there is a 2 [0; 1]
for which x + h = x + (1 ) z. It is immediate to check that z h = (1 ) x + z.
By the concavity of f , we then have f (x + h) f (x) + (1 ) f (z) and f (z h)
(1 ) f (x) + f (z). Adding the two inequalities, we have
f (x + h) + f (z h) f (x) + f (z)
f (x + h) f (x) f (z) f (z h) = f (y + h) f (y) :
The inequality (31.5) does not change if we divide both sides by a scalar h > 0. Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
provided the limits exist. Similarly f 0 (x) f 0 (y), and so f 0 (x) f 0 (y) when the (bilateral)
derivative exists. Concave functions f thus feature decreasing marginal e ects as their
argument increases, so embody a fundamental economic principle: additional units have
a lower and lower marginal impact on levels (of utility, of production, and so on; we then
talk of decreasing marginal utility, decreasing marginal returns, and so on). It is through
this principle that forms of concavity rst entered economics.1
The next lemma establishes this property rigorously by showing that one-sided derivatives
exist and are decreasing.
(i) the right f+0 (x) and left f 0 (x) derivatives exist at each x 2 int C;2
(ii) the right f+0 and left f 0 derivative functions are both decreasing on int C;
Proof Since x0 is an interior point, it has a neighborhood (x0 "; x0 + ") included in C,
that is, (x0 "; x0 + ") C. Let 0 < a < ", so that we have [x0 a; x0 + a] C. Let
: [ a; a] ! R be de ned by
f (x0 + h) f (x0 )
(h) = 8h 2 [ a; a]
h
Property (31.1) implies that is decreasing, that is,
f (x + h) f (x) f (y + h) f (y)
8h 2 [ a; a]
h h
Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
which implies that the right derivative function is decreasing. A similar argument holds for
the left derivative function.
Example 1427 (i) The concave function f (x) = jxj does not have a derivative at x = 0.
Nevertheless, the one-sided derivatives exist at each point of the domain, with
(
1 if x < 0
f+0 (x) =
1 if x 0
and (
0
1 if x 0
f (x) =
1 if x > 0
Therefore, f+0 (x) f0(x) for every x 2 R and both one-sided derivative functions are
decreasing.
(ii) The concave function
8
>
> x+1 if x 1
<
f (x) = 0 if 1<x<1
>
>
:
1 x if x 1
does not have a derivative at x = 1 and at x = 1. Yet, the one-sided derivatives exist at
each point of the domain, with
8
>
> 1 if x < 1
<
f+0 (x) = 0 if 1 x<1
>
>
:
1 if x 1
and 8
>
> 1 if x 1
<
f 0 (x) = 0 if 1<x 1
>
>
:
1 if x > 1
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing.
(iii) The concave function f (x) = 1 x2 is di erentiable on R with f 0 (x) = 2x. The
derivative function is decreasing. N
Proposition 1425 says, inter alia, that at the interior points x we have f+0 (x) f 0 (x).
The next result says that we actually have f+0 (x) = f 0 (x), and so f is di erentiable at x, at
all points x 2 C except those belonging to an, at most, countable subset of C. For the three
concave functions of the previous example, such set of non-di erentiability is f0g, f 1; 1g
and ;, respectively.
The proof relies on few interesting lemmas that, in a crescendo, shed further light on
one-sided derivatives of concave functions. The rst one signi cantly re nes the inequality
established in Proposition 1425-(iii).
31.1. SCALAR FUNCTIONS 951
Proof Let x 2 int C. We show only that limy!x+ f+0 (y) = f+0 (x), as a similar argu-
ment proves that limy!x f 0 (y) = f 0 (x). Since f+0 : int C ! R is decreasing, the limit
limy!x+ f+0 (y) exists. In particular, f+0 (y) f+0 (x) for all x < y 2 C, and so
lim f+0 (y) f+0 (x) (31.8)
y!x+
The nal lemma re nes the previous one by showing that concave functions can have
jump discontinuities of the form f+0 (x) ; f 0 (x) .
952 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
and
lim f 0 (y) = f+0 (x) f 0 (x) = lim f 0 (y)
y!x+ y!x
Proof By Lemma 1429, we have f 0 (x) f+0 (y) f 0 (y) for all x > y 2 int C. So, by
Lemma 1430 we have
We conclude that limy!x f+0 (y) = f 0 (x) for all x 2 C. A similar argument proves that
f+0 (x) = limy!x+ f 0 (y)for all x 2 C.
A nal remark on this proof. Since f 0 (x) = f+0 (x) = f 0 (x) at all x 2 C D, the
derivative function f 0 is continuous on C D (cf. Section 23.1.1). The concave function f
is thus continuously di erentiable on C D.
Proof Let f be concave and let x and y be two distinct points of (a; b). If 2 (0; 1), we
have
Therefore,
f (x + (1 ) (y x)) f (x)
f (y) f (x)
(1 )
4
Since f 0 (x) = f+
0
(x) = f 0 (x) at all x 2 C D, the derivative function f 0 is continuous on C D (cf.
Section 23.1.1). So, f is continuously di erentiable on C D.
31.1. SCALAR FUNCTIONS 953
f (x + (1 ) (y x)) f (x)
(y x) f (y) f (x)
(1 ) (y x)
This inequality holds for every 2 (0; 1). Hence, thanks to the di erentiability of f at x,
we have
f (x + (1 ) (y x)) f (x)
lim (y x) = f 0 (x) (y x)
!1 (1 ) (y x)
Therefore, f 0 (x) (y x) f (y) f (x), as desired.
Let f be strictly concave. Suppose there exists y 2 (a; b), with y 6= x, such that f (y) =
f (x) + f 0 (x) (y x). Then,
1 1 1 1 1 1
f x+ y > f (x) + f (y) = f (x) + f (x) + f 0 (x) (y x)
2 2 2 2 2 2
(y x) 1 1
= f (x) + f 0 (x) f x+ y
2 2 2
where the last inequality follows from (31.9). This contradiction completes the proof.
The right-hand side of inequality (31.9) is the tangent line of f at x, that is, the linear
approximation of f that holds, locally, at x. By Theorem 1432, such line always lies above
the graph of the function, the approximation is in \excess".
Geometrically, this remarkable property is clear: the de nition of concavity requires that
the straight line passing through the two points (x; f (x)) and (y; f (y)) lies below the graph
of f in the interval between x and y, and hence that it lies above it outside that interval.5
Letting y tend to x, the straight line becomes tangent and lies all above the curve.
5 f(x)+f'(x)(y-x)
4.5
f(x)
4
f(y)
3.5
3
f(y )
2
2.5
2 f(y )
1
1.5
0.5
O y y y x
1 2
0
0 1 2 3 4 5
5
For completeness, let us prove it. Let z be outside the interval [x; y]: suppose that z > y. We can then
write y = x + (1 ) z with 2 (0; 1) and, by the concavity of f , we have f (y) f (x) + (1 ) f (z),
that is, f (z) (1 ) 1 f (y) (1 ) 1 f (x). But, since 1= (1 ) = > 1 and 1 = 1 1= (1 )=
= (1 ) < 0, we have f (z) = f ( y + (1 ) x) f (y) + (1 ) f (x) for every > 1. If z < x, we
reason similarly.
954 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
Theorem 1433 Let f : (a; b) ! R be di erentiable on (a; b). Then, f is concave if and only
if
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b) (31.10)
Thus, for a function f di erentiable on an open interval, a necessary and su cient con-
dition for concavity of f is that the tangent lines at the various points of its domain all lie
above its graph.
Proof The \only if" follows from the previous theorem. We prove the \if". Suppose that
inequality (31.10) holds and consider the point z = x + (1 ) y. Let us consider (31.10)
twice: rst with the points z and x, and then with the points z and y. Then:
f 0 (z) (1 ) (x y) f (x) f ( x + (1 ) y)
0
f (z) (y x) f (y) f ( x + (1 ) y)
Let us multiply the rst inequality by , the second one by (1 ), and add them. We get
0 f (x) + (1 ) f (y) f ( x + (1 ) y)
Given the arbitrariness of x and y, we conclude that f is concave. A similar argument shows
that, if inequality (31.10) is strict when x 6= y, then f is strictly concave.
(i) f is concave if and only if the right derivative function f+0 exists and is decreasing on
int C;
(ii) f is strictly concave if and only if the right derivative function f+0 exists and is strictly
decreasing on int C.
31.1. SCALAR FUNCTIONS 955
Proof (i) We only prove the \if" since the converse follows from Proposition 1425. For
simplicity, assume that f is di erentiable on the open interval int C. By hypotheses, f 0 is
decreasing on int C. Let x; y 2 int C, with x < y, and 2 (0; 1). Set z = x + (1 ) y, so
that x < z < y. By the Mean Value Theorem, there exist x 2 (x; z) and y 2 (z; y) such
that
f (z) f (x) f (y) f (z)
f 0 ( x) = ; f0 y =
z x y z
Since f 0 is decreasing, f 0 ( x ) f0 y . Hence,
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
x + (1 )y x y x (1 )y
Being x + (1 )y x x = (1 ) (y x) and y x (1 )y = (y x), we then
have
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
(1 ) (y x) (y x)
In turn, this easily implies f ( x + (1 ) z) f (x) + (1 ) f (z), as desired.6 (ii) This
part is left to the reader.
A similar result, left to the reader, holds for the other one-sided derivative f 0 . This
theorem thus establishes a di erential characterization for concavity by showing that it is
equivalent to the decreasing monotonicity of one-sided derivative functions.
The function f is continuous. It has one-sided derivatives at each point of the domain, with
(
0 1 + 3x2 if x < 0
f+ (x) =
1 3x2 if x 0
and (
0 1 + 3x2 if x 0
f (x) =
1 3x2 if x > 0
To see that this is the case, consider the origin, which is the most delicate point. We have
f (h) f (0) h + h3 h3
f+0 (0) = lim = lim = lim 1+ = 1
h!0+ h h!0+ h h!0+ h
and
f (h) f (0) h + h3 h3
f 0 (0) = lim = lim = lim 1+ =1
h!0 h h!0 h h!0 h
Therefore, f+0 (x) f 0 (x) for every x 2 R and both one-sided derivative functions are
decreasing. By Theorem 1434, the function f is concave. N
6
Using a version of the Mean Value Theorem for unilateral derivatives, we can prove the result without
any di erentiability assumption on f .
956 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
One-sided derivatives are key in the previous theorem because concavity per se only
ensures their existence, not that of the two-sided derivatives. One-sided derivatives are,
however, less easy to handle than the two-sided derivative. So, in applications di erentiability
is often assumed. In this case we have the following simple consequence of the previous
theorem that provides a useful concavity criterion for functions.
Proof We only prove (i), as (ii) is similar. Let f : C R ! R be di erentiable on int C and
continuous on C. If f is concave, Theorem 1434 implies that f 0 = f+0 is decreasing. Vice
versa, if f 0 = f+0 is decreasing then Theorem 1434 implies that f is concave.
3 y
0
O x
-1
-2
-3
-4
-3 -2 -1 0 1 2 3 4 5
7
When C is open, the continuity assumption become super uous (a similar observation applies to Corollary
1438 below).
31.1. SCALAR FUNCTIONS 957
2
y
1
0
O x
-1 -1
-2
-3
-4
-5
-3 -2 -1 0 1 2 3 4
The derivatives are strictly decreasing and therefore f and g are strictly concave thanks to
Corollary 1436. N
The previous corollary provides a simple di erential criterion of concavity that reduces
the test of concavity to that, often operationally simple, of a property of rst derivatives. The
next result shows that it is, actually, possible to do even better by recalling the di erential
characterization of monotonicity seen in Section 28.4.
Proof (i) It is su cient to observe that, thanks to the \decreasing" version of Proposition
1322, the rst derivative f 0 is decreasing on int C if and only if f 00 (x) 0 for every x 2 int C.
(ii) It follows from the \strictly decreasing" version of Proposition 1324.
Under the further hypothesis that f is twice di erentiable on int C, concavity thus be-
comes equivalent to the negativity of the second derivative, a condition often easier to check
than the decreasing monotonicity of the rst derivative. In any case, thanks to the last two
corollaries we now have powerful di erential tests of concavity.8
8
As the reader can check, dual results hold for convex functions, with increasing monotonicity instead of
decreasing monotonicity (and f 00 0 instead of f 00 0).
958 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
Note the asymmetry between points (i) and (ii): while in (i) the condition f 00 0 is a
necessary and su cient condition for concavity, in (ii) the condition f 00 < 0 is only a su cient
condition for strict concavity, as the function f (x) = x4 exempli es. This follows from the
analogous asymmetry for monotonicity between Propositions 1322 and 1324.
p
Example 1439 (i) The functions f (x) = x and g (x) = log x have, respectively, deriva-
p
tives f 0 (x) = 1=2 x and g 0 (x) = 1=x that are strictly decreasing. Therefore, they are
strictly concave. The second derivatives f 00 (x) = 1=4x3=2 < 0 and g 00 (x) = 1=x2 < 0
con rm this conclusion.
(ii) The function f (x) = x2 has derivative f 0 (x) = 2x that is strictly increasing. There-
fore, it is strictly convex. Indeed, f 00 (x) = 2 > 0.
(iii) The function f (x) = x3 has derivative f 0 (x) = 3x2 that is strictly decreasing on
( 1; 0] and strictly increasing on [0; 1). Indeed, the second derivative f 00 (x) = 6x is 0
on ( 1; 0] and 0 on [0; 1). N
Example 1440 (i) The power functions f : [0; 1) ! R de ned by f (x) = xn are strictly
convex for all n > 1. Indeed, we have f 00 (x) = n (n 1) xn 2 0 for all x > 0. (ii) The
1
fractional power functions f : [0; 1) ! R de ned by f (x) = x n are strictly concave for all
n > 1. Indeed, we have
1 1 1
f 00 (x) = 1 xn 2 < 0
n n
for all x > 0. (iii) The negative exponential function f : R ! R de ned by f (x) = e x is
strictly concave because f 00 (x) = e x < 0. N
We close with a monotonicity twist: the converse of Proposition 1324 holds under con-
cavity. So, under this property the strict monotonicity of a function and the strict positivity
of its derivative become equivalent properties.
The negative exponential f (x) = e x illustrates this result. On the other hand, the
cubic function f (x) = x3 often used to show the failure of the converse in Proposition 1324
is not concave (cf. Example 1325).
Proof The \if" follows from Proposition 1324. As to the converse, we have f 0 0 since f
is increasing (Proposition 1322). By Corollary 1436-(i), f 0 is decreasing since f is concave.
It remains to prove that f 0 > 0. Suppose, by contradiction, that f 0 (x0 ) = 0 for some
x0 2 (a; b). Since f 0 is decreasing and f 0 0 , this implies f 0 (x) = 0 for all x 2 [x0 ; b).
By Corollary 1312, f is constant on any closed subinterval of [x0 ; b), thus contradicting the
strict monotonicity of f . We conclude that f 0 > 0.
We could further strengthen the previous corollary by dropping the di erentiability as-
sumption. Since concave functions always admit right and left derivative at each interior
point of their domain (Proposition 1425), the result continue to hold if we replace f 0 with
either f 0 or f+0 .
31.1. SCALAR FUNCTIONS 959
So far so good. Yet, given two functions h and g, it is not easy in general to check directly
the existence of a concave and strictly increasing transformation f such that h = f g or
g = f h, as the last de nition requires. Fortunately, there is a simple di erential criterion.
To state it, assume that the convex set C is an open interval (a; b), possibly unbounded, and
that g is twice di erentiable with g 0 (x) 6= 0 for all x 2 (a; b). The function g : (a; b) ! R
de ned by
g 00 (x)
g (x) = 8x 2 (a; b)
g 0 (x)
is called index of (relative) concavity of g.9 The computation of this index requires only the
knowledge of the rst and second derivatives of the function at hand.
Example 1444 (i) Let g : (0; 1) ! R be the logarithmic function g(x) = log x. We have
g 0 (x) = 1=x and g 00 (x) = 1=x2 . So, g : (0; 1) ! R is given by
1
g 00 (x) x2 1
g (x) = = 1 =
g 0 (x) x
x
9
This index was introduced by de Finetti (1952) p. 700 and, independently, by Pratt (1964) and Arrow
(1971) who developed its far-reaching economic applications. For this reason, it is often called the Arrow-Pratt
index (and Theorem 1445 is named after them).
960 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
g 00 (x) k2 e kx
g (x) = = =k
g 0 (x) ke kx
g 00 (x) p (p 1) xp 2 1 p
g (x) = = =
g 0 (x) pxp 1 x
Since 0g (x) = (p 1) =x2 , this concavity index is strictly increasing if p > 1, strictly de-
creasing if p < 1, and constant if p = 1 (i.e., if g is a straight line). N
The next remarkable result shows that the comparison of concavity of two functions can
be performed via their concavity indexes. This result makes operational the comparative
notion introduced in the last de nition.
(ii) 10
h g.
Lemma 1446 Let f : (a; b) ! R be strictly monotone and di erentiable. If f is twice di er-
entiable at x0 2 (a; b), with f 0 (x0 ) 6= 0, then its inverse function f 1 is twice di erentiable
at y0 = f (x0 ), with
00 f 00 (x0 )
f 1 (y0 ) =
[f 0 (x0 )]3
This formula is a special case of a combinatorial formula for higher order derivatives of
inverses that reminds that of Faa di Bruno.11 For instance, the third derivative of the inverse
is:
000 [f 00 (x0 )]2 f 000 (x0 )
f 1 (y0 ) = 3
[f 0 (x0 )]5 [f 0 (x0 )]4
Proof Suppose that f is strictly increasing (the decreasing case is similar), so that f 0 (x0 ) >
0. Since f is twice di erentiable, the derivative function f 0 is continuous. By Theorem 532,
there exists a neighborhood B" (x0 ) (a; b) such that f 0 (x) > 0 for all x 2 B" (x0 ). Without
loss of generality, assume that B" (x0 ) = (a; b), i.e., that f 0 > 0.
10
That is, h (x) g (x) for all x 2 (a; b).
11
We refer interested readers to Johnson (2002).
31.1. SCALAR FUNCTIONS 961
To ease notation, denote f 1 by '. Set ' (y0 + h) = x0 + k and observe that, by the
continuity of ', when h ! 0, also k ! 0. Moreover, we have h = f (x0 + k) f (x0 ) because
' 1 = f . That said, by Theorem 1234 we have '0 (y) = 1=f 0 (' (y)) for all y 2 Im f . We
then have
1 1
'0 (y0 + h) '0 (y0 ) f 0 ('(y0 +h)) f 0 (x0 ) f 0 (' (y0 + h))
f 0 (x0 )
lim = lim = lim
h!0 h h!0 h h!0 hf 0 (x0 ) f 0 (' (y0 + h))
1 f (' (y0 + h)) f 0 (x0 )
0 1
= 0
lim lim 0
f (x0 ) h!0 h h!0 f (' (y0 + h))
1 0 0
f (x0 + k) f (x0 )
= lim
2 k!0
0
[f (x0 )] h
1 f 0 (x0 ) f 0 (x0 + k)
= lim
[f 0 (x0 )]2 k!0 f (x0 + k) f (x0 )
f 0 (x0 ) f 0 (x0 +k)
1 k
= lim
[f (x0 )]2 k!0
0 f (x0 +k) f (x0 )
k
f 0 (x0 ) f 0 (x0 +k)
1 limk!0 k
= 2
[f (x0 )] limk!0 f (x0 +k)k f (x0 )
0
1 f 00 (x0 ) f 00 (x0 )
= =
[f 0 (x0 )]2 f 0 (x0 ) [f 0 (x0 )]3
where the fourth equality follows from the continuity of f 0 and the seventh one from the
condition f 0 (x0 ) 6= 0.
Proof of Arrow-Pratt's Theorem (i) implies (ii). Suppose that there exists a concave
and strictly increasing function f : Im g ! R such that h = f g. Since h and g are injective,
simple algebra shows that f = h g 1 . Since g is twice di erentiable, with g 0 > 0, by Lemma
1446 the inverse g 1 is also twice di erentiable. Since h is twice di erentiable, also f is then
twice di erentiable (cf. the Theorem of Faa di Bruno).
Since h = f g, by the chain rule we have
h0 (x) = f 0 (g (x)) g 0 (x) and h00 (x) = f 00 (g (x)) g 0 (x)2 + g 00 (x) f 0 (g (x)) (31.12)
So,
as well. Since h is twice di erentiable, also f is twice di erentiable (again, cf. the Theorem
of Faa di Bruno).
Since h = f g and h is twice di erentiable, (31.12) holds. In particular, from h0 ; g 0 > 0 it
follows that f 0 > 0, while by (31.13) we have that g g implies that f
00 0. We conclude
that f is strictly increasing and concave.
Inspection of the proof shows that, under the hypotheses of the last theorem, the trans-
formation f in (31.11) is twice di erentiable.
Example 1447 (i) In view of the last example, the negative exponential function can be
written as e x , where the scalar 2 R is the constant value of the concavity index of
this function. Given any two such functions g(x) = e 1 x and h(x) = e 2 x , by the last
theorem g is more averse than h if and only if 1 2.
The coe cient thus characterizes the degree of concavity of the negative exponential
function, which is higher the greater is this coe cient. It is a remarkable property of this
function.
(ii) Consider the power functions g; h : (0; 1) ! R given by g(x) = xp and h (x) = xq ,
with non-zero p; q 2 R. On (0; 1) we have g 0 ; h0 > 0, so we can invoke the Arrow-Pratt
Theorem. In particular, it implies that h is more concave than g if and only if
1 q 1 p
8x > 0
x x
that is, if and only if q p. Note that these power functions are concave when their
exponents are < 1 and convex when they are > 1. This example thus con rms that, as
remarked before, we can compare the degree of concavity also of non-concave functions. N
that is, as an inequality between a weighted arithmetic mean and a quasi-arithmetic mean
(Section 15.10). This \quasi-arithmetic" angle on the Jensen inequality paves the way to the
next result.
Thus, quasi-arithmetic means can be ranked according to the degree of concavity of the
strictly increasing functions that de ne them. The Jensen inequality for strictly increasing
functions is the special case in which h is the identity function h (x) = x, so that g is
concave (cf. Example 1443).
This quasi-arithmetic generalization of the Jensen inequality plays a key role in risk
theory, in particular in the study of risk aversion. It was independently proved in 1931 by
Bruno de Finetti and B rge Jessen.
Proof (i) implies (ii). Suppose that h is more concave than g, i.e., h = f g with f :
Im g ! R concave and strictly increasing. Since both g and f are strictly increasing, we
1
have h 1 = (f g) P = g 1 f 1 . Let fx1 ; x2 ; :::; xn g be a nite collection in C and let
i 0 be such that ni=1 i = 1. We have:
n
! n
!
X X
h 1 h (xi ) i = g 1 f 1 (f g) (xi ) i
i=1 i=1
n
!!
X
1 1
g f f g (xi ) i
i=1
n
! n
!
X X
1 1 1
= g f f g (xi ) i =g g (xi ) i
i=1 i=1
where the inequality holds because the function gP 1 f 1 is strictly increasing P and, by
the Jensen inequality, the concavity of f implies ni=1 (f g) (xi ) i f ( ni=1 g (xi ) i ).
Moreover, the last equality holds because f is strictly increasing, so f 1 f (x) = x for all
x 2 Im g.
(ii) implies (i). Suppose that (31.15) holds. Since g is strictly increasing, we can de ne
f : Im g ! R as f = h g 1 . We then have
1 1
h h (x) = x = g g (x) 8x 2 C
So
1 1
h (x) = h g g (x) = h g g (x) = (f g) (x) 8x 2 C
We thus have h = f g. It remains to show that f is concave and strictly increasing.
The latter property holds because g 1 is strictly increasing, being g strictly increasing. As
to
Pnconcavity, let fx1 ; x2 ; :::; xn g be a nite collection in C and let i 0 be such that
i=1 i = 1. By (31.15) we have
n
!! n
!!
X X
h g 1 g (xi ) i h h 1 h (xi ) i
i=1 i=1
964 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
So, if we let h(x) = e 1 x and g(x) = e 2x with 1 2, then for every nite collection
fx1 ; x2 ; :::; xn g of scalars we have
n
X n
X
1 1 xi
1 2 xi
log e i log e i
1 i=1 2 i=1
P
for all i 0 such that ni=1 i = 1. Note that this inequality involves log-exponential
functions (Example 912).
p
(ii) Since h (x) = 2 1 log x is more concave than g (x) = x on (0; 1), for every nite
collection fx1 ; x2 ; :::; xn g of strictly positive scalars we have
n
!2
Pn p X p
2 i log xi
e i=1
i xi
i=1
Pn
for all i 0 such that i=1 i = 1. Indeed, g 1 (x) = x2 for x 0 and h 1 (x) = e2x . N
Corollary 1450 Let 1 < q p < +1 be non-zero. For every nite collection fx1 ; x2 ; :::; xn g
of strictly positive scalars, we have
n
!1 n
!1
X q
q X p
p
i xi i xi (31.17)
i=1 i=1
Pn
for all i 0 such that i=1 i = 1.
Inequality (31.17) is called power mean inequality because it compares power means.
1 f (a)
= >0 and =
f (b) f (a) f (b) f (a)
So, fa;b = ' f , where ' is the strictly increasing a ne function ' (x) = x + .
The function f and its normalization fa;b share the same monotonicity, continuity, and
concavity properties. More importantly, they share the same degree of concavity. So, if we
take any two functions f; g : [a; b] ! R, their normalizations fa;b and ga;b factor out some
di erences that are immaterial for the comparison of their degrees of concavity.
More generally, we can normalize f on any subinterval [c; d] [a; b], so that it assumes
value 0 at c and 1 at d, via its transformation fc;d : [a; b] ! R de ned by
f (x) f (c)
fc;d (x) = 8x 2 [a; b]
f (d) f (c)
Now, fc;d = f + where
1 f (c)
= >0 and = (31.18)
f (d) f (c) f (d) f (c)
Again, the function f and its normalization fc;d share the same monotonicity, continuity,
and concavity properties as well as the same degree of concavity.
The next de nition builds upon these normalizations.
De nition 1451 Given any two functions f; g : [a; b] ! R, we say that f hereditarily
dominates g if fc;d gc;d for all a c < d b.12
Theorem 1452 Let f; g : [a; b] ! R be strictly increasing, continuous and concave func-
tions. Then, f is more concave than g if and only if f hereditarily dominates g.
Proof Since f and g are strictly increasing and continuous, there exists a strictly increasing
and continuous function h : Im g ! Im f such that f = h g. Let ; 0 > 0 and ; 0 2 R be
such that f~ = f + and g~ = 0 g + 0 . De ne h ~ : [0; 1] ! [0; 1] to be such that
~ (t) = h t
h 0
+ 8t 2 Im g~ (31.19)
12
That is, fc;d (x) gc;d (x) for all x 2 [c; d].
966 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
Moreover, we have
~
h 0s + 0
h (s) = 8s 2 Im g (31.20)
as well as
f~ (x) g~ 0
= f (x) = h (g (x)) = h 0
8x 2 [a; b]
which yields f~ = h
~ g~.
(i) implies (ii). The proof of this implication is divided in a few claims.
Claim 1 f is more concave than g if and only if f + is more concave than f + for all
; 0 > 0 and ; 0 2 R.
Proof of Claim 1 Observe that f is more concave than g if and only if h is concave if and
~ is concave if and only if f~ is more concave than g~.
only if h
0 0
Claim 2 If f + is more concave than f + for all ; > 0 and ; 2 R, then fc;d is
more concave than gc;d for all a c < d b.
Proof of Claim 2 Consider c; d 2 [a; b] such that c < d. Observe that fc;d = f + and
gc;d = 0 g + 0 where and are de ned as in (31.18) and 0 and 0 are de ned similarly
with g in place of f . By hypothesis and since c and d were arbitrarily chosen, this implies
the thesis.
Claim 3 If fc;d is more concave than gc;d for all a c<d b, then f hereditarily dominates
g.
Proof of Claim 3 Consider c; d 2 [a; b] such that c < d. Since fc;d is more concave than
gc;d , we have that there exists hc;d : Im fc;d ! gc;d where hc;d is concave and such that
fc;d = hc;d gc;d . Since fc;d (c) = 0 = gc;d (c) and fc;d (d) = 1 = gc;d (d), we have that
hc;d (0) = 0 and hc;d (1) = 1. Since hc;d is concave, it follows that
Since gc;d (c) gc;d (x) gc;d (d) for all x 2 [c; d], we conclude that fc;d = hc;d gc;d gc;d .
Since c and d were arbitrarily chosen, the implication follows.
Together, these claims prove that (i) implies (ii) and, on the way, establish some proper-
ties of some independent interest.
(ii) implies (i). By contradiction, assume that f is not more concave than g. This implies
that h is not concave. It follows that there exist t1 ; t2 ; t3 2 Im g such that t1 < t2 < t3
t3 t2 t2 t1
h (t2 ) < h (t1 ) + h (t3 )
t3 t1 t3 t1
which yields
h (t2 ) h (t1 ) t2 t1
<
h (t3 ) h (t1 ) t3 t1
31.2. INTERMEZZO: INNER MONOTONE OPERATORS 967
By construction, there exist x1 ; x2 ; x3 2 [a; b] such that g (xi ) = ti for all i = 1; 2; 3. Note
that x1 < x2 < x3 . We have that
f (x2 ) f (x1 ) h (t2 ) h (t1 ) g (x2 ) g (x1 )
= < (31.21)
f (x3 ) f (x1 ) h (t3 ) h (t1 ) g (x3 ) g (x1 )
Since ; 0 > 0 and ; 0 2 R were arbitrarily chosen, this latter fact is true if we choose
them to be such that f~ = fx1 ;x3 and g~ = gx1 ;x3 . By (31.22) and (31.21), we conclude that
That said, given any n n matrix A, its symmetric part is the n n symmetric matrix
As de ned by
1
A + ATAs =
2
To see the symmetry of As note that, by (15.6), we have
1 T 1 T 1
AT
s = A + AT = A +A = A + AT = As
2 2 2
In particular, A is symmetric if and only if A = As .
So, by considering their symmetric parts the general classi cation of square matrices of
the last de nition reduces to that of symmetric matrices of Section 29.3. Nevertheless, the
next subsection will illustrate the usefulness of the general classi cation.
Proof We have
n X
X n n X
X n
1 1
x Ax = aij xi xj = aij + aij xi xj
2 2
i=1 j=1 i=1 j=1
Xn X n Xn Xn n n
1 1 1 1 XX
= aij xi xj + aij xi xj = x Ax + aij xj xi
2 2 2 2
i=1 j=1 i=1 j=1 j=1 i=1
n X
X n
1 1 1 1
= x Ax + aji xi xj = x Ax + x AT x = x As x
2 2 2 2
i=1 j=1
as desired.
When the inequality (31.23) is reversed, we have an inner increasing operator, which
becomes strictly increasing when the inequality is strict. A function is inner monotone when
it is inner decreasing or increasing.13
that is,
Az z = T (z) z 0 8z 2 Rn
We conclude that T is:
(i) inner decreasing (increasing) if and only if A is negative (positive) semi-de nite;
(ii) strictly inner decreasing (increasing) if and only if A is negative (positive) de nite.
13
We use the adjective \inner" to distinguish this notion, based on inner products, from the more standard
notion of monotonicity in Rn . Yet, this adjective is not standard.
31.2. INTERMEZZO: INNER MONOTONE OPERATORS 969
The inner monotonicity of a linear operator T and the de niteness of its matrix A are
two faces of the same coin. N
Example 1457 De ne f : R2 ! R2 by
f (x1 ; x2 ) = (x1 ; x1 + x2 )
(f (x) f (y)) (x y) = (f1 (x) f1 (y)) (x1 y1 ) + (f2 (x) f2 (y)) (x2 y2 )
= (x1 y1 ) + (x1 y1 ) + (x1 + x2 y1 y2 ) (x2 y2 )
2 2
= (x1 y1 ) + (x1 y1 ) (x2 y2 ) + (x2 y2 )
1 1
(x1 y1 )2 + (x2 y2 )2 0
2 2
The penultimate inequality follows from the high school inequality:
1 2 1 2 1 1
a2 + b2 + ab a2 + b2 a + b = a2 + b2 0 8a; b 2 R
2 2 2 2
The reader can verify that for n = 1 inner monotonicity is equivalent to standard mono-
tonicity. When n 2, the two notions become altogether independent: in the next example
we present a inner monotone operator that is not monotone as well as a monotone operator
that is not inner monotone.
g (x1 ; x2 ) = (x2 ; x1 )
(g (x) g (y)) (x y) = (g1 (x) g1 (y)) (x1 y1 ) + (g2 (x) g2 (y)) (x2 y2 )
= 1 1 + ( 1) 1=2
(g (x) g (y)) (x y) = (g1 (x) g1 (y)) (x1 y1 ) + (g2 (x) g2 (y)) (x2 y2 )
= 1 1+1 1= 2
When g is inner decreasing and the vectors x and y have equal components, except for
an index i, then
xi > yi =) gi (x) gi (y) (31.24)
because in this case (g (x) g (y)) (x y) = (gi (x) gi (y)) (xi yi ). The sharper impli-
cation
xi > yi =) gi (x) < gi (y) (31.25)
holds when g is strictly inner decreasing.
(i) g is inner decreasing if and only if the Jacobian matrix Dg (x) is negative semi-de nite
for all x 2 C;
(ii) g is strictly inner decreasing if the Jacobian matrix Dg (x) is negative de nite for all
x 2 C.
This di erential criterion is the multivariable version of Propositions 1322 and 1324. A
dual \positive" version holds for inner increasing operators, as the reader can check.
Proof We only prove (i) and leave (ii) to the reader. Suppose that g is inner decreasing.
Let x 2 C and y 2 Rn . Then, for a scalar h > 0 small enough we have (g (x + hy) g (x))
((x + hy) x) 0. Since g is continuously di erentiable, we have
To prove that g is inner decreasing it is enough to show that (1) 0. But, (0) = 0 and
is decreasing since, for all t 2 (0; 1),
0
(t) = (x1 x2 ) Dg (tx1 + (1 t) x2 ) (x1 x2 ) 0
A market demand function D : Rn+ ! Rn+ (Section 22.9) is a strictly inner decreasing
operator if, for all price vectors p 6= p0 ,
D (p) D p0 p p0 < 0
that is, if D satis es the law of demand. In this case, (31.25) takes the form
which means that, ceteris paribus, a higher price of good i results in a lower demand for this
good. Inner monotonicity thus formalizes a key economic concept. Its Jacobian characteri-
zation established in the last proposition plays an important role in demand theory.
With this motivation, we turn to an interesting property of inner monotone operators.
To state it, we need a piece of terminology: a vector 2 Rn de nes the level set (g = ) =
fx 2 C : g (x) = g of an operator g : C Rn ! Rn .
Proposition 1461 The level sets of a inner monotone operator are convex.
This result generalizes the simple observation that the level sets of monotone scalar
functions are intervals. It holds trivially, however, for strictly inner monotone operators as
they are automatically injective: (g (x) g (y)) (x y) 6= 0 and x 6= y imply g (x) 6= g (y).
that is,
(g ( x + y) g (x)) (y x) + (g ( x + y) g (y)) (x y)
= (g ( x + y) g (x) g ( x + y) + g (y)) (y x) = 0
Thus,
(g ( x + y) g (x)) (y x) = (g ( x + y) g (y)) (x y) = 0
because two negative scalars that add up to zero are, in turn, both equal to zero. As x 6= y,
we conclude that g ( x + y) = g (x) = , as desired.
972 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
duality result between the two one-sided directional derivative functions is useful.
This result implies, inter alia, that f+0 (x; ) is superlinear if and only if f 0 (x; ) is sub-
linear.
Proof Assume that f is derivable from the right at x 2 U . For each y 2 Rn we then have:
f (x + h ( y)) f (x) f (x + ( h) y) f (x)
f+0 (x; y) = lim = lim
h!0+ h h!0 + h
f (x + hy) f (x)
= lim = f 0 (x; y)
h!0 h
So, f is derivable from the left at x, and (31.27) holds. A similar argument shows that
derivability from the left yields derivability from the right.
(i) the right f+0 (x; ) : Rn ! R and left f 0 (x; ) : Rn ! R directional derivatives exist at
each x 2 C;
The proof relies on the following lemma which shows that the di erence quotient is
decreasing.
Lemma 1465 Let f : C ! R be concave. Given any x 2 C and y 2 Rn there exists " > 0
such that x + hy 2 C for all h 2 ( "; ") and the function
f (x + hy) f (x)
h7 ! (31.28)
h
is decreasing on ( "; 0) [ (0; ").
Proof Let x 2 C and y 2 Rn . If y = 0, the statement is trivially true. So, let y 6= 0. Since
C is open, there exists a ball B (x) contained in C where > 0. If we set " = =2 kyk, then
x + hy 2 C for all h 2 ( "; "). De ne g : ( "; ") ! R by g (h) = f (x + hy). Since f is
concave, the reader can easily verify that g is concave too. Next, observe that
Proof of Proposition 1464 (i) In view of Proposition 1463, we can focus on the right
derivative function f+0 (x; ) : Rn ! R. By Lemma 1465, the di erence quotient is decreasing
on ( "; 0) [ (0; "), so the limit (31.26) exists
To show that it is nite observe that the di erence quotient is decreasing on ( "; 0) [ (0; "),
thus for each h 2 (0; ")
"
f (x + hy) f (x) f x+ 2 y f (x)
" 2R
h 2
(ii) The proof of the positive homogeneity of f+0 (x; ) is analogous to that of the homo-
geneity of f 0 (x; ) in Corollary 1284. For each 2 [0; 1], we have
y1 + y2 y1 + y 2
f+0 (x; y1 + y2 ) = f+0 x; 2 = 2f+0 x;
2 2
0 0
f+ (x; y1 ) f+ (x; y2 )
2 + = f+0 (x; y1 ) + f+0 (x; y2 ) .
2 2
The last result leads to interesting characterization of derivability via one-sided derivative
functions.
(i) f is derivable at x;
A concave function derivable at a point has, thus, a linear directional derivative function
represented via the inner product (31.30). Since, in general, the directional derivative func-
tion is only homogeneous (Corollary 1284), it is a further noteworthy property of concavity
that the much stronger property of linearity, with its inner product representation, holds.
Proof (iv) implies (iii). Assume that f 0 (x; ) : Rn ! R is linear. By (31.27), we have, for
all y; y 0 2 Rn and all ; 2 R,
(ii) implies (i). Assume that f+0 (x; ) = f 0 (x; ). By (31.27), for each y 2 Rn we have
f (x + hy) f (x) f (x + hy) f (x)
lim = f 0 (x; y) = f 0 (x; y) = lim
h!0+ h h!0 h
and so the bilateral limit
f (x + hy) f (y)
f 0 (x; y) = lim
h!0 h
exists nite. We conclude that f is derivable at x.
(i) implies (iv). Assume that f is derivable at x. In view of Proposition 1464, the
directional derivative function f 0 (x; ) : Rn ! R is linear because it is both superlinear,
being f 0 (x; ) = f+0 (x; ), and sublinear, being f 0 (x; ) = f 0 (x; ). Thus, f 0 (x; ) : Rn ! R
is linear. This completes the proof of the equivalence among conditions (i)-(iv).
Finally, assume that f is derivable (so, partially derivable) at x. By what just proved,
f 0 (x; ) : Rn ! R is linear. By Riesz's Theorem, there is a vector 2 Rn such that
0
f (x; y) = n
y for every y 2 R . Then,
@f (x)
= f 0 x; ei = ei = i 8i = 1; :::; n
@xi
Thus, = rf (x).
A remarkable property of concave functions of several variables is that for them partial
derivability and di erentiability are equivalent notions.
Thus, for concave functions we recover the remarkable equivalence between derivability
and di erentiability that holds for scalar functions but fails, in general, for functions of
several variables (cf. Section 27.2.1). This is another sign of the great analytical convenience
of concavity.
Proof Since (iii) implies (i) by Theorem 1268, it is enough to prove that (i) implies (ii)
and that (ii) implies (iii). (i) implies (ii). Suppose f is partially derivable at x. Then,
f+0 x; ei = f 0 x; ei for each versor ei of Rn . Let 0 6= y 2 Rn+ . By Proposition
1464, f+0 (x; ) is superlinear and f 0 (x; ) is sublinear. So, f+0 (x; 0) = f 0 (x; 0) = 0. Let
0 6= y 2 Rn+ . Since f+0 x; ei = f 0 x; ei , we have:
n
! n
! n
! n
X X yi X X y
0
f+ (x; y) = 0
yi f+ x; Pn ei
yi Pn i f+0 x; ei
y
i=1 i i=1 yi
i=1 i=1 i=1 i=1
n
! n n
! n
!
X X yi X X y i
= yi Pn f 0 x; ei yi f 0 x; Pn ei = f 0 (x; y)
i=1 i=1 i=1 yi i=1 i=1 yi
i=1
976 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
So, f+0 (x; y) = f 0 (x; y) because, again by Proposition 1464, f+0 (x; y) f 0 (x; y). We
conclude that f+ (x; ) = f (x; ) on R+ . A similar argument, based on f+0 x; ei =
0 0 n
By Proposition 1464, we conclude that f+0 (x; y) = f 0 (x; y). In turn, this implies f+0 (x; ) =
f 0 (x; ) on Rn . By Corollary 1466, f is derivable.
(ii) implies (iii). Suppose f is derivable at x. To show that f is di erentiable at x, in
view of the last corollary we need to show that
f (x + h) f (x) rf (x) h
lim =0
h!0 khk
(ii) f is derivable;
(iii) f is di erentiable
Proof Properties (i)-(iii) are equivalent by the last theorem. As (iv) implies (iii) by Theorem
1271, it remains to prove that (iii) implies (vi). So, let f be di erentiable. We want to show
that it is continuously di erentiable. We will actually prove that, at each x 2 C,
for all sequences fxn g C that converge to x. To this end, rst observe that, since f is
derivable on C, at each x 2 C it holds
This key equality will be tacitly used throughout the proof. Let x 2 C. Take fxn g C such
that xn ! x. Let k > 0 be such that f+0 (x; y) < k. Then, there exists h > 0 small enough
so that
f (x + hy) f (x)
<k
h
31.3. MULTIVARIABLE CASE 977
Set km = f+0 (x; y) + 1=m. By (31.32), lim sup f+0 (xn ; y) km for each m. Hence,
lim sup f 0 (xn ; y) = lim sup f+0 (xn ; y) lim km = f+0 (x; y) = f 0 (x; y)
m!1
lim inf f 0 (xn ; y) = lim inf f 0 (xn ; y) = lim inf f+0 (xn ; y)
= lim sup f+0 (xn ; y) f+0 (x; y) = f 0 (x; y) = f 0 (x; y)
We conclude that
lim sup f 0 (xn ; y) f 0 (x; y) lim inf f 0 (xn ; y)
that is,
lim sup f 0 (xn ; y) = f 0 (x; y) = lim inf f 0 (xn ; y)
In view of Proposition 412, this implies (31.31).
De ne x;y : Cx;y ! R by
Proof We consider the concave case, and leave to the reader the strictly concave one. (i)
implies (ii). Suppose f is concave. Let x; y 2 C and t1 ; t2 2 Cx;y . Then, for each 2 [0; 1],
x;y ( t1 + (1 ) t2 ) = f ((1 ( t1 + (1 ) t2 )) x + ( t1 + (1 ) t2 ) y)
= f ( ((1 t1 ) x + t1 y) + (1 ) ((1 t2 ) x + t2 y))
f ((1 t1 ) x + t1 y) + (1 ) f ((1 t2 ) x + t2 y)
= x;y (t1 ) + (1 ) x;y (t2 )
and so x;y is concave.
Since (ii) trivially implies (iii), it remains to prove that (iii) implies (i). Let x; y 2 C.
Since x;y is concave on [0; 1], we have
f ((1 t) x + ty) = x;y (t) t x;y (1) + (1 t) x;y (0) = (1 t) f (x) + tf (y)
for all t 2 [0; 1], as desired.
Proof Let f be concave. Fix x; y 2 C. Let x;y : Cx;y ! R be given by (31.33). By Lemma
1469, Cx;y is an open interval, and by Proposition 1470 the function x;y is concave on Cx;y .
Since f di erentiable at x, note that:14
(") (0) f (x + " (y x)) f (x)
lim = lim
"!0 " "!0 "
where the latter limit exists and is nite. So, is di erentiable at 0 2 Cx;y . Since [0; 1]
Cx;y , by (31.9) we have
0
(1) (0) + (0) = (0) + f 0 (x; y x)
i.e., f (y) f (x) + rf (x) (y x) (Theorem 1287). So, the inequality (31.34) holds. We
leave to the reader the strictly concave part.
Proof The \only if" follows from (31.34). As to the converse, suppose that (31.35) holds.
For each x 2 C, consider the function Fx : C ! R given by Fx (y) = f (x) + rf (x) (y x).
By (31.35), f (y) Fx (y) for all x; y 2 C. Since Fx (x) = f (x), we conclude that f (y) =
minx2C Fx (y) for each y 2 C. Since each Fx is a ne, we conclude that f is concave since,
as the reader can check, a function that is a minimum of a family of concave functions is
concave. We leave to the reader the strictly concave part.
(ii) f is strictly concave if and only if rf : C ! Rn is strictly inner decreasing, i.e., the
previous inequality is strict when x 6= y.
This result is a major motivation for the study of inner monotonicity. Yet, not all inner
monotone operators are gradient operator of concave functions: for instance, the operator
in Example 1457 is inner monotone but not a gradient operator, as the reader can check.
So, rf (x) (x y) f (x) f (y) rf (y) (x y). In turn, this implies (rf (x) rf (y))
(x y) 0 and we conclude that rf : C ! Rn is inner decreasing.
Conversely, suppose rf : C ! Rn is inner decreasing, i.e., (31.36) holds. Suppose rst
that n = 1. Let x 2 C, and de ne x : C ! R by x (y) = f (y) f (x) rf (x) (y x).
980 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
Since x was arbitrary, we conclude that f (y) f (x) + rf (x) (y x) for all x; y 2 C. By
Theorem 1472, f is concave. This completes the proof for n = 1.
Suppose now that n > 1. Let x; y 2 C and let x;y : Cx;y ! R be given by (31.33). By
Lemma 1469, Cx;y is an open interval, with [0; 1] Cx;y . Then, x;y is di erentiable on Cx;y ,
with
0
x;y (t) = rf ((1 t) x + ty) (y x) 8t 2 Cx;y (31.37)
Let t2 t1 2 Cx;y . Since rf is inner decreasing, we then have
f ((1 t) x + ty) = x;y (t) (1 t) x;y (0) + t x;y (1) = (1 t) f (x) + tf (y)
Consider now x; y 2 C such that x 6= y. By Proposition 1470, x;y is strictly concave and
di erentiable on the open interval Cx;y . By Corollary 1436, 0x;y is strictly decreasing and
0
x;y (t) = rf ((1 t) x + ty) (y x) 8t 2 Cx;y
0
Since x;y is strictly decreasing, we have that
0 0
(rf (y) rf (x)) (y x) = x;y (1) x;y (0) < 0
proving one implication. Assume now (31.36) holds with < whenever x 6= y. Let x; y 2 C
be such that x 6= y and consider one more time the function x;y : Cx;y ! R. Again, observe
that x;y is di erentiable on Cx;y
0
x;y (t) = rf ((1 t) x + ty) (y x) 8t 2 Cx;y
yielding that
0 0
0 < (rf ((1 t1 ) x + t1 y) rf ((1 t2 ) x + t2 y)) (y x) = x;y (t1 ) x;y (t2 )
and we conclude that 0x;y (t2 ) < 0x;y (t1 ), i.e., 0x;y is strictly monotone decreasing on Cx;y .
By Corollary 1436 and Proposition 1470, x;y is strictly concave and so is f .
A dual result, with opposite inequality, characterizes convex functions. The next result
makes truly operational this characterization via a condition of negativity on the Hessian
matrix r2 f (x) of f { that is, the matrix of second partial derivatives of f { which generalizes
the condition f 00 (x) 0 of Corollary 1438. In other words, in the general case the role of
the second derivative is played by the Hessian matrix.
(i) f is concave if and only if r2 f (x) is negative semi-de nite for every x 2 C;
Proof The result follows from Proposition 1459 once one remembers that the Hessian matrix
of a function of several variables is the Jacobian matrix of its derivative operator (Exercise
1292). So, the Hessian matrix r2 f (x) of f is the Jacobian matrix of the derivative operator
rf : C ! Rn , which plays here the role of g in Proposition 1459.
This is the most useful di erential criterion to establish concavity and strict concavity
for functions of several variables. Naturally, dual results hold for convex functions, which
are characterized by having positive semi-de nite Hessian matrices.
and we saw how its Hessian matrix was positive de nite. By Proposition 1474, f is strictly
convex.
(ii) Consider the CES production function f : R2+ ! R de ned by
1
f (x) = ( x1 + (1 ) x2 )
with 2 [0; 1] and > 0 (Example 865). Some tedious algebra shows that the Hessian
matrix is 1
2 2 2
r2 f (x) = (1 ) (1 )t x1 x2 H
where t = x1 + (1 ) x2 and
x22 x1 x2
H=
x1 x2 x21
If = ( 1; 2 ), we have
H = x22 2
1 2x1 x2 1 2 + x21 2
2 = (x2 1 x1 2 )2 0
982 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
Thus, the matrix H is positive semi-de nite. It follows that for > 1 the matrix r2 f (x)
is positive semi-de nite for all x1 ; x2 > 0, so by Proposition 1474 f is convex. While f is
concave when 0 < < 1.
In Corollary 880 we already established the concavity of the CES functions without doing
any calculation. Readers can compare the pros and cons of the two approaches. N
The next example considers an important class of functions and shows how fruitful it
can be to integrate the di erential criteria of the last result and some ad hoc reasoning that
exploits the speci c features of the functions at hand.
Example 1476 Given a square symmetric matrix A of order n, a vector b 2 Rn and a scalar
c 2 R, de ne the linear-quadratic function f : Rn ! R by
1
f (x) = x Ax + b x + c
2
f ( x + (1 ) 0) = f ( x) = b ( x) + c = (b x) + c + (1 ) c = f (x) + (1 ) f (0)
f (x + h) f (x) f (y + h) f (y)
Ultramodular functions are supermodular. Indeed, from the equality (20.1), we can set
h = x _ y y = x x ^ y 0. So, if f is ultramodular, we have
which implies that f is supermodular. The converse is false: for instance, the function
p
f (x1 ; x2 ) = x1 x2 is supermodular but not ultramodular (Example 1482).
The next result further clari es the relations between supermodularity and ultramodu-
larity.
Earlier in the chapter we learned the remarkable di erential properties of concave func-
tions (Section 31.3). It is useful to compare them with those of inframodular functions, which
are also sharp (inframodular functions are, indeed, much better behaved that submodular
functions).16 A rst important result is that, like for concave functions (Theorem 1467), also
for inframodular functions partial derivability is equivalent to di erentiability.17
@f (x)
0 8i; j = 1; :::; n
@xi @xj
15
That is, each section f ( ; x i ) : [ai ; bi ] ! R is convex in xi .
16
We omit the proofs of these di erentiability results (their inframodular, rather than ultramodular, is
self-explanatory).
17
In reading the result, recall from Section 2.3 that (a; b) = fx 2 Rn : ai < xi < bi g.
984 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
The di erential characterizations established in the last two results show that, unlike the
scalar case, inframodularity and concavity are quite unrelated properties in the multivariable
case, as we remarked at the beginning of this section.
Example 1482 (i) De ne f : R2+ ! R by f (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. This
function is supermodular (Example 937). Its Hessian matrix is
2 3
1( 1 1) x1 1 2 x2 2 1
1 2
1 1 2
6 x1 x2 7
r2 f (x) = 4 5
1 2 2
1
1 2
1 1 2 2 ( 2 1) x1 x2
x1 x2
So, f is ultramodular if and only if 1 ; 2 1. (ii) In view of the previous point, the
2 p
concave and supermodular function f : R+ ! R de ned by f (x1 ; x2 ) = x1 x2 is neither
ultramodular nor inframodular. (iii) The convex function f : R2 ! R de ned by f (x1 ; x2 ) =
log (ex1 + ex2 ) is neither ultramodular nor inframodular: its Hessian matrix is
2 ex1 ex1 +x2
3
ex1 +ex2 (ex1 +ex2 )2
r2 f (x) = 4 5
ex1 +x2 ex1
(ex1 +ex2 )2 ex1 +ex2
Example 1484 (i) Consider the function f : R ! R given by f (x) = (x+1)4 +2. We have
f 00 (x) = 12(x + 1)2 < 0. The function is concave on R and it is therefore su cient to nd
a point where its rst derivative is zero to nd a maximizer. We have f 0 (x) = 4(x + 1)3 .
Hence f 0 is zero only at x^ = 1. The point x ^ = 1 is the unique global maximizer, and the
maximum value of f on R is f ( 1) = 2.
(ii) Consider the function f : R ! R given by f (x) = x (1 x). Because f 0 (1=2) = 0
and f 00 (x) = 2 < 0, the point x ^ = 1=2 is the unique global maximizer of f on R. N
Proof In view of Fermat's Theorem, we need to prove the \if" part, that is, su ciency. So,
let x
^ 2 int C be such that rf (^
x) = 0. We want to show that x^ is a global maximizer. By
inequality (31.34), we have
f (y) f (^
x) + rf (^
x) (y x) 8y 2 int C
Since f is continuous, the inequality is easily seen to hold for all y 2 C. Since rf (^
x) = 0,
we conclude that f (y) f (^ x) for all y 2 C, as desired.
Example 1486 Consider the function f : R2 ! R given by f (x1 ; x2 ) = (x1 1)2 (x2 +
3)2 6. We have
2 0
r2 f (x1 ; x2 ) =
0 2
Since 2 < 0 and r2 f (x1 ; x2 ) = 4 > 0; the Hessian matrix is negative de nite for every
(x1 ; x2 ) 2 R2 and hence f is strictly concave. We have
The unique point where the gradient is zero is (1; 3) which is, therefore, the unique global
maximizer. The maximum value of f on R2 is f (1; 3) = 6. N
Example 1487 In Section (22.10) we considered the least squares optimization problem
also from Theorem 1485. Indeed, rg (x) = 2AT (Ax b) and so the rst-order condition
2AT (Ax b) = 0 can be written as a linear system
AT Ax = AT b (31.39)
Example 1488 Let A be a square and negative de nite symmetric matrix of order n. Con-
sider the quadratic optimization problem
Ax + b = 0
Since A is invertible (cf. Proposition 1195), by Theorem 1485 we conclude that the unique
maximizer is x^ = A 1 b. N
These examples show, inter alia, that to optimize quadratic functions amounts to solving
linear equations, the simplest and best understood class of equations. This explains the
popularity of quadratic optimization problems.
We close by noting that for scalar functions f : (a; b) ! R, where C = (a; b), the last
theorem also follows from Proposition 1343. That said, the last theorem is the result used in
applications because of the conceptual and analytical appeal of concavity (cf. the discussion
that ends Section 28.5.4).
Proposition 1489 Let f : C ! R be concavi able. Then there exists a concave function
co f : C ! R such that
(i) co f f;
In words, a concavi able function admits a smallest concave function that pointwise
dominates it.
Proof Let fgi gi2I be the collection of all concave functions gi : C ! R such that gi f.
This collection is not empty because f is concavi able. De ne co f : C ! R by
For each x 2 C, the scalar f (x) is a lower bound for the set fgi (x) : i 2 Ig. By the Least
Upper Bound Principle inf i2I gi (x) exists, so the function co f is well-de ned. It is easily
seen to be concave. Indeed, let 2 [0; 1] and x; y 2 C. By Proposition 127, for each " > 0
there exists i" such that
Since this inequality holds for every " > 0, we conclude that co f ( x + (1 ) y) co f (x)+
(1 ) co f (y), so co f is concave. In turn, this implies that co f (x) = mini2I gi (x). In par-
ticular, co f satis es properties (i) and (ii).
Example 1490 (i) Both the sine and cosine functions are concavi able. Their concave
envelope is constant to 1, i.e., co sin (x) = co cos (x) = 1 for all x 2 R. (ii) Let f : R ! R be
2
the Gaussian function f (x) = e x . It is concavi able with
8 h i
< f (x) x 2 p1 ; p1
2 2
co f (x) =
: 1
e 2 else
(iii) The quadratic function is not concavi able on the real line. (iv) Functions that have
at least a global maximizer are automatically concavi able: just take the function constant
to the maximum value. For instance, continuous supercoercive functions f : Rn ! R are
concavi able. N
This remarkable result shows how concavity is deeply connected to global maximization,
more than it may appear prima facie. It is a result, however, mostly of theoretical interest
because concave envelopes are, in general, not easy to compute. Indeed, Theorem 1485 can
be regarded as its operational special case.
The proof relies on two elegant lemmas of independent interest.
In words, global maximizers of concavi able functions are global maximizers of their
concave envelopes and they share the same maximum value.
f (^
x) co f (x) 8x 2 C (31.40)
f (^
x) = co f (^
x) co f (x) f (x) 8x 2 C
In view of Lemma 1493, in optimization problems with convex choice sets { e.g., consumer
problems since budget sets are convex { in terms of value attainment one can assume that
the objective function be concave. Thus, if in such problems we are interested only in the
value functions, without any loss we can just deal with concave objective functions.
31.5. GLOBAL OPTIMIZATION 989
This is no longer the case, however, if we are interested also in the solutions per se, i.e.,
in the solution correspondence. Indeed, in this regard Lemma 1493 says only that
So, by replacing an objective function with its concave envelope we do not lose solutions,
but we might well get intruders that solve the concavi ed problem but not the original one.
To understand the scope of this issue, note that co (arg maxx2C f (x)) arg maxx2C co f (x)
because the solutions of a concave objective function form a convex set. Thus, the best one
can hope is that
co arg max f (x) = arg max co f (x)
x2C x2C
Even in such best case, there might well be many vectors that solve the optimization problem
for the concave envelope co f but not for the original objective function f . We thus might
end up overestimating the solution correspondence. For instance, if in a consumer problem
we replace a utility function with its concave envelope, we do not lose any optimal bundle
but we might well get \extraneous" bundles, optimal for the concave envelope but not for
the original utility function. For an analytical example, if we maximize the cosine function
over the real line, the maximizers are the points x ^ = 2k with k 2 Z (Example 976). If we
replace the cosine function with its concave envelope, the maximizers become all the points
of the real line. So, the solution set is vastly in ated. Still, the common maximum value is
1.
A nal remark. There is a dual notion of convex envelope of a function as the largest
dominated convex function, relevant for minimization problems (the reader can establish the
dual version of Theorem 1491).
for all x > 0. Now, by applying this inequality to 1=x we get log (1=x) (1=x) 1 for all
x > 0, which implies (x 1) =x log x. This completes the proof of (31.41), as we leave to
the reader to check the equality part.
We can sharpen the last inequality through the following nding of Jovan Karamata: for
all x > 0 it holds
log x 1
p
x 1 x
This inequality permits to re ne (31.41) once we consider separately the two cases when
x is greater or lower than 1. Indeed, for 0 < x 1 we have
x 1
p log x x 1 (31.42)
x
while for x 1 we have
x 1 x 1
log x p (31.43)
x x
At the cost of a slightly more complicated expression, with two cases to consider, we thus
get sharper bounds for the logarithm. Next we illustrate them.
Karamata's logarithmic inequality is part of the following 1949 nice result,18 with a
third and fourth order sharpening of Karamata's inequality that are, however, increasingly
complex.
is concave.
So,
strong concavity =) strict concavity =) concave
Intuitively, a strongly concave function is \so concave" that it remains concave even if added
with the quadratic, so strictly convex, function k kxk2 . Note that the sum of a concave
function and of a strongly concave function is strongly concave, so there is a simple way to
construct strongly concave functions.
Proof Let f : C ! R be strongly concave. By de nition, there exists k > 0 such that the
2
function g : C ! R de ned by
Pgn (x)2= f (x) + k kxk is concave. Let x; y 2 C, with x 6= y,
2
and 2 (0; 1). Since kxk = i=1 xi is strictly convex, we have
f ( x + (1 ) y) = g ( x + (1 ) y) k k x + (1 ) yk2
> g (x) + (1 ) g (y) k kxk2 + (1 ) kyk2
= f (x) + (1 ) f (y)
as desired.
Strong concavity is, thus, a strong version of strict concavity. The next result shows the
great interest of such stronger version.
18
Reported on pp. 77-78 (Karamata) and 156-157 (Blanusa) of the 1949 issue of the Bull. Soc. Math.
Phys. Serbie (see also Mitrinovic, 1970, p. 272).
992 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
In Example 1010 we showed that the function f (x) = 1 x2 is coercive. Since this
function is easily seen to be strongly concave, the example can be now seen as an illustration
of the proposition just stated.
The proof relies on a lemma of independent interest.
Proof Since f is concave and upper semicontinuous, the convex set hypo f is closed. For,
let f(xn ; tn )g hypo f be such that (xn ; tn ) ! (x; t) 2 Rn+1 . We need to show that (x; t) 2
hypo f . By de nition, tn f (xn ) for each n 1, so t = lim tn lim sup f (xn ) f (x)
because f is upper semicontinuous. This shows that (x; t) 2 hypo f .
Let (x0 ; t0 ) 2
= hypo f , with x0 2 C and t0 > f (x0 ). By Proposition 1025, hypo f and
(x0 ; t0 ) are strongly separated, that is, there exist (a; c) 2 Rn+1 , b 2 R and " > 0 such that
We have c > 0. For, suppose that c = 0. Then, a x0 b + " > b a x for all x 2 C, so
in particular a x0 > a x0 by taking x = x0 , a contradiction. Next, suppose c < 0. Again
by taking x = x0 and t = f (x0 ), from (31.45) it follows that ct0 > cf (x0 ). So t0 < f (x0 ),
which contradicts t0 > f (x0 ).
In sum, c > 0. Without loss of generality, set c = 1. De ne the a ne function r : C ! R
by r (x) = a (x0 x) + t0 . We then have r (x) t for all (x; t) 2 hypo f . In particular, this
is the case for (x; f (x)) for all x 2 C, so r (x) f (x) for all x 2 C. We conclude that r is
the sought-after a ne function.
Proof of Proposition 1499 We rst show that every upper contour set (f t) is bounded.
If C is bounded, then the statement is trivially true, being (f t) a subset of C. Otherwise,
suppose, by contradiction, that there exists an unbounded sequence fxn g (f t), i.e.,
such that kxn k ! +1. Since f is strongly concave, there exists k > 0 such that the
function g (x) = f (x) + k kxk2 is concave. Since f is upper semicontinuous, so is g. Since
g is concave and upper semicontinuous, by the previous lemma there is an a ne function
r : C ! R, with r (x) = a x + b for some a 2 Rn and b 2 R, such that r g. This implies
that a xn + b f (xn ) + k kxn k2 for all n. By the Cauchy-Schwarz inequality we have
a xn kak kxn k for all n, so
Along with Tonelli's Theorem (Theorem 1080), this proposition readily implies the fol-
lowing remarkable unique existence result that combines the best of the two worlds of coer-
civity and concavity: coercivity ensures that a maximizer exists, strict concavity guarantees
uniqueness.
In view of this remarkable result one may wonder whether there are strong concavity
criteria. The next result shows that this is, indeed, the case.19
Proof \Only if" Since f is strongly concave, there exists k > 0 such that the function
g (x) = f (x) + k kxk2 is concave. Since f is twice continuously di erentiable, so is g. Since g
is concave, we have y r2 g (x) y 0 for all x 2 C and all y 2 Rn (Proposition 1474). Some
simple algebra shows that r2 g (x) = r2 f (x) + 2kI for all x 2 C, where I is the identity
matrix of order n (note that kxk2 = x Ix). By setting c = 2k, this proves the implication.
\If". Set k = c=2 > 0. De ne g to be such that g (x) = f (x) + k kxk2 . Since f is twice
continuously di erentiable, so is g and r2 g (x) = r2 f (x) + 2kI = r2 f (x) cI for all x 2 C.
Since r2 f (x) cI is negative semi-de nite for all x 2 C, so is r2 g (x). By Proposition 1474,
it follows that g is concave, yielding that f is strongly concave.
Theorem 1503 (Projection Theorem) Let C be a closed and convex set of Rn . For
every x 2 Rn , the optimization problem
(x m) (m y) 0 8y 2 C (31.48)
Proof Let C be a vector subspace. By taking y = 0 and y = 2m, condition (31.48) is easily
seen to imply (x m) m = 0. So, (x m) (m y) = (x m) ( y) 0 for all y 2 C. Fix
y 2 C. Then, (x m) ty 0 for t = 1, so (x m) y = 0. Since y was arbitrarily chosen,
we conclude that (x m) y = 0 for all y 2 C, i.e., (x m) ?C.
Conversely, assume (x m) ?C. Since C is a vector subspace and m 2 C, we have
m y 2 C for all y 2 C, yielding (x m) (m y) = 0 for all y 2 C, proving (31.48).
To prove this general form of the Projection Theorem, given an x 2 Rn we consider the
function f : Rn ! R de ned by f (y) = kx yk2 . Problem (31.47) can be rewritten as
Thanks to the following lemma, we can apply Theorem 1501 to this optimization prob-
lem.20
Proof Simple algebra shows that r2 f (y) = 2I for all y 2 Rn , so z r2 f (y) z = 2 kzk2
kzk2 for all y 2 Rn and all z 2 Rn . By taking c = 1, condition (31.46) is satis ed. This
proves that f is strongly concave on Rn and, in particular, when we restrict the domain to
C.
20
The reader should compare this result with Lemma 1124. In a similar vein, the function of Lemma
1058 can be shown to be strongly concave. In these cases, strong concavity combines strict concavity and
coercivity, thus con rming its dual role across concavity and coercivity.
31.7. ULTRACODA: PROJECTIONS ON CONVEX SETS 995
if and only if it is a solution of the optimization problem (31.49). In view of the previous
lemma, f is strongly concave on C. Clearly, it is also continuous. Since C is a closed and
convex set of Rn , by Theorem 1501, there exists a unique solution m 2 C to the optimization
problem (31.49).
It remains to show that conditions (31.48) and (31.50) are equivalent, so that condition
(31.48) characterizes the minimizer m.21 Fix any y 2 C and let yt = ty + (1 t) m for
t 2 [0; 1]. From (31.50) it follows that, for each t 2 (0; 1], we have
0 kx mk2 kx y t k2 = km y t k2 2 (x m) (m yt )
2
= km ty (1 t) mk 2 (x m) (m ty (1 t) m)
2 2
= t k(m y)k 2t (x m) (m y)
Thus, (31.48) implies kx mk2 kx yk2 0, so (31.50). Summing up, we proved that
conditions (31.48) and (31.50) are equivalent.
To prove (31.51), we have two possible methods. The rst one veri es condition (31.48). To
this end, we start by making few observations. First, note that for each square matrix B of
order n and each pair of vectors x and y in Rn
T
x By = xT By = xT By = yTB Tx = y B Tx = B Tx y
21
Here we follow Zarantonello (1971).
996 CHAPTER 31. CONCAVITY AND DIFFERENTIABILITY
1
De ne the square matrix D of order n to be such that D = AT AAT A. We leave to the
reader to verify that D = DT = D2 . Next, note that
1 1
APC (x) = A x + AT AAT (b Ax) = Ax + AAT AAT (b Ax)
= Ax + (b Ax) = b
that is, PC (x) 2 C. With this in mind, we can show that PC (x) satis es (31.48). Let y 2 C.
Since Ay = b, observe that
(x PC (x)) (PC (x) y)
T T 1 1
= A AA (Ay Ax) AT AAT (Ay Ax) +
T T 1
A AA (Ay Ax) (x y)
= D (y x) D (y x) D (y x) (x y)
T
= D D (y x) (y x) D (y x) (x y)
2
= D (y x) (y x) D (y x) (x y)
= D (y x) (x y) D (y x) (x y) = 0
Since y was arbitrarily chosen in C, (31.48) holds.
The second method to prove (31.51) relies on results about constrained optimization that
we will study later in the book. Consider the optimization problem
min kx yk2 sub y 2 C
y
Example 1508 The uniqueness of the solution of the minimization problem (31.47) relies
on the convexity of C. Indeed, consider the optimization problem
where @B1 (0) = fx 2 Rn : kxk = 1g is the unit sphere, a closed but non-convex set. If we
take the origin x = 0, it is easy to see (just draw a picture for the case n = 2) that
That is, every element of the unit sphere is a projection of the origin onto the unit sphere it-
self. The lack of convexity of the unit sphere thus caused a dramatic failure of the uniqueness
of the projection of the origin. N
for all x; y 2 Rn .
as well as
(y PC (y)) (PC (y) PC (x)) 0
By adding, we get 31.53.
for all x; y 2 Rn .
(x y) (PC (x) PC (y)) (PC (x) PC (y)) (PC (x) PC (y)) 0 (31.54)
as desired.
kPC (x) PC (y)k2 = (PC (x) PC (y)) (PC (x) PC (y)) (x y) (PC (x) PC (y))
kx yk kPC (x) PC (y)k
where the last inequality follows from the Cauchy-Schwarz inequality. In turn, this implies
(31.55).
A nonexpansive operator is thus a Lipschitz operator with unit coe cient. Being nonex-
pansive, projector operators are continuous, a nal noteworthy property that we report for
later reference.
Convex Analysis
This inequality has a natural geometric interpretation: the tangent hyperplane (line, in the
scalar case) lies above the graph of f , which it touches only at (x; f (x)). Remarkably, next
we show that this property actually characterizes the di erentiability of concave functions.
In other words, this geometric property is peculiar to the tangent hyperplanes of concave
functions.
Proof \If". Suppose 2 Rn satis es (32.1). Let z 2 Rn . Since C is open, for h > 0 small
enough we have x + hz 2 C, so
999
1000 CHAPTER 32. CONVEX ANALYSIS
so satis es (32.1).
\Only if". Assume that 2 Rn satis es (32.2). Let y 2 C. Since C is open, there is
h > 0 small enough so that x + h (y x) 2 C. Then, by Lemma 1465,
f (x + t (y x)) f (x)
(y x) f 0 (x; y x) (32.3)
t
which is (32.1) when t = 1.
By choosing z = ei (the i-th versor) with i 2 f1; :::; ng, (32.4) yields that the i-th component
of rf (x) is smaller than or equal to i for all i 2 f1; :::; ng. By choosing z = ei with
i 2 f1; :::; ng, (32.4) yields that the i-th component of rf (x) is greater than or equal to i
for all i 2 f1; :::; ng. This proves that rf (x) = .
\If". Assume there is a unique vector 2 Rn such that (32.1) holds. By the previous
lemma, is the unique vector satisfying f+0 (x; z) z for all z 2 Rn . By Proposition 1464,
0 n
we have that f+ (x; ) : R ! R is superlinear. By Theorem 1564 (there is no circularity
in using this result in the current proof), there exists a non-empty, compact and convex set
D Rn such that f+0 (x; z) = min 0 2D 0 z for all z 2 Rn . This implies that each vector
0 2 D satis es f 0 (x; z) 0 z for all z 2 Rn . Since is the unique vector satisfying this
+
condition, = 0 for all 0 2 D, that is, D = f g. We can conclude that f+0 (x; z) = z
n 0 n
for all z 2 R and, in particular, f+ (x; ) : R ! R is a linear function. By Corollary 1466,
f is derivable at x. By Theorem 1467, f is di erentiable at x.
The superdi erential thus consists of all vectors (and so of the linear functions) for which
(32.1) holds. It may not exist any such vector (Example 1522 below); in this case the
superdi erential is empty and the function is not superdi erentiable at the basepoint.
32.1. SUPERDIFFERENTIALS 1001
In words, r is equal to f at the basepoint x and dominates f elsewhere. It follows that @f (x)
identi es the set of all a ne functions that touch the graph of f at x and that lie above this
graph at all other points of the domain. In the scalar case, a ne functions are the straight
lines. So, in the next gure the straight lines r, r0 , and r00 belong to the superdi erential
@f (x) of a concave scalar function:
It is easy to see that, at the points where the function is di erentiable, the only straight
line that satis es conditions (32.6)-(32.7) is the tangent line f (x) + f 0 (x) (y x). But, at
the points where the function is not di erentiable, we might well have several straight lines
r : R ! R that satisfy such conditions, that is, that touch the graph of the function at
the basepoint x and that lie above such graph elsewhere. The superdi erential, being the
collection of these straight lines, can thus be viewed as a surrogate of the tangent line, i.e.,
of the di erential. This is the idea behind the superdi erential: it is a surrogate of the
di erential when it does not exist. The next result, an immediate consequence of Theorem
1513, con rms this intuition.
In the following example we determine the superdi erential of a simple scalar function.
Example 1517 Consider f : R ! R de ned by f (x) = 1 jxj. The only point where f
is not di erentiable is x = 0. By Proposition 1516, we have @f (x) = ff 0 (x)g for each
1002 CHAPTER 32. CONVEX ANALYSIS
x 6= 0. It remains to determine @f (0). This amounts to nding the scalars that satisfy
the inequality
1 jyj 1 j0j + (y 0) 8y 2 R
i.e., the scalars such that jyj y for each y 2 R. If y = 0, this inequality trivially
holds for all if y = 0. If y =
6 0, we have
y
1 (32.8)
jyj
Since
y 1 if y 0
=
jyj 1 if y < 0
from (32.8) it follows both 1 and ( 1) 1. That is, 2 [ 1; 1]. We conclude that
@f (0) = [ 1; 1]. Thus: 8
>
> 1 if x > 0
<
@f (x) = [ 1; 1] if x = 0
>
>
:
1 if x < 0
N
Next we show that this is always the case for scalar functions.
In words, the superdi erential of a scalar function consists of all coe cients that lie
between the right and left derivatives. This makes precise the geometric intuition we gave
above on scalar functions.
Proof We only prove that @f (x) f+0 (x) ; f 0 (x) . Let 2 @f (x). Given any h 6= 0, by
de nition we have f (x + h) f (x) + h. If h > 0, we then have
i = ei f ei = 0 8i = 1; :::; n
n
X
i = (1; :::; 1) f (1; :::; 1) = 1
i=1
Xn
i = ( 1; :::; 1) f ( 1; :::; 1) = 1
i=1
P
we conclude that ni=1 i = 1 and i 0 for each i = 1; :::; n. That is, belongs to the
simplex n 1 . Thus, @f (0) n 1 . On the other hand, if 2 n 1 , then
and so 2 @f (0). We conclude that @f (0) = n 1 , that is, the superdi erential at the
origin is the simplex. The reader can check that, for every x 2 Rn ,
@f (x) = f 2 n 1 : x = f (x)g
i.e., @f (x) consists of the vectors x of the simplex such that x = f (x). N
Example 1520 We can generalize the previous example by showing that for any positively
homogeneous function f : Rn ! R we have
so f (x) x. We conclude that f (x) = x for all 2 @f (x). In turn, this implies that
(32.5) takes the form
f (y) y 8y 2 Rn
for all 2 @f (x), i.e., @f (x) @f (0). In turn, this is easily seen to imply (32.10).2 N
Before we argued that the superdi erential is a surrogate of the di erential. To be a useful
surrogate, however, it is necessary that it often exists, otherwise it would be of little help.
The next key result shows that, indeed, concave functions are everywhere superdi erentiable
and that, moreover, this is exactly a property that characterizes them (another proof of the
tight connection between superdi erentiability and concavity).
2
The argument shows that (32.10) actually holds for any superhomogeneous function f : Rn ! R with
f (0) = 0.
1004 CHAPTER 32. CONVEX ANALYSIS
Theorem 1521 A function f : C ! R is concave if and only if @f (x) is non-empty for all
x 2 C.
Proof \If". Suppose @f (x) 6= ; at all x 2 C. Let x1 ; x2 2 C and t 2 [0; 1]. Let 2
@f (tx1 + (1 t) x2 ). By (32.5),
that is,
Hence,
f (tx1 + (1 t) x2 )
tf (x1 ) t (1 t) (x1 x2 ) + (1 t) f (x2 ) (1 t) t (x2 x1 )
= tf (x1 ) + (1 t) f (x2 )
as desired.
\Only if". Suppose f is concave. Let x 2 C. By Proposition 1464, we have that f+0 (x; ) :
n
R ! R is superlinear. By Theorem 1564 (there is no circularity in using this result in
the current proof), there exists a non-empty, compact and convex set D Rn such that
0
f+ (x; z) = min 2D z for all z 2 R . Since D is non-empty, there exists 2 Rn such that
n
0
f+ (x; z) z for all z 2 Rn . By Lemma 1514, @f (x) is non-empty.
The maintained hypothesis that C is open is key for the last two propositions, as the
next example shows.
p
Example 1522 Consider f : [0; 1) ! R de ned by f (x) = x. The only point of the
(closed) domain in which the function is not di erentiable is the boundary point x = 0. The
superdi erential @f (0) is given by the scalars such that
p p
y 0 + (y 0) 8y 0 (32.11)
p
i.e., such that y y for each y 0. If y = 0, this inequality holds for all . If y > 0,
p p
the inequality is equivalent to y=y = 1= y. But, letting y tend to 0, this implies
p
limy!0+ 1= y = +1. Therefore, there is no scalar for which (32.11) holds. It follows
that @f (0) = ;. We conclude that f is not superdi erentiable at the boundary point 0. N
N.B. In this section we focus on open convex sets C to ease matters and x ideas. Yet, this
example shows that non-open domains may be important. Fortunately, the results of this
section can be extended to such domains. For instance, Theorem 1521 can be stated for any
convex set C with non-empty interior { that is, C is possibly not open, but int C 6= ; { by
32.1. SUPERDIFFERENTIALS 1005
There is a tight relationship between superdi erentials and directional derivatives, as the
next result shows. Note that (32.13) generalizes (32.9) to the multivariable case.
and
f+0 (x; y) = min y 8y 2 Rn (32.14)
2@f (x)
Proof Lemma 1514 implies (32.12), while (32.13) follows from (32.12) via (31.27). Finally,
by (32.12), we have that f+0 (x; y) min 2@f (x) y for all y 2 Rn . On the other hand, by
0 n
Proposition 1464, we have that f+ (x; ) : R ! R is superlinear. By Theorem 1564 (there is
no circularity in using this result in the current proof), there exists a non-empty, compact
and convex set D Rn such that f+0 (x; y) = min 2D y for all y 2 Rn . Observe that if
0
2 D, then f+ (x; y) n
y for all y 2 R . By Lemma 1514, this implies that 2 @f (x),
that is, D @f (x). We can conclude that f+0 (x; y) = min 2D y min 2@f (x) y for all
y 2 Rn , proving the opposite inequality.
32.1.2 Properties
We begin with a rst important property of the superdi erential.
Proof It is easy to check that @f (x) is closed and convex. To show that @f (x) is compact,
assume that it is is non-empty (otherwise the result is trivially true) and, without loss of
generality, that 0 2 C and x = 0. Since f is continuous on the open set C (Theorem 833),
by Lemma 905 there exists a neighborhood B" (0) C and a constant k > 0 such that
jf (y)j k kyk for all y 2 B" (0). Let 2 @f (0). Since y 2 B" (0) if and only if y 2 B" (0),
by (32.5) we have:
Hence, j yj k kyk for all y 2 B" (0). For each versor ei , there is > 0 small enough so
that ei 2 B" (0). Hence,
j ij = ei k ei = k 8i = 1; :::; n
3
If the domain C is not assumed to be open, we need to require continuity (which is otherwise automatically
satis ed by Theorem 833).
1006 CHAPTER 32. CONVEX ANALYSIS
so j i j k for each i = 1; :::; n. Since was arbitrarily chosen in @f (0), by Proposition 169
we conclude that @f (0) is a bounded (so, compact) set.
As it is the case for the standard di erential, the monotonicity of the functions a ects
the sign of the superdi erential.
Later, Theorem 1564 will establish a converse of this result for superlinear functions.
Proof Assume that @f (x) 6= ;, otherwise the result is trivially true. Let 2 @f (x). Set
y = x + ei with > 0. By taking small enough, we have y 2 C. Since f is increasing,
we have
0 f x + ei f (x) ei = i
Proof We prove the result for k = 0 and leave the more general case to readers. Assume
that @f (0) 6= ;, otherwise the result is trivially true. Let
2 @f (0) = f 2 Rn : 8x 2 Rn ; f (x) xg
Pn
Since
Pn f is increasing, PnMoreover, i=1 i = 1 f (1) = 1 and
0 by Proposition 1525.
i=1 i = ( 1) f ( 1) = 1, so that i=1 i = 1. We conclude that 2 n 1 , as
desired.
Assume that, in addition, f is translation invariant. Let 2 @f (k). We have
for all x 2 Rn . We conclude that 2 @f (0). As to the converse, let 2 @f (0). We have
A dual result, with a dual notion of cyclically increasing function, holds for convex
functions, as the reader can check.
1008 CHAPTER 32. CONVEX ANALYSIS
32.1.3 Supercalculus
There is a non-trivial calculus for superdi erentials. Next we give a non-trivial sum rule that
generalizes the analog classic di erential rule. In words, the \superdi erential of a sum" is
the \sum of the superdi erentials," where the latter is a sum of sets.4
We want to show that there exist f 2 @f (x) and g 2 @g (x) such that
= f + g (32.17)
De ne '; : Rn ! R by
of ' are sets with disjoint interiors. In particular, the set epi is convex because is a
convex function and the set hypo ' is convex because ' is a concave function. Since both
functions are continuous, both sets are closed with non-empty interiors. By Proposition 1027,
the sets epi and hypo ' are separated, i.e., there exists 0 6= ( ; ) 2 Rn R such that, for
all y 2 Rn , all t 2 R with (y; t) 2 epi , all z 2 Rn and all s 2 R with (z; s) 2 hypo ',
y+ t z+ s
By taking z = x, we have
g (y) (y x) + g (x) 8y 2 Rn
f (z) f (x) + (z x) 8z 2 Rn
Next we present a noteworthy chain-rule for superdi erentials. Note that it requires a
monotonicity assumption.
Proof
Pk By de nition, @gi (x) = f i 2 Rn : 8y 2 Rn ; gi (y) gi (x) + i (y x)g. Let 2
@g (x) Rn . There exist 2 @f (x) and i 2 @gi (x) for each i = 1; :::; k such that
Pki i
i=1
= i=1 i i . We have:
(f g) (y) = f (g1 (y) ; :::; gk (y)) f (g1 (x) + 1 (y x) ; :::; gk (x) + k (y x))
f (g1 (x) ; :::; gk (x))
+ (g1 (x) + 1 (y x) g1 (x) ; :::; gk (x) + k (y x) gk (x))
= (f g) (x) + ( 1 (y x) ; :::; k (y x))
k
X n
X n
X k
X
= (f g) (x) + i ij (yj xj ) = (f g) (x) + (yj xj ) i ij
i=1 j=1 j=1 i=1
Xn
= (f g) (x) + (yj xj ) j
j=1
1010 CHAPTER 32. CONVEX ANALYSIS
In turn, this property implies that A epi f = (z; t) 2 Rk R : f (z) t . Indeed, since
2 @ (f g) (x) we have, for all h 2 Rn ,
(g (x + h) ; f (g (x)) + h) (g (x + h) ; f (g (x + h))) 2 Gr f g
(g (x + h) z; f (g (x)) + h) (g (x + h) ; f (g (x)) + h)
y+ t g (x + h) + (f (g (x)) + h)
6
This separation argument, somehow reminiscent of that used in Theorem 1530, is based on Lemaire
(1985) p. 106.
32.2. ORDINAL SUPERDIFFERENTIALS 1011
y+t g (x + h) + f (g (x)) + h
g (x) g (x + h) + h 8h 2 Rn
for all k 2 R. N
The next result shows the ordinal nature of this notion, thus justifying its name.
Proposition 1535 Let g : C ! R be ordinally superdi erentiable at x 2 C. If f : D
R ! R is strictly increasing, with Im g D, then f g : C Rn ! R is ordinally
o o
superdi erentiable at x, with @ (f g) (x) = @ g (x).
Proof Given x; y 2 C, it is enough to observe that g (y) g (x) if and only if (f g) (y)
(f g) (x) (cf. Proposition 221).
Because of its ordinal nature, the ordinal superdi erential is a convex semicone, as the
next result shows.
Proposition 1536 Let f : C ! R be ordinally superdi erentiable at x 2 C. Then, @ o f (x)
is a convex semicone.
Proof Let ; 0 2 @ o f (x). In view of Proposition 885, we need to show that + 0 2
@ o f (x) whenever ; 0 and + > 0. Let y 2 C be such that
0 0
(y x) + (y x) = + (y x) 0
It follows that at least one addendum must be negative. Without loss of generality, say the
rst: (y x) 0. We have two cases: either > 0 or = 0. In the former case, we
have that (y x) 0. In the latter case, since + > 0, 0 (y x) 0 and > 0.
We can conclude that either (y x) 0 or 0 (y x) 0 which implies f (y) f (x),
given that ; 0 2 @ o f (x), yielding that + 0 2 @ o f (x).
Next we show that for concave functions the notions of ordinal superdi erential and
of superdi erential are connected. Before doing so, we introduce an ancillary result which
shows how monotonicity is captured by the ordinal superdi erential.
Proposition 1537 Let f : C ! R be ordinally superdi erentiable at x 2 C. If f is strongly
increasing, then @ o f (x) Rn+ f0g.
So, the elements of the ordinal superdi erential of a strongly increasing function are
positive and non-zero vectors.
Proof Assume that @f (x) 6= ;, otherwise the result is trivially true. Let 2 @f (x). By
de nition, we have that f (y) f (x) (y x) for all y 2 C. This implies that if y 2 C
and (y x) 0, then f (y) f (x), yielding that 2 @ o f (x) and @f (x) @ o f (x). Now,
assume that f is concave, strongly increasing, and x 2 C. Note that @f (x) is non-empty.
By the previous part of[the proof, we have that @f (x) @ o f (x). By Proposition 1536, it
follows that @ o f (x) @f (x). Vice versa, consider 2 @ o f (x). By Proposition 1537
>0
and since f is strongly increasing, we have that > 0. Let y 2 Rn be such that y = 0.
It follows that for every h > 0 small enough x + hy 2 C and (x + hy) x. Since
2 @ o f (x), it follows that f (x + hy) f (x) 0 for every h > 0 small enough. We can
conclude that
f (x + hy) f (x)
f+0 (x; y) = lim 0
h!0+ h
Since y was arbitrarily chosen, it follows that f+0 (x; y) 0 for all y 2 Rn such that y = 0.
De ne V = fy 2 Rn : y = 0g and g : V ! R by g (y) = 0. Clearly, V is a vector subspace
and g is linear. By the Hahn-Banach's Theorem (Theorem 1563) and since f+0 (x; ) g and
f+0 (x; ) is superlinear, it follows that g admits a linear extension such that f+0 (x; y) g (y)
for every y 2 Rn . By Riesz's Theorem, there exists 0 2 Rn such that g (y) = 0 y for every
y 2 Rn . We conclude that
y = 0 =) 0 y = 0 (32.24)
By Theorem 1523, it follows that 0 2 @f (x). Since f is strongly increasing, we also have
that 0 > 0.7 We are left to show that = 0 for some > 0. By Theorem 1562 and since
(32.24) holds, we have that 0 = for some 2 R. Since > 0 and 0 > 0, we have that
> 0, it is enough to set = 1= > 0.
The next result shows that the ordinal superdi erential is to quasi-concave functions
what the superdi erential is to concave functions (Theorem 1521).
Proof Let x 2 C. We have two cases: either x is a maximizer or not of f on C. In the rst
case, choose = 0. The implication
(y x) 0 =) f (y) f (x)
trivially holds because f (y) f (x) for all y 2 C, being x a maximizer. Thus, 2 @ o f (x)
and this latter set is non-empty. In the second case, since x is not a maximizer and f is lower
semicontinuous and quasi-concave, the strict upper contour set
(f > f (x)) = fy 2 C : f (y) > f (x)g
7 0
By the previous part of the proof, 2 @ o f (x). By Proposition 1537 and since f is strongly increasing,
0
2 @ o f (x) Rn
+ f0g.
1014 CHAPTER 32. CONVEX ANALYSIS
is non-empty, open and convex.8 Since x does not belong to it, by Proposition 1025 there
exists 2 Rn such that if y 2 (f > f (x)), that is f (y) > f (x), then y> x. By taking
the contrapositive, we have that 2 @ o f (x) and this latter set is non-empty.
As to the converse, assume that f is bounded above, i.e., there exists M 2 R such that
f (y) M for all y 2 C. We need to introduce two connected ancillary objects. We start
with the function G : Rn C ! R such that for every 2 Rn and for every x 2 C
G ( ; x) = sup ff (y) : y xg
f^ (x) = infn G ( ; x)
2R
Observe that f (x) f^ (x) M for every x 2 C. Note that f^ is also quasi-concave on C
(why?).
We can now prove the quasi-concavity of f . Consider x 2 C. Let 2 @ o f (x). This
implies that if y 2 C is such that (y x) 0, then f (y) f (x). It follows that
This implies that f^ (x) = f (x). Since x 2 C was arbitrarily chosen, we can conclude that
f = f^, yielding that f is quasi-concave.
In the next result, which relates ordinal superdi erentiability and di erentiability, the
semicone nature of ordinal superdi erentiability further emerges.
provided rf (x) 6= 0.
Proof Let be an element of the set on the left hand side. To prove the inclusion, let
y 2 C be such that y x. We want to show that f (y) f (x). By assumption, if
(y x) < 0, then f (y) f (x). Suppose then that y = x. Since 6= 0, there is some
z 2 Rn such that z > 0. Let yn = y z=n. Since C is open, we have yn 2 C for n su ciently
large. Clearly, yn = y ( z) =n < x By assumption, it follows that f (yn ) f (x).
Since f is continuous, by taking the limit we have f (y) = limn!1 f (yn ) f (x), as desired.
We conclude that 2 @ o f (x).
The conditions rf (x) 6= 0 and of strong increasing monotonicity in this proposition are
needed, as next we show.
Example 1542 For the continuous and quasi-concave function f (x) = x3 we have 0 =
= @ o f (0) = (0; 1). On the other hand, for the continuous and concave function
f 0 (0) 2
f (x) = x2 , the origin is a global maximum and 0 = f 0 (0) 2 @ o f (0) = R. N
Proof We begin by proving the \Only if" part. Note that, by contrapositive, (32.26) is
equivalent to the following property for each x; y 2 C
Consider x; y 2 C. Thus, assume that f (y) f (x). Since f is quasi-concave, it follows that
f ((1 t) x + ty) f (x) for every t 2 (0; 1). By Theorem 1287, we have that
f (x + t (y x)) f (x) f ((1 t) x + ty) f (x)
rf (x) (y x) = lim = lim 0
t!0 t t!0+ t
yielding that rf (x) (y x) 0.
We next prove the \If" part. By contradiction, assume that there exists x; y 2 C
and t^ 2 (0; 1) such that f 1 t^ x + t^y < min ff (x) ; f (y)g. De ne ' : [0; 1] ! R by
' (t) = f ((1 t) x + ty) for all t 2 [0; 1]. Since f is di erentiable on C, we have that ' is dif-
ferentiable on (0; 1), continuous on [0; 1], and ' (0) = f (x) as well as ' (1) = f (y). Consider
A = arg mint2[0;1] ' (t). By Weierstrass' Theorem, A is non-empty. Moreover, A is a closed
subset of [0; 1]. It follows that t~ = max A is well-de ned. Since ' t^ = f 1 t^ x + t^y <
min ff (x) ; f (y)g, we have that ' t~ ' t^ < min f' (0) ; ' (1)g, yielding that t~ 2 (0; 1).
Clearly, there exists t 2 t~; 1 such that '0 (t) > 0. Otherwise, we would have that '0 (t) 0
for all t 2 t~; 1 , yielding that ' is decreasing on t~; 1 and, in particular, by continuity, on
t~; 1 . This would imply that ' t~ ' (1) min f' (0) ; ' (1)g, a contradiction. Consider
then the set L = t 2 t~; 1 : '0 (t) > 0 . Since t 2 L, L is non-empty and bounded from
below by t~. We can de ne s = inf L. Next, we prove that s = t~. Clearly, s t~. By
contradiction, assume that s > t~. By de nition of L, this would imply that ' (t) 0 for all
0
t 2 t~; s , yielding that ' is decreasing on t~; s , and, in particular, by continuity, on t~; s .
It would follow that ' (s) ' t~ , that is, s 2 A and s > t~, a contradiction with t~ = max A.
By a dual version of Lemma 1001, there exists a sequence fsn g L such that sn ! s.
~ ~
Since ' is continuous, s = t, and ' (s) = ' t < min f' (0) ; ' (1)g, there exists n such that
' (sn ) < min f' (0) ; ' (1)g as well as '0 (sn ) > 0. Next, de ne also yn = (1 sn ) x + sn y.
Note that
yn x = sn (y x) and '0 (sn ) = rf (yn ) (y x)
Since '0 (sn ) > 0 and sn > 0, this implies that 0 < sn '0 (sn ) = rf (yn ) (yn x). By
assumption,
rf (yn ) (x yn ) < 0 =) f (x) < f (yn )
We can conclude that
min f' (0) ; ' (1)g ' (0) = f (x) < f (yn ) = ' (sn )
a contradiction.
The next result is the quasi-concave counterpart of Theorem 1473, where a suitable notion
of quasi-monotonicity is used.
The function ' is thus decreasing on (0; 1). By continuity, ' is decreasing on [0; 1]. Since
' (1) ' (0), this implies that ' is constant on [0; 1]. Since '+ (0) = rf (x) (y x) (why?),
in turn, this implies that 0 = '0+ (0) = rf (x) (y x) < 0, a contradiction.
\Only if" Let f be quasi-concave and suppose that (32.28) does not hold. So, there exists
a pair x; y 2 C such that
32.2.3 A normalization
A simple consequence of formula (32.23) is that when a concave f is strongly increasing and
normalized, we have the sharp equality @f (x) = @ o f (x) \ n 1 . This observation suggests
a normalized version of the ordinal superdi erential, whose elements are required to belong
to the simplex.
This notion is best suited for strongly increasing functions, for which it has the same
scope as ordinal superdi erentiability.
Proof The \if" is obvious because @ no f (x) @ o f (x). As to the converse, suppose that
@ f (x) 6= ;. By Proposition 1537, @ f (x) R+ f0g. Let 2 @ o f (x), i.e.,
o o n
Pn
From > 0 it follows that i=1 i = 1 > 0, so that (32.30) amounts to
1
f (y) > f (x) =) Pn (y x) > 0 8y 2 C
i=1 i
The inequality characterization of this lemma permits to prove the relevance of the nor-
malized ordinal superdi erential for strongly increasing quasi-concave functions.
Proof By Theorem 1539, @ no f (x) is non-empty and convex. It remains to show that it is
closed (so, compact because the simplex is compact). Suppose that the sequence f n g
@ no f (x) converges to 2 Rn . Let y 2 Rn be such that y< x. To prove that @ no f (x)
is closed we need to show that f (y) < f (x), so that 2 @ no f (x). Eventually, we have
n y < n x. Indeed, set " = x y > 0; eventually, n x > x "=2 and
n y < y + "=2, so that n y < y + "=2 = x "=2 < n x. We conclude that,
eventually, n y < n x. By Lemma 1547, this implies that f (y) < f (x), as desired.
32.3 Optimization
A main motivation for the study of superdi erentials is that they permit to establish neat
characterizations of (global) maximizers, as next we show. Here f is a generic real-valued
function, possibly non-concave, on a generic domain A, possibly non-convex.
(i) a point x
^ 2 A is a maximizer;
^ and 0 2 @ o f (^
(iii) f is ordinally superdi erentiable at x x).
Proof (i) implies (ii). Let x ^ 2 A be a maximizer. We have f (x) f (^ x)+0 (x x ^) for every
x 2 A, and so 0 2 @f (^ x). (ii) implies (iii). It is enough to observe that @f (x) @ o f (x) for
all x 2 A. (iii) implies (ii). Let 0 2 @f (^x). It follows that if y 2 A and 0 (y x ^) 0, then
f (y) f (^ x). Since 0 (y x ^) 0 holds for every y 2 A, we have that f (y) f (^ x) for all
y 2 A, i.e., x
^ 2 A is a maximizer.
This theorem gives as a corollary the most general version of the rst-order condition for
concave functions. Indeed, in view of Proposition 1516, the earlier Theorem 1485 is a special
case of this result.
Proof It is enough to observe that, by Theorem 1521, @f (x) 6= ; for all x 2 C, i.e., f is
superdi erentiable at all x 2 C.
The next simple example shows how this corollary makes it possible to nd maximizers
of concave functions even when Fermat's Theorem does not apply because there are points
where the function is not di erentiable.
Example 1553 For the concave function f : R ! R de ned by f (x) = 1 jxj we have
(Example 1517): 8
>
> 1 if x > 0
<
@f (x) = [ 1; 1] if x = 0
>
>
:
1 if x < 0
By the last corollary, x
^ = 0 is the unique maximizer because 0 2 @f (0) and 0 2
= @f (x) for
all x 6= 0 N
Theorem 1551 permits also to prove a general rst-order condition for quasi-concave
functions.
Proof It is enough to observe that, by Theorem 1539, @f (x) 6= ; for all x 2 C, i.e., f is
ordinally superdi erentiable at all x 2 C.
0 2 f (x)
For instance, for the self-correspondence f : [0; 1] [0; 1] given by f (x) = [0; x2 ], the
endpoints 0 and 1 are xed points in that 0 2 f (0) = f0g and 1 2 f (1) = [0; 1].
The next important theorem establishes the existence of xed points by generalizing
Brouwer's Theorem (we omit its non-trivial proof).9
9
It is named after Shizuo Kakutani, who proved it in 1941.
32.4. INCLUSION EQUATIONS 1021
0 2 E (p) (32.32)
and q 2 D (p).
The existence of equilibria thus reduces to the solution of some inclusion equations de-
ned by the excess market demand correspondence. To solve these inclusion equations, and
thus establish the existence of equilibria, we consider the following assumptions on such
correspondence:
We denoted the assumptions as in the earlier Section 14.1.3 because they have the same
economic interpretation (upon which we already expatiated). We use the letter \E" for the
rst and fourth assumptions because they have to adapt their mathematical form to the
more general setting of correspondences.
We can now state and prove a general version of Arrow-Debreu's Theorem.
10 P
The inequality p E (p) 0 means i2I pi zi 0 for all z 2 E (p).
1022 CHAPTER 32. CONVEX ANALYSIS
Theorem 1559 (Arrow-Debreu) Under assumptions E.1, A.2, and W.1 a weak market
equilibrium exists. If, in addition, assumptions E.4 and W.2 hold, then a market equilibrium
exists.
Proof We follow Debreu (1959). Since E is bounded, there is a compact set K in Rn such
that E (p) K for all p 2 Rn+ . Without loss of generality, we can assume that K is convex.
By E.2, we can limit ourselves to the upper hemicontinuous restriction E : n 1 K.
De ne g : K n 1 by
g (z) = arg max p z
p2 n 1
zi = ei z 0 8i 2 I
that, in his consumer role, agent i solves we no longer assume that the solution is unique,
but permit multiple optimal bundles. Consequently, now we have a demand correspondence
Di : Rn+ Rn+ de ned by
where now, though, the sum is in the sense of (21.3). The aggregate demand correspondence
still inherits the invariance property of individual demand correspondences { i.e., D ( p) =
D (p) for all > 0 { since this invariance property is easily seen to continue to hold for each
agent.
The aggregate supply function S : Rn+ ! Rn continues to be S (p) = f!g. So, the weak
Walras' law still takes the form p E (p) 0, where E : Rn+ Rn is the excess demand
correspondence de ned by E (p) = D (p) f!g. If Walras' law holds for each agent i 2 I,
i.e., p Di (p) = p ! i for each i 2 I, then its aggregate version p E (p) = 0 holds.
jIj
Here a pair (p; x) 2 Rn+ Rn+ of prices and consumption allocations is a weak Arrow-
Debreu (market) equilibrium of the exchange economy E if
The pair (p; x) becomes a Arrow-Debreu (market) equilibrium if in the market clearing
condition (ii) we have equality, so that optimal bundles exhaust endowments.
The next result, a general version of Lemma 1053, connects the Arrow-Debreu and the
aggregate market equilibrium notions.
n jIj
P 1560 Given a pair (p; x) 2 R+
Lemma Rn+ of prices and consumption allocations, set
q = i2I xi . The pair (p; x) is a:
(i) Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.31) and
q 2 D (p);
(ii) weak Arrow-Debreu equilibrium if and only if p solves the inclusion equation (32.32)
and q 2 D (p).
We can now establish which properties of the utility functions and endowments of the
agents of economy E imply the properties of the aggregate demand correspondence that the
Arrow-Debreu's Theorem requires. For simplicity, we consider weak equilibria and prove the
desired existence result that generalizes Proposition 1054.
Proposition 1561 Let E = f(ui ; ! i )gi2I be an economy in which, for each agent i 2 I, the
endowment ! i is strictly positive and the utility function ui is continuous and quasi-concave
on a consumption set Ai = [0; bi ] where bi 2 Rn++ . Then, a weak market price equilibrium of
the exchange economy E exists.
This existence result generalizes Proposition 1054 in that utility functions are only re-
quired to be quasi-concave and not strictly quasi-concave.
Proof The \if" part is obvious and left to the reader. \Only if" Before starting, we introduce
some derived objects since reasoning in terms of linear functions rather than vectors will
simplify things quite signi cantly. De ne fi : Rn ! R by fi (x) = i x for each i = 1; :::; k.
Similarly, de ne f : Rn ! R by f (x) = x. Next, de ne the operator F : Rn ! Rk to be
such that the i-th component of F (x) is F (x)i = fi (x). Since F is linear (why?), note that
Im F is a vector subspace of Rk . Next, we de ne the function g : Im F ! R as follows:
1
g (v) = f (x) where x 2 F (v) 8v 2 Im F
First, we need to show that g is well-de ned. In other words, we need to check that to each
vector of Im F the function g assigns one and only one value. In fact, by de nition, given
v 2 Im F there always exists a vector x 2 Rn such that F (x) = v. The potential issue is
that there might exist a second vector y 2 Rn such that F (y) = v, but f (x) 6= f (y). We
next show that this latter inequality never holds. Indeed, since F is linear, if F (y) = v, then
F (x) F (y) = 0 and F (x y) = 0. By de nition of F , we have i (x y) = fi (x y) = 0
for every i = 1; :::; k. By (32.33), this yields that (x y) = f (x y) = 0, that is,
f (x) = f (y). We just proved that g is well-de ned. The reader can verify that g is also
linear. By the Hahn-Banach's Theorem (Theorem 760), g admits P an extension to Rk . By
the Riesz's Theorem, there exists a vector 2 R such that g (v) = ki=1 i vi for all v 2 Rk .
k
11
Readers who struggle with this last step should consult the proof of the Riesz's Theorem (in particular,
the part dealing with \uniqueness").
Chapter 33
Superdi erentials permit to establish representation results for superlinear functions that
generalize Riesz' Theorem. This beautiful topic is the subject matter of this chapter (for
coda readers).
Proof Let dim V = k n and let fx1 ; :::; xk g be a basis for V . If k = n, there is nothing
to prove since V = Rn . Otherwise, by Theorem 92 there are n k vectors fxk+1 ; :::; xn g
such that the overall set fx1 ; :::; xn g is a basis for Rn . Let V1 = span fx1 ; :::; xk+1 g. Clearly,
V V1 . Given any x 2 V1 , there exists a unique collection of scalars f i gk+1 i=1 R such
Pk Pk
that x = i=1 i xi + k+1 xk+1 . Since i=1 i xi 2 V , every element of V1 can be uniquely
written as x + xk+1 , with x 2 V and 2 R. That is, V1 = fx + xk+1 : x 2 V; 2 Rg.
Let r be an arbitrary scalar. De ne f1 : V1 ! R by f (x + xk+1 ) = f (x) + r for all
x 2 V and all 2 R. The function f1 is linear, with f1 (xk+1 ) = r, and is equal to f on V .
We need to show that r can be chosen so that f1 (x) g (x) for all x 2 V1 .
If > 0, we have that for every > 0 and every x 2 V
g (x + xk+1 ) f (x)
f1 (x + xk+1 ) g (x + xk+1 ) () f (x)+ r g (x + xk+1 ) () r
1025
1026 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS
Summing up, we have f1 (x) g (x) for all x 2 V1 if and only if we choose r 2 R so that
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)
inf r sup
y2V; >0 x2V; >0
Note that
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)
f ( y + x) = ( + ) f y+ x ( + )g y+ x
+ + + +
= ( + )g (y xk+1 ) + (x + xk+1 )
+ +
( + ) g (y xk+1 ) + g (x + xk+1 )
+ +
= g (y xk+1 ) + g (x + xk+1 )
Thus, for all x; y 2 V and all ; > 0, we have
f (y) g (y xk+1 ) g (x + xk+1 ) f (x)
In turn, this implies (33.1), as desired. We conclude that there exists a linear function
f1 : V1 ! R that extends f and such that f1 (x) g (x) for all x 2 V1 .
Consider now V2 = span fx1 ; :::; xk+1 ; xk+2 g. By proceeding as before, we can show the
existence of a linear function f2 : V2 ! R that extends f1 and such that f2 (x) g (x)
for all x 2 V2 . In particular, being V V1 V2 , the linear function f2 is such that
f2 (x) = f1 (x) = f (x) for all x 2 V . So, f2 extends f to V2 . By iterating, we reach a
nal extension fn k : Rn ! R that extends f and is such that fn k (x) g (x) for all
n
x 2 Vn k = span fx1 ; :::; xn g = R . This completes the proof.
33.2. REPRESENTATION OF SUPERLINEAR FUNCTIONS 1027
Proof We prove the \only if" part, as the \if" follows from Example 873. Suppose f is
superlinear. By the Hahn-Banach's Theorem, @f (0) is not empty. Indeed, let x 2 Rn and
consider the vector subspace Vx = f x : 2 Rg generated by x (see Example 87). De ne
lx : Vx ! R by lx ( x) = f (x) for all 2 R. The function lx is linear on the vector subspace
Vx . Since f is superlinear, recall that f (x) f ( x), that is, f (x) f ( x). We next
show that lx f on Vx . Since f is superlinear, if 0, then lx ( x) = f (x) = f ( x). If
< 0, then lx ( x) = f (x) = ( f (x)) f ( x) = f ( x), proving that lx f on
Vx . By the Hahn-Banach's Theorem, there exists l 2 (Rn )0 such that l f on Rn and l = lx
on Vx .1 By the Riesz's Theorem, there exists 2 Rn such that l (x) = x for all x 2 Rn .
We thus have showed that 2 @f (0) and f (x) = x. The rst fact implies that @f (0) is
not empty, hence min 2@f (0) x f (x) for all x 2 Rn , while the second fact implies that
Since x was arbitrarily chosen, (33.3) holds for every x 2 Rn . Next, suppose C; C 0 Rn are
any two non-empty convex and compact sets such that
We conclude that C = C 0 . In turn, in view of (33.3) this implies that @f (0) is the unique
non-empty compact and convex set in Rn for which (33.2) holds.
(i) Let @f (0) Rn+ . If x; y 2 Rn are such that x y, then x y for all 2 @f (0).
Let y 2 @f (0) be such that f (y) = y y. Then,
as desired. Conversely, assume that f is increasing. Then, for each i = 1; :::; n we have
0 f ei = min ei = min i
2@f (0) 2@f (0)
So, 0 2= @f (0).
(iii) The proof is similar to (i) and left to the reader.
(iv) Let @f (0) n 1 . By (i), f is increasing. It remains to prove that it is translation
invariant. Let x 2 Rn and k 2 R. We have k = k because 2 n 1 . So,
f (x + k) = min (x + k) = min ( x+ k)
2@f (0) 2@f (0)
= min ( x + k) = k + min x = f (x) + k
2@f (0) 2@f (0)
as desired. Conversely, assume that f is increasing and translation invariant. By point (i),
@f (0) Rn+ . Moreover, since f (k) = k for all k 2 R, we have
n
X
i = 1 min 1 = f (1) = 1 8 2 @f (0)
2@f (0)
i=1
and
n
X
i = 1 min ( 1) = f ( 1) = 1 8 2 @f (0)
2@f (0)
i=1
Pn Pn Pn
So, we have both i=1 i 1 and i=1 i 1, which implies i=1 i = 1. We conclude
that @f (0) n 1.
Proof Let f be superlinear. By the Riesz's Theorem, if f is linear, then there exists 2 Rn
such that f (x) = x for all x 2 Rn . Note that 2 @f (0). Consider 2 @f (0). De ne
l : Rn ! R by l (x) = x for all x 2 Rn . Since 2 @f (0) and l 2 (Rn )0 , we have that
l f . By (18.5), this implies that f = l and, in particular, = , that is @f (0) = f g.
Conversely, if @f (0) is a singleton, say @f (0) = f g, then (33.2) implies f (x) = x for all
n
x 2 R , proving linearity.
We can actually say something more about the domain of additivity of a superlinear
function. To this end, consider the collection Af = fx 2 Rn : f (x) = f ( x)g of all vectors
where the gap f ( x) f (x) closes.
f (x + y) = min (x + y) = min ( x+ y) = x+ y= ( x) ( y)
2@f (0) 2@f (0)
= (f ( x) + f ( y)) f( x y) f (x + y)
x = x+ x
can be interpreted as a trading strategy: if x denotes a portfolio, its positive and negative
parts x+ and x describe, respectively, the long and short positions that it involves { i.e.,
how much one has to buy and sell of each primary asset to form portfolio x. To describe
how much it costs to form a portfolio x, we then need the ask market value va : Rn ! R
de ned by
Xn Xn
va (x) = x+ p
j j
a
xj pbj 8x 2 Rn (33.6)
j=1 j=1
So, va (x) is the cost of portfolio x. In particular, since each primary asset yj corresponds to
the portfolio ej , we have va ej = paj . Note that we can attain the primary assets' holdings
of portfolio x also by buying and selling according to any pair of positive vectors x0 and x00
such that x = x0 x00 . In this case, the cost of x would be
n
X n
X
x0j paj x00j pbj (33.7)
j=1 j=1
A moment's re ection shows that there are actually in nite possible decompositions of x as
a di erence of two positive vectors x0 and x00 . Each of them is a possible trading strategy
that delivers the assets' holdings that portfolio x features. Yet, as observed in Example 919,
we have
x+ x0 and x x00
33.3. MODELLING BID-ASK SPREADS 1031
The positive and negative parts thus represent the minimal holdings of the primary assets
needed to construct portfolio x. As a result, they are readily seen to be the cheapest among
such trading strategies and so we can focus on them and forget about alternative, more
expensive, buying and selling pairs x0 and x00 .
Proposition 1567 The ask market value va : Rn ! R is such that, for each x 2 Rn ,
8 9
<Xn n
X =
va (x) = min x0j paj x00j pbj : x0 ; x00 0 and x = x0 x00
: ;
j=1 j=1
x + x = x+ x + x+ x = x+ + x+ (x + x )
In particular, we have vb ej = pbj for each primary asset j. There is a tight relationship
between bid and ask market values, as next we show.
1032 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS
vb (x) = va ( x) 8x 2 Rn (33.8)
In particular, vb is superlinear.
So, ask and bid market values are one the dual of the other. The superlinearity of vb is
a rst dividend of this duality.
Proof By de nition of va and vb and since paj pbj for each j, we have that vb (x) va (x)
n n
for all x 2 R . If x 2 R , then
0 1 0 1
Xn Xn X n Xn
va ( x) = @ ( x)+ a
j pj ( x)j pbj A = @ xj paj x+ bA
j pj
j=1 j=1 j=1 j=1
n
X n
X
= x+ b
j pj xj paj = vb (x)
j=1 j=1
proving the rst part of the statement. Consider now x; x0 2 Rn . Since va is sublinear, we
have that va ( x x0 ) va ( x) + va ( x0 ), yielding that
vb x + x0 = va x x0 va ( x) va x0 vb (x) + vb x0
By Proposition 1566, the set of portfolios without bid-ask spreads fx 2 Rn : vb (x) = va (x)g
is a vector subspace of Rn over which the bid and ask market values are linear.
De nition 1570 The nancial market (L; pb ; pa ) satis es the law of one price (LOP) if,
for all portfolios x; x0 2 Rn , we have
or, equivalently,
R (x) = R x0 =) vb (x) = vb x0 (33.10)
33.3. MODELLING BID-ASK SPREADS 1033
Conditions (33.9) and (33.10) are equivalent because of the bid-ask duality (33.8), so the
de nition is well posed. Note that if pai = pbi for all i, then we get back to the LOP of Section
24.6 since va = v. The rationale behind this more general version of the LOP is, mutatis
mutandis, the same: portfolios that induce the same contingent claims should have the same
market value whether we form or liquidate them.
In a market with bid-ask spreads, the LOP allows us to de ne a pair of pricing rules.
Speci cally, the ask pricing rule fa : W ! R and the bid pricing rule fb : W ! R are the
functions that associate to each replicable contingent claim w 2 W their ask and bid prices,
respectively. That is, for each w 2 W we have
where x 2 R 1 (w). Clearly, we have fb fa and, by the bid-ask duality (33.8), also the
pricing rules are dual:
fb (w) = fa ( w) 8w 2 W (33.11)
Next we show that they also inherit the shape of their corresponding market values.
Theorem 1571 Suppose the nancial market L; pb ; pa satis es the LOP. Then, the ask
pricing rule fa : W ! R is sublinear and the bid pricing rule fb : W ! R is superlinear.
In sum, the pricing of contingent claims made possible by the LOP inherits the bid and
ask duality of the underlying market values.
Proof First, we verify that fa is well-de ned. In other words, we are going to check that
to each vector w of W the rule de ning fa assigns one and only one value. Indeed, assume
that there exist x; x0 2 Rn such that R (x) = w = R (x0 ). The potential issue could be
that va (x) 6= va (x0 ). But, the LOP exactly prevents this from happening. Next, consider
w; w0 2 W . By de nition, there exist x; x0 2 Rn such that R (x) = w and R (x0 ) = w0 . Since
R is linear, we also have that R (x + x0 ) = R (x) + R (x0 ) = w + w0 . Since va is sublinear,
this yields that
fa w + w0 = va x + x0 va (x) + va x0 = fa (w) + fa w0
proving that f is subadditive. Consider now w 2 W and 0. By de nition, there exists
x 2 Rn such that R (x) = w. Since R is linear, we also have that R ( x) = R (x) = w.
Since va is sublinear, this yields that
fa ( w) = va ( x) = va (x) = fa (w)
Theorem 1572 Suppose the nancial market L; pb ; pa is complete and satis es the LOP.
Then, there exists a unique non-empty, compact, and convex set C Rk such that
Compared to the linear case of Section 24.6, bid-ask spreads result in a multiplicity of
pricing kernels , given by the set C. In particular, the ask price of a claim w can be
expressed as fa (w) = aw w and fb (w) = bw w via pricing kernels aw and bw in C that,
respectively, attain the maximum and the minimum for the linear pricing w.
fb (w) = min w 8w 2 Rk
2C
Let us continue to consider a complete market. In such a market there are no arbitrages
I if, for all x; x0 2 Rn ,
R x0 R (x) =) va x0 va (x) (33.12)
or, equivalently,2 if
R x0 R (x) =) vb x0 vb (x) (33.13)
Without bid-ask spreads, the unique pricing rule is linear, so each of these two conditions
reduces to (24.20) because for linear functions positivity and monotonicity are equivalent
properties (Proposition 650). Here we need to make explicit the monotonicity assumption
that in the linear case was implicitly assumed.
It is easy to see that the no arbitrage conditions (33.12) and (33.13) imply the LOPs
(33.9) and (33.10). Under such stronger conditions we can get a stronger version of the last
result in which the pricing kernels are positive, thus generalizing Proposition 1146.
Proposition 1573 Suppose the nancial market L; pb ; pa is complete and has no arbi-
trages I. Then, there exists a non-empty, compact, and convex set C Rk+ such that
for all w 2 Rk . If, in addition, the risk-free contingent claim 1 has no bid-ask spread, with
fa (1) = fb (1) = 1, then C n 1.
2
To see the equivalence, note that R (x0 ) R (x) =) R ( x0 ) R ( x) =) va ( x0 ) va ( x) =)
va ( x0 ) va ( x) =) vb (x0 ) vb (x).
33.3. MODELLING BID-ASK SPREADS 1035
Since the market is complete, by Proposition 1566 the set of contingent claims without
bid-ask spreads Afb = w 2 Rk : fb (w) = fa (w) is a vector subspace of Rk over which the
bid and ask pricing rules are linear. The second part of the result says that if the constant
(so, risk free) contingent claim 1 belongs to such subspace and if its price is normalized to
1, then the pricing kernels are actually probability measures.3
Proof Under condition (33.12), the superlinear function fb is easily seen to be increasing. By
Theorem 1564-(i), we then have C = @fb (0) Rn+ . If 1 2 Af , then f is translation invariant.
By Theorem 1564-(iv), we then have C = @fb (0) n 1 provided fa (1) = fb (1) = 1.
Finally, the absence of arbitrages II is here modelled via strong monotonicity. So, the
resulting nonlinear version of the Fundamental Theorem of Finance, in which C Rn++ ,
relies on Theorem 1564-(iii). We leave the details to readers.
3
A similar normalization holds in Proposition 1146, as the reader can check.
1036 CHAPTER 33. NONLINEAR RIESZ'S THEOREMS
Chapter 34
Implicit functions
y = f (x)
This form separates the independent variable x from the dependent one y, so it permits
to determine the values of the latter from those of the former. The same function can be
rewritten in implicit form through an equation that keeps all the variables on the same side
of the equality sign:
g (x; f (x)) = 0
g (x; y) = f (x) y
Example 1574 (i) The function f (x) = x2 + x 3 can be written in implicit form as
g (x; f (x)) = 0 with g (x; y) = x2 + x 3 y. (ii) The function f (x) = 1 + lg x can be
written in implicit form as g (x; f (x)) = 0 with g (x; y) = 1 + lg x y. N
Note that
1
g (0) \ (A Im f ) = Gr f
The graph of the function f thus coincides with the level curve g 1 (0) of the function g of
two variables.1
1
The rectangle A Im f has as its factors { its edges, geometrically { the domain and image of f . Clearly,
p 2
Gr f A Im f . For example, p for the function f (x) = x this rectangle is the orthant R+ of the plane,
2
while for the function f (x) = x x is the unit square [0; 1] [0; 1] of the plane.
1037
1038 CHAPTER 34. IMPLICIT FUNCTIONS
y
2.5
1.5
0.5
0
-1 O 1 x
-0.5
-1
-2 -1 0 1 2
The implicit rewriting of a scalar function f whose explicit form is known is nothing
more than a curiosity because the explicit form contains all the relevant information on f ,
in particular on the dependence between the independent variable x and the dependent one
y. Unfortunately, often applications feature important scalar functions that are not given in
a \ready to use" explicit form, but only in implicit form through equations g (x; y) = 0. For
this reason, it is important to consider the inverse problem: does an equation of the type
g (x; y) = 0 de ne implicitly a scalar function f ? In other words, does a function f exist
such that g (x; f (x)) = 0? If so, which properties does it have? For instance, is it unique?
Is it convex or concave? Is it di erentiable?
This chapter will address these motivating questions by showing that, under suitable
regularity conditions, this function f exists and is unique (locally or globally, as it will
become clear) and that it may enjoy remarkable properties. As usual, we will emphasize a
global viewpoint, the one most relevant for applications.
that belong to a posited space B A (cf. Section 6.3.2). It is a purely set-theoretic result that
considers generic sets A, B, C and D.
that is, the level curve g 1 (k) of the function g is described on the rectangle A B by the
function of a single variable f . Thus, f provides a \functional description" of this level curve
that speci es the relationship existing between the arguments x and y of g when they belong
to the level curve g 1 (k). By the lemma, for a function f to be implicit { so to satisfy
condition (34.1) { thus amounts to provide a functional description of the level curve.
Proof (i) implies (ii). We rst show that Gr f g 1 (k) \ (A B). Let (x; y) 2 Gr f .
By de nition, (x; y) 2 A B and y = f (x), thus g (x; y) = g (x; f (x)) = k. This implies
(x; y) 2 g 1 (k) \ (A B), so Gr f g 1 (k) \ (A B). As to the converse inclusion, let
(x; y) 2 g 1 (k) \ (A B). We want to show that y = f (x). Suppose not, i.e., y 6= f (x).
De ne f~ : A ! R by f~ (x) = f (x) if x 6= x and f~ (x) = y. Since g (x; y) = k, we
have g(x; f~ (x)) = k for every x 2 A. Since (x; y) 2 A B, we have f~ 2 B A . Being by
construction f~ 6= f , this contradicts the uniqueness of f . We conclude that (34.2) holds, as
desired.
(ii) implies (i). Let f 2 B A be such that (34.2). By de nition, (x; f (x)) 2 Gr f for each
x 2 A. By (34.2), we have (x; f (x)) 2 g 1 (k), so g (x; f (x)) = k for each x 2 A. It remains
to prove the uniqueness of f . Let h 2 B A satisfy (34.1). We have Gr h g 1 (k) \ (A B)
since we can argue as in the rst inclusion of the rst part of the proof. By (34.2), this
inclusion then yields Gr h Gr f . In turn, this implies h = f . Indeed, if we consider x 2 A,
then (x; h (x)) 2 Gr h Gr f . Since (x; h (x)) 2 Gr f , then (x; h (x)) = (x0 ; f (x0 )) for some
x0 2 A. This implies x = x0 and h (x) = f (x0 ), and so h (x) = f (x). Since x was arbitrarily
chosen, we conclude that f = h, as desired.
In this case we say that equation g (x;y) = k implicitly de nes f on the rectangle A B.
Proof For simplicity, let k = 0. Let x0 2 A. By condition (34.3), there exist scalars
y 0 ; y 00 2 B, say with y 0 y 00 , such that g (x0 ; y 0 ) 0 g (x0 ; y 00 ). Since f is continuous, by
Bolzano's Theorem there exists y0 2 [y 0 ; y 00 ] such that g (x0 ; y0 ) = 0. Since x0 was arbitrarily
chosen, this proves the existence of the implicit function f .
Next comes the uniqueness of the implicit function. Recall that a function is strictly
monotone if it is either strictly increasing or strictly decreasing.
Proof Let f; h : A ! B be such that g (x; f (x)) = g (x; h (x)) = k for all x 2 A. We want
to show that h = f . Suppose, by contradiction, that h 6= f . So, there is at least some x 2 A
with h (x) 6= f (x), say h (x) > f (x). The function g is strictly monotone in y, say increasing.
Thus, k = g (x; h (x)) > g (x; f (x)) = k, a contradiction. We conclude that h = f .
The last two propositions show that, if g is continuous and strictly monotone in y and
satis es condition (34.3), then equation g (x;y) = k implicitly de nes a unique function f on
the rectangle A B.
When g is partially derivable in y, a convenient di erential condition that ensures the
strict monotonicity of g in y is that either
@g
(x; y) > 0 8 (x; y) 2 A B
@y
or that the opposite inequality holds for all (x; y) 2 A B. This type of di erential mono-
tonicity conditions will play a key role in what follows, in particular in the local and global
versions of the Implicit Function Theorem.
g (x; y) = 0
34.2. IMPLICIT FUNCTIONS 1041
By Propositions 1577 and 1578, there is a unique implicit function f : R ! R such that
Note that we are not able to write y as an explicit function of x, that is, we are not able to
provide the explicit form of f . N
Having discussed existence and uniqueness, we can now turn to the properties that the
implicit function f inherits from g. In short, the continuity of g is passed to the implicit
function, as well as its monotonicity and convexity, although reversed.
(ii) f is (strictly) convex if g is (strictly) quasi concave, provided the sets A, B and C are
convex.
(iii) f is (strictly) concave if g is (strictly) quasi convex, provided the sets A, B and C are
convex.
We leave to the reader the dual version of this result in which the strict monotonicity of
g in y changes from increasing to decreasing.
Proof (i) Suppose rst that g is strictly increasing in x. Let us show that f is strictly
decreasing. Take x; x0 2 A with x > x0 . Suppose, by contradiction, that f (x) f (x0 ). We
have
0 = g (x; f (x)) > g x0 ; f (x) g x0 ; f x0 = 0
This contradiction shows that f (x) < f (x0 ). We conclude that f is strictly decreasing.
Now, suppose that g is strictly decreasing in x. To show that f is strictly increasing,
take again x; x0 2 A with x > x0 and suppose, by contradiction, that f (x) f (x0 ). We now
have
0 = g (x; f (x)) g x; f x0 < g x0 ; f x0 = 0
This contradiction shows that f (x) > f (x0 ). Thus, f is strictly increasing.
(ii) Let g be quasi concave. Let us show that f is convex. Let x; x0 2 A and 2 [0; 1].
From g (x; f (x)) = g (x0 ; f (x0 )) it follows that
g x + (1 ) x0 ; f (x) + (1 ) f x0
g (x; f (x)) = g x + (1 ) x0 ; f x + (1 ) x0
1 1
g x; y < k < g x; y + 8x 2 B~" (x)
m m
We conclude that lim f (xn ) = f (x). Since x was arbitrarily chosen, the function f is
continuous.
We turn, for the case n = 1, to the all-important issue of the di erentiability of the
implicit function.
(ii) g is continuously di erentiable on A B, with either @g (x; y) =@y > 0 for all (x; y) 2
A B or @g (x; y) =@y < 0 for all (x; y) 2 A B.
In the next section we will discuss at length the di erential formula (34.5), which plays
a fundamental role in applications.
Proof Since either @g (x; y) =@y > 0 for all (x; y) 2 A B or the opposite inequality holds,
g is strictly monotone in y. By Proposition 1578, f is then the unique function in B A such
that g (x; f (x)) = k for all x 2 A. Let x 2 A and y = f (x). Set h2 = f (x + h1 ) f (x).
Since g is continuously di erentiable, for every h1 ; h2 6= 0 there exists 0 < # < 1 such that2
@g (x + #h1 ; y + #h2 ) @g (x + #h1 ; y + #h2 )
g (x + h1 ; y + h2 ) = g (x; y) + h1 + h2
@x @y
If h1 is small enough so that x + h1 2 A and y + h2 2 B, we then have
@g (x + #h1 ; y + #h2 ) @g (x + #h1 ; y + #h2 )
0= h1 + h2 (34.6)
@x @y
By Proposition 1581-(iv), the implicit function f is continuous. Hence, if h1 ! 0 then
h2 ! 0. So, by (34.6) we have
@g
h2
@g(x+#h1 ;y+#h2 ) (x; y)
f 0 (x) = lim = lim @x
= @x (34.7)
h1 !0 h1 h1 !0 @g(x+#h1 ;y+#h2 ) @g
@y (x; y)
@y
2
It is a cruder version of approximation (29.35).
1044 CHAPTER 34. IMPLICIT FUNCTIONS
because of the continuity of @g=@x and of @g=@y. In turn, this shows that the continuity of
the derivative function f 0 is a direct consequence of the continuity of @g=@x and of @g=@y.
From (34.7) it follows that
@g
(x; f (x))
f 0 (x) = @x 8x 2 A
@g
(x; f (x))
@y
g (x; y) = x2 2y ey = 0
@g
(x; y) 2x
f 0 (x) = @x = 8 (x; y) 2 g 1
(0)
@g 2 + ey
(x; y)
@y
Though we were not able to provide the explicit form of f , we have formula (34.5) for
its derivative. For instance, at each (x0 ; y0 ) 2 g 1 (0) we can then write the rst-order
approximation
2x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 ) = y0 + (x x0 ) + o (x x0 )
2 + ey0
that gives us some precious information on f . N
In the next example we use the results of this section without explicitly referring to them,
a task that we leave to the readers.
4 2
y y
x ln e 3 + e 3 + xy = 0
3
where both scalars x and y are strictly positive. To this end, de ne g : (0; 1) (0; b) ! R,
with b 2 (0; 1], by
4 2 y
g (y; x) = x log e 3 y + e 3 + xy
3
34.3. A LOCAL PERSPECTIVE 1045
Along with the continuous di erentiability of g, the easily checked di erential condition
(34.9) thus ensures that locally, near the point (x0 ; y0 ), there exists a unique and continuously
di erentiable implicit function f : B (x0 ) ! V (y0 ). It is a remarkable achievement: the
hypotheses of the global results of the previous section (Propositions 1577, 1578 and 1582)
are clumsier. Yet, the global viewpoint { the most relevant for applications { will be partly
vindicated by the Global Implicit Function Theorem of next chapter and, more important
here, the proof of the Implicit Function Theorem will show how this theorem actually builds
on the previous global results.
To emphasize the local perspective of the Implicit Function Theorem, here we say that
equation g (x;y) = 0 implicitly de nes a unique f at the point (x0 ; y0 ) 2 g 1 (0).
Proof Suppose, without loss of generality, that (34.9) takes the positive form
@g
(x0 ; y0 ) > 0 (34.12)
@y
Since g is continuously di erentiable, by the Theorem on the permanence of sign there exists
a neighborhood B ~ (x0 ; y0 ) U for which
@g ~ (x0 ; y0 )
(x; y) > 0 8 (x; y) 2 B (34.13)
@y
Let " > 0 be small enough so that
Since @g (x; y) =@y > 0 for every (x; y) 2 [x0 "; x0 + "] [y0 "; y0 + "], the function g (x; )
is strictly increasing in y for every x 2 [x0 "; x0 + "]. So, g (x0 ; y0 ") < 0 = g (x0 ; y0 ) <
g (x0 ; y0 + "). The functions g ( ; y0 ") and g ( ; y0 + ") are both continuous in x, so again by
the Theorem on the permanence of sign there exists a small enough neighborhood B (x0 )
[x0 "; x0 + "] so that
By Bolzano's Theorem, for each x 2 B (x0 ) there exists y0 " < y < y0 + " such that
g (x; y) = 0. By the strict monotonicity of g (x; ) on [y0 "; y0 + "], such y is unique. By
setting V (y0 ) = (y0 "; y0 + "), we have thus de ned a unique implicit function f : B (x0 ) !
V (y0 ) on the rectangle U (x0 ) V (y0 ) such that (34.10) holds.4
Having established the existence of a unique implicit function, its di erential properties
now follow from Proposition 1582.
Since the function f : B (x0 ) ! V (y0 ) de ned implicitly by the equation g (x;y) = 0 at
(x0 ; y0 ) is unique, in view of Proposition 1576 the relation (34.10) is equivalent to
that is, to
1
g (0) \ (B (x0 ) V (y0 )) = Gr f (34.16)
Thus, the level curve g 1 (0) { so, the solutions of the equation g (x; y) = 0 { can be repre-
sented locally by the graph of the implicit function. In the nal analysis, this is the reason
why the theorem is so important in applications (as we will see shortly in Section 34.3.2).
Inspection of the proof of the Implicit Function Theorem shows that on the rectangle
B (x0 ) V (x0 ) we have either @g (x; y) =@y > 0 or @g (x; y) =@y < 0. Assume the former, so
that g is strictly increasing in y. By Proposition 1581, we then have that:
Thus, some basic properties of the implicit function provided by the Implicit Function
Theorem can be easily established. Note that formula (34.11) permits the computation of
the rst derivative of the implicit function even without knowing the function in explicit
form. Since the rst derivative is often what is really needed for such a function (because,
for example, we are interested in solving a rst-order condition), this is a most useful feature
of the Implicit Function Theorem.
We can provide a heuristic derivation of formula (34.11) through the total di erential
@g @g
dg = dx + dy
@x @y
1048 CHAPTER 34. IMPLICIT FUNCTIONS
of the function g. We have dg = 0 for variations (dx; dy) that keep us along the level curve
g 1 (0). Therefore,
@g @g
dx = dy
@x @y
which \yields" (the power of heuristics!):
@g
dy @x
= @g
dx
@y
It is a rather rough (and incorrect) argument, yet useful to remember formula (34.11).
In general, at each point (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )) in which @g (x; y) =@y 6= 0,
we have
@g
(x; y) 2x y 3 y 3 2x
f 0 (x) = @x = =
@g 3xy 2 + 5y 4 3xy 2 + 5y 4
(x; y)
@y
In particular, the rst-order local approximation in a neighborhood of x0 is
@g
(4; 2) 0
f 0 (4) = @x = =0
@g 32
(4; 2)
@y
@g(x;y)
14x
f 0 (x) = @x
= (34.18)
@g(x;y) 2 ey
@y
14x0
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 ) = y0 (x x0 ) + o (x x0 )
2 ey0
p 1 (0)
p
at (x 0 ; y0 ). For example, at the point (1= 7; 0) 2 g we have, as x ! 1= 7,
1 p 1 1
f p = 2 7 x p +o x p
7 7 7
@g(x;y)
3x2 + 4yex + ey
f 0 (x) = @x
=
@g(x;y) 4ex + 2y + xey
@y
5
The reader can verify that also ( 12; 2) 2 g 1 (0) and @g=@y ( 12; 2) 6= 0, and calculate f 0 ( 12) for
the implicit function de ned at ( 12; 2).
6
This function is di erent from the previous implicit function de ned at the other point (4; 2).
1050 CHAPTER 34. IMPLICIT FUNCTIONS
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). The rst-order local approximation is
@g(0;0)
1
f 0 (0) = @x
@g(0;0)
=
4
@y
and, as x ! 0,
1
f (x) = y0 + f 0 (0) x + o (x) = x + o (x)
4
N
By exchanging the variables in the Implicit Function Theorem, we can say that the
continuity of the partial derivatives of g in a neighborhood of (x0 ; y0 ) and the condition
@g (x0 ; y0 ) =@x 6= 0 ensures the existence of a (unique) implicit function x = ' (y) such
that locally g (' (y) ; y) = 0. It follows that, if at least one of the two partial derivatives
@g (x0 ; y0 ) =@x and @g (x0 ; y0 ) =@y is not zero, there is locally a univocal tie between the two
variables. As a result, the Implicit Function Theorem cannot be applied only when both the
partial derivatives @g (x0 ; y0 ) =@y and @g (x0 ; y0 ) =@x are zero.
For example, if g (x; y) = x2 + y 2 1, then for every point (x0 ; y0 ) that satis es the
equation g (x; y) = 0 we have @g (x0 ; y0 ) =@y = 2y0 , which is zero only for y0 = 0 (and hence
x0 = 1). At the two points (1; 0) and ( 1; 0) the equation does not de ne any implicit
function of the type y = f (x). But @g ( 1; 0) =@x = 2 6= 0 and, therefore, at such points
the equation de nes an implicit function of the type x = ' (y). Symmetrically, at the two
points (0; 1) and (0; 1) the equation de nes an implicit function of the type y = f (x) but
not one of the type x = ' (y).
This last remark suggests a nal important observation on the Implicit Function Theo-
rem. Suppose that, as at the beginning of the chapter, ' is a standard function de ned in
explicit form, which can be written in implicit form as
Given (x0 ; y0 ) 2 g 1 (0), suppose @g (x0 ; y0 ) =@x 6= 0. The Implicit Function Theorem { in
\exchanged" form { then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function f : B (y0 ) ! V (x0 ) such that
g (f (y) ; y) = 0 8y 2 B (y0 )
The function f is, therefore, the inverse of ' on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence { locally, around the point y0 { of the inverse
of '. In particular, formula (34.11) here becomes
@g
(x0 ; y0 )
@y 1
f 0 (y0 ) = = 0
@g ' (x0 )
(x0 ; y0 )
@x
which is the classic formula (26.20) of the derivative of the inverse function. In sum, there
is a close connection between implicit and inverse functions, which the reader will see later
in the book (Section 35.1).
Proposition 1589 Let g : U ! R be de ned (at least) on an open set U of R2 and let
g (x0 ; y0 ) = k. If g is continuously di erentiable on a neighborhood of (x0 ; y0 ), and
@g
(x0 ; y0 ) 6= 0
@y
then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = k 8x 2 B (x0 )
The function f is continuously di erentiable on B (x0 ), with
@g
(x; y)
f 0 (x) = @x (34.20)
@g
(x; y)
@y
This is the version of the Implicit Function Theorem which we will refer to in the rest of
the section when discussing marginal rates.
In view of Proposition 1576, the implicit function f : B (x0 ) ! V (y0 ) permits to establish
a functional representation of the level curve g 1 (k) through the basic relation
1
g (k) \ (B (x0 ) V (y0 )) = Gr f (34.21)
which is the general form of (34.16) for any k 2 R. Implicit functions thus describe the link
between the variables x and y that belong to the same level curve, thus making it possible to
formulate trough them some key properties of these curves. The great e ectiveness of this
formulation explains the importance of implicit functions, as mentioned right after (34.15).
For example, the isoquant g 1 (k) is a level curve of the production function g : R2+ ! R,
which features two inputs, x and y, and one output. The points (x; y) that belong to the
isoquant are all the input combinations that keep the quantity of output produced constant.
The implicit function y = f (x) tells us, locally, how it has to change the quantity y, when
x varies, to keep constant the output produced. Therefore, the properties of the function
f : B (x0 ) ! V (y0 ) characterize, locally, the relations between the inputs that guarantee the
level k of output. We usually assume that f is:
(i) decreasing, that is, f 0 (x) 0 for every x 2 B (x0 ): the two inputs are partially
substitutable and, to keep the quantity produced unchanged to the level k, to lower
quantities of the input x have to correspond larger quantities of the input y (and vice
versa);
(ii) convex, that is, f 00 (x0 ) 0 for every x 2 B (x0 ): to greater levels of x, have to
correspond larger and larger quantities of y to compensate (negative) in nitesimal
variations of x to keep production at level k.
Remarkably, as noted after the proof of the Implicit Function Theorem, via Proposition
1581 we can tell which properties of g induce these desirable properties.
Example 1590 Consider a Cobb-Douglas production function g : R2++ ! R given by
g (x; y) = x y 1 , with 0 < < 1. Given any k > 0, let (x0 ; y0 ) 2 R2++ be such that
2
g (x0 ; y0 ) = k. Since g : R++ ! R is continuously di erentiable, with @g (x0 ; y0 ) =@y 6= 0, by
the Implicit Function Theorem there exist neighborhoods B (x0 ) and V (y0 ) and a unique
implicit function fk : B (x0 ) ! V (y0 ) such that g (x; fk (x)) = k for all x 2 B (x0 ). The
implicit function fk is continuously di erentiable, as well as strictly decreasing and strictly
convex because g is strictly increasing and strictly concave (Proposition 1581).7 N
The absolute value jf 0 j of the derivative of the implicit function is called the marginal
rate of transformation because for in nitesimal variations of the inputs, it describes their
degree of substitutability { that is, the variation of y that balances an increase in x. Thanks
to the functional representation (34.21) of the isoquant, geometrically the marginal rate of
transformation can be interpreted as the slope of the isoquant at (x; y). This is the classic
interpretation of the rate, which follows from (34.21).
The Implicit Function Theorem implies the classic formula
@g
(x; y)
M RTx;y = f 0 (x) = @x
@g
(34.22)
@y (x; y)
7
Later in the chapter we will revisit this example (Example 1607).
34.3. A LOCAL PERSPECTIVE 1053
This is the usual form in which the notion of marginal rate of transformation M RTx;y
appears.
For example, at a point at which we use equal quantities of the two inputs { that is, x = y {
if we increase the rst input by one unit, the second one must decrease by = (1 ) units to
leave unchanged the quantity of output produced: in particular, when = 1=2, the decrease
of the second one must be of one unit. At a point at which we use a quantity of the second
input ve times larger than that of the rst input { that is, y = 5x { an increase of one unit
of the rst input is compensated by a decrease of 5 = (1 ) of the second one. N
Similar considerations hold for the level curves of a utility function u : R2+ ! R, that is,
for its indi erence curves u 1 (k). The implicit functions provided by the Implicit Function
Theorem tell us, locally, how one has to vary the quantity y when x varies to keep the
overall utility level constant. For them we assume properties of monotonicity and convexity
similar to those assumed for the implicit functions de ned by isoquants. The monotonicity
of the implicit function re ects the partial substitutability of the two goods: it is possible to
consume a bit less of one good and a bit more of the other one and yet keep unchanged the
overall level of utility. The convexity of the implicit function models the classic hypothesis
of decreasing rates of substitution: when the quantity of a good, for example x, increases we
then need greater and greater \compensative" variations of the other good y to stay on the
same indi erence curve, i.e., to have u (x; y) = u (x + x; y + y).
Here as well, it is important to note that via Proposition 1581 we can tell which properties
of the utility function u induce these desirable properties, thus for instance making rigorous
the common expression \convex indi erence curves" (cf. Chapter 17). Indeed, they have a
functional representation via convex implicit functions.
In the present case the absolute value jf 0 j of the derivative of the implicit function is
called marginal rate of substitution: it measures the (negative) variation in y that balances
marginally an increase in x. Geometrically, it is the slope of the indi erence curve at (x; y).
Thanks to the Implicit Function Theorem, we have
@u
(x; y)
M RSx;y = f 0 (x) = @x
@u
@y (x; y)
Let h be a scalar function with a strictly positive derivative, so that it is strictly increasing
and h u is then a utility function equivalent to u. By the chain rule,
@h u
@x (x; y) h0 (u (x; y)) @u
@x (x; y)
@u
@x (x; y)
@h u
= = (34.23)
@y (x; y) h0 (u (x; y)) @u
@y (x; y) @u
@y (x; y)
1054 CHAPTER 34. IMPLICIT FUNCTIONS
Since we can drop the derivative h0 (u (x; y)), the marginal rate of substitution is the same
for u and for all its increasing transformations h u. Thus, the marginal rate of substitution
is an ordinal notion, invariant under strictly increasing (di erentiable) transformations. It
does not depend on which of the two equivalent utility function, u or h u, is considered.
This explains the centrality of this ordinal notion in consumer theory, where after Pareto's
ordinalist revolution it has replaced the cardinal notion of marginal utility (cf. Section 38.5).
Example 1592 To illustrate (34.23), consider on Rn++ the equivalent Cobb-Douglas utility
function u (x; y) = xa y 1 a and log-linear utility function log u (x; y) = a log x + (1 a) log y.
We have
@u @ log(u(x;y))
@x (x; y) axa 1 y 1 a a y @x (x; y)
M RSx;y = @u
= a a
= = @ log(u(x;y))
@y (x; y) (1 a) x y 1 ax (x; y)
@y
The two utility functions have the same marginal rate of substitution. N
Finally, let us consider a consumer that consumes in two periods, today and tomorrow,
with intertemporal utility function U : R2+ ! R given by
where we assume the same instantaneous utility function u in the two periods. Given a
utility level k, let
U 1 (k) = (c1 ; c2 ) 2 R2+ : U (c1 ; c2 ) = k
be the intertemporal indi erence curve and let (c1 ; c2 ) be a point on it. When the hypotheses
of the Implicit Function Theorem { with the variables exchanged { are satis ed at (c1 ; c2 ),
there exists an implicit function f : B (c2 ) ! V (c1 ) such that
The scalar function c1 = f (c2 ) tells us how much has to vary consumption today c1 when
consumption tomorrow c2 varies, so as to keep the overall utility U constant. We have:
@U
(c1 ; c2 )
@c2 u0 (c2 )
f 0 (c2 ) = =
@U u0 (c1 )
(c1 ; c2 )
@c1
When the number
u0 (c2 )
IM RSc1 ;c2 = f 0 (c2 ) = (34.24)
u0 (c1 )
exists, it is called intertemporal marginal rate of substitution: it measures the (negative)
variation in c1 that balances an increase in c2 .
Example 1593 For the power utility function u (c) = c = , with for > 0, we have
c1 c2
U (c1 ; c2 ) = +
Theorem 1594 In the Implicit Function Theorem, if the function g is n times continuously
di erentiable, so does the implicit function f .8 In particular, for n = 2 we have
2 2
@g(x;y) @g(x;y)
@2x @y 2 @g(x;y)
@x@y
@g(x;y) @g(x;y)
@x @y + @g(x;y)
@2y
@g(x;y)
@x
f 00 (x) = 3 (34.25)
@g(x;y)
@y
Proof We will omit the proof of the rst part of the statement. Suppose f is twice di er-
entiable and let us apply the chain rule to (34.11), that is to
@g(x;f (x))
gx0 (x; f (x))
f 0 (x) = @x
=
@g(x;f (x)) gy0 (x; f (x))
@y
For the sake of brevity we do not make the dependence of the derivatives of g on (x; f (x))
explicit, so we can write
0 0
00 + g 00 f 0 (x) g 0 00
gxx 00 gx g 0
gxy 00
gx0 gyx 00 gx
gyy
gxx xy y gx0 gyx
00 + g 00 f 0 (x)
yy g0 y g0
f 00 (x) =
y y
2 + 2 = 2 + 2
gy0 gy0 (x; f (x)) gy0 gy0
00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
= 0 3
gy
as desired.
The two previous theorems allow us to give local approximations for an implicitly de ned
function. As we know, one is rarely able to write the explicit formulation of a function which
is implicitly de ned by an equation: being able to give approximations is hence of great
importance.
If g is of class C 1 on an open set U , the rst-order approximation of the implicitly de ned
function in a point (x0 ; y0 ) 2 A such that g (x0 ; y0 ) = 0 is
@g
(x0 ; f (x0 ))
f (x) = y0 @x (x x0 ) + o (x x0 )
@g
(x0 ; f (x0 ))
@y
8
Also analyticity is preserved: if g is analytic, so does f .
1056 CHAPTER 34. IMPLICIT FUNCTIONS
as x ! x0 .
If f is of class C 2 on an open set U , the second-order (or quadratic) approximation of
the implicit function in a point (x0 ; y0 ) 2 U such that g (x0 ; y0 ) = 0 is, as x ! x0 ,
00 g 0 2 00 g 0 g 0 + g 00 g 02
gx0 gxx y 2gxy x y yy x
f (x) = y0 (x x0 ) + (x x0 )2 + o (x x0 )2
gy0 gy03
where we omitted the dependence of the derivatives on the point (x0 ; f (x0 )).
g (x1 ; :::; xn ; y) = 0
in which x = (x1 ; :::; xn ) is a vector, while y remains a scalar. In the n + 1 arguments of the
function g : A Rn+1 ! R, we thus separate one of them, denoted by y, from the other
ones. The choice of which argument to label y is, again from a formal standpoint, arbitrary.
Yet, in applications an argument may stand out in terms of interpretation, thus becoming
the one of substantive interest (e.g., y is an output and x is a vector of inputs).
In any case, here we regard x as a vector of independent variables and y as a dependent
variable, so the function implicitly de ned by equation g (x; y) = 0 is a function f of n
34.3. A LOCAL PERSPECTIVE 1057
variables. Fortunately, the Implicit Function Theorem easily extends to this case, mutatis
mutandis: since f is a function of several variables, now the partial derivatives @f (x) =@xk
take the place of the derivative f 0 (x) that we had in the scalar case.
Theorem 1596 Let g : U ! R be de ned (at least) on an open set U of Rn+1 and let
g (x0 ; y0 ) = 0. If g is continuously di erentiable on a neighborhood of (x0 ; y0 ), with
@g
(x0 ; y0 ) 6= 0
@y
then there exist neighborhoods B (x0 ) Rn and V (y0 ) R and a unique function f :
B (x0 ) ! V (y0 ) such that
We omit the proof of this result. By using gradients, formula (34.27) can be written as
rx g (x; y)
rf (x) =
@g
(x; y) (x; y)
@y
where rx g denotes the partial gradient of g with respect to x1 , x2 , ..., xn only. Moreover,
being f unique, also in this more general case (34.26) is equivalent to (34.15) and (34.16).
Example 1597 Let g : R3 ! R be de ned by g (x1 ; x2 ; y) = x21 x22 +y 3 and let (x1 ; x2 ; y0 ) =
(6; 3; 3). We have that g 2 C 1 R3 and so (@g=@y) (x; y) = 3y 2 , therefore
@g
(6; 3; 3) = 27 6= 0
@y
By the Implicit Function Theorem, there exists a unique y = f (x1 ; x2 ) de ned in a neigh-
borhood U (6; 3), which is di erentiable there and takes values in a neighborhood V ( 3).
Since
@g @g
(x; y) = 2x1 and (x; y) = 2x2
@x1 @x2
we have
@f 2x1 @f 2x2
(x) = and (x) = 2
@x1 3y 2 @x2 3y
In particular
12 6
rf (6; 3) = ;
27 27
1058 CHAPTER 34. IMPLICIT FUNCTIONS
The reader can check that a global implicit function exists f : R2 ! R and, after having
recovered the explicit expression (which exists because of the simplicity of g), can verify that
formula (34.27) is correct in computing rf (x). N
N.B. Global versions in the spirit of Proposition 1582 of Theorems 1594 and 1596 can be
easily established, as readers can check. O
in which both x = (x1 ; :::; xn ) and y = (y1 ; :::; ym ) are vectors. Here g : A Rn+m ! R is a
vector function and the equation implicitly de nes an operator f = (f1 ; :::; fm ) between Rn
and Rm such that
Here also g = (g1 ; ::; gm ) : A Rn+m ! Rm is an operator and the equation de nes an
operator f = (f1 ; :::; fm ) between Rn and Rm such that
8
>
> g1 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
>
<
g2 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0 (34.28)
>
>
>
:
gm (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
Let us focus directly on this latter general case. Here the following square submatrix of
34.3. A LOCAL PERSPECTIVE 1059
We can now state, without proof, the operator version of the Implicit Function Theorem,
which is the most general form of this result that we consider.
Theorem 1598 Let g : U ! Rm be de ned (at least) on an open set U of Rn+m and let
g (x0 ; y0 ) = 0. If g is continuously di erentiable on a neighborhood of (x0 ; y0 ), with
det Dy g (x0 ; y0 ) 6= 0 (34.29)
then there exist neighborhoods B (x0 ) Rn and V (y0 ) Rm and a unique operator f =
(f1 ; :::; fm ) : B (x0 ) ! V (y0 ) such that (34.28) holds for every x 2 B (x0 ). The operator f
is continuously di erentiable on B (x0 ), with
1
Df (x) = (Dy g (x; y)) Dx g (x; y) (34.30)
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).
The Jacobian of the implicit operator is thus pinned down by formula (34.30). To better
understand this formula, it is convenient to write it as an equality
Dy g (x; y) Df (x) = D g (x; y)
| {z } | {z } | x {z }
m m m n m n
of two m n matrices. In terms of the (i; j) 2 f1; :::; mg f1; :::; ng entry of each such
matrix, the equality is
Xm
@gi @fk @gi
(x) (x) = (x)
@yk @xj @xj
k=1
For each independent variable xj , we can determine the sought-after m-dimensional vector
@f1 @fm
(x) ; :::; (x)
@xj @xj
by solving the following linear system with m equations:
8 P
> m @g1 @fk @g1
>
> k=1 (x) (x) = (x)
>
> @yk @xj @xj
>
>
>
< Pm @g2 @fk @g2
k=1 (x) (x) = (x)
> @yk @xj @xj
>
>
>
>
>
> Pm @gm @fk @gm
>
: (x) (x) = (x)
k=1
@yk @xj @xj
1060 CHAPTER 34. IMPLICIT FUNCTIONS
By doing this for each j, we can nally determine the Jacobian Df (x) of the implicit
operator.
Example 1599 De ne g = (g1 ; g2 ) : R4 ! R2 by
g1 (x1 ; x2 ; y1 ; y2 ) = 3x1 4ex2 + y12 6y2
g2 (x1 ; x2 ; y1 ; y2 ) = 2x1 y22 y1
4x2 e + y12 1
and let (x0 ; y0 ) = (1; 0; 1; 0). The submatrix of the Jacobian matrix of the operator g
containing the partial derivatives of g with respect to y1 and y2 is given by
" #
2y1 6
Dy g(x; y) =
4x2 ey1 + 2y1 4x1 y2
while that reporting the partial derivatives with respect to x1 and x2 is
" #
3 4ex2
Dx g(x; y) =
2y22 4ey1
The determinant of Dy g(x; y) is jDy g(x; y)j = 8x1 y1 y2 24x2 ey1 + 12y1 , so jDy g(x0 ; y0 )j =
12 6= 0. Condition (34.28) is thus satis ed. By the last theorem, there exists an implicit
operator f = (f1 ; :::; fm ) : B (x0 ) ! V (y0 ) which is continuously di erentiable on B (x0 ).
The partial derivatives
@f1 @f2
(x) and (x)
@x1 @x1
satisfy the following system
" # 2 @f1 3 " #
2y1 6 @x1 (x) 3
4 5=
4x2 ey1 + 2y1 4x1 y2 @f2
@x (x) 2y22
1
Our previous discussion implies, inter alia, that in the special case m = 1 formula (34.30)
reduces to
@g @f @g
(x) (x) = (x)
@y @xj @xj
which is formula (34.27) of the vector function version of the Implicit Function Theorem.
Since condition (34.29) reduces to (@g=@y) (x0 ; y0 ) 6= 0, we conclude that the vector function
version is, indeed, the special case m = 1. Everything ts together.
We now return to the global perspective of Section 34.2 and take a deeper look at some of
the motivating questions that we posed in the rst section. For simplicity, we will focus on
the basic equation g (x; y) = 0, where g : C R2 ! R is a function of two variables x and
y. But, before starting the analysis we introduce projections, which will play a key role.
Let A be a subset of the plane R2 , with typical elements (x; y). Its projection
is the set of point x on the horizontal axis for which there exists a point y on the vertical
axis such that the pair (x; y) belong to A.9
Likewise, de ne the projection
on the vertical axis, that is the set of points y on the vertical axis for which there exists (at
least) one point x on the horizontal axis such that (x; y) belongs to A.
The projections 1 (A) and 2 (A) are nothing but the \shadows" of the set A R2 on
the two axes, as the following gure illustrates:
9
This notion of projection is not to be confused with the altogether di erent one seen in Section 27.1.
1062 CHAPTER 34. IMPLICIT FUNCTIONS
4
y
0 π (A)
2
-2
-4
O π (A) x
1
-6
-6 -4 -2 0 2 4 6
and
2 (B" (x; y)) = B" (y) = (y "; y + ")
In particular, 1 (Gr f ) is the domain of f and 2 (Gr f ) is the image Im f . This holds in
general: if f : A R ! R one has 1 (Gr f ) = A and 2 (Gr f ) = Im f . N
34.4. A GLOBAL PERSPECTIVE 1063
If the implicit function f exists, its domain will be included in 1 (g 1 (0)) and its codomain
will be included in 2 (g 1 (0)). This observation motivates the following de nition.
If we draw the graph of the level curve g 1 (0), a frame A B isolates a portion of the
graph. Over a such portion builds the next de nition.
g (x; f (x)) = 0 8x 2 A
The following example illustrates these ideas. In particular, it shows that in some frames
the graph is explicitable, in other less fortunate ones, it is not. By changing the framing we
can tell apart di erent parts of the graph according to their explicitability.
is the unit circle. Since 1 (g 1 (0)) = 2 (g 1 (0)) = [ 1; 1], the possible implicit function on
a rectangle A B takes the form f : A ! B with A [ 1; 1] and B [ 1; 1]. Let us x
x 2 [ 1; 1], so to analyze the set
S (x) = y 2 [ 1; 1] : x2 + y 2 = 1
1064 CHAPTER 34. IMPLICIT FUNCTIONS
The set has two elements, except for x = 1. In other words, for every 0 < x < 1 there
are two values y for which g (x; y) = 0. Let us consider the frame given by the projections'
rectangle
A B = [ 1; 1] [ 1; 1]
f (x) 2 S (x) 8x 2 [ 1; 1]
entails that
g (x; f (x)) = 0 8x 2 [ 1; 1]
and is thus implicitly de ned by g on frame A B. Such functions are in nitely many; for
example, this is the case for the function
( p
1 x2 if x 2 Q\ [ 1; 1]
f (x) = p
1 x2 otherwise
Therefore, there are in nitely many functions implicitly de ned by g on the rectangle
A B = [ 1; 1] [ 1; 1].10 The equation g (x; y) = 0 is therefore not explicitable on this
rectangle, which makes this case hardly interesting. Let us consider instead the less ambitious
frame
A~ B~ = [ 1; 1] [0; 1]
p
The function f : [ 1; 1] ! [0; 1] de ned by f (x) = 1 x2 is the only function such that
p
g (x; f (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]
that is, f is the only function implicitly de ned by g on the rectangle A~ ~ Equation
B.
g (x; y) = 0 is then explicitable on frame A~ B,
~ with
g 1
(0) \ A~ ~ = Gr f
B
10
Note that most of them are somewhat irregular; the only continuous ones among them are the two in
(34.33).
34.4. A GLOBAL PERSPECTIVE 1065
y
2.5
1.5
0.5
0
-1 O 1 x
-0.5
-1
-2 -1 0 1 2
y
1.5
0.5
-1 1
0
O x
-0.5
-1
-1.5
-2
-2 -1 0 1 2
1066 CHAPTER 34. IMPLICIT FUNCTIONS
To sum up, there are in nitely many implicit functions on frame A B, while uniqueness
can be obtained when we restrict ourselves to the smaller frames A~ B ~ and A B. The
study of implicit functions is of interest on these two rectangles because the unique implicit
function f de ned thereon describes a univocal relationship between the variables x and y
which equation g (x; y) = 0 implicitly determines. N
This example shows, inter alia, how important is to study, for each x 2 1 (0)),
1 (g the
solution set
S (x) = y 2 2 (g 1 (0)) : g (x; y) = 0
The scalar functions f : 1 (g 1 (0)) ! 2 (g 1 (0)), with f (x) 2 S (x) for every x in their
domain, are the possible implicit functions. In particular, when the rectangle A B is such
that S (x) \ B is a singleton for each x 2 A, we have a unique implicit function f : A ! B.
In this case, for each x 2 A there is a unique solution y 2 B to equation g (x; y) = 0.
Let us see another simple example, warning the reader that { though useful to x ideas {
these are fortunate cases: usually constructing S (x) is far from easy (though a local result,
the Implicit Function Theorem is key in this regard).
p
Example 1604 Let g : R2+ ! R be given by g (x; y) = xy 1. We have
1
g (0) = (x; y) 2 R2+ : xy = 1
S (x) = fy 2 (0; 1) : xy = 1g
Since
1
S (x) = 8x 2 (0; 1)
x
we consider A B = R2++ and f : (0; 1) ! (0; 1) given by f (x) = 1=x. We have
1
g (x; f (x)) = g x; =0 8x 2 (0; 1)
x
A nal remark. When writing g (x; y) = 0, variables x and y play symmetric roles, so
that we can think of a relationship of type y = f (x) or of type x = ' (y) indi erently. In
what follows, we will always consider a function y = f (x), as the case x = ' (y) can be
easily recovered via a parallel analysis to that we conduct here.
34.4. A GLOBAL PERSPECTIVE 1067
(i) equilibrium analysis, where equation (34.34) derives from an equilibrium condition in
which y is an equilibrium (endogenous) variable and x is an (exogenous) parameter;
(ii) optimization problems, where equation (34.34) comes from a rst-order condition in
which y is a choice variable and x is a parameter.
The analysis of the relationship between x and y, that is, between the values of the
parameter and the resulting choice or equilibrium variable, is a comparative statics exercise.
In view of what we learned in this chapter, it consists in studying the function f implicitly
de ned by the economic relation (34.34). The uniqueness of f , so the explicitability of
equation (34.34), is essential to best conduct comparative statics exercises.
The following two subsections will present these two comparative statics problems.12
Equilibrium comparative statics Consider the market of a given good, as seen in Chap-
ter 13. Let D : [0; b] ! R and S : [0; b] ! R be the demand and supply functions respectively.
A pair (p; q) 2 [a; b] R+ of prices and quantities is said to be a market equilibrium if
In particular, having found the equilibrium price p^ by solving the equation D (p) = S (p),
the equilibrium quantity is q^ = D (^
p) = S (^
p).
Suppose that the demand for the good (also) depends on an exogenous variable 0. For
example, may be the level of indirect taxation which in uences the demanded quantity.
The demand thus takes the form D (p; ) and is a function D : [0; b] R+ ! R, that is,
it depends on both the market price p and the value of the exogenous variable. The
equilibrium condition (34.35) now becomes
and the equilibrium price p^ varies as changes. What is the relationship between taxation
level and equilibrium prices? Which properties does such a relationship have?
Answering these simple, yet important, economic questions is equivalent to asking oneself:
(i) whether a (unique) function p = f ( ) which connects taxation and equilibrium prices
(i.e., the exogenous and endogenous variable of this simple market model) exists, and (ii)
which properties such a function has.
To deal with this problem, we introduce the function g : [0; b] R+ ! R given by
g (p; ) = S (p) D (p; ), so that the equilibrium condition (34.36) can be written as
g (p; ) = 0
11
Though in that section we adopted a local angle, the marginal analysis can be carried out globally, as
readers can check (cf. Example 1607 below).
12
In Chapter 41 we will further study comparative statics exercises in optimization problems.
1068 CHAPTER 34. IMPLICIT FUNCTIONS
In particular,
1
g (0) = f(p; ) 2 [0; b] R+ : g (p; ) = 0g
is the set of all pairs of equilibrium prices/taxation levels (i.e., of endogenous/exogenous
variables).
The two questions asked above are now equivalent to asking oneself whether:
(ii) if so, which are the properties of such a function f : for example, if it is decreasing, so
that higher indirect taxes correspond to lower equilibrium prices.
Problems as such, where the relationship among endogenous and exogenous variables
is studied { in particular, how changes in the former impact the latter { are of central
importance in economic theory and in its empirical tests.
To x ideas, let us examine the simple linear case where everything is straightforward.
D (p; ) = (p + )
S (p) = a + bp
g (p; ) = a + bp + (p + )
a
f( )= + (34.37)
b+ +b
clearly satis es (34.36). The equation g (p; ) = 0 thus implicitly de nes (and in this case
also explicitly) the function f given by (34.37). Its properties are obvious: for example, it
is strictly decreasing, so that changes in the taxation level bring about opposite changes in
equilibrium prices.
Regarding the equilibrium quantity q^, for every it is
q^ = D (f ( ) ; ) = S (f ( ))
Since the interval (0; 1) is open, by Fermat's Theorem a necessary condition for y > 0 to
be optimal is that it satis es the rst-order condition
@ (p; y)
=p c0 (y) = 0 (34.40)
@y
The key aspect of the producer's problem is to assess how the optimal production of potatoes
varies as the market price of potatoes changes, i.e., how the production of potatoes is a ected
by their price. Such a relevant relationship between prices and quantities is expressed by the
scalar function f such that
p c0 (f (p)) = 0 8p 0
that is, by the function implicitly de ned by the rst-order condition (34.40). Function
f is referred to as the producer's supply function (of potatoes). For each price level p, it
gives the optimal quantity y = f (p). Its existence and properties (for example, if it is
increasing, so that higher prices lead to larger produced quantities of potatoes, hence larger
supplied quantities in the market) are of central importance in studying a good's market. In
particular, the sum of the supply functions of all producers who are present in the market
constitutes the market supply function S (p) which we saw in Chapter 13.
To formalize the derivation of the supply function from the optimization problem (34.39),
we de ne a function g : [0; 1) (0; 1) ! R by
g (p; y) = p c0 (y)
g (p; y) = 0
If there exists an implicit function y = f (p) such that g (p; f (p)) = 0, it is nothing but the
supply function itself. Let us see a simple example where the function f and its properties
can be recovered with simple computations.
Example 1606 Consider quadratic costs c (y) = y 2 for y 0. Here g (p; y) = p 2y, so the
only function f : [0; 1) ! [0; 1) implicitly de ned by g on R2+ is f (p) = p=2. In particular,
f is strictly increasing, so that higher prices entail a higher production, and hence a larger
supply. N
1070 CHAPTER 34. IMPLICIT FUNCTIONS
34.4.4 Properties
The rst important problem one faces when analyzing implicit functions is that of determin-
ing which conditions on function g guarantee that equation g (x; y) = 0 is explicitable on a
frame, that is, it de nes a unique implicit function over there. Later in the book we will
establish a Global Implicit Function Theorem (Section 35.4), a deep result. Here we can,
however, establish a few simple, yet quite interesting, facts that follow from Propositions
1577 and 1578.
For simplicity, set A = 1 (g 1 (0)) and B = 2 (g 1 (0)) and let us focus on the frame
given by the projections' rectangle A B.13 For the problem to be well posed, it is necessary
that
S (x) = fy 2 B : g (x; y) = 0g =
6 ; 8x 2 A (34.41)
So, for every possible x at least a solution (x; y) to equation g (x; y) = 0 exists. As previously
noted, every scalar function f : A ! B with f (x) 2 S (x) for all x 2 A is a possible implicit
function.
In view of Proposition 1577, the non-emptiness condition (34.41) holds if
The results of Section 34.2 permit to ascribe some notable properties to the implicit
function. Speci cally, let f : A ! B be the unique function such that g (x; f (x)) = 0 for all
x 2 A. By Propositions 1581 and 1582, if g is strictly increasing in y, then f is:14
@g
(x; y)
0
f (x) = @x 8 (x; y) 2 g 1
(0)
@g
(x; y)
@y
if g is continuously di erentiable on A B, with either @g (x; y) =@y > 0 for all (x; y) 2
A B or @g (x; y) =@y < 0 for all (x; y) 2 A B.
13
So, this rectangle is included in the domain of g. In any case, what we establish in what follows is easily
seen to hold for any frame of g.
14
In points (ii) and (iii) we tacitly assume that the domain of C is convex, while in points (iv) and (v) we
assume that it is open.
34.4. A GLOBAL PERSPECTIVE 1071
Point (ii) makes rigorous in a global sense { in contrast to the local one already remarked
in Section 34.3.2 { the expression \convex indi erence curves" by showing that they are,
indeed, represented via convex implicit functions.
Thus, the results of Section 34.2 are all what we need in this example, there is no need to
invoke the Implicit Function Theorem. For instance, the continuous di erentiability of fk
follows from Proposition 1582 since @g (x; y) =@y > 0 for all (x; y) 2 R2++ . In sum, here the
Implicit Function Theorem actually delivers an inferior, local rather than global, result. N
(i) D : [0; b] R ! R and S : [0; b] ! R are continuous and such that D (0; ) S (0) and
D (b; ) S (b) for every .
g (f ( ) ; ) = 0 8 0
By Proposition 1581, it is
(iii) (strictly) convex if S is (strictly) quasi concave and D is (strictly) quasi convex.
15
Indeed D and S are continuous and, furthermore, D (0; ) S (0) and D (b; ) S (b) for every .
1072 CHAPTER 34. IMPLICIT FUNCTIONS
Property (ii) is especially interesting. Under the natural hypothesis that D is strictly
decreasing in , we have that f is strictly decreasing: changes in taxation bring about
opposite changes in equilibrium prices (increases in entail decreases in p, and decreases in
determine increases in p).
In the linear case of Example 1605, the existence and properties of f follow from simple
computations. The results in this section allow to extend the same conclusions to much more
general demand and supply functions.
where c is the choice variable and 0 parameterizes the objective function F : (a; b)
[0; 1) ! R. Assume that F is partially derivable. If the partial derivative @F ( ; c) =@c is
strictly increasing in c { for example, @ 2 F ( ; c) =@c2 > 0 if F is twice di erentiable { and if
condition (34.3) holds, then by Propositions 1577 and 1578 the rst-order condition
@F ( ; c)
g (c; ) = =0
@c
implicitly de nes a unique function f : [0; 1) ! (a; b) such that
@F ( ; f ( ))
=0 8 0
@c
By Proposition 1581, the function f is:
In the special case of the producer's problem, market prices p are the parameters and
production levels y are the choice variables. So, F (p; y) = py c (y) is the pro t function
and
@F (p; y)
g (p; y) = = p c0 (y)
@y
The strict monotonicity of g in y is equivalent to the strict monotonicity of the derivative
function c0 (and to its strict convexity or concavity). In particular, in the standard case when
c0 is strictly increasing (so, c is strictly convex), the function g is concave, which implies that
the supply function y = f (p) is convex. In such a case, since g is strictly increasing in p, the
supply function is strictly increasing in p.
Chapter 35
35.1 Equations
Let
f :A!B
be an operator between two subsets A and B of Rn . A general form of an equation is
f (x) = y0 (35.1)
that is, 8
>
> f1 (x1 ; :::; xn ) = y01
>
>
<
f2 (x1 ; :::; xn ) = y02
>
>
>
>
:
fn (x1 ; :::; xn ) = y0n
where y0 is a given element of B.1 The variable x is the unknown of the equation and y0 is
the known term. The solutions of the equation are all x 2 A such that f (x) = y0 .
A basic taxonomy: equation (35.1) is
(ii) homogeneous if y0 = 0;
Earlier in the book we studied homogeneous equations (Chapter 14) and linear equations
(Section 15.7). First-degree and second-degree equations are polynomial equations familiar
from, at least, high school.
Two basic existence and uniqueness questions can be asked about the solutions of equa-
tion (35.1), from local and global angles:
(Q.i) can the equation be solved globally: given every y0 2 B, is there x 2 A that satis es
(35.1)? if so, is the solution unique?
1
We write y0 to emphasize that it should be regarded as a xed element of Rn and not as a variable.
1073
1074 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
(Q.ii) can the equation be solved locally: given a y0 2 B, is there x 2 A that satis es (35.1)?
if so, is the solution unique?
Before further discussing these questions, let us formalize them. To this end, observe
that the set of all solutions of equation (35.1) is given by the preimage
1
f (y0 ) = fx 2 A : f (x) = y0 g
So, the previous questions can be addressed via the inverse correspondence f 1 :B A
de ned by
f 1 (y) = fx 2 A : f (x) = yg 8y 2 B
with domain Im f B (cf. Example 950). We say that f is weakly invertible at y 2 B if
f (y) is non-empty, that is, if y 2 Im f . When, in addition, f 1 (y) is a singleton, we say
1
The global question (Q.i) is more demanding, but also more important, than the local
one (Q.ii). The unique existence of solutions at each y0 2 B amounts to the existence of the
inverse function f 1 : B ! A, which describes how solutions vary as the known term varies.
Example 1608 Consider the second-order equation ax2 + bx + c = 0, where a, b and c are
scalar coe cients with a 6= 0. As well-known, the solution formula is
p
b b2 4ac
x=
2a
We can write it in the format (35.1) as
ax2 + bx = y
via the second-degree polynomial f : R ! R given by f (x) = ax2 + bx and the known term
y = c. Solutions are then described by the inverse correspondence f 1 : R R given by:2
8 p p
>
< b b2 +4ay
;
b+ b2 +4ay
if y 4ab2
1 2a 2a
f (y) =
>
: b2
; if y < 4a
Thus, the knowledge of the solution formula amounts to the knowledge of the inverse corre-
spondence. As the known term y varies, solutions may exist or not, may be unique or not.
2
The condition y b2 =4a amounts to the positivity of the discriminant b2 + 4ay.
35.2. WELL-POSED EQUATIONS 1075
For instance, for the quadratic f (x) = x2 the inverse correspondence f 1 :R R is given
by ( p p
1
y; y if y 0
f (y) =
; if y < 0
So, a unique solution exists when y = 0, so when the equation is homogeneous. Two distinct
solutions exist when y > 0, no solution exists when y < 0. In this case, where we posited
A = B = R, we can only answer the local question (Q.ii). But, we may be less ambitious
and restrict ourselves to A = B = (0; 1). In this case, there exist unique solutions described
p
by the inverse function f 1 : (0; 1) ! R given by f 1 (y) = y. N
As this example clari es, the choice of the solution domain A and of the set of known
terms B determines the nature of the equation: the same function, in the example the
quadratic, de nes di erent equations under di erent sets A and B.
Couple of further remarks are in order. First, every equation (35.1) can be put in a
homogeneous form fy0 (x) = 0 via the auxiliary function fy0 (x) = f (x) y0 . If we are
interested in addressing question (Q.ii), so what happens at a given y0 , it is then without
loss of generality to consider homogeneous equations. This is what we did, for example,
in Chapter 14 where we presented a few non-trivial results about homogeneous equations.
However, for the global question (Q.i) it is important to keep track of the known term by
studying the general form f (x) = y0 .
Second, though in the chapter we focus on the basic case m = n, equations can be de ned
more generally through operators f : A Rn ! B Rm between di erent Euclidean spaces.
For linear equations, we rst studied the case n = m (Section 15.7) and then generalized the
analysis to any n and m (Section 15.8). In this chapter such a generalization is not pursued.
In words, if a unique solution exists for each value of the known term, does it change
continuously as the known term changes?
This is a question about the \robustness" of unique solutions, whether they change
abruptly, discontinuously, under small changes of the known term. If they did, the equation
would feature an unpleasant instability because small changes in the known term would
result in signi cant changes in its solutions.
To address this question, we introduce a piece of terminology.
De nition 1609 Equation (35.1) is said to be well posed if f is a bijection with continuous
inverse f 1 .
A x=b
n n
1076 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
The general study of well-posed equations is based on the following deep, innocuous
looking, result proved by Luitzen Brouwer in 1911.3
In general, continuity does not preserve openness, as Example 596 shows. Yet, under a
basic condition like injectivity it does, a remarkable fact.
Proof Let U be an open subset of the domain A. We want to show that its image f (U ) is
open. The restriction fjU of f on U is injective and continuous. By the Domain Invariance
Theorem, the set f (U ) = Im fjU is open.
Proof of the Domain Invariance Theorem We prove this result only in a special di er-
ential case:4 we assume that the injective function f : U ! Rn is continuously di erentiable,
with det Df (x) 6= 0 for all x 2 U . We rst show that its image Im f is open. To this end,
let x0 2 U and consider a neighborhood B" (x0 ) U . De ne
by ' (x) = kf (x) f (x0 )k. The function f is easily seen to be continuous. It is also strictly
positive: since f is injective, we have f (x) 6= f (x0 ) for all x 2 @B" (x0 ), and so ' (x) > 0
for all x 2 @B" (x0 ). Since @B" (x0 ) is compact, by the Weierstrass Theorem ' has then a
minimum value m > 0.
Consider the neighborhood Vm=2 (f (x0 )). We want to show that Vm=2 (f (x0 )) f (U ),
thus proving that f (U ) is open. Fix y 2 Vm=2 (f (x0 )). We want to show that y 2 f (U ). To
this end, de ne
hy : B" (x0 ) = fx 2 U : kx x0 k "g ! R
by hy (x) = kf (x) yk. Since B" (x0 ) is compact, by the Weierstrass Theorem hy has a
minimizer xy 2 B" (x0 ). For each x 2 @B" (x0 ) we have
where the rst inequality follows from (4.11). Yet, hy (x0 ) = kf (x0 ) yk < m=2 because
y 2 Vm=2 (f (x0 )). So, the minimizer xy does not belong to the boundary @B" (x0 ). That is,
P
it belongs to B" (x0 ). Clearly, xy minimizes also h2y , given by h2y (x) = ni=1 (fi (x0 ) yi )2 .
By Fermat's Theorem, rh2 (xy ) = 0, that is,
n
X @fi (xy )
(fi (xy ) yi ) =0 8j = 1; :::; n
@xj
i=1
Since det Df (xy ) 6= 0, this is an homogeneous linear system that, by Cramer's Theorem,
has 0 as its unique solution. Therefore, fi (xy ) = yi for each i = 1; :::; n. We conclude that
f (xy ) = y, so that y 2 f (U ), as desired.
It remains to prove that the bijective function f 1 : Im f ! Rn is continuous. Clearly,
Im f 1 = U . So, let G be any open subset of U . By Corollary 601, it is enough to prove
that the set f (G) = (f 1 ) 1 (G) is open. Let fjG be the restriction of f on G. By what
has been just proved, Im fjG is an open set. Since Im fjG = f (G), we conclude that f 1 is
continuous.
Two open sets U and V that admit an homeomorphism f : U ! V are called homeo-
morphic.5 This map has the nature of a topological isomorphism between the two sets. To
see why, denote by U and V the collections of all open subsets of U and V , respectively.
In view of the Domain Invariance Theorem, it is easy to see that f is a bijective correspon-
dence between the elements of U and V : we have f (U 0 ) 2 V for all U 0 2 U as well as
f 1 (V 0 ) 2 U for all V 0 2 V . We can thus move back and forth between the collections U
and V through f . The homeomorphic sets U and V are thus isomorphic from a topological
standpoint.
By the Domain Invariance Theorem, it is enough to check that a bijective operator
f : U ! f (U ) is continuous to establish that its domain U and its image V = f (U ) are two
homeomorphic open sets. No need to check the properties of the inverse, in particular the
openess of its domain and its continuity, a big relief.
Example 1614 (i) Bounded open intervals are homeomorphic: given any two of them (a; b)
and (c; d), it is enough to consider the continuous bijection f : (a; b) ! (c; d) given by
d c
f (x) = (x a) + c (35.2)
b a
5
In more advanced courses, readers will learn that this notion extends well beyond open sets.
1078 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
(ii) The open unit ball B1 (0) = fx 2 Rn : kxk < 1g is homeomorphic to Rn : the map
f : B1 (0) ! Rn given by
x
f (x) = q (35.3)
1 kxk2
is a continuous bijection (cf. Example 213). N
Proof As the other properties are trivial, we consider only transitivity. Let U , V and G be
three open sets of Rn with U =0 V and V =0 G. By de nition, there exist homeomorphisms
f : U ! V and g : V ! G. Their composition g f : U ! G is easily checked to be an
homeomorphism (cf. Proposition 566), i.e., homeomorphisms are closed under composition.
Thus, U =0 G.
The transitivity of =0 permits to nd new pairs of homeomorphic sets from old ones.
For instance, it is easy to check that the neighborhoods of points of Rn are homeomorphic
to the unit open ball (why?). The transitivity of =0 then immediately implies that any two
neighborhoods in Rn are homeomorphic.
Not all open sets of Rn are homeomorphic, as the next basic example shows.
Example 1616 In the real line, the open sets U = (0; 1) and V = (2; 3) [ (5; 6) are not
homeomorphic. For, suppose per contra that there exists a continuous injective map f :
U ! V . By Proposition 580, V = Im f is an interval, a contradiction. N
We close this little homeomorphic excursion with an interesting result, whose proof we
omit, that covers the Example 1614.
Back to equations, our main object of interest, the Domain Invariance Theorem shows
that, in the important case when the set B of known terms is open, an equation
f (x) = y0
De nition 1619 An equation (35.1), with open solution domain A, is said to be smoothly
well posed if f : A ! B is a C 1 -di eomorphism.
This is best possible kind of stability for an equation. By the Domain Invariance Theorem,
the set B of known terms is open.
We close by brie y discussing a classi cation of open sets according to their di erential
similarity.
U =1 V =) U = V
A natural question is when the converse holds, that is, when two homeomorphic open sets
are actually di eomorphic, so linked via a C 1 -di eomorphism rather than \just" via an
homeomorphism. For instance, this is the case for the homeomorphic open sets in Example
1614 since the maps (35.2) and (35.3) are both easily checked to be C 1 -di eomorphisms.
This is, indeed, a special case of the following result, which signi cantly improves Proposition
1617.6
Any two open convex sets U and V of Rn are thus linked through a C 1 -di eomorphism
f : U ! V , a remarkable property. Yet, this is just the beginning of a long journey in the
di erential structure of sets of Rn , left to more advanced courses.7
6
See, e.g., Gonnord and Tosel (1998) p. 60.
7
It the subject matter of di erential topology (economic applications of this topic can be found in Mas-
Colell, 1985).
1080 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
where y = f (x).
where y = f (x). So, the \di erential of the inverse\ coincides with the \inverse of the
di erential". Formula (35.4) thus ensures the mutual consistency of the linear approxima-
tions at x of f and of its inverse f 1 . This consistency is a key regularity property of
C 1 -di eomorphisms.
The converse of Proposition 1623 is false for n 2, even on a convex domain:8 later in
the chapter, in Example 1646 we will see a continuously di erentiable operator f : R2 ! R2 ,
with det Df (x) 6= 0 for each x 2 R2 , which is not injective { let alone a C 1 -di eomorphism.
Next we give another example of this kind.
Example 1624 Let U be the open convex set R2++ . De ne a continuously di erentiable
operator f : U ! R2 by f (x1 ; x2 ) = x21 x22 ; 2x1 x2 . We have
2x1 2x2
Df (x) =
2x2 2x1
and so Df (x) = 4 x21 + x22 > 0 for all x = (x1 ; x2 ) 2 U . Yet, f (x) = f ( x) for all x 2 U
and so f is not injective. N
As to the scalar case n = 1, for a convex domain we can state the converse.
Proposition 1625 Let U be an open interval, bounded or not, of the real line. A continu-
ously di erentiable function f : U ! f (U ) is a C 1 -di eomorphism if and only if f 0 (x) 6= 0
for all x 2 U .
Proof In view of the last result, we only need to prove the \if" part. Let f : U ! R be a
continuously di erentiable function with f 0 (x) 6= 0 for all x 2 U . Let x0 2 U be such that
f 0 (x0 ) > 0 (the case < 0 is similar). Then, f 0 > 0. For, suppose that f 0 (x1 ) < 0 for some
x1 2 U . Since the derivative function f 0 : U ! R is continuous, by the Bolzano Theorem
there exists x 2 U such that f 0 (x) = 0, a contradiction. We conclude that f 0 > 0. Thus,
f is strictly increasing, so injective. Set V = Im f . By the Invariant Domain Theorem,
f : U ! V is an homeomorphism. By Theorem 1234, f 1 : V ! R is di erentiable
with (f 1 )0 (y) = 1=f 0 (x) for y = f (x). Take any y 2 V . Let fyn g V converge to
y. There exists a sequence fxn g U with f (xn ) = yn for each n. As f 0 is continuous,
0
lim(f 1 )0 (yn ) = lim 1=f 0 (xn ) = 1=f 0 (x) = f 1 (y). We conclude that f 1 is continuous
at y 2 V . Thus, f : U ! V is a C 1 -di eomorphism.
f : A ! f (A) (35.7)
is a C k -di eomorphism.
The local Jacobian condition (35.6) thus ensures that locally, over a small enough open
set A containing x0 , the operator f is a C k -di eomorphism. As a result, by Proposition 1623
we then have
Df (x) 6= 0 8x 2 A
as well as
1 1
Df (y) = (Df (x)) 8x 2 A (35.8)
where y = f (x).
The role of the Jacobian condition (35.6) in this important theorem suggests the following
classi cation.
Proof We consider only the easy scalar case n = 1, when condition det Df (x0 ) 6= 0 takes
the basic form f 0 (x0 ) 6= 0.10 Let f 0 (x0 ) > 0 (the case < 0 is similar). Being f 0 continuous,
there is a neighborhood B (x0 ) of x0 on which f 0 is strictly positive. Thus, f is strictly
increasing, so injective, on B (x0 ).
The next lemma follows from the previous one along with the Domain Invariance Theo-
rem.
9
Also the converse is true, so one can rst prove either theorem and get the other as a simple consequence
(cf. Theorem 1648).
10
For the proof of the more involved general case, we refer readers to Apostol (1974) p. 370.
35.3. LOCAL ANALYSIS 1083
Proof Let G be an open subset of U . We want to show that f (G) is open. Let x 2 G.
By Lemma 1628, there is a neighborhood B (x) on which[f is injective. By the Domain
Invariance Theorem, the set f (B (x)) is open. Since G = B (x), we have
x2G
!
[ [
f (G) = f B (x) = f (B (x))
x2G x2G
We conclude that the set f (G) is open because it is the union of open sets.
Proof of the Inverse Function Theorem By Lemma 1628, there is a neighborhood B (x0 )
on which f is injective. By setting V = f (B (x0 )), the function f : B (x0 ) ! V is bijective.
By the Domain Invariance Theorem, the set V is open. Since f is continuously di erentiable,
the real-valued Jacobian function det Df (x) is continuous. We can thus assume, without
loss of generality, that det Df (x) 6= 0 for all x 2 B (x0 ). In the rest of the proof we consider
the restriction f : B (x0 ) ! V of f .
The set B (x0 ) V is open in R2n . De ne g : B (x0 ) V ! Rn by
The operator version of the Implicit Function Theorem (Theorem 1598), in \exchanged"
form, then ensures the existence of neighborhoods V~ (y0 ) V and B~ (x0 ) B (x0 ) and of a
unique function ' : V~ (y0 ) ! B
~ (x0 ) such that ' (y0 ) = x0 and
By (35.9),
f (' (y)) = y 8y 2 V~ (y0 )
This relation implies that ' is injective. Since f f 1 (y) = y for all y 2 V , we conclude
that f 1 = ' on V~ (y0 ). Thus, f 1 (V~ (y0 )) = Im ', so that Im ' is an open set. Therefore,
f : Im ' ! V~ (y0 ) is a continuous bijection de ned on an open set that contains x0 (because
' (y0 ) = x0 ). By the Domain Invariance Theorem, f is an homeomorphism. Since Im '
B (x0 ), we have det Df (x) 6= 0 for all x 2 Im '. Finally, k times continuously di erentiability
also follows from the Implicit Function Theorem. By setting A = Im ' the result is proved.
The Inverse Function Theorem relies on two hypotheses, the Jacobian condition (35.6)
and the continuous di erentiability hypothesis (of some order k 1). In general, it fails
when we remove either hypothesis. A non-trivial, omitted, example can be given to show
that plain di erentiability is not enough for the theorem. So, continuous di erentiability is
needed. The next simple example shows that the Jacobian condition is needed.
1084 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
Example 1630 The quadratic and cubic functions f (x) = x2 and f (x) = x3 are contin-
uously di erentiable functions on the real line that do not satisfy the Jacobian condition
(35.6) at the origin. The quadratic function is not locally invertible at the origin: there is
no neighborhood of the origin over which we can restrict the quadratic function and make
1
it injective. The cubic function is injective, but its inverse function f 1 (x) = x 3 is not
di erentiable at the origin. N
The Inverse Function Theorem permits to solve locally equations, thus answering question
(Q.ii). To see why, it is convenient to denote by B the image f (A) and write the C k -
di eomorphism (35.7) as
f :A!B
With this, suppose that { by skill or luck { we have been able to nd a solution x0 of equation
f (x) = y0 . Based on this knowledge, when x0 is a regular point of f the Inverse Function
Theorem ensures that, rst, x0 is the unique solution and, second, that there exists an open
set B { the image of a small enough open set A containing x0 { such that for each known
term y belonging to B the equation f (x) = y has unique solutions as well, described by
f 1 : B ! A.
In sum, if a regular point x0 happens to solve the equation, it is the unique solution
and, locally around x0 , the equation is smoothly well posed. The next result, an immediate
consequence of the Inverse Function Theorem, formalizes this discussion (for simplicity, we
consider the case k = 1).
f (x) = y0
If x0 is a regular point of f , there exist open sets A and B containing x0 and y0 such that
the equation
f (x) = y
has a unique solution in A for each y 2 B. The inverse function f 1 : B ! A, which
associates a solution to each known term, is continuously di erentiable.
In words, these open sets cover, through their union, the set A. A nite subcover of the
open cover fGi gi2I is a nite collection of sets, say fG1 ; :::; Gn g, taken from the open cover
fGi gi2I that still are able to cover the set A, that is, such that
[n
A Gi
i=1
Theorem 1632 (Heine-Borel) A subset A of Rn is compact if and only if each open cover
of A has a nite subcover.
This characterization of compactness may appear prima facie not that thrilling, but it
becomes key when one goes beyond Rn , as more advanced courses will do. Because of its
unimposing appearance, we introduce it here as, momentarily, we will be able to show that
it is a powerful characterization already in Rn .
Proof \If". Let A be a set of Rn such each open cover of A admits a nite subcover. We
want to show that it is closed and bounded. We rst prove that Ac is open. Fix x 2 Ac .
For each y 2 A, let B" (y) and B"y (x) be, respectively, neighborhoods of y and x with radius
lower than kx yk =2. This is easily seen to imply that B" (y) \ B"y (x) = ;. Since the
collection fB" (y)gy2A is an open cover of A, by hypothesis there exists a nite subcover
\n [n
fB" (yi )gni=1 of A. Set V = B"yi (x) and B = B" (yi ). Clearly, A B. We also
i=1 i=1
have V \ B = ;. Indeed,
[n [n [n
V \B =V \ B" (yi ) = (V \ B" (yi )) (B"yi (x) \ B" (yi )) = ;
i=1 i=1 i=1
Thus, by taking any y 2 A we have A B2"+M (y). This implies that the set A is bounded.
\Only if". We only consider the scalar case when A is a compact interval [a; b] of R. We
begin with a claim.
Claim Let f[an ; bn ]gn be a collection of compact intervals of R with [an+1 ; bn+1 ]
1 [an ; bn ]
\
for each n 1. We have [an ; bn ] 6= ;.
n 1
Proof of the Claim Given the collection f[an ; bn ]gn 1 , let A = fa1 ; a2 ; :::; an ; :::g. We have
A [a1 ; b1 ] and therefore A is a bounded set. By the Least Upper Bound Principle, we can
set x = sup A. Since each bn is an upper bound for A, we have bn x for each n 1. Hence,
1086 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
\ \
an x bn for each n 1 and, therefore, x 2 [an ; bn ]. This implies [an ; bn ] 6= ;,
n 1 n 1
as desired.
Back to the proof, suppose per contra that there exists an open cover fGi gi2I of [a; b] that
does not contain any nite subcover of [a; b]. Let = b a and c1 = (a + b) =2. The collection
fGi gi2I is an open cover also of the intervals [a; c1 ] and [c1 ; b]. Therefore, at least one of
these two intervals has no nite subcover of fGi gi2I . Otherwise, from [a; b] = [a; c1 ][[c1 ; b] it
would follow that [a; b] itself would have such a subcover. Without loss of generality, suppose
therefore that [a; c1 ] has no nite subcover of fGi gi2I . Set c2 = (a + c1 ) =2. By repeating the
argument just seen, we can assume that also [a; c2 ] does not have not such a nite subcover.
By proceeding in this way we can construct a collection of intervals f[a; cn ]gn 1 such that
[a; cn+1 ] [a; cn ] and
cn a=
2n
for each n 1. Moreover, none of these closed intervals has a nite subcover of fGi gi2I . By
\ \ [n
the Claim, [a; cn ] 6= ;. Let x 2 [a; cn ]. Since [a; b] Gi , there exists Gi
n 1 n 1 i=1
such that x 2 Gi . As x is an interior point of Gi , there exists a neighborhood (x "; x + ")
such that (x "; x + ") Gi . For n su ciently large, we have =2n < " and, therefore,
Consequently, the singleton fGi g is a nite subcover of fGi gi2I that covers [a; cn ], which
contradicts the fact that all the intervals [a; cn ] do not have such subcovers. From this
contradiction it follows that [a; b] has a nite subcover of fGi gi2I .
1 1
Gn = 1+ ;1
n n
[
It is an open cover of A since Gn = ( 1; 1). It does not admit any nite subcover: consider
n
any nite subcover fGn1 ; :::; Gnk g of fGn g, where
we have
k
[ 1 1
Gni = Gnk = 1+ ;1
nk nk
i=1
Thus, the nite subcover fGn1 ; :::; Gnk g does not cover A.
Intuitively, discrete sets have a granular structure. This intuition is con rmed by the
next result (which shows, inter alia, the usefulness of the open cover characterization of
compactness).
Proposition 1634 A discrete set is at most countable. It is nite if and only if it is compact.
Proof Let A be a discrete set in Rn . For each x 2 A there exists a small enough neighborhood
B" (x) such that B" (x) \ A = ;. By the density of the rationals (Proposition 42), it is easy
to see that there exists a point qx 2 B" (x) with rational components, i.e., qx 2 Qn . To
distinct elements x and x0 of A we can associate distinct points qx and qx0 . Thus, the map
A 3 x 7 ! qx 2 Qn is injective. Hence, jAj jQn j. As the set Qn is countable (why?), we
conclude that the set A is at most countable.
If A is nite, it is compact (cf. Example 171). Conversely, let A be compact. For
each x[ 2 A there exists a small enough neighborhood B" (x) such that B" (x) \ A = ;. As
A B" (x), by The Heine-Borel Theorem there exists a nite collection fB" (xi )gki=1
x2A
k
[
such that A B" (xi ). Thus, A = fx1 ; :::; xk g.
i=1
We continue this topological analysis by introducing a new class of functions. In the rest
of the section C denotes a closed subset of Rn .
Properness requires the norm of the images of f to diverge to +1 along any possible
unbounded sequence fxn g Rn , i.e., such that kxn k ! +1. In words, the function cannot
take, inde nitely, values that have increasing norm values on a sequences that \dash o " to
in nity.
Example 1636 (i) If m = 1, supercoercive functions are proper. Indeed, for them we have
The converse is false: the cubic function f (x) = x3 is proper, but not supercoercive. In view
of Proposition 1016, supercoercive functions f : R ! R are actually the proper functions that
1088 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
feature bounded upper contour sets (momentarily, Proposition 1637 will put in perspective
this remark).
(ii) De ne f : R2 ! R2 by f (x) = x21 x22 ; 2x1 x2 . We have
kf (x)k2 = x41 + x42 + 6x21 x22 x41 + x42 + 2x21 x22 = kxk4
and so f is easily checked to be proper.11
(iii) Let B be a symmetric square matrix of order n and c a vector in Rn . The multivari-
able quadratic function f : Rn ! R de ned by
1
f (x) = x Bx + c x
2
is proper if B is positive de nite. Since x Bx > 0 for all 0 6= x 2 Rn , by the Weierstrass
Theorem it has a minimum value m > 0 over the unit sphere fx 2 Rn : kxk = 1g. Thus,
x x
x Bx = B kxk2 m kxk2 80 6= x 2 Rn
kxk kxk
| {z }
m
By now, the next characterization of proper functions should not be that surprising.
In view of Proposition 600, we have the following simple, yet interesting, corollary that
neatly shows what properness adds to continuity.
11
In Example 1624 we considered the restictrion of this function on R2++ .
35.4. GLOBAL ANALYSIS 1089
Proof Invertible linear operators are bijective and their inverses are linear operators (see
Chapter 15). By Lemma 899, there exists a constant k > 0 such that f 1 (x) k kxk for
every x 2 Rn . Let fxn g Rn be such that kxn k ! +1. Then, kxn k = f 1 (f (xn ))
k kf (xn )k, so kf (xn )k ! +1. We conclude that f is proper.
Proposition 1640 A continuous proper operator f : C ! Rm maps closed sets into closed
sets.
Proof Let y 2 Im f . Let x 2 f 1 (y). By the Inverse Function Theorem, there exist open
sets Bx and Vy , containing x and y, such that the map f : Bx ! Vy is bijective. Since
f : Bx ! Vy is bijective, the set f 1 (y) \ Bx contains at most a single point: the one whose
image is y. Thus, f 1 (y) \ Bx = fxg. This proves that x is an isolated point. We conclude
that the level set f 1 (y) is discrete.
Thus, two simple hypotheses, continuous di erentiability and everywhere regularity, en-
sure that the level sets are discrete sets, a signi cant nding that motivates the following
de nition.
This is a large class of functions, it is easy to come up with examples. For instance,
by Proposition 1637 proper functions are level-proper (this also clari es the \level-proper"
terminology). Yet, constant functions on the real line are a basic example of functions not
level-proper.
All this has interesting consequences for the study of equations. Take an equation, not
necessarily well posed, with an open solution domain A and de ned via a continuously
di erentiable and level-proper function f : A ! Rn . If each point of A is regular under f ,
the last corollary ensures that, for each known term y0 2 Rn , the solution set f 1 (y0 ) is at
most nite. That is, this equation has at most nitely many solutions.
Simple di erential conditions like continuous di erentiability and regularity, paired with
a mild requirement on level sets, thus deliver a remarkable niteness property of solution
sets.
Proof We prove the \if" as the converse is trivially true. By the Domain Invariance Theorem,
Im f is an open set. Let f : U ! Im f be bijective, with inverse f 1 : Im f ! U . By the
Inverse Function Theorem, each x 2 U has a neighborhood B (x) such that f : B (x) !
f (B (x)) is a C k -di eomorphism. Thus,[
the inverse f 1 is k-times continuously di erentiable
on the open set f (B (x)). As Im f = f (B (x)), we conclude that the inverse f 1 is k-
x2U
times continuously di erentiable, as desired.
In view of Proposition 1639, Cramer's Theorem is a special case of this theorem because
for linear operators T (x) = Ax we have DT (x) = A, so the Jacobian condition (35.10) holds
when det A 6= 0.
These limit points are distinct. For, suppose per contra that xi = xj for some i 6= j with
1 i; j k. Call x this common limit point. Then, there exist distinct xm m
i and xj , with
ym = f (xm m
i ) = f (xj ), that converge to x . But, this contradicts the fact that, by the Inverse
Function Theorem, there exists a neighborhood B" (x ) such that f : B" (x ) ! f (B" (x ))
is injective. We conclude that the limit points xi are distinct. In view of (35.11), we have
1
fx1 ; :::; xk g f (y)
1092 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
and so y 2 Lk , as desired.
The set Lk is thus both open and closed, so either empty or equal to Rn , for each k. Let
y 2 Im f . The set f 1 (y) is nite with, say, k elements. Thus, Lk = Rn . As y 2 = Lk+1 , we
also have Lk+1 = ;. Thus,
y 2 Rn : f 1
(y) = k = Lk Lk+1 = Lk = Rn
We conclude that there exists a natural number k such that
1
f (y) = k (35.12)
for all y 2 Rn . It remains to prove that k = 1, a non-trivial task that we omit. Observe,
however, that at this point we can conclude the proof by just making a grain of injectivity
assumption: there exists some y 2 Im f with singleton preimage f 1 (y), i.e.,
1
9y 2 Im f; f (y) = 1 (35.13)
In view of (35.12), this mild assumption implies right away that k = 1. In any event, having
proved that k = 1 one can conclude that f : Rn ! Rn is a bijective function. By Proposition
1644, f is then a C k -di eomorphism, as desired.
Consider an equation f (x) = y0 with open solution domain A and de ned through a
continuously di erentiable and proper function f . If each point of A is regular under f , the
equation is well posed: for every possible known term y0 2 Rn , there exists a unique solution,
given by x = f 1 (y0 ). Since f 1 is continuously di erentiable, solutions change smoothly.
At a theoretical level, questions (Q.i) and (Q.iii) are best answered in this case.13
The \if" part of the last theorem fails, in general, without the hypothesis that f is proper,
as the next classic example shows.
Thus,
det Df (x) = e2x1 cos2 x2 + e2x1 sin2 x2 = e2x1 > 0 8x 2 Rn
and so condition (35.10) holds. However, this function is not proper.
p Indeed, if we take
xn = (0; n), then kxn k = n but kf (xn )k = k(cos n; sin n)k = cos n + sin2 n = 1. So,
2
The next example complements the previous one by showing that properness alone is not
enough for the \if".
Example 1647 The function f : R2 ! R2 de ned by f (x) = x21 x22 ; 2x1 x2 is proper
(Example 1636). Its Jacobian matrix is:
2x1 2x2
Df (x) =
2x2 2x1
As det Df (x) = 4 x21 + x22 is zero at the origin, condition (35.10) fails. As remarked in
Example 1624, this function is not injective. N
g (x; f (x)) = 0 8x 2 Rn
1
Df (x) = (Dy g (x; y)) Dx g (x; y) 8x 2 Rn (35.15)
Since g is proper, so does F . Indeed, if k(x; y)k ! +1, then kg (x; y)k ! +1, so
kF (x; y)k ! +1 because kg (x; y)k kF (x; y)k.
Since
Fi (x; y) = xi 8i = 1; :::; n
and
Fn+j (x; y) = gj (x; y) 8j = 1; :::; m
1094 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
we have
2 @F1 (x) @F1 (x) @F1 (x) @F1 (x) @F1 (x)
3
@x1 @x2 @xn @y1 @ym
6 7
6 7
6 7
6 @Fn (x) @Fn (x) @Fn (x) @Fn (x) @Fn (x) 7
6 7
DF (x) = 6
6
@x1 @x2 @xn @y1 @ym 7
7
6 @Fn+1 (x) @Fn+1 (x) @Fn+1 (x) @Fn+1 (x) @Fn+1 (x) 7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x) @Fn+m (x)
@x1 @x2 @xn @y1 @ym
2 3
1 0 0 0 0
6 7
6 7
6 7
6 0 0 1 0 0 7
= 6
6 @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y) @g1 (x;y)
7
7
6 @x1 @x2 @xn @y1 @ym 7
6 7
4 5
@gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y) @gm (x;y)
@x1 @x2 @xn @y1 @ym
So, 2 3
@g1 (x;y) @g1 (x;y)
@y1 @ym
6 7
det DF (x) = det 4 5 = det Dy g (x; y)
@gm (x;y) @gm (x;y)
@y1 @ym
it follows that
1
F (x; 0) = (x; f (x)) 8x 2 Rn
Since F 1 is di erentiable, it can be proved that this implies that f is di erentiable. Since
g (x; f (x)) = 0 8x 2 Rn
So, formula (35.15) holds because condition (35.14) ensures that the matrix Dy g (x; f (x)) is
invertible at all x 2 Rn .
35.5. MONOTONE EQUATIONS 1095
y00 > y0 =) f 1
y00 > f 1
(y0 )
that is, a larger known term results in a larger solution. The opposite is true when f 1
is strictly decreasing. The monotonicity properties of the inverse function thus determine
the concordance between the changes in the known term and the resulting changes in the
solutions.
Ax = b (35.16)
de ned via a linear operator T : Rn ! Rn given by T (x) = Ax. For this equation, mono-
tonicity amounts to the unique existence of positive solutions in correspondence of positive
known terms, i.e.,
b > 0 =) A 1 b > 0 8b 2 Rn (35.17)
This a classic comparative statics exercise in economics.14
A straightforward application of Collatz's Theorem permits to characterize monotone
equations.
Proposition 1650 Let A and B be two subsets of Rn . An equation de ned via a surjective
map f : A ! B is monotone increasing if and only if
A dual result holds for the decreasing case. We can illustrate this result with the linear
equation (35.16). To this end, we need some matrix terminology. A square matrix A is:
Ax 0 =) x 0 8x 2 Rn (35.19)
It is easy to check that a square matrix A is inverse-positive if and only if the linear operator
T is invertible with a strictly increasing inverse operator T 1 . By Collatz's Theorem, a
square matrix A thus satis es condition (35.19) if and only if it is inverse-positive. In this
case, (35.17) holds and so to a positive known term b > 0 corresponds a unique positive
solution A 1 b > 0.
To check the monotonicity of a linear equation (35.16) thus amounts to check whether its
matrix A is inverse-positive. This can be done by checking directly that A is non-singular and
has a positive inverse. For some classes of matrices, however, there are some computationally
convenient criteria that do not require the computation of the inverse matrix. For instance,
the next result of Hawkins and Simon (1949) characterizes through the positivity of principal
minors the inverse positivity of Leontief matrices, a class of matrices relevant, for instance,
in input-output and in general equilibrium analyses.15
We omit the proof of this result.16 The following simple example illustrates it.
2 3
A=
4 7
is positive. N
Proof The \if" follows from Proposition 1204. The converse is proved in Fiedler (1986) p.
114.
15
See, e.g., the discussion on gross substitution in Mas-Colell et al. (1995) p. 612.
16
We refer to Fiedler (1986) p. 114.
35.6. PARAMETRIC EQUATIONS AND IMPLICIT FUNCTIONS 1097
f (x; ) = y0 (35.20)
Sy0 ( ) = fx 2 A : f (x; ) = y0 g
It associates to each parameter value the corresponding solution set of equation (35.20). The
solution correspondence describes how solutions vary as the parameter varies. The previous
questions then become:
(QP.i) is the set Sy0 ( ) not empty for all 2 ? If so, is it a singleton?
(QP.ii) is the set Sy0 ( ) not empty for some 2 ? If so, is it a singleton?
(QP.iii) if Sy0 is, locally or globally, a function, is it continuous (or even di erentiable)?
f (Sy0 ( ) ; ) = y0
So, to answer the unique existence questions (QP.i) and (QP.ii) amounts to check whether
Sy0 is a function implicitly de ned, locally or globally, by equation (35.20). That is, Sy0
gives the functional representation of the level curve
1
f (y0 ) = f(x; ) 2 A : f (x; ) = y0 g
The study of the solutions of a parametric equation given a known term and the study of
the functional representation of a level curve are, mathematically, equivalent exercises.
In sum, to answer the questions (QP.i) and (QP.ii) we need suitable versions of the
Implicit Function Theorem: local versions of such theorem would give local answers, while
global versions would give global answers. In any case, a deja vu: in our discussions of
implicit functions we already (implicitly) took this angle, which in economics is at heart
of comparative statics analysis (cf. Section 34.4.3). Indeed, conditions that ensure the
existence, at least locally, of a solution function Sy0 : ! Rn permit to e ectively describe
how solutions { the endogenous variables { react to changes in the parameters { the exogenous
variables.
1098 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
For brevity, we leave readers to revisit those discussions through the lenses of this section.
To help them, we close with a consequence of the operator version of the Implicit Function
Theorem (Theorem 1598) that deals with a parametric homogeneous equation
f (x; ) = 0 (35.21)
Theorem 1655 (Peano) There exists a continuous and surjective map f : Rn ! Rm for
all m; n 2 N.
The basic case of a Peano map f : R ! R2 is already striking: the plane is the continuous
image of the real line. By just following a scalar prescription, the Peano map enables you
to ll the plane without ever lifting the pencil. Formally, to every point y = (y1 ; y2 ) of the
plane corresponds a scalar x such that (f1 (x) ; f2 (x)) = (y1 ; y2 ).
Proof The case m = n = 1 is trivial (just take the identity function).17 In his original
work Peano constructed a highly non-trivial, here omitted, example of a continuous and
surjective map { the so-called Peano curve { from the closed unit interval [0; 1] to the closed
unit square [0; 1] [0; 1]. In a variation on Peano's theme, it is possible to show that there
exists a continuous and surjective map from the real line to the plane. By building on this,
next we show by induction that there exists a continuous and surjective map f1;m : R ! Rm
for all m 2. Initial step: as just remarked, there exists a continuous and surjective map
f1;2 : R ! R2 . Induction step: suppose now that there exists a continuous and surjective
map f1;m 1 : R ! Rm 1 . De ne f2;m : R2 ! Rm by
17
This proof is based on Section 1.8 of Aron et al. (2016), to which we refer for missing details.
35.8. ULTRACODA: EQUATIONS IN SCIENCE 1099
Clearly, this function is continuous. To show that it is also surjective, take any y =
(y1 ; y2 ; :::; ym ) 2 Rm . By the induction hypothesis, there exists a scalar a such that fm 1 (a) =
(y2 ; :::; ym ). By setting x = (y1 ; a) 2 R2 , we thus have f2;m (x) = y. We conclude that
Im f2;m = Rm . The composition f2;m f1;2 : R ! Rm is, therefore, a continuous and
surjective map. By setting f1;m = f2;m f1;2 , this completes the induction argument.
To conclude the proof it is enough to de ne fn;m : Rn ! Rm by fn;m (x1 ; :::; xn ) =
f1;m (x1 ). This function is continuous and surjective.
In sum, any two Euclidean spaces Rn and Rm are the continuous image (Peano) as well
the bijective image (Cantor) one of the other, regardless of their dimension. Peano's map is
not injective. Cantor's map is not continuous. Still, after all these surprises, one is left to
wonder whether it is possible to combine their striking ndings by showing the existence of
an homeomorphism between Euclidean spaces of di erent dimension.
In 1911, fortunately, Brouwer showed that this is not possible. Indeed, as a direct conse-
quence of his Domain Invariance Theorem we have the following result.
Theorem 1656 (Brouwer) Two spaces Rn and Rm , with n 6= m, are not homeomorphic.
At last, topology vindicates intuition: spaces with di erent dimensions are similar set-
theoretically but not topologically. In particular, continuous maps f : Rn ! Rm are either
surjective or injective when n 6= m, they cannot enjoy simultaneously both properties.18 A
basic topological failure characterizes a change in dimension.
Proof Let n 6= m, say n < m. Suppose, per contra, that there exists an homeomorphism
f : Rn ! Rm . De ne : Rn ! Rm by (x1 ; :::; xn ) = (x1 ; :::; xn ; 0; :::; 0). As (Rn ) Rm ,
this continuous map embeds Rn into Rm . In particular,
(Rn ) = f(x1 ; :::; xn ; 0; :::; 0) : (x1 ; :::; xn ) 2 Rn g
is a closed subset of Rm (with empty interior). Thus, the set f 1 ( (Rn )) is closed in Rn
because f is continuous. But, the map f 1 is continuous, so by the Domain Invariance
Theorem the set f 1 ( (Rn )) is also open. As f 1 ( (Rn )) is neither empty nor equal to
Rn , we reached a contradiction. We conclude that there is no homeomorphism between Rn
and Rm .
Summing up, we conclude { on the shoulders of Cantor, Peano and Brouwer { that the
dimension of an Euclidean space is a topological, but not a set-theoretic, invariant.
We can consider four main problems about a scienti c inquiry described by a triple (X; Y; M ).
We formalize them by means of the evaluation function g : X M ! Y de ned by g (x; m) =
m (x) that relates causes, e ects and models through the expression
y = g (x; m) (35.22)
(i) Direct problems: Given a model m and a cause x, what is the resulting e ect y?
formally, which is the (unique) value y = g (x; m) given x 2 X and m 2 M ?
(ii) Causation problems: Given a model m and an e ect y, what is the underlying cause
x? formally, which are the (possibly multiple) values of x that solve equation (35.22)
given y 2 Y and m 2 M ?
(iii) Identi cation problems: Given a cause x and an e ect y, what is the underlying model
m? formally, which are the (possibly multiple) values of m 2 M that solve equation
(35.22) given x 2 X and y 2 Y ?
(iv) Induction problems: Given an e ect y, what are the underlying cause x and model m?
formally, which are the (possibly multiple) values of x 2 X and m 2 M that solve
equation (35.22) given x 2 X?
The latter three problems { causation, identi cation and induction { are formalized by
regarding (35.22) as an equation. For this reason, we call them inverse problems.19 We can
thus view the study of equations as a way to address such problems. In this regard, note
that:
1. In causation and identi cation problems, the equation (35.22) is parametric. In the
former problem, x is the unknown, y is the known term and m is a parameter; in the
latter problems, m is the unknown, y is the known term and x is a parameter.
2. In induction problems, y is the known term of equation (35.22), while x and m are the
unknowns.
Example 1657 Consider an orchard with several apple trees that produce a quantity of
apples according to the summer weather conditions; in particular, the summer could be
either cold or hot or mild. Here m is an apple tree that belongs to the collection M of the
apple trees of the orchard, y is the apple harvest with Y = [0; 1), and x is the average
summer temperature with X = [0; 1). We interpret m (x) as the quantity of apples that
the tree m produces when the summer weather is x. The trees in the orchard thus di er in
their performance in the di erent weather conditions.
In this example the previous four problems takes the form:
(i) Given a tree m and an average summer temperature x, what is the resulting apple
harvest y?
19
In this chapter we considered the case X; Y Rn , but the study of equations can be carried out more
generally, as readers will learn in more advanced courses.
35.8. ULTRACODA: EQUATIONS IN SCIENCE 1101
(ii) Given a tree m and an apple harvest y, what is the underlying average summer tem-
perature x?
(iii) Given an average summer temperature x and an apple harvest y, what is the underlying
tree m?
(iv) Given an apple harvest y, what are the underlying average summer temperature x and
tree m? N
1102 CHAPTER 35. EQUATIONS AND INVERSE FUNCTIONS
Chapter 36
Study of functions
It is often useful to have, roughly, a sense of how a function looks like. In this chapter we
will outline a qualitative study of functions. To this end, we rst introduce couple of classes
of points.
A dual de nition holds for (strict) convexity at a point. From Corollary 1438 it immedi-
ately follows the next result.
Example 1660 (i) The function f : R ! R given by f (x) = 2x2 3 is strictly convex at
each point because f 00 (x) = 4 > 0 at each x. (ii) The function f : R ! R given by f (x) = x3
is strictly convex at x0 = 5 since f 00 (5) = 30 > 0, and it is strictly concave at x0 = 1 since
f 00 ( 1) = 6 < 0. N
1103
1104 CHAPTER 36. STUDY OF FUNCTIONS
5 10
y y
0 f(x )
0
6
-5 4 f(x )
0
-10
O x x O x x
0 0
-15 -2
0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 7
O.R. Like the rst derivative of a function at a point gives information on its increase or
decrease, so the second derivative gives information on concavity or convexity at a point.
The greater jf 00 (x0 )j, the more pronounced the curvature (the \belly") of f at x0 { and the
\belly" is upward if f 00 (x0 ) < 0 and downward if f 00 (x0 ) > 0, as the previous gure shows.
Economic applications often consider the ratio
f 00 (x0 )
f 0 (x0 )
which does not depend on the unit of measure of f (x) (cf. Section 31.1.4). Indeed, let T and
S be the units of measure of the dependent and independent variables, respectively. Then,
the units of measure of f 0 and of f 00 are T =S and T =S 2 , so the unit of measure of f 00 =f 0 is
T
S2 1
T
=
S
S
In short, in an in ection point the \sign" of the concavity of the function changes. By
Proposition 1659, we have the following simple result.
(ii) If f 00 (x0 ) = 0 and f 000 (x0 ) 6= 0, then x0 is an in ection point for f (provided f is three
times continuously di erentiable at x0 ).
36.2. ASYMPTOTES 1105
Example 1663 (i) The origin is an in ection point of the cubic function f (x) = x3 . (ii)
2 2
Let f : R ! R be the Gaussian function f (x) = e x . Then f 0 (x) = 2xe x and f 00 (x) =
2
4x2 2 e x , so the function is concave for
1 1
p <x< p
2 2
p p
and convex
p for jxj > 1= 2. The two points 1= 2 are therefore in ection points. Indeed,
f 00 ( 1= 2) = 0. We will continue the study of this function later in the chapter in Section
36.4. N
For di erentiable functions, geometrically at a point of in ection x0 the tangent line cuts
the graph: it cannot lie (locally) above or below it. In particular, if f 0 (x0 ) = f 00 (x0 ) = 0
then the tangent line is horizontal and cuts the graph of the function: we talk of a point of
in ection with horizontal tangent.
Example 1664 The origin is an in ection point with horizontal tangent of the cubic func-
tion, as well as of any function f (x) = xn with n odd. N
36.2 Asymptotes
Intuitively, an asymptote is a straight line to which the graph of a function gets arbitrarily
close. Such straight lines can be vertical, horizontal, or oblique.
(i) When at least one of the two following conditions is satis ed:
lim f (x) = +1 or 1
x!x+
0
lim f (x) = +1 or 1
x!x0
(ii) When
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1
(iii) When
lim (f (x) ax b) = 0 (or lim (f (x) ax b) = 0)
x!+1 x! 1
that is, when the distance between the function and the straight line y = ax + b tends
to 0 as x ! +1 (or ! 1), the straight line of equation y = ax + b is an oblique
asymptote for f to +1 (or to 1).
Horizontal asymptotes are actually the special case of oblique asymptotes with a = 0.
Moreover, it is evident that there can be at most one oblique asymptote as x ! 1 or as
x ! +1. It is, instead, possible that f has several vertical asymptotes.
1106 CHAPTER 36. STUDY OF FUNCTIONS
7
f (x) = 3
x2 +1
with graph
2
y
1.5
0.5
-0.5
-1
-1.5
O x
-2
-2.5
-3
-3.5
-5 0 5
Since limx!+1 f (x) = limx! 1 f (x) = 3; the straight line y = 3 is both a right and a
left horizontal asymptote for f (x). N
1
f (x) = +2
x+1
with graph
8
y
6
0
O x
-2
-4
-5 0 5
1
f (x) =
x2 +x 2
36.2. ASYMPTOTES 1107
with graph
3
y
0
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4 5
Since limx!1+ f (x) = +1 and limx!1 f (x) = 1, the straight line x = 1 is a vertical
asymptote for f (x). Moreover, since limx! 2+ = 1 and limx! 2 = +1, also the
straight line x = 2 is a vertical asymptote for f (x). N
2x2
f (x) =
x+1
with graph
20
y
15
10
0
O x
-5
-10
-15
-20
-6 -4 -2 0 2 4 6
Vertical and horizontal asymptotes are easily identi ed. We thus shift our attention to
oblique asymptotes. To this end, we provide two simple results.
Proof \If". When f (x) =x ! a, consider the di erence f (x) ax. If it tends to a nite
limit b, then (and only then) f (x) ax b ! 0. \Only if". From f (x) ax b ! 0 it
follows that f (x) ax ! b and, by dividing by x, that f (x) =x a ! 0.
Proposition 1669 gives a necessary and su cient condition for the search of oblique
asymptotes, while Proposition 1670 only provides a su cient condition. To use this latter
condition, the limits involved must exist. In this regard, consider the following example.
cos x2
f (x) = x +
x
as x ! 1 we have
f (x) cos x2
=1+ !1
x x2
and
cos x2
f (x) x= !0
x
Therefore, y = x is an oblique asymptote of f as x ! 1. Nevertheless, the rst derivative
of f is
2x2 sin x2 cos x2 cos x2
f 0 (x) = 1 + = 1 2 sin x2
x2 x2
It is immediate to verify that the limit of f 0 (x) as x ! 1 does not exist. N
and as x ! +1
r 1 !
p 1 1 2
f (x) x= x2 x x=x 1 x=x 1 1
x x
1
1
1 x
2
1 1
= 1 !
x
2
Therefore,
1
y=x
2
is an oblique asymptote as x ! +1 for f . N
(i) If f (x) = g (x) + h (x) and h (x) ! 0 as x ! 1, then f and g share the possible
oblique asymptotes.
p 1 a1
y= n
a0 x +
n a0
Let us verify only (ii) for n odd (for n even the calculations are analogous). If n is odd
as x ! 1, we have
p q
n
a xn n 1 + a1 + ::: + an
f (x) 0 a0 x a0 x p
= ! n a0
x x
p
hence the slope of the oblique asymptote is n a0 . Moreover
" 1 #
p p a1 xn 1+ ::: + an n
f (x) n
a0 x = n
a0 x 1+ 1 =
a0 xn
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
p a1 xn 1+ ::: + an
= n
a0 x a 1 xn 1 +:::+a
a0 xn n
a 0 xn
1110 CHAPTER 36. STUDY OF FUNCTIONS
Since as x ! 1
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
1 p a1 xn 1 + ::: + an p a1
a1 xn 1 +:::+an
! and n
a0 x ! n
a0
n a0 xn a0
a 0 xn
we have, as x ! 1,
p p a1 1
f (x) n
a0 x ! n
a0
a0 n
In the previous example we had n = 2, a0 = 1, and a1 = 1. Indeed, as x ! +1, the
asymptote had the equation
p
2 1 1 1
y= 1 x+ =x
2 1 2
(i) We rst calculate the limits of f at the boundary points of the domain, and also as
x ! 1 when A is unbounded.
(ii) We determine the sets on which the function is positive, f (x) 0, increasing, f 0 (x)
0, and concave/convex, f 00 (x) Q 0. Once it is also determined the intersections of the
graph with the axes by nding the set f (0) on the vertical axis and the set f 1 (0) on
the horizontal axis, we begin to have a rst idea of its graph.
(iii) We look for candidate extremal points via rst and second-order conditions (or, more
generally, via the omnibus procedure of Section 29.2.2).
(iv) We look, via the condition f 00 (x) = 0, for candidate in ection points; they are certainly
so if at them f 000 6= 0 (provided f is three times continuously di erentiable at x).
Example 1674 Let f : R ! R be given by f (x) = x6 3x2 + 1. We look for possible local
extremal points. The rst-order condition f 0 (x) = 0 has the form
6x5 6x = 0
minimizers. From limx!+1 f (x) = limx! 1f (x) = +1 if follows that the graph of this
function is:
2
y
1.5
0.5
0
O x
-0.5
-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
N
Example 1675 De ne f : [0; 1) ! R by
(
x log x x>0
f (x) =
0 x=0
This function is continuous (why at the origin?) and is zero at the points 0 and 1. In view
of Example 1685, it is strictly concave and its unique maximizer is the point 1=e. Since
limx!+1 f (x) = 1, we conclude that the graph of f is
N
Example 1676 Let f : R ! R be given by f (x) = x3 7x2 + 12x. We have
lim f (x) = 1 , lim f (x) = +1
x! 1 x!+1
3. Since f 00 (x) = 6x 14, it is zero for x = 7=3. The second derivative is 0 when
x 7=3.
00 ((7
p
4. Since
p f 13)=3) < 0, the point is a local maximizer; since instead f 00 ((7 +
13)=3) > 0, the point is a local minimizer. Finally, the point 7=3 is of in ection.
10
y
8
0
O x
-2
-3 -2 -1 0 1 2 3 4 5 6 7
Example 1677 Let f : R ! R be given by f (x) = xex . Its limits are limx! 1 xe
x =0
and limx!+1 xex = +1. We then have:
1. f (x) 0 () x 0.
2. f 0 (x) = (x + 1) ex 0 () x 1.
3. f 00 (x) = (x + 2) ex 0 () x 2.
4. f (0) = 0, so the origin is the unique point of intersection with the axes.
36.3. STUDY OF FUNCTIONS 1113
10
9
y
8
0
O x
-1
-6 -4 -2 0 2 4 6
lim x2 ex = 0+ , lim x2 ex = +1
x! 1 x!+1
We then have:
p p
3. f 00 (x) = x2 + 4x + 2 ex 0 () x 2 ( 1; 2 2] [ [ 2 + 2; +1).
p
5. The two points of abscissae 2 2 are in ection points.
1114 CHAPTER 36. STUDY OF FUNCTIONS
8 y
7
0
O x
-1
-4 -3 -2 -1 0 1 2 3 4 5
lim x3 ex = 0 , lim x3 ex = +1
x! 1 x!+1
1. f (0) = 0; f (x) 0 () x 0.
p p
3. f 00 (x) = x3 + 6x2 + 6x ex 0 () x 2 3 3; 3 + 3 [ [0; 1).
p
5. The three points of abscissae 3 3 and 0 are in ection points.
36.3. STUDY OF FUNCTIONS 1115
8
y
7
0
O x
-1
-2
-6 -5 -4 -3 -2 -1 0 1 2 3
1
f (x) = 2x + 3 +
x 2
1. f (0) = 3 0:5 = 2:5; we have f (x) = 0 when (2x + 3) (x 2) = 1, that is, when
2x2 x 5 = 0, i.e., for
p
1 41
x= ' 1:35 and 1:85
4
3. Since
2
f 00 (x) =
(x 2)3
is positive
p for every x p
> 2 and negative for every x < 2, the two stationary points
2 + (1= 2) and 2 (1= 2) are, respectively, a local minimizer and a local maximizer.
1116 CHAPTER 36. STUDY OF FUNCTIONS
1
lim [f (x) 2x] = lim 3+ =3
x! 1 x! 1 x 2
25
20 y
15
10
0
O x
-5
-10
-15
-20
-25
-5 0 5 10
Note that
1
f (x) as x ! 2 and f (x) 2x + 3 as x ! 1
x 2
Thus, near the point 2 the function f (x) behaves like 1= (x 2), i.e., it diverges, while
for x su ciently large it behaves like the straight line 2x + 3. N
36.4 Bells
Gaussian function Let f : R ! R be the Gaussian function
x2
f (x) = e
Both limits, as x ! 1, are 0. So, the horizontal axis is a horizontal asymptote. The
function is always strictly positive and f (0) = 1. Next, we look for possible local extremal
2
points. The rst-order condition f 0 (x) = 0 has the form 2xe x = 0, so the origin x = 0
is the unique critical point. The second derivative is
x2 x2 x2
f 00 (x) = 2e + ( 2x) e ( 2x) = 2e 2x2 1
by Proposition 1343 the origin is actually a strong global maximizer. Moreover, we have
1 1
f 00 (x) < 0 () 2x2 1 < 0 () x 2 p ;p
2 2
1
f 00 (x) = 0 () 2x2 1 = 0 () x = p
2
1 1
f 00 (x) > 0 () 2x2 1 > 0 () x 2 1; p [ p ; +1
2 2
p
So, the
p points
p x = 1= 2 are in ection points, with fp concave on
p the open interval
( 1= 2; 1= 2) and convex on the open intervals ( 1; 1= 2) and (1= 2; +1). The graph
of the function is the famous Gaussian bell:
y
1.5
0.5
0
O x
-0.5
-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Versiera of Agnesi The versiera of Agnesi is a curve that can be constructed as follows.1
Consider a circle of radius a > 0 centered at the point (0; a) and a straight line of equation
y = 2a. Take all straight lines that pass through the origin 0. Each of them intersects the
straight line y = 2a at a unique point A = (A1 ; A2 ). In turn, the segment 0A intersects
twice the circle, at the origin and at a point B = (B1 ; B2 ). The versiera is the locus of
points (A1 ; B2 ) that have as a rst coordinate that of A and as a second coordinate that of
1
It is named after Maria Agnesi, who studied it in her 1748 book Instituzioni analitiche.
1118 CHAPTER 36. STUDY OF FUNCTIONS
B. Graphically:
y
1.5
A A A
3 1 2
1
0.5
B B
3
1
0 B
2
-0.5
-1
O x
-1.5
-2
-2 -1 0 1 2
8a3
f (x) = 8x 2 R
x2 + 4a2
Remarkably, its graph is bell-shaped and reminds that of the Gaussian function:
5
y
4
0
O x
-1
-5 0 5
Indeed, we have
8a3
f 0 (x) = 2x
(x2 + 4a2 )2
36.4. BELLS 1119
which is zero only at the origin x = 0. Since xf 0 (x) < 0 for all x 6= 0,2 the origin is the
(global) maximizer. We also have
2
That is, f 0 (x) > 0 for all x < 0 and f 0 (x) < 0 for all x > 0.
1120 CHAPTER 36. STUDY OF FUNCTIONS
Part VII
Di erential optimization
1121
Chapter 37
Unconstrained optimization
1123
1124 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
S = fx 2 C : rf (x) = 0g
f (^
x) f (x) 8x 2 S (37.2)
then x
^ is a solution for the optimization problem (37.1).
In other words, once the conditions for Tonelli's Theorem to be applied are veri ed, one
constructs the set of critical points S. A point x ^ in S where f attains its top value is a
solution of the optimization problem and f (^x) is the maximum value of f on C.
N.B. If the function f is twice continuously di erentiable, in phase 1 instead of S one can
consider the subset S2 S of the critical points that satisfy the second-order necessary
condition (Sections 28.5.3 and 29.3.3). O
The rationale of the elimination method is simple. By Fermat's Theorem, the set S
consists of all points in C which are candidate local solutions for the optimization problem
(37.1). On the other hand, if f is continuous and coercive on C, by Tonelli's Theorem there
exists at least a solution for this optimization problem. Such a solution must belong to
the set S (as long as it is non-empty) because a solution of the optimization problem is, a
fortiori, a local solution. Hence, the solutions of the \restricted" optimization problem
are also solutions of the optimization problem (37.1). But, the solutions of the restricted
problem (37.3) are the points x^ 2 S for which condition (37.2) holds, which are then the
solutions of optimization problem (37.1), as phase 2 of the elimination method states.
As the following examples show, the elimination method elegantly and e ectively com-
bines Tonelli's global result with Fermat's local one. Note how Tonelli's Theorem is crucial
since in unconstrained di erential optimization problems the choice set C is open, so Weier-
strass' Theorem inapplicable (as it requires C to be compact).
The smaller is the set S of critical points, the better the method works in that phase 2
requires a direct comparison of f at all points of S. For this reason, the method is particularly
e ective when we can consider, instead of S, its subset S2 consisting of all critical points
which satisfy the second-order necessary condition.
2
Example 1681 Let f : Rn ! R be given by f (x) = (1 kxk2 )ekxk and let C = Rn . The
function f is coercive on Rn . Indeed, it is supercoercive: by taking tn = kxn k, it follows that
2 2
f (xn ) = 1 kxn k2 ekxn k = 1 t2n etn ! 1
for any sequence fxn g of vectors such that tn = kxn k ! +1. Since it is continuous, f is
coercive on Rn by Proposition 1019. The unconstrained di erential optimization problem
2
max 1 kxk2 ekxk sub x 2 Rn (37.4)
x
37.2. COERCIVE PROBLEMS 1125
of Example 1341. Let us check that this di erential problem is coercive. By setting g (x) = ex
and h (x) = x4 x2 , it follows that f = g h. We have limx! 1 h (x) = limx! 1 x4 +x2 =
1. So, by Proposition 1019 the function h is coercive on R. Since g is strictly increasing,
the function f is a strictly increasing transformation of a coercive function. By Proposition
1007, f is coercive.
This unconstrained di erential optimization problem is thus coercive and can be solved
with the elimination method.
p p
Phase 1: From Example 1341 we know that S2 = 1= 2; 1= 2 .
p p p
Phase 2: We have f ( 1= 2) = f (1= 2), so both points x ^ = 1= 2 are solutions of the
unconstrained optimization problem. The elimination method allowed us to identify the
nature of such points, something not possible by using solely di erential methods as in
Example 1341. N
is said to be concave if the set C A is both open and convex and if the function f : A
Rn ! R is both di erentiable and concave on C.
As we learned earlier in the book (Section 31.5.1), in a such a problem the rst-order
condition rf (^ x) = 0 becomes necessary and su cient for a point x ^ 2 C to be a solution.
This remarkable property explains the importance of concavity in optimization problems.
But, more is true: by Theorem 1032, such a solution is unique if f is strictly quasi-concave.
Besides existence, also the study of the uniqueness of solutions { key for comparative statics
exercises { is best carried out under concavity.
The necessary and su cient status of the rst-order condition leads to the concave (elim-
ination) method to solve the concave problem (37.6). It consists of a single phase:
1. Find the set S = fx 2 C : rf (x) = 0g of the stationary points of f on C; all, and only,
the points x
^ 2 S solve the optimization problem.
It requires the concavity of the objective function, a demanding condition that, however, is
often assumed in economic applications, as remarked before.3
Example 1685 Let f : (0; 1) ! R be given by x log x and let C = (0; 1). The function
f is strictly concave since f 00 (x) = 1=x < 0 for all x > 0 (Corollary 1438). Let us solve the
concave problem
max x log x sub x > 0 (37.7)
x
We have
1
f 0 (x) = 0 () log x = 1 () elog x = e 1
() x =
e
According to the concave method, x
^ = 1=e is the unique solution of problem (37.7). N
Example 1686 Let f : R2 ! R be given by f (x) = 2x2 3xy 6y 2 and let C = R2 . The
function f is strictly concave since the Hessian
4 3
3 12
We have
4x 3y = 0
rf (x) = 0 () () x = (0; 0)
12y 3x = 0
By the concave method, the origin x
^ = (0; 0) is the unique solution of problem (37.8). N
Example 1687 For bundles with two goods, the Cobb-Douglas utility function u : R2+ ! R
is u (x1 ; x2 ) = xa1 x12 a , with a 2 (0; 1). Consider the consumer problem
w
max f (x1 ) sub x1 2 0;
x1 p1
Since f (0) = f (w=p1 ) = 0 and f 0, the maximizers are easily seen to belong to the open
interval (0; w=p1 ). Therefore, we can consider the nicer unconstrained problem
w
max f (x1 ) sub x1 2 0;
x1 p1
a p1 1 (w p1 x1 )
g 0 (x1 ) = 0 () = (1 a) w p1 x1 () a = p1 (1 a)
x1 p2 p2
x1
Since g is easily checked to be strictly concave, by the concave method the unique maximizer
is
w
x
^1 = a
p1
By replacing it in the budget constraint, we conclude that
w w
x
^= a ; (1 a)
p1 p2
The last example shows that via some little \cogito ergo solvo" it may possible to reduce
a constrained optimization problem into a simpler unconstrained one. Next we give another
simple illustration of this important fact.
The function f is positive on [0; 1], strictly positive on (0; 1) and zero at the points 0 and 1.
So, any solution of this problem must belong to (0; 1) and we can thus consider the simpler
unconstrained problem
max f (x) sub x 2 (0; 1)
x
In view of Example 1685, x^ = 1=e is the unique solution of problem (37.10). This nding is
consistent with the graph of f described in Example 1675, i.e.,
1. The two classes are not exhaustive: there are unconstrained di erential optimization
problems which are neither coercive nor concave. For example, the unconstrained
di erential optimization problem
is neither coercive nor concave: the cosine function is neither coercive on the real line
(see Example 1006) nor concave. Nonetheless, the problem is trivial: as one can easily
infer from the graph of the cosine function, its solutions are the points x = 2k con
k 2 Z. As usual, common sense gives the best guidance in solving any problem (in
particular, optimization ones), more so than any classi cation.
2. The two classes are not disjoint: there are unconstrained di erential optimization prob-
lems which are both coercive and concave. For example, the unconstrained di erential
optimization problem
max 1 x2 sub x 2 R
x
1130 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
is both coercive and concave: the function 1 x2 is indeed both coercive (see Example
1010) and strictly concave on the real line. In cases such as this one, we use the more
powerful concave method.5
3. The two classes are distinct: there are unconstrained di erential optimization problems
which are coercive but not concave, and vice versa.
3
y
1
1
0
O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
shows how it is concave, but not coercive. The optimization problem is thus
concave, but not coercive.
(b) The unconstrained di erential optimization problem
x2
max e sub x 2 R
x
is coercive but not concave: the Gaussian function e x2 is indeed coercive (Ex-
5
As coda readers may have noted, this objective function is strongly concave. Indeed, it is for such a class
of concave functions that the overlaps of the two classes of unconstrained di erential optimization problems
works at best.
37.5. RELAXATION 1131
ample 1008) but not concave, as its famous bell graph shows
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
37.5 Relaxation
An optimization problem
max f (x) sub x 2 C
x
with objective function f : A Rn ! R may be solved by relaxation, that is, by considering
an ancillary optimization problem
which is characterized by a larger choice set C B A which is, however, analytically more
convenient (for example it may be convex or open), so that the relaxed problem becomes
coercive or concave. If, crossing ngers, a solution of the relaxed problem happens to belong
to the original choice set C, it automatically solves the original problem as well. The following
examples should clarify this simple yet powerful idea, which may permit to solve optimization
problems that are neither coercive nor concave.
where Qn+ is the set of vectors in Rn whose coordinates are rational and positive. An obvious
relaxing of the problem is
2
max 1 kxk2 ekxk sub x 2 Rn
x
whose choice set is larger yet analytically more convenient. Indeed the relaxed problem is
coercive and a simple application of the elimination method shows that its solution is the
^ = 0 (Example 1681). Since it belongs to Qn+ , we conclude that the origin is also
origin x
the unique solution of problem (37.11). It would have been far more complex to reach such
a conclusion by studying the original problem directly.
1132 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
Exercise 1690 (ii) Consider the consumer problem with log-linear utility
n
X
max ai log xi sub x 2 C (37.12)
x
i=1
where C = B (p; w) \ Qn is the set of bundles with rational components (a realistic assump-
tion). Consider the relaxed version
n
X
max ai log xi sub x 2 B (p; w)
x
i=1
with a larger yet convex { thus analytically more convenient { choice set. Indeed, convexity
itself allowed us to conclude in Section 22.6 that the unique solution of the problem is the
bundle x ^ such that x
^i = ai w=pi for every good i = 1; :::; n. If ai ; pi ; w 2 Q for every i, the
bundle x ^ belongs to C, so is the unique solution of problem (37.12). It would have been far
more complex to reach such a conclusion by studying problem (37.12) directly. N
In sum, it is sometimes convenient to ignore some of the constraints that the choice
features set when doing so makes the choice set larger yet more analytically tractable, in the
hope that some solutions of the relaxed problem belong to the original choice set.
We close with an example, based on an example of Leonida Tonelli, that nicely illustrates
some of the themes explored so far in this chapter.
Example 1691 We want to divide a natural number n in three parts n1 , n2 , and n3 such
that the sum of their cubes is minimal. To solve this problem amounts to solving the following
optimization problem:
sub n1 + n2 + n3 = n and n1 ; n 2 ; n 3 2 N
sub x1 + x2 + x3 = n and x1 ; x2 ; x3 2 R+
C = x 2 R2+ : x1 + x2 n
37.5. RELAXATION 1133
Its interior is the convex set int C = x 2 R2++ : x1 + x2 < n . The unconstrained optimiza-
tion problem
max f (x1 ; x2 ) sub (x1 ; x2 ) 2 int C (37.14)
x1 ;x2
f (x) = y0 (37.16)
If a vector x x) y0 k2 =
^ 2 A solves equation (37.16), then it solves problem (37.17). Indeed, kf (^
0. The converse is false because the optimization problem might have solutions even though
the equation has no solutions. Even in this case, however, the optimization connection is
important because the solutions of the optimization problems are the best approximations
{ i.e., the best surrogates { of the missing solutions. A classic example is a system of linear
equations Ax = b, which has the form (37.17) via the linear function f (x) = Ax de ned on
Rn and the known term b 2 Rm , i.e.,
In this case (37.17) is a least squares problem and, when the system has no solutions, we
have the least squares solutions studied in Section 22.10.
In sum, the solutions of the optimization problem (37.17) are candidate solutions of equa-
tion (37.16). If they turn out not to be solutions, they are nevertheless best approximations.
As to problem (37.17), assume that the image of f is a closed convex set of Rn . Consider
the auxiliary problem
min ky y 0 k2 sub y 2 Im f
y
By the general Projection Theorem (Section 31.6), there is a unique solution y^ 2 Im f , which
is characterized by the condition
(y0 y^) (^
y y) 0 8y 2 Im f
that admits at least a solution, i.e., arg maxx2C f (x) 6= ;. To ease notation, we denote the
maximum value by f^ = maxx2C f (x).
In words, a sequence fxn g in the choice set is relaxing if the objective function assumes
larger and larger values, so it gets closer and closer to the maximum value f^, as n increases.
The following notion gives some computational content to problem (37.19).
xn+1 = h (xn )
1
f^ f (xn ) = O
nk
The sequence of iterates fxn g is de ned recursively via h. Their images f (xn ) converge
to the maximum value f^ at a rate k, that is, there is a constant c > 0 such that
c
f^ f (xn )
nk
We consider the convergences of images f (xn ) because one should be primarily interested
in getting, as fast as possible, to values that are almost optimal. Indeed, solutions have per
se only an instrumental role, ultimately what matters is the value that they permit to attain.
In particular, given a threshold " > 0, iterates xn are "-optimal if
1
c k
n
"
So, if we are willing to accept an " deviation from the maximum value, it is enough to
1
perform (c=") k iterates.
6
We refer interested readers to Nesterov (2004) for a authoritative presentation of this topic.
1136 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
^k2
2 kx0 x
f^ f (xn ) (37.22)
n
for the sequence fxn g of its iterates.
Thus, objective functions that are -smooth and concave have a optimal decision pro-
cedure (37.21), called gradient descent, with unitary speed { i.e., with O (1=n) errors. The
gradient descent procedure prescribes that, if at x we have @f (x) =@xi > 0 (resp., < 0), in
the next iterate we increase (resp., decrease) the component i of the vector x. If one draws
the graph of a scalar concave function, the intuition behind this rule should be apparent.7
This rule reminds a basic rule of thumb when trying to reach the peak of a mountain: at a
crossroad, always take the rising path.
The proof relies on the following lemma of independent interest (it is a rst-order ap-
proximation with integral remainder).
By (44.64), we have
Z 1 Z 1
0
f (y) f (x) = (1) (0) = (t) dt = rf (x + t (y x)) (y x) dt
0 0
as desired.
The next lemma reports some important inequalities for -smooth functions.
Proof of Theorem 1694 Set g = f . Clearly, also the function g is -smooth. Since g is
convex, we then have
f^ f (xn ) rf (xn+1 ) (^
x xn ) krf (xn )k kxn x
^k kx0 x
^k krf (xn )k
By (37.27), 0 dn+1 dn . Assume dn > 0 for each n, otherwise xn is the maximizer. Then
dn 1 1 1
1 2 (dn dn+1 ) kx0 ^ k2
x =2 kx0 ^ k2
x
dn+1 dn dn+1 dn+1 dn
that is,
1 1 1
dn+1 dn 2 kx0 ^k2
x
By iterating we get
1 1 1
2 +
d1 2 kx0 x
^k d0
1 1 1 1 1 1 2 1
d2 2 + d 2 + 2 +
d0
= 2 +
d0
2 kx0 x
^k 1 2 kx0 x
^k 2 kx0 x
^k 2 kx0 x
^k
1 n 1
2 +
dn 2 kx0 x
^k d0
^k2
2 kx0 x
0 < dn
n
This proves (37.22).
Example 1697 Given a matrix A , with n m, consider the least squares optimization
m n
problem (37.18), i.e.,
max g (x) sub x 2 Rn
x
with g : Rn ! R de ned by g (x) = kAx bk2 . Then, rg (x) = AT (Ax b), so for
some > 0 we have
where the last inequality holds because the Gram matrix AT A induces a linear operator
g : Rn ! Rn de ned by g (x) = AT Ax, which is Lipschitz continuous by Theorem 898.
1140 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
We conclude that g is -smooth. Since it is also concave, by the last theorem the map
h : Rn ! Rn de ned by
1 T
h (x) = x A (Ax b)
^k2
2 kx0 x
f^ f (xn )
n
for the sequence of iterates
1
xn+1 = xn AT (Axn b)
generated by h. N
De nition 1698 A sequence fxn g C is maximizing for problem (37.1) if lim f (xn ) = f^.
Next we show that under some standard conditions maximizing sequences converge to
solutions.
Theorem 1699 Let f : C ! R be continuous and coercive. If problem (37.1) has a unique
solution x
^ 2 C, then a sequence fxn g C is maximizing for this problem if and only if it
converges to x
^.
Proof We prove the \if" because the converse is trivial. Let x ^ be the unique solution of
problem (37.1). Let fxn g C be maximizing, i.e., lim f (xn ) = f^. We want to show that
xn ! x ^. Suppose, by contradiction, that there exists " > 0 and a subsequence fxnk g such
that kxnk x ^k " for all k (cf. Proposition 2115). Since f is coercive, there is a scalar t < f^
such that (f t) \ C is compact. Since limk!+1 f (xnk ) = f^, eventually all terms of the
subsequence fxnk g belong to the set (f t)\C. By the Bolzano-Weierstrass' Theorem, there
exists a subsubsequence xnks that converges to some x 2 (f t). Since f is continuous,
we have lims!+1 f xnks = f (x ) f^ = lims!+1 f xnks , where the equality follows
from lim f (xn ) = f^. So, f^ = f (x ). In turn, this implies x^ = x . We thus reached the
contradiction:
0<" xnks x
^ xnks x + kx x
^k = xnks x !0
We conclude that xn ! x
^.
The following simple example shows that this result is, in general false.
37.7. CODA: COMPUTATIONAL ISSUES 1141
^ = 0, so f^ = 1. The sequence xn = 1
The unique maximizer of f is x 1=n is maximizing
because
1
f (xn ) = 1 !1
n
Yet, xn ! 1. N
The following consequence of the previous theorem is especially relevant for problem
(37.20).
Proof We prove the \if" because the converse is trivial. Let f : Rn ! R be strictly concave
and supercoercive. The function f is continuous because it is concave (Theorem 833). By
Tonelli's Theorem, problem (37.20) has then a solution, which is unique because f is strictly
concave (Theorem 1032). The result now follows from Theorem 1699.
Example 1702 In the last example, assume that (A) = n . By Theorem 1057, the function
g is strictly concave and supercoercive. So, the iterates
1
xn+1 = xn AT (Axn b) (37.29)
1
^ = AT A
converge to the least squares solution x AT b. The iteration does not require any
matrix inversion. N
Thus, for optimization problems featuring strictly concave and supercoercive objective
functions, the sequence recursively de ned via a decision procedure converges to the solution.
If we make the stronger assumption that the objective function is strongly concave,8 then
we can bound the rate of convergence to solutions of maximizing sequences.
Thus, for a the sequence fxn g recursively de ned via a decision procedure with speed of
order k we have p
c
kxn x ^k k
n2
provided the objective function is strongly concave.
8
Recall that strongly concave functions are strictly concave and supercoercive (Section 31.6).
1142 CHAPTER 37. UNCONSTRAINED OPTIMIZATION
The proof of Proposition 1703 is an easy consequence of the following lemma that sharp-
ens for strongly concave functions a classic inequality that holds for concave functions (cf.
Theorem 1471).
Proof By de nition, there is k > 0 such that the function g : U ! R de ned by g (x) =
f (x) + k kxk2 is concave. Then, for all x; y 2 U we have
g (y) g (x) + rg (x) (y x)
so that
f (y) + k kyk2 f (x) + k kxk2 + rf (x) (y x) + 2kx (y x) (37.31)
We have
k kxk2 k kyk2 + 2kx (y x) = k kxk2 kyk2 + 2x (y x)
= k kxk2 kyk2 + 2x y x x
Proof of Proposition 1703 Assume that f is strongly concave with constant k > 0. By
(37.30), we have f (x) pf p (^
x) + rf (^
x) (x x ^) k kx x ^k2 = f (^x) k kx x ^k2pfor all
n
x 2 R . So, k^ x xk k f (^ n
x) f (x) for all x 2 R . In turn, by setting = k this
easily implies the desired result.
where PC : Rn ! C is the projection operator (Section 31.6). Indeed, the projection ensures
that the next iterate keeps being an element of the choice set C.
37.7. CODA: COMPUTATIONAL ISSUES 1143
1
If (A) = m, by (31.51) we have PC (x) = x + AT AAT (b Ax) for all x 2 Rn . So
1 1 1 1
h (x) = PC x+ rf (x) =x+ rf (x) + AT AAT b A x+ rf (x)
1 1 1 1
= x+ rf (x) + AT AAT b AT AAT A x+ rf (x)
provided f is di erentiable.
(ii) Let C = Rn+ be the positive orthant. Consider an optimization problem
+
By (31.51), PC (x) = x+ for all x 2 Rn , so h (x) = x + 1 rf (x) provided f is di eren-
tiable. N
Finally, there exist \accelerated" decision procedure that have speed of order 2, i.e.,
f^ f (xn ) = O 1=n2 . Roughly speaking, they have a bivariate form
1
yn+1 = xn + rf (xn )
xn+1 = n yn+1 + n yn
Equality constraints
38.1 Introduction
The classic necessary condition for local extremal points given by Fermat's Theorem considers
interior points of the choice set C, something that greatly limits its use in nding candidate
solutions of optimization problems coming from economics. Indeed, in many of them the
hypotheses of monotonicity of Proposition 979 hold and, therefore, the possible solutions are
on the boundary of the choice set, not in its interior. A classic example is the consumer
problem
max u (x) sub x 2 B (p; w) (38.1)
x
Under a standard hypothesis of monotonicity, by Walras' law the problem can be rewritten
as
max u (x) sub x 2 (p; w)
x
int (p; w) = ;
Fermat's Theorem is thus useless for nding the candidate solutions of the consumer prob-
lem. The equality constraint, with its drastic topological consequences, deprives us of this
fundamental result in the study of the consumer problem. Fortunately, there is an equally
important result of Lagrange that rescue us, as this chapter will show.
1145
1146 CHAPTER 38. EQUALITY CONSTRAINTS
the functions f and gi are continuously di erentiable on a non-empty and open subset D of
their domain A; that is, ; =
6 D int A.
The set
C = fx 2 A : gi (x) = bi 8i = 1; :::; mg (38.3)
is the subset of A identi ed by the constraints. Therefore, optimization problem (38.2) can
be equivalently formulated in canonical form as
Nevertheless, for this special class of optimization problems we will often use the more
evocative formulation (38.2).
In what follows we will rst study in detail the important special case of a single con-
straint, which we will then generalize in Section 38.7 to the case of several constraints.
The next fundamental lemma gives the key to nding the solutions of problem (38.4).
The hypothesis x ^ 2 C \ D requires that x^ be a point of the choice set at which f and g are
both continuously di erentiable. Moreover, we require that rg (^ x) 6= 0. In this regard, note
that a point x 2 D is said to be regular (with respect to the constraints) if rg (x) = 0, and
singular otherwise. According to this terminology, the condition rg (^ x) 6= 0 requires point
x
^ to be regular.
x) = ^ rg (^
rf (^ x) (38.5)
@f @g
x) = ^
(^ (^
x) 8k = 1; :::; n
@xk @xk
Thus, a necessary condition for x ^ to be a local solution of the optimization problem (38.4)
is that the gradients of the functions f and g are proportional. The \hat" above reminds
us that this scalar depends on the point x^ considered.
Next we give a proof of this remarkable fact based on the Implicit Function Theorem.
38.3. ONE CONSTRAINT 1147
Proof We prove the lemma for n = 2 (the extension to arbitrary n is routine if one uses a
version of the Implicit Function Theorem for functions of n variables). Since rg (^ x) 6= 0, at
least one of the two partial derivatives @g=@x1 or @g=@x2 is non-zero at x ^. Let for example
@g=@x2 (^x) 6= 0 (in the case @g=@x1 (^ x) 6= 0 the proof is symmetric). As seen in Section
34.3.2, the Implicit Function Theorem can be applied also to study locally points belonging
to the level curves g 1 (b) with b 2 R. Since x ^ = (^ ^2 ) 2 g 1 (b), this theorem yields
x1 ; x
neighborhoods U (^ x1 ) and V (^x2 ) and a unique di erentiable function h : U (^ x1 ) ! V (^x2 )
such that x^2 = h (^
x1 ) and g (x1 ; h(x1 )) = b for each x1 2 U (^ x1 ), with
@g
@x1 (x1 ; x2 )
h0 (x1 ) = @g
8 (x1 ; x2 ) 2 g 1
(b) \ (U (^
x1 ) V (^
x2 ))
@x2 (x1 ; x2 )
0 @f @f
(x1 ) = (x1 ; h(x1 )) + (x1 ; h(x1 )) h0 (x1 )
@x1 @x2
Since x
^ is a local solution of the optimization problem (38.4), there exists a neighborhood
B" (^
x) of x
^ such that
f (^
x) f (x) 8x 2 g 1 (b) \ B" (^
x) (38.6)
Without loss of generality, suppose that " is su ciently small so that
(^
x1 "; x
^1 + ") U (^
x1 ) and (^
x2 "; x
^2 + ") V (^
x2 )
Hence, B" (^
x) U (^
x1 ) V (^
x2 ). This permits to rewrite (38.6) as
f (^
x1 ; h (^
x1 )) f (x1 ; h (x1 )) 8x1 2 (^
x1 "; x
^1 + ")
that is, (^
x1 ) (x1 ) for every x1 2 (^ x1 "; x^1 + "). The point x ^1 is, therefore, a local
maximizer for . The rst-order condition reads
@g
!
0 @f @f @x1 (^
x1 ; x^2 )
(x1 ) = (^
x1 ; x
^2 ) (^
x1 ; x
^2 ) @g =0 (38.7)
@x1 @x2 (^
x 1 ; x
^ 2 )
@x2
If (@g=@x1 ) (^
x1 ; x
^2 ) 6= 0, we have
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
If (@g=@x1 ) (^
x1 ; x
^2 ) = 0, then (38.7) yields
@f
(^
x1 ; x
^2 ) = 0
@x1
The next example shows that condition (38.5) is necessary, but not su cient.
x31 + x32
max sub x1 x2 = 0 (38.8)
x1 ;x2 2
is of the form (38.4), where f; g : R2 ! R are given by f (x) = 2 1 (x31 + x32 ) and g (x) =
x1 x2 , while b = 0. We have rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1), so ^ = 0 is such
that rf (0; 0) = ^ rg (0; 0). Hence, the origin (0; 0) satis es condition (38.5) with ^ = 0.
But, the origin is not a solution of problem (38.8):
Note that the origin is not even a constrained (global) minimizer since f (t; t) = t3 < 0 for
every t < 0. N
@f @f @g @g
(^
x) ; (^
x) =^ (^
x) ; (^
x)
@x1 @x2 @x1 @x2
that is,
@f @g @f @g
x) = ^
(^ (^
x) and x) = ^
(^ (^
x) (38.10)
@x1 @x1 @x2 @x2
The condition rg (^ x) 6= 0 means that at least one of the partial derivatives (@g=@xi ) (^
x) is
di erent from zero. If, for convenience, we suppose that both are non-zero and that ^ 6= 0,
then (38.10) is equivalent to
@f @f
@x1 (^
x) @x2 (^
x)
@g
= @g
(38.11)
@x1 (^
x) @x2 (^
x)
38.3. ONE CONSTRAINT 1149
@f @f
df (^
x) (h) = rf (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
@g @g
dg (^
x) (h) = rg (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
They linearly approximate the di erences f (^ x + h) f (^x) and g (^x + h) g (^x), that is, the
e ect of moving from x ^ to x
^ +h on f and g. As we know well by now, such an approximation is
the better the smaller h. Suppose, ideally, that h is in nitesimal and that the approximation
is exact, so that f (^ x + h) f (^ x) = df (^x) (h) and g (^
x + h) g (^ x) = dg (^
x) (h). This is
clearly incorrect formally, but here we are proceeding heuristically.
Continuing in our heuristic reasoning, let us start now from the point x ^ and let us
consider variations x ^ + h with h in nitesimal. The rst issue to worry about is whether they
are legitimate, i.e., whether they satisfy the equality constraint g (^ x + h) = b. This means
that g (^
x + h) = g (^x), so h must be such that dg (^ x) (h) = 0. It follows that
@g @g
(^
x) h1 + (^
x) h2 = 0
@x1 @x2
and so
@g
@x2 (^
x)
h1 = @g
h2 (38.12)
@x1 (^
x)
The e ect of moving from x ^ to x
^ + h on the objective function f is given by df (^
x) (h). When
h is legitimate, by (38.12) this e ect is given by
@g
!
@f @x2 (^
x) @f
df (^
x) (h) = (^
x) @g
h2 + (^
x) h2 (38.13)
@x1 (^
x) @x 2
@x1
If x
^ is solution of the optimization problem, we must necessarily have df (^ x) (h) = 0 for
every legitimate variation h. Otherwise, if, say df (^x) (h) > 0, one would have a point x ^+h
that satis es the equality constraint, but such that f (^ x + h) > f (^x). On the other hand, if
instead df (^
x) (h) < 0 the same observation could be made this time for h, which is obviously
a legitimate variation, and that would lead to the point x ^ h with f (^ x h) > f (^ x).
The necessary condition df (^ x) (h) = 0 together with (38.13) gives
@g
!
@f @x2 (^
x) @f
(^
x) @g
h2 + (^
x) h2 = 0
@x1 (^
x) @x 2
@x1
which is precisely expression (38.11). At an intuitive level, all this explains why (38.5) is
necessary for x
^ to be solution of the problem.
1150 CHAPTER 38. EQUALITY CONSTRAINTS
This function, called the Lagrangian, plays a key role in optimization problems. Its gradient
is
@L @L @L
rL (x; ) = (x; ) ; :::; (x; ) ; (x; ) 2 Rn+1
@x1 @xn @
It is important to distinguish in this gradient the two parts rx L and r L given by
@L @L
rx L (x; ) = (x; ) ; :::; (x; ) 2 Rn
@x1 @xn
and
@L
r L (x; ) = (x; ) 2 R
@
Using this notation, we have
and
r L (x; ) = b g (x) (38.16)
which leads to the following fundamental formulation of the necessary condition of optimality
of Lemma 1707 in terms of the Lagrangian function.
Proof Let x^ be solution of the optimization problem (38.4). By Lemma 1707, there exists
^ 2 R such that
rf (^x) ^ rg (^x) = 0
By (38.15), the condition is equivalent to
x; ^ ) = 0
rx L(^
x; ^ ) = 0
On the other hand, by (38.15) we have r L (x; ) = b g (x), so we have also r L(^
since b g (^
x) = 0. It follows that (^ ^
x; ) is a stationary point of L.
Thanks to Lagrange's Theorem, the search for local solutions of the constrained opti-
mization problem (38.4) reduces to the search for the stationary points of a suitable function
of several variables, the Lagrangian function. It is a more complicated function than the
38.3. ONE CONSTRAINT 1151
original function f because of the new variable , but through it the search for the solutions
of the optimization problem can be done by solving a standard rst-order condition, similar
to the ones seen for unconstrained optimization problems.
Needless to say, we are discussing a condition that is only necessary: there is no guarantee
that the stationary points are actually solutions of the problem. It is already a remarkable
achievement, however, to be able to use the simple ( rst-order) condition
rL (x; ) = 0 (38.17)
to search for the possible candidate solutions of the constrained optimization problem (38.4).
In the next section we will see that this condition plays a fundamental role in the search for
the local solutions of problem (38.4) with the Lagrange's method, which in turn may lead to
the global solutions through a version of the elimination method.
x; ^ ) is not
We close with two important remarks. First, observe that in general the pair (^
a maximizer of the Lagrangian function, even when x ^ turns out to solve the optimization
problem. The pair (^ ^
x; ) is just a stationary point for the Lagrangian function, nothing
more. Therefore, it is erroneous to assert that the search for solutions of the constrained
optimization problem reduces to the search for maximizers of the Lagrangian function.
Second, note that problem (38.4) has a symmetric version
min f (x) sub g (x) = b
x
in which, instead of looking for maximizers, we look for minimizers. Condition (38.5) is
necessary also for this version of problem (38.4) and, therefore, the stationary points of the
Lagrangian function could be minimizers instead of maximizers. However, it can be the
case that they are neither maximizers nor minimizers. This is the usual ambiguity of rst-
order conditions, encountered also in unconstrained optimization: it re ects the fact that
rst-order conditions are only necessary conditions.
On the other hand, again by a heuristic application of the chain rule we have
df (^
x (b)) x (b)) dx (b)
df (^
=
db dx db
@f (^x (b)) ^ @g (^
x (b)) ^ @g (^x (b)) dx (b)
= (b) + (b)
@x @x @x db
@f (^x (b)) ^ @g (^
x (b)) 0 x (b)) dx (b)
@g (^
= (b) x^ (b) + ^ (b)
@x @x @x db
| {z }
=0 by (38.5)
x (b)) dx (b) ^
@g (^
= ^ (b) = (b)
@x db
where the last equality follows from (38.18). Summing up, for every scalar b we have
df (^
x (b)) ^
= (b)
db
The multiplier is thus the \marginal maximum value" in that it quanti es the marginal
e ect on the attained maximum value of (slightly) altering the constraint. For instance, in
the consumer problem the scalar b is the income of the consumer, so the multiplier quanti es
the marginal e ect on the attained maximum utility of a (small) variation in income.
N.B. We are using the word \altering" rather than \relaxing" because by changing b the
choice set (38.3) does not get larger. It just becomes di erent. So, a priori, a change in b
might not be bene cial (indeed, the sign of the multiplier can be positive or negative). In
contrast, the word \relaxing" becomes appropriate in studying variations of the scalars that
de ne inequality constraints (cf. the discussion in Section 41.7). O
1. determine the set D where the functions f and g are continuously di erentiable;
2. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di erentiable;
4. determine the set S of the regular points x 2 C \ (D D0 ) for which there exists a
Lagrange multiplier 2 R such that the pair (x; ) 2 Rn+1 is a stationary point of the
Lagrangian function, that is, it satis es the rst-order condition (38.17);1
1
Note that S C because the points that satisfy condition (38.17) also satisfy the constraints. It is
therefore not necessary to check if for a point x 2 S we have also x 2 C.
38.4. THE METHOD OF ELIMINATION 1153
5. the local solutions of the optimization problem (38.4), if they exist, belong to the set
S [ (C \ D0 ) [ (C D) (38.19)
Thus, according to Lagrange's method, the possible local solutions of the optimization
problem (38.4) must be searched among the points of the subset (38.19) of C. Indeed, a
local solution that is a regular point will belong to the set S thanks to Lagrange's Theorem.
However, this theorem does not say anything about possible local solutions that are singular
points { and so belong to the set C \ D0 { as well as about possible local solutions where
the functions do not have a continuous derivative { and so belong to the set C D.
In conclusion, a necessary condition for a point x 2 C to be a local solution for the
optimization problem (38.4) is that it belongs to the subset S [ (C \ D0 ) [ (C D) C.
This is what this procedure, a key dividend of Lagrange's Theorem, establishes. Clearly, the
smaller such a set is, the more e ective the application of the theorem is: the search for local
solutions can be then restricted to a signi cantly smaller set than the original set C.
That said, what about global solutions? If the objective function f is coercive and
continuous on C, the ve phases of the Lagrange's method plus the following extra sixth
phase provide a version of the elimination method to nd global solutions.
f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D) (38.20)
then x
^ is a (global) solution of the optimization problem (38.4).
In other words, the points of the set (38.19) in which f attains its maximum value are the
solutions of the optimization problem. Indeed, by Lagrange's method this is the set of the
possible local solutions; global solutions, whose existence is ensured by Tonelli's Theorem,
must then belong to such a set. Hence, the solutions of the \restricted" optimization problem
are also the solutions of the optimization problem (38.4). Phase 6 is based on this remarkable
fact. As for the Lagrange's method, the smaller the set (38.19) is, the more e ective the
application of the elimination method is. In particular, in the lucky case when it is a single-
ton, the elimination method determines the unique solution of the optimization problem, a
remarkable achievement.
2 P
is of the form (38.4), where f; g : Rn ! R are given by f (x) = e kxk and g (x) = ni=1 xi ,
and b = 1. The functions are both continuously di erentiable on the entire plane, so D = R2 .
We then trivially have C D = ;: at all the points of the constraint set, the functions f
and g are both are continuously di erentiable. We have therefore completed phases 1 and 2
of Lagrange's method.
Since rg (x) = (1; 1; :::; 1), there are no singular points, that is, D0 = ;. This completes
phase 3 of Lagrange's method.
The Lagrangian function L : Rn+1 ! R is given by
n
!
2 X
L (x; ) = e kxk + 1 xi
i=1
To nd the set of its stationary points, it is necessary to solve the rst-order condition (38.17)
given here by the following (nonlinear) system of n + 1 equations:
( @L kxk2
@xi = 2xi e = 0 8i = 1; :::; n
@L P n
@ =1 i=1 xi = 0
We observe that for no solution we can have = 0. Indeed, otherwise the rst n equations
would imply xi = 0, which contradicts the last equation. It follows that for every solution
we have 6= 0. The rst n equations yield
2
xi = ekxk
2
and, upon substituting these values in the last equation, we get
2
1 + n ekxk = 0
2
that is
2 kxk2
= e
n
Substituting this value of in any of the rst n equations we nd xi = 1=n, so the only
point (x; ) 2 Rn+1 that satis es the rst-order condition (38.17) is
1 1 1 2 1
; ; :::; ; e n
n n n n
S [ (C \ D0 ) [ (C D) = S (38.23)
Thus, in this example the rst-order condition (38.17) turns out to be necessary for any local
solution of the optimization problem (38.22). The unique element of S is, therefore, the only
candidate to be a local solution of the problem. This completes Lagrange's method.
38.4. THE METHOD OF ELIMINATION 1155
Turn now to the elimination method, which we can use since the continuous function f
is coercive on the (non compact, being closed but unbounded) set
( n
)
X
C = x = (x1 ; :::; xn ) 2 Rn : xi = 1
i=1
Indeed: 8 n
< R
> if t 0
p
(f t) = x 2 Rn : kxk lg t if t 2 (0; 1]
>
:
; if t > 1
so the set (f t) is compact and non-empty for each t 2 (0; 1]. Since the set in (38.23) is
a singleton, the elimination method allows us to conclude that (1=n; :::; 1=n) is the unique
solution of the optimization problem (38.22). N
To nd the set of its stationary points we need to solve the rst-order condition (38.17),
given here by the following system (nonlinear) of n + 1 equations
( @L pi
@xi = xi =0 8i = 1; :::; n
@L Pn
@ =1 i=1 xi = 0
Because the coordinates of the vector p are all di erent from zero, one cannot have = 0
for any solution. PIt followsPthat for each solution 6= 0. Because x 2 Rn++ , the rst n
equations
P imply pi = xi , and by substituting these values in the last equation we
nd ni=1 pi =P . Then, by substituting this value of in each of the rst n equations we
nd xi = pi = ni=1 pi . Thus, the unique point (x; ) 2 Rn+1 that satis es the rst- order
condition (38.17) is ( )
n
X
p1 p2 pn
Pn ; Pn ; :::; Pn ; pi
i=1 pi i=1 pi i=1 pi i=1
so that S is the singleton
p p2 pn
S= Pn 1 ; Pn ; :::; Pn
i=1 pi i=1 pi i=1 pi
2
That is, all coordinates of p are either strictly positive or strictly negative.
1156 CHAPTER 38. EQUALITY CONSTRAINTS
S [ (C \ D0 ) [ (C D) = S (38.25)
Thus, also in this example the rst-order condition (38.17) is necessary for each local solution
of the optimization problem (38.24). Again, the unique element of S is the only candidate to
be a local solution of the optimization problem (38.22). This completes Lagrange's method.
We can apply the elimination method because P the continuous function f is, by Lemma
1050, also coercive on the set C = x 2 Rn++ : ni=1 xi = 1 , which is not compact because
it is not closed. In view of (38.25), the elimination method implies that
p1 pn
( Pn ; :::; Pn )
i=1 pi i=1 pi
When the elimination method is based on Weierstrass' Theorem, rather than on the
weaker (but more widely applicable) Tonelli's Theorem, as a \by-product" we can also nd
the global minimizers, that is, the points x 2 C that solve problem minx f (x) sub x 2 C.
Indeed, it is easy to see that such are the points that minimize f over S [(C \ D0 )[(C D).
Clearly, this is no longer true with Tonelli's Theorem because it only ensures the existence
of maximizers and remains silent on possible minimizers.
is of the form (38.4), with f; g : R2 ! R are given by f (x1 ; x2 ) = 2x21 5x22 and g (x1 ; x2 ) =
x21 + x22 , while b = 1. Both f and g are continuously di erentiable on the entire plane, so
D = R2 . Hence, C D = ;: at all the points of the constraint set the functions f and g are
continuously di erentiable. This completes phases 1 and 2 Lagrange's method.
We have rg (x) = (2x1 ; 2x2 ), so the origin (0; 0) is the unique singular point, that is,
D0 = f(0; 0)g. This singular point does not satisfy the constraint, so C \ D0 = ;. This
completes phase 3 of Lagrange's method.
The Lagrangian function L : R3 ! R is given by
To nd the set of its stationary points we must solve the rst-order condition (38.17):
8 @L
>
> =0
< @x1
@L
> @x2 = 0
>
: @L
@ =0
so that
S = f(0; 1) ; (0; 1) ; (1; 0) ; ( 1; 0)g
As in the last two examples, the rst-order condition is necessary for any local solution of
the optimization problem (38.26).
By having completed Lagrange's method, let us turn to elimination method to nd the
global solutions. Since the set C = (x1 ; x2 ) 2 R2 : x21 + x22 = 1 is compact and the function
f is continuous, we can use such method through Weierstrass' Theorem. In view of (38.27),
in phase 6 we have:
The points (0; 1) and (0; 1) are thus the (global) solutions of the optimization problem
(38.26), while the reliance here of the elimination method on Weierstrass' Theorem makes it
possible to say that the points (1; 0) and ( 1; 0) are global minimizers. N
is of the form (38.4), with f; g : R2 ! R given by f (x) = e x1 and g (x) = x31 x22 , and
b = 0. We have D = R2 , hence C D = ;. Phases 1 and 2 of Lagrange's method have been
completed.
Moreover, we have
rg (x) = 3x21 ; 2x2
so the origin (0; 0) is the unique singular point and it also satis es the constraint, i.e.,
D0 = C \ D0 = f(0; 0)g. This completes phase 3 of Lagrange's method.
The Lagrangian function L : R3 ! R is given by
x1
L (x1 ; x2 ; ) = e + x22 x31
3
Note that there are no other points that satisfy rL = 0: Indeed, suppose that rL(^ ^2 ; ^ ) = 0, with
x1 ; x
x
^1 6= 0 and x
^2 6= 0. Then, from @L=@x1 = 0 we deduce = 2, whereas from @L=@x2 = 0 we deduce = 5.
1158 CHAPTER 38. EQUALITY CONSTRAINTS
To nd the set of its stationary points, we need to solve the rst-order condition (38.17),
given here by the following (nonlinear) system of three equations:
8 @L x1
> @x1 = e
> 3 x21 = 0
<
@L
> @x2 = 2 x2 = 0
>
: @L 2 x31 = 0
@ = x2
Note that for no solution we can have = 0. Indeed, for = 0 the rst equation becomes
e x1 = 0, which does not have solution. Let us suppose therefore 6= 0. The second equation
implies x2 = 0, hence from the third one it follows that x1 = 0. The rst equation becomes
1 = 0, and the contradiction shows that the system does not have solutions. Therefore,
there are no points that satisfy the rst-order condition (38.17), so S = ;. Phase 4 of
Lagrange's method shows that
S [ (C \ D0 ) [ (C D) = C \ D0 = f(0; 0)g (38.29)
By Lagrange's method, the unique possible local solution of the optimization problem (38.28)
is the origin (0; 0).
Turn now to the elimination method. To use it we need to show that the continuous f is
coercive on the (non compact, being closed but unbounded) set C = (x1 ; x2 ) 2 R2 : x31 = x22 .
Note that: 8 2
>
> R if t 0
<
(f t) = ( 1; lg t] R if t 2 (0; 1]
>
>
:
; if t > 1
Thus, f is not coercive on the entire plane but it is coercive on C, which is all that matters
here. Indeed, note that x1 can satisfy the constraint x31 = x22 only if x1 0, so that
C R+ R and
(f t) \ C (( 1; lg t] R) \ (R+ R) = [0; lg t] R 8t 2 (0; 1]
p p
If x1 2 [0; lg t], the constraint implies x22 2 0; lg3 t , i.e., x22 2 [ lg3 t; lg3 t]. It
follows that
q q
3
(f t) \ C [0; lg t] lg t; lg3 t 8t 2 (0; 1]
where (p; w) = x 2 Rn+ : p x = w , with strictly positive prices p 0. To best solve this
problem with the di erential methods of this chapter, assume also that the utility function
u : A Rn+ ! R is continuously di erentiable on int A.5
For instance, consumer problems that satisfy such assumptions
Pare the those featuring
n
a log-linear utility function u : Rn++ ! R de ned by u (x) = i=1 i log x
a Pi , with A =
int A = R++ , or a separable utility function u : R+ ! R de ned by u (x) = ni=1 xi , with
n n
Let us rst nd the local solutions of the consumer problem through Lagrange's method.
The function g (x) = p x expresses the constraint, so
Hence, the set (p; w) D consists of the boundary points of A that satisfy the constraint.6
Note that when A = int A, as in the log-linear case, we have (p; w) D = ;.
From
rg (x) = p 8x 2 Rn
it follows that there are no singular points, that is, D0 = ;. Hence,
(p; w) \ D0 = ;
L (x; ) = u (x) + (w p x)
so to nd the set of its stationary points, it is necessary to solve the rst-order condition:
8 @u(x)
> @L
>
> @x1 (x; ) = @x1 p1 = 0
>
>
>
>
>
<
>
>
>
> @L
(x; ) = @u(x)
pn = 0
>
> @xn @xn
>
>
: @L
@ (x; ) = w p x=0
In a more compact way, we write
@u (x)
= pi 8i = 1; :::; n (38.30)
@xi
p x=w (38.31)
The fundamental condition (38.30) is read in a di erent way according to the interpretation,
cardinalist or ordinalist, of the utility function. Let us suppose, for simplicity, that 6= 0.
In the cardinalist interpretation, the condition is recast in the equivalent form
@u(x) @u(x)
@x1 @xn
= =
p1 pn
5
Note that A Rn + implies int A Rn
++ , i.e., the interior points of A have strictly positive coordinates.
6
Here the choice set, (p; I), is by de nition included in the domain A, so @A\A\ (p; I) = @A\ (p; I).
1160 CHAPTER 38. EQUALITY CONSTRAINTS
which emphasizes that, at a bundle x which is a (local) solution of the consumer problem,
the marginal utilities of the income spent for the various goods, measured by the ratios
@u(x)
@xi
pi
are all equal. Note that 1=pi is the quantity of good i that can be purchased with one unit
of income.
In the ordinalist interpretation, where the notion of marginal utility becomes meaningless,
condition (38.30) is rewritten as
@u(x)
@xi pi
@u(x)
=
pj
@xj
for every pair of goods i and j of the solution bundle x. At such a bundle, therefore,
the marginal rate of substitution between each pair of goods must be equal to the ratio
between their prices, that is, M RSxi ;xj = pi =pj . For n = 2 we have the classic geometric
interpretation of the optimality condition for a bundle (x1 ; x2 ) as equality between the slope
of the indi erence curve (in the sense of Section 34.3.2) and the slope of the straight line of
the budget constraint.
2
x
2
1.5
0.5
-0.5
O x
1
-1
-1 0 1 2 3 4 5 6 7
The ordinalist interpretation does not require the cardinalist notion of marginal utility, a
notion that { by Occam's razor { becomes thus super uous for the study of the consumer
problem. The observation dates back to a classic 1900 work of Vilfredo Pareto and repre-
sented a turning point in the history of utility theory, so much that we talk of an \ordinalist
revolution".
In any case, relations (38.30) and (38.31) are rst-order conditions for the consumer
problem and their resolution determines the set S of the stationary points. In conclusion,
Lagrange's method implies that the local solutions of the consumer problem must be looked
for among the points of the set
S [ (@A \ (p; w)) (38.32)
38.5. THE CONSUMER PROBLEM 1161
Besides points that satisfy the rst-order conditions (38.30) and (38.31), local solutions can
therefore be boundary points @A \ (p; w) of the set A that satisfy the constraint.7
When u is coercive and continuous on (p; w), we can apply the elimination method
to nd the (global) solutions of the consumer problem, that is, the optimal bundles (which
are the economically meaningful notions, consumers do not care about bundles that are just
locally optimal). In view of (38.32), the solutions are the bundles x
^ 2 S [ (@A \ (p; w))
such that
u (^
x) u (x) 8x 2 S [ (@A \ (p; w))
In other words, we have to compare the utility levels attained by the stationary points in S
and by the boundary points that satisfy the constraint in @A \ (p; w). As the comparison
requires the computation of all these utility levels, the smaller the set S [ (@A \ (p; w))
the more e ective the elimination method.
Example 1714 Consider the log-linear utility function in the case n = 2, i.e.,
u (x1 ; x2 ) = a log x1 + (1 a) log x2
with a 2 (0; 1). The rst-order condition at each (x1 ; x2 ) 2 R2++ takes the form
a 1 a
= p1 ; = p2 (38.33)
x1 x2
p1 x1 + p2 x2 = w (38.34)
Relation (38.33) implies
a 1 a
=
p1 x1 p2 x2
Substituting this in (38.34), we have
1 a
p1 x1 + p1 x1 = w
a
and hence
w w
x1 = a ; x2 = (1 a)
p1 p2
In conclusion,
w w
S= ; (1 a)a (38.35)
p1 p2
Since @A = ;, we have @A \ (p; w) = ;. By Lagrange's method, the unique possible local
solution of the consumer problem is the bundle
w w
x= a ; (1 a) (38.36)
p1 p2
We turn now to the elimination method that we can use because the continuous function u
is, by Lemma 1050, coercive on the set (p; w) \ A = x 2 R2++ : p1 x1 + p2 x2 = w , which
is not compact since it is not closed. In view of (38.35), the elimination method implies
that the bundle (38.36) is the unique solution of the log-linear consumer problem, that is,
the unique optimal bundle. Note that this nding con rms what we already proved and
discussed in Section 22.8, in a more general and elegant way through Jensen's inequality. N
7
When A = Rn + , they lie on the axes and are called corner solutions in the economics jargon (as remarked
earlier in the book). In the case n = 2 and A = R2+ , corner solutions can be (0; I=p2 ) and (I=p1 ; 0).
1162 CHAPTER 38. EQUALITY CONSTRAINTS
where g = (g1 ; :::; gm ) : A Rn ! Rm and b = (b1 ; :::; bm ) 2 Rm . All functions f and gi are
assumed to be continuously di erentiable on a non-empty open subset D A. Thus, at all
points x 2 D we can de ne the Jacobian matrix Dg (x) by
2 3
rg1 (x)
6 rg (x) 7
Dg (x) = 6
4
2 7
5
rgm (x)
A point x 2 D is called regular (with respect to the constraints) if Dg (x) has full rank,
otherwise is called singular. For instance, the Jacobian Dg (^
x) has full rank if the gradients
rg1 (^
x),...,rgm (^ n
x) are linearly independent vectors of R . In such a case, the full rank
condition requires m n, that is, that the number m of constraints be smaller than the
dimension n of the space.
Two observations about regularity: (i) when m = n, the Jacobian has full rank if and
only if it is a non-singular square matrix, that is, det Dg (x) 6= 0;8 (ii) when m = 1, we have
Dg (x) = rg (x) and so the full rank condition amounts to require rg (x) 6= 0, which brings
us back to the notions of regular and singular points seen in the case m = 1 of a single
constraint.
The following result extends Lemma 1707 to the case with multiple constraints and shows
that the regularity condition rg (^x) 6= 0 from such lemma can be generalized by requiring
the Jacobian Dg (^x) to have full rank. In other words, x
^ must not be a singular point here
either.9
Lemma 1716 Let x ^ 2 C \ D be the local solution of the optimization problem (38.37). If
x) has full rank, then there is a vector ^ 2 Rm such that
Dg (^
n
X
rf (^
x) = ^ i rgi (^
x) (38.38)
i=1
for every (x; ) 2 A Rm , and Lagrange's Theorem takes the following general form.
8
So, in this case a point x is singular if its Jacobian matrix Dg (x) is a singular matrix. The notion of
singular point is thus consistent with the notion of singular matrix (Section 15.6.6).
9
We omit the proof, which generalizes that of Lemma 1707 by means of a suitable version of the Implicit
Function Theorem. We then also omit the simple proof of Theorem 1717, which is similar to that of the
special case of a single constraint.
1164 CHAPTER 38. EQUALITY CONSTRAINTS
The comments that we made for Lagrange's Theorem also hold in this more general case.
In particular, the search for local candidate solutions for the constrained problem must still
be conducted following Lagrange's method, while the elimination method can be still used
to check whether such local candidates actually solve the optimum problem. The examples
will momentarily illustrate all this.
From an operational standpoint note that, however, the rst-order condition
rL (x; ) = 0
is now based on a Lagrangian L that has the more complex form (38.39). Also the form of
the set of singular points D0 is more complex: the study of the Jacobian's determinant may
be complex, thus making the search for singular points quite hard. The best thing is often
to directly look for the singular points which satisfy the constraints { i.e., for the set C \ D0
{ instead of trying to determine the set D0 rst and the intersection C \ D0 afterwards (as
we did in the case with one constraint). The points x 2 C \ D0 are such that gi (x) = bi and
the gradients rgi (x) are linearly dependent. So, we must verify whether the system
8 Pm
>
> i=1 i rgi (x) = 0
>
>
>
> g1 (x) = b1
>
<
>
>
>
>
>
>
>
:
gm (x) = bm
admits solutions (x; ) 2 Rn Rm with = ( 1 ; :::; m ) 6= 0, that is, with i that are not
all null. Such possible solutions identify the singular points that satisfy the constraints. To
ease calculations, it is useful to note that the system can be written as
8 Pm @gi (x)
>
> i=1 i @x1 = 0
>
>
>
>
>
>
>
>
>
>
>
>
>
< Pm
> @gi (x)
i=1 i @xn = 0
(38.40)
>
> g (x) = b
>
> 1 1
>
>
>
>
>
>
>
>
>
>
>
>
:
gm (x) = bm
38.7. SEVERAL CONSTRAINTS 1165
has the form (38.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by f (x1 ; x2 ) =
7x1 3x3 ; g1 (x1 ; x2 ; x3 ) = x21 + x22 and g2 (x1 ; x2 ; x3 ) = x1 + x2 x3 , while b = (1; 1) 2 R2 .
These functions are all continuously di erentiable on R3 , so D = R3 . Hence, C D = ;:
at all points of the constraint set, the functions f , g1 and g2 are all continuously di erentiable.
This completes phases 1 and 2 of Lagrange's method.
Let us nd the singular points satisfying the constraint, that is, the set C \ D0 . The
system (38.40) becomes 8
>
> 2 1 x1 + 2 = 0
>
>
>
< 2 1 x2 + 2 = 0
>
2 =0
>
>
>
> x1 + x22 = 1
2
>
>
:
x1 + x2 x3 = 1
Since 2 = 0, 1 is di erent from 0. This implies that x1 = x2 = 0, thus contradicting the
fourth equation. Therefore, there are no singular points satisfying the constraint, that is,
C \ D0 = ;. Phase 3 of Lagrange's method is thus completed.
The Lagrangian L : R5 ! R is
To nd the set of its critical points we must solve the rst-order condition (38.17), which is
given by the following non-linear system of ve equations
8
> @L
>
> @x1 = 7 2 1 x1 2 =0
>
>
>
> @L
>
> @x2 = 2 1 x2 2 =0
<
@L
> @x3 = 3 + 2 = 0
>
>
>
> @L
x21 x22 = 0
>
> @ 1 =1
>
>
: @L = 1 x x2 + x3 = 0
@ 2 1
4 3 4 5 4 3 6 5
; ; ; ;3 ; ; ; ; ;3
5 5 5 2 5 5 5 2
so that
4 3 4 4 3 6
S= ; ; ; ; ;
5 5 5 5 5 5
1166 CHAPTER 38. EQUALITY CONSTRAINTS
thus proving that in the example the rst-order condition (38.17) is necessary for any local
solution of the optimization problem (38.41).
We now turn to the elimination method. Clearly, the set
is closed. It is also bounded (and so compact). For the x1 and x2 such that x21 + x22 = 1
we have x1 ; x2 2 [ 1; 1], while for the x3 such that x3 = x1 + x2 1 and x1 ; x2 2 [ 1; 1] we
have x3 2 [ 3; 1]. It follows that C [0; 1] [0; 1] [ 3; 1], and so C is bounded. Since f is
continuous, we can thus use the elimination method through Weierstrass' Theorem. In view
of (38.42), in the last phase of the elimination method we have
4 3 4 4 3 7 49
f ; ; =8 and f ; ; =
5 5 5 5 5 5 5
Hence, (4=5; 3=5; 4=5) solves the optimum problem (38.41), while ( 4=5; 3=5; 7=5) is a
minimizer. N
has also the form (38.37), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by
f (x1 ; x2 ) = x1 , g1 (x1 ; x2 ; x3 ) = x21 + x32 , g2 (x1 ; x2 ; x3 ) = x23 + x22 2x2 , while b = (0; 0) 2
R2 .
As before, these functions all are continuously di erentiable on R3 , so D = R3 . Therefore,
C D = ;: at all points of the constraint set, the functions f , g1 and g2 are all continuously
di erentiable. This completes phases 1 and 2 of Lagrange's method.
Let us nd the set C \ D0 of the singular points satisfying the constraint. The system
(38.40) becomes 8
>
> 2 1 x1 = 0
>
>
> 3 1 x2 + 2 (2x2 2) = 0
>
< 2
2 2 x3 = 0
>
>
>
> x21 + x32 = 0
>
>
: 2
x3 + x22 2x2 = 0
In light of the rst and the third equations, we must consider three cases:
In conclusion, the origin f(0; 0; 0)g is the unique singular point that satis es the con-
straints, so C \ D0 = f(0; 0; 0)g. This completes phase 3 of Lagrange's method.
The Lagrangian L : R4 ! R is given by
The rst-order condition (38.17) given by the following (non-linear) system of ve equations
8
> @L
> @x1 = 1 + 2 1 x1 = 0
>
>
>
>
> @x@L
= 3 1 x22 2 2 (x2 1) = 0
>
>
< 2
@L
> @x3 = 2 2 x3 = 0
>
>
>
> @L 2 x32 = 0
>
> @ 1 = x1
>
>
: @L = x2 x2 + 2x = 0
@ 2 3 2 2
and so n p p o
S= 8; 2; 0 ; 8; 2; 0
Among such three points one must search for the possible local solutions of the optimization
problem (38.43).
As to the elimination method, also here the set
is clearly closed. It is also bounded (and so compact). In fact, the second constraint can be
written as x23 + (x2 1)2 = 1, and so the x2 and x3 that satisfy it are such that x2p2 [0; p 2]
and x3 2 [ 1; 1]. Now, the constraint x2 = x3 implies x2 2 [0; 8], and so x 2 [ 8; 8].
p p 1 2 1 1
We conclude that C [ 8; 8] [0; 2] [ 1; 1] and so C is bounded. As in the previous
1168 CHAPTER 38. EQUALITY CONSTRAINTS
example, we can thus use the elimination method through Weierstrass' Theorem. In view of
(38.44), in the last phase of the elimination method we have
p p
f 8; 2; 0 = 8 and f (0; 0; 0) = 0
p
Hence, the origin (0; 0; 0) solves the optimum problem (38.43), while ( 8; 2; 0) is a minimizer.
N
We have that 1 = 2x2 and 2 = 2x3 , which, if substituted in the rst equation, lead to
the following non-linear system in three equations:
8
< x1 + 2x1 x2 x3 = 0
>
1 x21 + x2 = 0
>
:
1 x1 x3 = 0
10
At a \mechanical" level, one can easily verify that no value of x1 can be such that the matrix Dg (x)
does not have full rank.
38.7. SEVERAL CONSTRAINTS 1169
From the last two equations it follows that x2 = x21 1 and x3 = 1p x1 , which, if substituted
in the rst equation, imply that 2x31 1 = 0, from which x1 = 1= 3 2 follows and so
1 1
x2 = p
3
1 and x3 = 1 p
3
4 2
Therefore, the Lagrangian has a unique critical point
1 1 1 2 2
p
3
;p
3
1; 1 p
3
;p
3
2; 2 + p
3
2 4 2 4 2
so that
1 1 1
S= p
3
;p
3
1; 1 p
3
2 4 2
This completes all phases of Lagrange's method. In conclusion, C D = ; and D0 = ; we
have
1 1 1
S [ (C \ D0 ) [ (C D) = S = p
3
;p
3
1; 1 p
3
(38.46)
2 4 2
There is a unique candidate local solution of the optimization problem (38.45).
Let us consider the elimination method. The set
is closed but
p not bounded (so it is not compact). In fact, consider the sequence fxn g given
by xn = 1 + n; n; 1 n . The sequence belongs to C, but kxn k ! +1 and so there is
no neighborhood in R3 that may contain it. On the other hand, by Proposition 1019 the
function f is coercive and continuous on C. As in the last two examples, we can thus use the
elimination method but this time via Tonelli's Theorem. In view of (38.46), the elimination
method implies that the point
1 1 1
p
3
;p
3
1; 1 p
3
2 4 2
is the solution of the optimization problem (38.45). In this case the elimination method
is silent about possible minimizers because it relies on Tonelli's Theorem rather than on
Weierstrass' Theorem. N
1170 CHAPTER 38. EQUALITY CONSTRAINTS
Chapter 39
Inequality constraints
39.1 Introduction
Let us go back to the consumer problem seen at the beginning of the previous chapter, in
which we considered a consumer with utility function u : A Rn ! R and income w 0.
Given the vector p 2 Rn+ of prices of the goods, because of Walras' law we wrote his budget
constraint as
C (p; w) = fx 2 A : p x = wg
and his optimization problem as:
In this formulation we assumed that the consumer exhausts his budget (so the equality in
the budget constraint) and we did not impose other constraints on the bundle x except that
of satisfying the budget constraint. However, the hypothesis that income is entirely spent
may be too strong, so one may wonder what happens to the consumer optimization problem
if we weaken the constraint to p x w, that is, if the constraint is given by an inequality
and not anymore by an equality.
As to the bundles of goods x, in many cases it is meaningless to talk of negative quantities.
Think for example of the purchase of physical goods, say fruit or vegetables in an open air
market, in which the quantity purchased has to be positive. This suggests to impose the
positivity constraint x 0 in the optimization problem.
By keeping in mind these observations, the consumer problem becomes:
the optimization problem still takes the form (39.1), but the budget constraint C (p; w) is
now di erent.
1171
1172 CHAPTER 39. INEQUALITY CONSTRAINTS
The general form of an optimization problem with both equality and inequality constraints
is:
where I and J are nite sets of indexes (possibly empty), f : A Rn ! R is the objective
function, the functions gi : A Rn ! R and the associated scalars bi characterize jIj
equality constraints, while the functions hj : A Rn ! R with the associated scalars cj
induce jJj inequality constraints. We continue to assume, as in the previous chapter, that
the functions f , gi and hj are continuously di erentiable on a non-empty and open subset
D of their domain A.
The optimization problem (39.4) can be equivalently formulated in canonical form as
(i) A constraint of the form h (x) c can be included in the formulation (39.4) by consid-
ering h (x) c. In particular, the constraint x 0 can be included by considering
x 0;
(ii) A constrained minimization problem for f can be written in the formulation (39.4) by
considering f .
These two observations show the scope and exibility of formulation (39.4). In particular,
in light of (ii) it should be clear that also the choice of the sign in expressing the inequality
constraints is just a convention. That said, next we give some discipline to this formulation.
De nition 1721 The problem (39.4) is said to be well posed if, for each j 2 J, there exists
x 2 C such that hj (x) < c.
To understand this de nition observe that an equality constraint g (x) = b can be written
in form of inequality constraint as g (x) b and g (x) b. This makes uncertain the
distinction between equality and inequality constraints in (39.4). To avoid this, and so
to have a clear distinction between the two types of constraints, in what follows we will
always consider optimization problems (39.4) that are well posed, so that it is not possible
to express equality constraints in the form of inequality constraints. Naturally, De nition
1721 is automatically satis ed when J = ;, so there are no inequality constraints to worry
about.
39.1. INTRODUCTION 1173
is of the form (39.4) with jIj = jJj = 1, f (x) = x21 + x22 + x33 , g (x) = x1 + x2 x3 ,
h (x) = x21 + x22 and b = c = 1.1 These functions are continuously di erentiable, so D = R3 .
Moreover, C = x 2 R3 : x1 + x2 x3 = 1 and x21 + x22 1
(ii) The optimization problem:
max x1
x1 ;x2 ;x3
is of the form (39.4) with I = f1; 2g, J = ;, f (x) = x1 , g1 (x) = x21 + x32 , g2 (x) =
x23 + x22 2x2 and b1 = b2 = 0. These functions are continuously di erentiable, so D = R3 .
Moreover, C = x 2 R3 : x32 = x21 and x23 + x22 = 2x2
(iii) The optimization problem:
1 1
C= x 2 R3 : x1 + x2 + x3 = 1, x21 + x22 + x23 = , x1 0 and x2
2 10
sub x1 + x2 1 and x1 + x2 1
is of the form (39.4) with I = ;; J = f1; 2g ; f (x) = x31 x32 , h1 (x) = x1 + x2 , h2 (x) =
x2 + x1 and c1 = c2 = 1. These functions are continuously di erentiable, so D = R2 .
Moreover, C = x 2 R2 : x1 + x2 1 and x2 1 + x1
(v) The minimum problem:
min x1 + x2 + x3
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 + x23
2
1
To be pedantic, here we should have set I = J = f1g ; g1 (x) = x1 + x2 x3 , h1 (x) = x21 + x22 and
b1 = c1 = 1. But, in this case of a single equality constraint and of a single inequality constraint, the
subscripts just make the notation heavy.
1174 CHAPTER 39. INEQUALITY CONSTRAINTS
max (x1 + x2 + x3 )
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 x23
2
N
In words, A (x) is the set of the indices of the so-called binding constraints at x, that is, of
the constraints that hold as equalities at the given point x. For example, in the problem
max f (x1 ; x2 ; x3 )
x1 ;x2 ;x3
In other words, a point x 2 D is regular if the gradients of the functions that induce
constraints binding at such point are linearly independent. This condition generalizes the
notion of regularity upon which Lemma 1716 was based. Indeed, if we form the matrix whose
rows consist of the gradients of the functions that induce binding constraints at the point
considered, the regularity of the point amounts to require that such a matrix has full rank.
Note that in view of Corollary 94-(ii) a point is regular only if jA (x)j n, that is, only
if the number of the binding constraints at x does not exceed the dimension of the space on
which the optimization problem is de ned.
We can now state the generalization of Lemma 1716 for problem (39.4). In reading it,
note how the vector ^ associated to the inequality constraints has positive sign, while there
is no restriction on the sign of the vector ^ associated to the equality constraints.
X X
rf (^
x) = ^ i rgi (^
x) + ^ j rhj (^
x) (39.8)
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J (39.9)
@f X @gi X @hj
(^
x) = ^i (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J
This lemma generalizes both Fermat's Theorem and Lemma 1716. Indeed:
The novelty of Lemma 1724 relative to these previous results is, besides the positivity of
the vector ^ associated to the inequality constraints, the condition (39.9). To understand
the role of this condition, it is useful the following characterization.
Lemma 1725 Condition (39.9) holds if and only if ^ j = 0 for each j such that hj (^
x) < cj ,
that is, for each j 2
= A (^
x).
Proof Assume (39.9). Since for each j 2 J we have hj (^ x) cj , from the positive sign of ^
it follows that (39.9) implies cj hj (^ x) = 0 for each j 2 J, and therefore ^ j = 0 for each j
such that hj (^x) < cj . Conversely, if this last property holds we have
^ j (cj hj (^
x)) = 0 8j 2 J (39.10)
because, being hj (^
x) cj for each j 2 J, we have hj (^
x) < cj or hj (^
x) = cj . Condition
(39.10) immediately implies (39.9).
1176 CHAPTER 39. INEQUALITY CONSTRAINTS
In words, (39.9) is equivalent to require the nullity of each ^ j associated to a not binding
constraint. Hence, we can have ^ j > 0 only if the constraint j is binding in correspondence
of the solution x^.
For example, if x^ is such that hj (^
x) < cj for each j 2 J, i.e., if in correspondence of x ^
all the inequality constraints are not binding, then we have ^ j = 0 for each j 2 J and the
vector ^ does not play any role in the determination of x ^. Naturally, this re ects the fact
that for this solution x
^ the inequality constraints themselves do not play any role.
The next example shows that conditions (39.8) and (39.9) are necessary, but not su cient
(something not surprising since the same is true for Fermat's Theorem and for Lemma 1716).
x31 + x32
max (39.11)
x1 ;x2 2
sub x1 x2 0
It is a simple modi cation of Example 1708, and has the form (39.4) with f; h : R2 ! R
given by f (x) = 2 1 (x31 + x32 ) and h (x) = x1 x2 , while c = 0. We have:
and
rf (0; 0) = rg (0; 0)
(0 0) = 0
The origin (0; 0) satis es with = 0 the conditions (39.8) and (39.9), but it is not solution
of the optimization problem (39.11), as (38.9) shows. N
We defer the proof of Lemma 1724 to the appendix.3 It is possible, however, to give a
heuristic proof of this lemma by reducing problem (39.4) to a problem with only equality
constraints, and then by exploiting the results seen in the previous chapter. For simplicity,
we give this argument for the special case
jJj
for each (x; ; ) 2 A RjIj R+ . Note that the vector is required to be positive.
The next famous result, proved in 1951 by Harold Kuhn and Albert Tucker, generalizes
Lagrange's Theorem to the optimization problem (39.4). We omit the proof because it is
analogous to that of Lagrange's Theorem.
^; ^ ; ^ = 0
rLx x (39.15)
^ j rL j
^; ^ ; ^ = 0
x 8j 2 J (39.16)
rL ^; ^ ; ^ = 0
x (39.17)
rL ^; ^ ; ^
x 0 (39.18)
The components ^ i and ^ j of the vectors ^ and ^ are called Lagrange (or Kuhn-Tucker )
multipliers, while (39.15)-(39.18) are called Kuhn-Tucker conditions. The points x 2 A
jJj
for which there exists a pair ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis es the
conditions (39.15)-(39.18) are called points of Kuhn-Tucker.
Kuhn-Tucker points are, therefore, the solutions of the { typically nonlinear { system of
equations and inequalities given by the Kuhn-Tucker conditions. By Kuhn-Tucker's Theo-
rem, a necessary condition for a regular point x to be solution of the optimization problem
(39.4) is that it is a point of Kuhn-Tucker.7 Observe, however, that a Kuhn-Tucker point
(x; ; ) is not necessarily a stationary point for the Lagrangian function: the condition
(39.18) only requires rL (x; ; ) 0, not the stronger property rL (x; ; ) = 0.
Later in the book, in Section 41.7, we will present a marginal interpretation of the
multipliers ( ^ ; ^ ), along the lines sketched in the case of equality constraints (Section 38.3.3).
Let D0 be the set of the singular points x 2 D where the regularity condition of the
constraints does not hold, and let D1 be, instead, the set of the points x 2 A where this
condition holds. The method of elimination consists of four phases:
1. Verify if Tonelli's Theorem can be applied, that is, if f is continuous and coercive on
C;
2. determine the set D where the functions f and gi are continuously di erentiable;
3. determine the set C D of the points of the constraint set where the functions f and
g are not continuously di erentiable;
4. determine the set C \ D0 of the singular points that satisfy the constraints;
5. determine the set S of the regular Kuhn-Tucker points, i.e., the points x 2 C \(D D0 )
jJj
for which there exists a pair ( ; ) 2 RjIj R+ of Lagrange multipliers such that the
triple (x; ; ) satis es the Kuhn-Tucker conditions (39.15)-(39.18);8
f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D)
then such x
^ is solution of the optimization problem (39.4).
The rst phase of the method of elimination is the same of the previous chapter, while
the other phases are the obvious extension of the method to the case of the problem (39.4).
has the form (39.4), where f; h : R2 ! R are given by f (x1 ; x2 ) = x1 2x22 and h (x1 ; x2 ) =
x21 + x22 , while b = 1. Since C is compact, the rst phase is completed through Weierstrass'
Theorem.
The functions f and h are continuously di erentiable, so D = R2 and C D = ;. We
have rh (x) = (2x1 ; 2x2 ), so the constraint is regular at each point x 2 C, that is, C \D0 = ;.
This completes the rst four phases of the elimination method.
The Lagrangian function L : R3 ! R is given by
and to nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8
> @L
>
> @x1 = 1 2 x1 = 0
>
>
>
> @L
= 4x2 2 x2 = 0
>
>
< @x2
@L
@ = 1 x21 x22 = 0
>
>
>
> @L
>
> = 1 x21 x22 0
> @
>
>
: 0
We start by observing that 6= 0, that is, > 0. Indeed, if = 0 the rst equation
becomes 1 = 0, a contradiction. We therefore assume that > 0. The second equation
implies x2 = 0, and in turn the third equation implies x1 = 1. From the rst equation it
follows = (1=2), and hence the only solution of the system is ( 1; 0; (1=2)). The only
Kuhn-Tucker point is therefore ( 1; 0) , i.e., S = f( 1; 0)g.
In sum, since the sets C \ D0 and C D are both empty, we have
S [ (C \ D0 ) [ (C D) = S = f( 1; 0)g
The method of elimination allows us to conclude that ( 1; 0) is the only solution of the
optimization problem 39.19. Note that in this solution the constraint is binding (i.e., it is
satis ed with equality); indeed = (1=2) > 0, as required by Proposition 1728. N
Therefore,
n
X
2 x2i =0
i=1
Pn 2
that is, = 2 i=1 xi . We conclude that 0.
If xi = 0, from the condition @L=@xi = 0 it follows that = i . Since i 0 and 0,
it follows that i = 0. In turn, this implies = 0 and hence using again the condition
@L=@xi = 0 we P conclude that xi = = 0 for each i = 1; :::; n. But this contradicts the
n
condition (1 i=1 xi ) = 0, and we therefore conclude that xi 6= 0, that is, xi > 0.
Since this holds for each i = 1; :::; n, it follows that xi > 0 for each i = 1; :::; n. From
the condition i xi = 0 it follows that i = 0 for each i = 1; :::; n, and the rst n equations
become:
2xi =0 8i = 1; :::; n
P
that is, xi = =2 for each i = 1; :::; n. The xi are therefore all equal; from ni=1 xi = 1 it
follows that
1
xi = 8i = 1; :::; n
n
In conclusion,
1 1
S= ; :::;
n n
Since C D = ; and D0 = ;, we have
1 1
S [ (C \ D0 ) = ; :::;
n n
1182 CHAPTER 39. INEQUALITY CONSTRAINTS
The method of elimination allows us to conclude that the point (1=n; :::; 1=n) is the solution
of the optimization problem (39.20). N
has solution (1=n; :::; 1=n). It is the unique solution if h is strictly concave.
Pn
If h (xi ) = xi log xi , the function i=1 h (xi ) is called entropy (Examples 239 and 1685).
P
Proof Let x1 ; x2 ; :::; xn 2 [0; 1] with the constraint ni=1 xi = 1. Since h is concave, by the
Jensen's inequality we have
n n
!
X 1 1X 1
h (xi ) h xi = h
n n n
i=1 i=1
Namely,
n
X 1 1 1
h (xi ) nh =h + +h
n n n
i=1
P
This shows that (1=n; :::; 1=n) is a solution. Clearly, ni=1 h (xi ) is strictly concave if h is.
Hence, the uniqueness of the solution is ensured by Theorem 1032.
Proposition 1732 Let A be convex. If the functions gi are a ne for each i 2 I and the
functions hj are convex for each j 2 J, then the choice set C de ned in (39.5) is convex.
It is easy to give examples where C is no longer convex when the conditions of convexity
and a nity used in this result are not satis ed. Note that the convexity condition of the
hj is much weaker than that of a nity on the gi . This shows that the convexity of the
choice set is more natural for inequality constraints than for equality ones. This is a crucial
\structural" di erence between the two types of constraints { which are more di erent than
it may appear prima facie.
De nition 1733 The optimization problem (39.4) is said to be concave if the objective
function f is concave, the functions gi are a ne and the functions hj are convex on the open
and convex set A.
x=b (39.23)
In a similar vein, when also the functions hj happen to be a ne, say hj (x) = j x + qi ,
we can write also the inequality constraints in the matrix form Hx c, where H is the
jJj n matrix with rows j and c 2 RjJj . Thus, when all constraints are identi ed by a ne
functions, the choice set is a polyhedron C = fx 2 Rn : x = b and Hx cg. This case often
arises in applications. Indeed, if also the objective function is a ne, we are back to linear
programming, an important class of concave problem that we already studied via convexity
arguments (Section 22.7.2).
1184 CHAPTER 39. INEQUALITY CONSTRAINTS
Theorem 1734 The Kuhn-Tucker points solve a concave optimization problem in which the
functions f; fgi gi2I and fhj gj2J are di erentiable.
Proof Let (x ; ; ) be a Kuhn-Tucker point for the optimization problem (39.4), that is,
(x; ; ) satis es the conditions (39.15)-(39.18). In particular, this means that
X X
rf (x ) = i rgi (x )+ j rhj (x ) (39.24)
i2I j2A(x )\J
rhj (x ) (x x ) 0 8j 2 A (x ) ; 8x 2 C
rgi (x ) (x x )=0 8i 2 I; 8x 2 C
f (x) f (x ) + rf (x ) (x x ) 8x 2 A
has the form (39.4), where f; h1 ; h2 : R3 ! R are continuously di erentiable functions given
by f (x1 ; x2 ; x3 ) = x1 x2 x23 , h1 (x1 ; x2 ; x3 ) = x21 +x22 2x1 , h2 (x1 ; x2 ; x3 ) = x21 +x22 +2x1 ,
while c1 = c2 = 0.
Clearly, f is concave and h1 and h2 are convex, so (39.27) is a concave optimization
problem. The system of inequalities
rf (0; 0; 0) = ( 1; 1; 0)
By combining Kuhn-Tucker's Theorem and Theorem 1734 we get the following necessary
and su cient optimality condition.
Theorem 1736 Consider a concave optimization problem in which the functions f; fgi gi2I
and fhj gj2J are continuously di erentiable. A regular point x 2 A is a solution of the
problem if and only if it is a Kuhn-Tucker point.
3. determine the set C \ D0 of the singular points that satisfy the constraints;
5. if S 6= ;, then all the points of S are solutions of the problem,9 while also a singular
point x 2 C \ D0 is a solution if and only if f (x) = f (^
x) for some x
^ 2 S;
Since either phase 5 or 6 applies, depending on whether or not S is empty, the actual
phases of the convex method are ve.
The convex method works thanks to Theorems 1734 and 1736. Indeed, if S 6= ; then
by Theorem 1734 all points of S are solutions of the problem. In this case, a singular point
x 2 C \ D0 can in turn be a solution when its value f (x) is equal to that of any point in S.
When, instead, we have S = ;, then Theorem 1736 guarantees that no regular point in A
is solution of the problem. At this stage, if Tonelli's Theorem is able to ensure the existence
of at least a solution, we can restrict the search to the set C \ D0 of the singular points that
satisfy the constraints. In other words, it is su cient to nd the maximizers of f on C \ D0 :
they are also solutions of problem (39.4), and vice versa.
Clearly, the convex method becomes especially powerful when S 6= ; because in such a
case there is no need to verify the validity of global existence theorems a la Weierstrass or
Tonelli, but it is su cient to nd the Kuhn-Tucker points.
If we content ourselves with solutions that are regular points, without worrying about
the possible existence of singular solutions, we can give a short version of the convex method
that is based only on Theorem 1734. We can call it the short convex method. It is based
only on three phases:
Indeed, by Theorem 1734 all regular Kuhn-Tucker points are solutions of the problem.
The short convex method is simpler than the convex method, and it does not require the use
of global existence theorems. The price of this simpli cation is in the possible inaccuracy of
this method: being based on su cient conditions, it is not able to nd the solutions where
these conditions are not satis ed (by Theorem 1736, such solutions would be singular points).
Furthermore, the short method cannot be applied when S = ;; in such a case, it is necessary
to apply the complete convex method.
The short convex method is especially powerful when the objective function f is strictly
concave, as often assumed in applications. Indeed, in such a case a solution found with the
short method is necessarily also the unique solution of the concave optimization problem.
The next example illustrates.
9
The set S is at most a singleton when f is strictly concave because in such a case there is at most a
solution of the problem (Theorem 1032).
39.4. CONCAVE OPTIMIZATION 1187
This problem is of the form (39.4), where f; h1 ; h2 : R3 ! R are given by f (x) = x21 + x22 + x23 ,
h1 (x) = (3x1 + x2 + 2x3 ) and h2 (x) = x1 , while c1 = 1 and c2 = 0.
Using Theorem 1474 it is easy to verify that f is strictly concave, while it is immediate
to verify that h1 and h2 are convex. Therefore, (39.28) is a concave optimization problem.
Moreover, the functions f , h1 and h2 are continuously di erentiable. This completes the
rst two phases of the short convex method, which we apply here since f is strictly concave.
Let us nd the Kuhn-Tucker points. The Lagrangian function L : R5 ! R is given by
To nd the set S of its Kuhn-Tucker points it is necessary to solve the system of equalities
and inequalities:
8 @L
>
> @x1 = 2x1 + 3 1 + 2 = 0
>
>
> @L = 2x +
>
>
> @x2 2 1 =0
>
>
>
> @L
>
> @x3 = 2x3 + 2 1 = 0
>
>
>
>
< 1 @@L = 1 ( 1 + 3x1 + x2 + 2x3 ) = 0
1
(39.29)
>
> @L
= x = 0
>
> 2 @ 2 2 1
>
>
> @L
> @ = 1 + 3x1 + x2 + 2x3 0
>
>
>
>
1
>
> @L
>
> @ 2 = x1 0
>
>
:
1 0; 2 0
We consider four cases, depending on the fact that the multipliers 1 and 2 are zero or
not.
Case 1: 1 > 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 + 2 = 0. This last equation does not have strictly positive solutions 1 and 2 , and
hence we conclude that we cannot have 1 > 0 and 2 > 0.
Case 2: 1 = 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 = 0, that is 1 = 0. This contradiction shows that we cannot have 1 = 0 and 2 > 0.
Case 3: 1 > 0 and 2 = 0. The conditions 1 @L=@ 1 = @L=@x1 = @L=@x2 = @L=@x3 =
0 imply: 8
> 2x1 + 3 1 = 0
>
>
< 2x + =0 2 1
>
> 2x3 + 2 =0
>
:
1
3x1 + x2 + 2x3 = 1
Solving for 1 , we get 1 = 1=7, and hence x1 = 3=14, x2 = 1=14 and x3 = 1=7. The
quintuple (3=14; 1=14; 1=7; 1=7; 0) solves the system (39.29), and hence (3=14; 1=14; 1=7) is a
Kuhn-Tucker point.
1188 CHAPTER 39. INEQUALITY CONSTRAINTS
We close with an important observation. The solution methods seen in this chapter are
based on the search of the Kuhn-Tucker points, and therefore they require the resolution of
systems of nonlinear equations. In general, these systems are not easy to solve and this limits
the computational usefulness of these methods, whose importance is mostly theoretical. At
a numerical level, other methods are used (which the interested reader can nd in books of
numerical analysis).
Lemma 1738 (i) The function y = x jxj is continuously di erentiable in R and Dx jxj =
2
2 jxj. (ii) The square (x+ ) of the function x+ = max fx; 0g is continuously di erentiable on
2
R, and D (x+ ) = 2x+ .
Proof (i) Observe that x jxj is in nitely di erentiable for x 6= 0 and its rst derivative is,
by the product rule for di erentiation,
jxj
Dx jxj = xD jxj + jxj Dx = x + jxj = 2 jxj
x
This is true for x 6= 0. Now it su ces to invoke a basic calculus result that asserts: let f : I !
R be continuous on a real interval, and f be di erentiable at I fx0 g; if limx!x0 Df (x) = ,
then f is di erentiable at x0 and Df (x0 ) = . As an immediate consequence, Dx jxj = 2 jxj
also at x = 0. (ii) We have x+ = 2 1 (x + jxj). Therefore
2 1 1 1
x+ = (x + jxj)2 = x2 + x jxj
4 2 2
2 2
It follows that (x+ ) is continuously di erentiable and D (x+ ) = x + jxj = 2x+ .
Proof of Lemma 1724 Let k k be the Euclidean norm. We have hj (^ x) < cj for each
j 2
= A (^ x). Since A is an open, there exists ~" > 0 su ciently small such that B~" (^ x) =
fx 2 A : kx x ^k ~"g A. Moreover, since each hj is continuous, for each j 2 = A (^
x) there
exists "j su ciently small such that hj (x) < cj for each x 2 B"j (^ x) = fx 2 A : kx x ^k "j g.
Let "0 = minj 2A(^
= x) "j and ^ " = min f~"; "0 g; in other words, ^" is the minimum between ~" and
10
The objective function is easily see to be strongly concave. So, coda readers may note that the existence
and uniqueness of the solution would also follow from Theorem 1501.
39.5. APPENDIX: PROOF OF A KEY LEMMA 1189
the "j . In this way we have B^" (^ x) = fx 2 A : kx x ^k ^"g A and hj (x) < cj for each
x 2 B^" (^
x) and each j 2 = A (^x).
Given " 2 (0; ^"], the set S" (^
x) = fx 2 A : kx x ^k = "g is compact. Moreover, by what
just seen hj (x) < cj for each x 2 S" (^ x) and each j 2= A (^
x), that is, in S" (^
x) all the non
binding constraints are always satis ed.
For each j 2 J, let h~ j : A Rn ! R be de ned by
~ 2 2 C 1 (A) and
for each x 2 A. By Lemma 1738, h j
~ 2 (x)
@h + @hj (x)
j ~ j (x)
=2 h cj ; 8p = 1; :::; n (39.30)
@xp @xp
Fact 1. For each " 2 (0; ^"], there exists N > 0 such that
f (x) f (^
x) kx x ^ k2 (39.31)
0 1
X X 2
N@ x))2 +
(gi (x) gi (^ ~ j (x)
h ~ j (^
h x) A<0
i2I i2J\A(^
x)
Proof of Fact 1 We proceed by contradiction, and we assume therefore that there exists
" 2 (0; ^"] for which there is no N > 0 such that (39.31) holds. Take an increasing sequence
fNn gn with Nn " +1, and for each of these Nn take xn 2 S" (^ x) for which (39.31) does not
hold, that is, xn such that:
f (xn ) f (^
x) kxn x ^k2
0 1
X X 2
Nn @ x))2 +
(gi (xn ) gi (^ ~ j (xn )
h ~ j (^
h x) A 0
i2I i2J\A(^
x)
f (xn ) f (^
x) kxn ^k2
x X
(gi (xn ) x))2
gi (^ (39.32)
Nn
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
j2J\A(^
x)
Since the sequence fxn g just constructed is contained in the compact set S" (^ x), by the
Bolzano-Weierstrass' Theorem there exists a subsequence fxnk gk convergent in S" (^ x), i.e.,
there exists x 2 S" (^
x) such that xnk ! x . Inequality (39.32) implies that, for each k 1,
1190 CHAPTER 39. INEQUALITY CONSTRAINTS
we have:
f (xnk ) f (^
x) kxnk ^ k2
x X
(gi (xnk ) x))2
gi (^ (39.33)
Nnk
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
k
j2J\A(^
x)
f (xnk ) f (^
x) kxnk ^k2
x
lim =0
k Nnk
~j ,
and hence (39.33) implies, thanks to the continuity of the functions gi and h
X X 2
(gi (x ) x))2 +
gi (^ ~ j (x )
h ~ j (^
h x)
i2I i2J\A(^
x)
0 1
X X 2
= lim @ (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A=0
k
i2I j2J\A(^
x)
2
It follows that (gi (x ) x))2 =
gi (^ ~ j (x )
h ~ j (^
h x)
= 0 for each i 2 I and for each
j 2 J \ A (^x), from which gi (x ) = gi (^
x) = bi for each i 2 I and h~ j (x ) = h
~ j (^
x) = cj for
each j 2 J \ A (^x).
Since in S" (^x) the non binding constraints are always satis ed, i.e., hj (x) < cj for each
x 2 S" (^x) and each j 2 = A (^
x), we can conclude that x satis es all the constraints. We
therefore have f (^x) f (x ) given that x ^ solves the optimization problem.
On the other hand, since xnk 2 S" (^ x) for each k 1, (39.33) implies
f (xnk ) f (^
x)
0 1
X X 2
kxnk ^k2 + Nnk @
x (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A "2
i2I j2J\A(^
x)
for each k 1, and hence f (xnk ) f (^x) + "2 for each k 1. Thanks to the continuity of
f , this leads to
f (x ) = lim f (xnk ) f (^x) + "2 > f (^
x)
k
which contradicts f (^
x) f (x ). This contradiction proves Fact 1. 4
Using Fact 1, we prove now a second property that we will need. Here we set S =
SRjIj+jJj+1 = x 2 RjIj+jJj+1 : kxk = 1 .
Fact 2. For each " 2 (0; ^"], there exist x" 2 B" (^
x) and a vector
X X
" @f @gi " " @hj
0 (x" ) 2 x"j x
^j "
i (x ) j (x" ) = 0 (39.34)
@xz @xz @xz
i2I j2J\A(^
x)
Proof of Fact 2 Given " 2 (0; ^"], let N" > 0 be the positive constant whose existence is
guaranteed by Fact 1. De ne the function " : A Rn ! R as:
0 1
X X 2
" (x) = f (x) f (^
x) kx x ^k2 N" @ x))2 +
(gi (x) gi (^ ~ j (x) h
h x) A
~ j (^
i2I j2J\A(^
x)
The function " is continuous on the compact set B" (^ x) = fx 2 A : kx x ^k "g and, by
Weierstrass' Theorem, there exists x" 2 B" (^ x) such that " (x" ) " (x) for each x 2 B" (^
x).
"
In particular, " (x ) " "
" (^
x) = 0, and hence (39.35) implies that kx k < ", that is, x 2
x). Point x" is therefore a maximizer on the open set B" (^
B" (^ x) and by Fermat's Theorem
we have r " (x" ) = 0. Therefore, by (39.30), we have:
0 1
Xm X
@f
(x" ) 2 (x"z x ^z ) 2N" @ gi (x" )
@gi "
(x ) + ~ j (x" ) @hj (x" )A = 0 (39.36)
h
@xz @xz @xz
i=1 j2J\A(^
x)
so that (39.34) is obtained by dividing (39.36) by c" . Observe that "i 0 for each j 2 J
P " 2 P 2
" " "
and that i2I ( i ) + j2J "j = 1, i.e., " "
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S. 4
Using Fact 2, we can now complete the proof.nTake a decreasing sequence o f"n gn (0; ^"]
n n n n n
with "n # 0, and consider the associated sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj S whose
n
existence is guaranteednby Fact 2. o
n n n n n
Since the sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj is contained in the compact set S, by
n
the Bolzano-Weierstrass' Theorem there exists a subsequence
n o
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj k
1192 CHAPTER 39. INEQUALITY CONSTRAINTS
convergent in S, that is, there exists 0; 1 ; :::; jIj ; 1 ; :::; jJj 2 S such that
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj ! 0; 1 ; :::; jIj ; 1 ; :::; jJj
nk @f X nk @gi nk X nk @hj nk
0 (xnk ) 2 (xnk x
^z ) i (x ) j (x ) = 0
@xz @xz @xz
i2I j2J\A(^
x)
for each z = 1; :::; n. Consider the sequence fxnk gk so constructed. From xnk 2 B"nk (^
x) it
follows that kxn k x^k < "nk ! 0 and hence, for each z = 1; :::; n,
@f X @gi X @hj
0 (^
x) i (^
x) j (x) (39.37)
@xz @xz @xz
i2I j2J\A(^
x)
0 1
nk @f X nk @gi nk X nk @hj nk A
= lim @ 0 xk 2 (xnk x
^z ) i (x ) j (x )
k @xz @xz @xz
i2I j2J\A(^
x)
= 0:
The linear independence of the gradients associated to the constraints that holds for the
hypothesis of regularity of the constraints implies i = 0 for each i 2 I, which contradicts
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S.
General constraints
where X is a subset of A and the other elements are as in the optimization problem (39.4).
This problem includes as special cases the optimization problems that we have seen so far:
we get back to the optimization problem (39.4) when X = A and to an unconstrained
optimization problem when I = J = ; and C = X is open.
Besides its own interest, formulation (40.1) may be also useful when there are conditions
on the sign or on the value of the choice variables xi . The classic example is the non-negativity
condition of the xi , which are best expressed as a constraint x 2 Rn+ rather than through n
inequalities xi 0. Here a constraint of the form x 2 X simpli es the exposition.
In this chapter we address the general optimization problem (40.1). If X is open, the
solution techniques of Section 39.2 can be easily adapted by restricting the analysis on X
itself, which then becomes the ad hoc domain of the objective function f . Matters are more
interesting when X is not open, in particular when it is a closed and convex set. This is
the case that we will consider. We will thus focus on the concave optimization problems
studied in Section 39.4, widely used in applications. Consequently, throughout the chapter
we assume that:
1
Sometimes this distinction is made by talking of implicit and explicit constraints. Di erent authors,
however, may give an opposite meaning to this terminology (that, in any case, we do not adopt).
1193
1194 CHAPTER 40. GENERAL CONSTRAINTS
The set C is closed and convex. As it is often the case, the best way to proceed is to abstract
from the speci c problem at hand, with its potentially distracting details. For this reason,
we will consider the following general optimization problem:
The next lemma gives a simple and elegant way to unify these two cases.
Proposition 1739 If x
^ 2 [a; b] is solution of the optimization problem (40.4), then
f 0 (^
x) (x x
^) 0 8x 2 [a; b] (40.5)
Proof We divide the proof in three parts, one for each of the equivalences to prove.
(i) Let x^ 2 (a; b). We prove that (40.5) is equivalent to f 0 (^
x) = 0. If f 0 (^
x) = 0 holds,
0
then f (^ x) (x x ^) = 0 for each x 2 [a; b], and hence (40.5) holds. Vice versa, suppose that
(40.5) holds. Setting x = a, we have (a x ^) < 0 and so (40.5) implies f 0 (^ x) 0. On
the other hand, setting x = b, we have (b x ^) > 0 and so (40.5) implies f (^ 0 x) 0. In
conclusion, x ^ 2 (a; b) implies f 0 (^
x) = 0.
(iii) Let x ^ = b. We prove that (40.5) is equivalent to f 0 (b) 0. Let f 0 (b) 0. Since
0
(x b) 0 for each x 2 [a; b], we have f (b) (x b) 0 for each x 2 [a; b] and (40.5) holds.
Vice versa, suppose that (40.5) holds. By taking x 2 [a; b), we have (x b) < 0 and so (40.5)
implies f 0 (b) 0.
Proof of Proposition 1739 In view of Lemma 1740, it only remains to prove that (40.5)
becomes a su cient condition when f is concave. Suppose, therefore, that f is concave and
that x
^ 2 [a; b] is such that (40.5) holds. We prove that this implies that x ^ is solution of
problem (40.4). Indeed, by (31.10) we have f (x) f (^ x) + f 0 (^
x) (x x ^) for each x 2 [a; b],
which implies f (x) f (^ x) f 0 (^
x) (x x ^) for each x 2 [a; b]. Thus, (40.5) implies that
f (x) f (^x) 0, that is, f (x) f (^ x) for each x 2 [a; b]. Hence, x^ solves the optimization
problem (40.4).
rf (^
x) (x x
^) 0 8x 2 C (40.6)
As in the scalar case, the variational inequality uni es the optimality necessary conditions
for interior and boundary points. Indeed, it is easy to check that, when x ^ is an interior point
of C, (40.6) reduces to the classic rst-order condition rf (^ x) = 0 of Fermat's Theorem.
1196 CHAPTER 40. GENERAL CONSTRAINTS
0 (t) (0) f (^
x + t (x x ^)) f (^
x)
+ (0) = lim = lim
t!0+ t t!0 + t
df (^
x) (t (x x ^)) + o (kt (x x ^)k)
= lim
t!0 + t
o (t kx x ^k)
= df (^
x) (x x ^) + lim = df (^
x) (x x^) = rf (^
x) (x x
^)
t!0 + t
For each t 2 [0; 1] we have (0) = f (^x) f (zt ) = (t), and so : [0; 1] ! R has a (global)
maximizer at t = 0. It follows that 0+ (0) 0, which implies rf (^ x) (x x ^) 0, as desired.
As to the converse, assume that f is concave. By (31.35), f (x) f (^x) + rf (^
x) (x x ^)
for each x 2 C, and therefore (40.6) implies f (x) f (^x) for each x 2 C.
For the dual minimum problems, the variational inequality is easily seen to take the dual
form rf (^
x) (x x ^) 0 for each x 2 C. For interior solutions, instead, the condition
x) = 0 is the same in both maximization and minimization problems.2
rf (^
min x2 sub x 0
x
rf (^
x) (x x
^) = 2^
x (x x
^) 0 8x 0
NC (x) = fy 2 Rn : y (x x) 0 8x 2 Cg
Next we provide a couple of important properties of NC (x). In particular, (i) ensures that
normal cones are, indeed, cones and (ii) shows that they are non-trivial only for boundary
points.
Proof (i) The set NC (x) is clearly closed. Moreover, given y; z 2 NC (x) and ; 0, we
have
( y + z) (x x) = y (x x) + z (x x) 0 8x 2 C
and so y + z 2 NC (x). By Proposition 859, NC (x) is a convex cone. (ii) We only prove
the \if" part. Let x be an interior point of C. Suppose, by contradiction that there is a
vector y 6= 0 in NC (x). As x is interior, we have that x + ty 2 C for t > 0 su ciently
small. Hence we would have y (x + ty x) = ty y = t kyk2 0. This implies y = 0, a
contradiction. Hence NC (x) = f0g.
To see the importance of normal cones, note that condition (40.6) can be written as:
rf (^
x) 2 NC (^
x) (40.7)
Therefore, x
^ solves the optimization problem (40.3) only if the gradient rf (^
x) belongs to the
normal cone of C with respect to x ^. This way of writing condition (40.6) is useful because,
given a set C, if we can describe the form of the normal cone { something that does not
require any knowledge of the objective function f { we can then have a sense of which form
takes the \ rst-order condition" for the optimization problems that have C as a choice set.
In other words, (40.7) can be seen as a general rst-order condition in which we can dis-
tinguish the part, NC (^ x), determined by the constraint C, and the part, rf (^
x), determined
by the objective function. This distinction between the roles of the objective function and
of the constraint is illuminating.3 For this reason, we report it formally.
The next result characterizes the normal cone for convex cones.
NC (x) = fy 2 Rn : y x = 0 and y x 0 8x 2 Cg
another all-important closed and convex set. To this end, given x 2 n 1 set
The set f y 2 Rn : y 2 I (x) and 0g is easily seen to be the smallest convex cone
that contains I (x). The normal cone is thus such a set.
Proof of Proposition 1747 Suppose that P (x) is not a singleton and let i; j 2 P (x).
Clearly, 0 < xi ; xj < 1. Consider the points x" 2 Rn having coordinates x"i = xi + ",
x"j = xj ", and x"k = xk for all k 6= i and k 6= j; while the parameter " runs over
Pn [ ""0 ; "0 ]
with "0 > 0 su ciently small in order that x " 0 for " 2 [ "0 ; "0 ]. Note that k=1 xk = 1
and so x" 2 n 1 . Let y 2 N n 1 (x). By de nition, y (x" x) 0 for every " 2 [ "0 ; "0 ].
Namely, "yi "yj = " (yi yj ) 0, which implies yi = yj . Hence, it must hold yi = for
all i 2 P (x). That is, the values of y must be constant on P (x). This is trivially true when
P (x) is singleton. Let now j 2= P (x). Consider the vector xj 2 Rn , where xjj = 1 and xjk = 0
for each k 6= j: If y 2 N n 1 (x), then y xj x 0. That is,
X X X
yj yk xk = yj yk xk = yj xk = yj 0
k6=j k2P (x) k2P (x)
Therefore, N n 1 (x) f y 2 Rn : y 2 I (x) and 0g. We now show the converse inclu-
sion. Let y 2 Rn be such that, for some 0, we have yi = for all i 2 P (x) and yk
for each k 2
= P (x). If x 2 n 1 , then
n
X X X
y (x x) = yi (xi xi ) = yi (xi xi ) + yi (xi xi )
i=1 i2P (x) i2P
= (x)
0 1
X X X X
= (xi xi ) + yi xi = @ xi A + yi xi
i2P (x) i2P
= (x) i2P (x) i2P
= (x)
0 1
X X
@ xi A + xi = 0
i2P (x) i2P
= (x)
Hence y 2 N n 1 (x).
C = C1 \ \ Cn
A natural question is whether the n relaxed optimization problems that correspond to the
larger choice sets Ci can be combined to inform on the original optimization problem. The
next result is key, as it provides a condition under which holds an \intersection rule" for
normal cones. It involves the sum
n
( n )
X X
NCi (x) = yi : yi 2 NCi (x) 8i = 1; :::; n
i=1 i=1
int C1 \ \ int Cn 6= ;
where the set Ci itself can replace its interior int Ci if it is a ne.
P
Proof Let xP 2 C. Suppose y = ni=1 yi , with yi 2 NCi (x) for every i = 1; :::; n. Then,
y (x x) = ni=1 yi (x x) 0, and so y 2 NC (x). This proves the inequality. We omit
the proof that Slater's condition implies the equality.
In words, under Slater's condition the normal cone of an intersection of sets is the sum
of their normal cones. Hence, a point x ^ satis es the rst-order condition (40.7) if and only
if there is a vector y^ = (^
y1 ; :::; y^n ) such that
( P
rf (^ x) = ni=1 y^i
y^i 2 NCi (^
x) 8i = 1; :::; n
A familiar \multipliers" format emerges. The next section will show how the Kuhn-Tucker's
Theorem ts in this general framework.
Lemma 1751 The set C satis es Slater's condition if there is x 2 int X such that gi (x) = bi
for all i 2 I and hj (x) < cj for all j 2 J.
40.3. OPENING THE BLACK BOX 1201
\ \
Proof The level sets Ci are a ne (Proposition 828). Since x 2 X \ Ci \ int Cj ,
i2I j2J
this intersection is non-empty and so C satis es Slater's condition.
In what follows we thus assume the existence of such x.5 In view of Proposition 1749, it
now becomes key to characterize the normal cones of the sets Ci and Cj .
where A (x) is the collection of the binding inequality constraints de ned in (39.7). Since in
this concave problem the rst-order condition (40.7) is a necessary and su cient optimality
condition, we can say that x ^ 2 C solves the optimization problem (40.1) if and only if there
^ jJj
exists a triple of vectors ( ; ^ ; ^ ) 2 RjIj R+ Rn such that
8 P P
x) = ^ + i2I ^ i rgi (^
< rf (^ x) + j2J ^ j rhj (^
x)
(40.11)
:
^ j (c hj (^
x)) = 0 8j 2 J
Indeed, as we noted in Lemma 1725, the second condition amounts to require ^ j = 0 for
each j 2
= A (^
x).
To sum up, under a Slater's condition we get back the Kuhn-Tucker's conditions (39.8)
and (39.9), suitably modi ed to cope with the new constraint x 2 X. We leave to the reader
the formulation of these conditions via a Lagrangian function.
5
This also ensures that the problem is well posed in the sense of De nition 1721.
1202 CHAPTER 40. GENERAL CONSTRAINTS
So, condition (40.11) can be equivalently written (with unzipped gradients) as:
8 @f (^x) P
> ^ @gi (^x) P @hj (^
x)
>
> @xk i2I i @xk + j2J ^ j @xk 8k = 1; :::; n
>
>
<
^ j (c hj (^
x)) = 0 8j 2 J
>
>
>
>
: @f (^x) P ^ i @gi (^x) P
> @hj (^x)
@xk i2I @xk j2J ^ j @xk x
^k = 0 8k = 1; :::; n
By considering f , it is easy to see the result continues to hold when in (40.13) we replace
with . The proof is an application of the Brouwer Theorem. On the other hand, it can
be shown6 that the Multivariable Bolzano Theorem implies the Poincare-Miranda Theorem
and so, through it, the Brouwer Theorem. These three theorems are thus equivalent.
Proof Let PK be the projection of K (Section 31.6). Let x 2 Rn . By (31.48), for each y 2 C
it holds
(x PK (x)) (y PK (x)) 0
and so x PK (x) 2 NK (PK (x)). In view of (40.13), we then have
' = PK f PK
This function is bounded. Indeed, since the projection operator is continuous (Corollary
1512) by the Weierstrass Theorem we can set = maxx2K jPK (x)j and = maxx2K jf (x)j
and so, for each x 2 Rn , it holds
j' (x)j = jPK (x) f (PK (x))j jPK (x)j + jf (PK (x))j +
Thus, ' is a self-map on B + (0). We can thus write ' : B + (0) ! B + (0). By the
Brouwer Theorem, there exists a xed point c 2 B + (0) such that
where the last inequality follows from (40.14). We conclude that c 2 K, as desired.
When K is the closed ball B" (0) we get a direct generalization of the scalar Bolzano
Theorem, in which condition (40.13) takes a sharp version.
In the scalar case [ "; "], condition (40.15) amounts to require f ( ") 0 f (") or
f ( ") 0 f ("), that is, f ( ") f (") 0. In turn, this easily implies the scalar version of
Bolzano's Theorem (Theorem 568).
This corollary is an immediate consequence of the Multivariable Bolzano Theorem thanks
to the following characterization of the normal cone of B" (0).
6
See, e.g., Mawhin (2020), from which we also take the next proof.
1204 CHAPTER 40. GENERAL CONSTRAINTS
x (x x) = [x x x x] 0 8x 2 B" (0)
and so x 2 NB" (0) (x). This proves that f x 2 Rn : 0g NB" (0) (x). As to the converse
inclusion, let 0 6= y 2 NB" (0) (x). Since 0 2 B" (0), we have y x 0. On the other hand,
since "y= kyk 2 B" (0) we have
y y y y x
y " x 0 () " y x () " kyk y x () "
kyk kyk kyk
We conclude that
y x
" (40.16)
kyk
By the Cauchy-Schwarz inequality,
that is,
y x
"
kyk
Along with (40.16), this implies that
y x = kyk kxk
The vectors y and x are thus collinear (cf. Theorem 109). As both vectors are di erent from
0, this means that there exists 0 6= 2 R such that y = x. As 0 y x = (x x) and
x x > 0, we conclude that > 0. Hence, y 2 f x 2 Rn : 0g, as desired.
Chapter 41
41.1 De nition
Given a set Rm of parameters and an all-inclusive choice space A Rn , suppose that
each value of the parameter vector determines a choice (or feasible) set ' ( ) A. Choice
sets are thus identi ed, as the parameter varies, by a feasibility correspondence ' : A.
An objective function f : A ! R, de ned over pairs (a; ) of choices a and parameters
, has to be optimized over the feasible sets determined by the correspondence ' : A.
Jointly, ' and f thus determine an optimization problem in parametric form:
When f ( ; ) is, for every 2 , concave (quasi-concave) on the convex set A and ' is
convex-valued, this problem is called concave (quasi-concave).
A vector x
^ 2 ' ( ) is a solution for 2 if it is an optimal choice given , that is,
f (^
x; ) f (x; ) 8x 2 ' ( )
It associates to each the corresponding solution set, i.e., the set of optimal choices. Its
domain S is the solution domain, that is, the collection of all s for which problem
(41.1) admits a solution. If such solution is unique at all 2 S, then is single-valued, that
is, it is a function. In this case : S ! A is a solution function.
1205
1206 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
Example 1758 The parametric optimization problem with equality and inequality con-
straints has the form
If f does not depend on the parameter, and if i (x; ) = gi (x) bi for every i 2 I and
j (x; ) = hj (x) cj for every j 2 J (so that m = jIj + jJj), we get back to the familiar
problem (39.4) studied in Chapter 39, that is,
max f (x)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
In this case, if we set b = b1 ; :::; bjIj 2 RjIj and c = c1 ; :::; cjJj 2 RjJj , the parameter set
consists of all = (b; c) 2 RjIj RjJj . N
is a parametric optimization problem (Section 22.1.4). The set A is the consumption set.
The space Rn+1+ of all price and income pairs is the parameter set , with generic element
= (p; w). The budget correspondence B : Rn+1 + Rn+ is the feasibility correspondence
and the utility function u : A ! R is the objective function (interestingly, in this important
example the objective function does not depend on the parameter).
Let S be the set of all parameters (p; w) for which the consumer problem has
solution (i.e., an optimal bundle). The demand correspondence D : S Rn+ is the solution
n
correspondence, which becomes a demand function D : S ! R+ when optimal bundles are
unique. Finally, the indirect utility function v : S ! R is the value function. N
function smoothly translate in changes in the value functions? To address this question,
in the parametric optimization problem (41.1) we consider two di erent objective functions
f; g : A ! R and, to ease notation, we denote by vf : Sf ! R and vg : Sg ! R their
value functions.
Proposition 1760 Let 2 Sf \ Sg . For each " > 0, if jf (a; ) g (a; )j " for all a 2 A
then jvf ( ) vg ( )j ".
Fortunately, the translation is thus smooth: objective functions that, action by action,
are close induce value functions that, in turn, are close (e.g., close utility functions induce
close indirect utility functions). In terms of value attainment, nothing dramatic happens,
regardless of what happens to the solutions (about them, this result is silent).
f (^
af ; ) + " f (^
ag ; ) + " g (^
ag ; ) f (^
ag ; ) "
So, jvf ( ) vg ( )j = jf (^
af ; ) g (^
ag ; )j ", as desired.
Later in the chapter a much deeper stability result, the Maximum Theorem, will address
other fundamental continuity questions. Unlike this preliminary result, it will be able to say
something about solutions.
41.2 An illustration
Given an element 0 of the simplex n 1, de ne the parametric objective function
f : n 1 Rn ! R by
n
X n
X xi
f (x; ) = i xi + xi log
i=1 i=1 i
with the convention 0 log 0 = 0.1 We consider the parametric optimization problem
The objective function is continuous and strictly convex in x since it can be written as
n
X n
X n
X
f (x; ) = i xi + xi log xi xi log i
i=1 i=1 i=1
P
and the entropy ni=1 xi log xi is continuous and strictly convex (Example 1495). So, problem
(41.4) is concave and, for each 2 Rn , has a unique solution. It thus features a solution
function, which the next result identi es.
1
Recall from Example 1495 the meaning of this convention.
1208 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
e i
i
( ) = Pn 8i = 1; :::; n (41.5)
i=1 e
i
i
n
X
v( ) = log e i
i (41.6)
i=1
In particular,
rv ( ) = ( ) 8 2 Rn (41.7)
Later in the chapter, we will study in some detail these solution and value functions
(Section 41.9).
Pn
Proof Fix 0 y 2 n 1 and de ne hy : n 1 ! R by hy (x) = i=1 xi log (xi =yi ), with
the convention 0 log 0 = 0. By the log-sum inequality, we have
n n
! Pn
X xi X xi
hy (x) = xi log > xi log Pi=1
n = 0 = hy (y) 8y 6= x 2 n 1 (41.8)
yi i=1 yi
i=1 i=1
We have
n
X n
X
e i
i e i
i e i
f (^
x; ) = i Pn + Pn log Pn
i=1 e i=1 e i=1 e
i i i
i=1 i i=1 i i
n n n
!!
1 X X X
= Pn ie
i
i + e i
i i log e i
i
i=1 e
i
i i=1 i=1 i=1
n n n
! n
!
1 X X X X
= Pn i ie
i
ie
i
i e i
i log e i
i
i=1 e
i
i i=1 i=1 i=1 i=1
n
! n n
1 X X X
= Pn e i
i log e i
i = log e i
i
i=1 e
i
i i=1 i=1 i=1
41.3. BASIC PROPERTIES 1209
Xn
e i
= i + log Pn xi + hx^ (x)
i=1 e
i
i=1 i
n n
!!
X X
= i + i log e i
i xi + hx^ (x)
i=1 i=1
n
X
= log e i
i + hx^ (x)
i=1
We conclude that (41.5) is the solution function of problem (41.4) and that (41.6) is its value
function. Finally, (41.7) is readily checked.
We now turn to convexity properties. In the next three results we assume that the set A
is convex and, to ease matters, that is viable.
1210 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
f (^
x1 ; ) f( x
^1 + (1 )x
^2 ; ) min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^
x2 ; ) = v ( )
and so f ( x
^1 + (1 )x
^2 ; ) = v ( ), i.e., x
^1 + (1 )x
^2 2 ( ).
The convexity of the solution set means inter alia that, when non-empty, such a set is
either a singleton or an in nite set. That is, either the solution is unique or there is an
in nite number of them. Next we give the most important su cient condition that ensures
uniqueness.
Turn now to value functions. In the following result we assume the convexity of the graph
of '. As we already remarked, this is a substantially stronger assumption than the convexity
of the images ' (x).
v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
min ff (^
x1 ; 1) ; f (^
x2 ; 2 )g = min fv ( 1 ) ; v ( 2 )g
41.4. MAXIMUM THEOREM 1211
v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
f (^
x1 ; 1) + (1 ) f (^
x2 ; 2) = v ( 1 ) + (1 ) v ( 2)
So, v is concave.
Example 1766 In the consumer problem, the graph of the budget correspondence is convex
if the consumption set is convex. Indeed, let ((p; w) ; x) ; ((p0 ; w0 ) ; x0 ) 2 Gr B and let 2
[0; 1]. Then, p ( x + (1 ) x0 ) w+(1 ) w0 , so the set Gr B is convex. By Proposition
1763, the demand correspondence is convex-valued if the utility function is quasi-concave,
while by Proposition 1765 the indirect utility is quasi-concave (concave) if the utility function
is quasi-concave (concave). N
If ' is bounded and continuous and f is continuous, then is viable, bounded, compact-valued
and upper hemicontinuous, and v is continuous.
Under the continuity of both the objective function and feasibility correspondence, the
optimization problem is thus stable under changes in parameters: both the value function
and the solution correspondence are continuous. The Maximum Theorem is an important
result in applications because, as remarked before, the stability that it ensures is often a
desirable property of the optimization problems that they feature. Natura non facit saltus
as long as the hypotheses of the Maximum Theorem are satis ed.
Lemma 1768 Given any bounded sequence of scalars fan g, if lim supn!1 an = a then there
exists a subsequence fank g such that limk!1 ank = a.
2
It is named after Claude Berge, who proved it in 1959.
1212 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
Proof For k = 1 de ne
n1 = min fn 1 : jan aj < 1g
and, recursively, for each k 2,
1
nk = min n 1 : n > nk 1 and jan aj <
k
In this way, fank g is a subsequence of fan g because by construction nk > nk 1 for every k
2. At the same time, fank g converges to a as, again by construction, it holds jank aj < 1=k
for every k 1. Thus, fank g is the subsequence we are looking for, provided we show that
it is well de ned. This requires to show that the sets whose minima we are taking are not
empty, so that these minima well de ned. The rest of the proof is devoted to show exactly
this.
For each n 1, set bn = supm n am 2 R. Recall that a = limn!1 bn = inf n An . Fix
any " > 0. Since bn converges to a, there exists some n" 1 such that bn" a < "=2. Since
bn = supm n am there is some m n" such that bn" "=2 am bn" . In turn, this easily
implies that
jam aj = jam bn" + bn" aj jam bn" j + jbn" aj < "
| {z } | {z }
"=2 <"=2
Summing up, for each " > 0 the set fn 1 : jam aj < "g is not empty. In turn, this is
easily seen to imply that the sets used to de ne nk are not empty, as desired.
Proof of the Maximum Theorem Since ' is bounded, recall that there exists a compact
set K such that ' ( ) K A for all 2 . Suppose that ' and f are continuous. By
Proposition 958, the set ' ( ) is closed for each 2 . Since ' is bounded, ' ( ) turns out
to be compact as well. By Proposition 1762, S = and is compact-valued. Fix any point
2 and consider a sequence f n g such that limn!1 n = . Next, we rst prove
that fv ( n )g is bounded. By contradiction, assume that supn jv ( n )j = +1. It follows that
there exists a subsequence f nk g such that jv ( nk )j k for every k 1. For each s 1, let
x
^nk 2 ' ( nk ) such that v ( nk ) = f (^
xnk ; nk ) for every s. By Bolzano-Weierstrass' Theorem
and since ' is bounded, there exists a subsequence x ^ks that converges to x 2 K. Since '
is continuous and lims!1 nks = , we can conclude that x 2 ' . Since f is continuous,
this implies that
+1 = lim v nks = lim f x
^nks ; nks = f x; < +1
s!1 s!1
so limn!1 v ( n ) = v .
It remains to show that is upper hemicontinuous at . Let n ! and xn ! x
with xn 2 ( n ). We want to show that x 2 . Since ( n ) ' ( n ) and ' is upper
hemicontinuous, clearly x 2 ' . By the continuity of both f and v, we then have
and so x 2 , as desired.
The next example show that the joint continuity of the objective function in its two
arguments is needed in the Maximum Theorem.
Example 1769 De ne f : R2 ! R by
(
0 if (x; ) = (0; 0)
f (x; ) = 2x
x2 + 2 else
As noted by Schwarz (1872) p. 220, this function is separately continuous (why?), but
discontinuous at the origin (cf. Example 559). Indeed, let us approach the origin along the
45 degree line (x; ) = (t; t), with t 2 R. We have
2t2
lim f (t; t) = lim = 1 6= 0 = f (0)
t!0 t!0 t2 + t2
where ' ( ) = [ 1; 1] for each 2 [ 1; 1] (so, ' is both bounded and continuous). We have
(
0 if = 0
v( ) =
1 else
The value function is thus discontinuous at 0 (it is actually lower semicontinuous there, as
remarked by Baire, 1927). N
1214 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
The continuity properties of demand correspondences and indirect utility functions follow
from the Maximum Theorem, a remarkable rst dividend of this result. To this end, we need
the following continuity property of the budget correspondence.
Proposition 1770 The budget correspondence is continuous at all (p; w) such that w > 0.
Proof Let (p; w) 2 Rn+ R++ . We rst show that B is upper hemicontinuous at (p; w). Let
(pn ; wn ) ! (p; w), xn ! x and xn 2 B (pn ; wn ). We want to show that x 2 B (p; w). Since
p xn wn for each n, it holds p x = limn!1 p xn limn!1 wn = w, that is, x 2 B (p; w).
We conclude that B is upper hemicontinuous at (p; w).
The correspondence B is also lower hemicontinuous at (p; w) 2 Rn+1
+ . Let (pn ; wn ) !
(p; w) and x 2 B (p; w). We want to show that there is a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We consider two cases.
(i) Suppose p x < w. Since (pn ; wn ) ! (p; w), there is n large enough so that pn x < wn
for all n n. Hence, the constant sequence xn = x is such that xn 2 B (pn ; wn ) for all
n n and xn ! x.
(ii) Suppose p x = w. Since w > 0, there is x 2 Rn+ such that p x < w. Since
(pn ; wn ) ! (p; w), there is n large enough so that pn x < wn for all n n. Set
1 1
xn = 1 x+ x
n n
In both cases it then easily follows the existence of a sequence fxn g such that xn 2
B (pn ; wn ) and xn ! x. We conclude that B is lower hemicontinuous at (p; w).
We can now apply the Maximum Theorem to the consumer problem that, under a mild
continuity hypothesis on the utility function, turns out to be stable with respect to changes
in prices and wealth.
(i) the demand correspondence is compact-valued and upper hemicontinuous at (p; w);
Proof Since the consumption set is compact, the budget correspondence is bounded and
continuous on Rn+ R++ . Since the utility function is continuous, the result then follows
from the Maximum Theorem.
Observe that (i) implies that demand functions are continuous at (p; w) since upper
hemicontinuity and continuity coincide for bounded functions (Proposition 960).
41.5. ENVELOPE THEOREMS I: FIXED CONSTRAINT 1215
The Maximum Theorem has some remarkable consequences on the study of equations.
Indeed, consider the parametric equation
f (x; ) = y0
Corollary 1772 Assume that Sy0 ( ) 6= ; for all 2 . If f is continuous and A is bounded,
then Sy0 : Rn is viable, compact-valued and upper hemicontinuous.
Proof Consider the parametric version of the optimization problem (37.17), i.e.,
where the feasibility correspondence is constant, with ' ( ) = C A for all 2 . The
parameter only a ects the objective function. To ease matters, throughout the section we
also assume that S = .
1216 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
@f ( ( 0 ) ; 0) 0 @f ( ( 0 ) ; 0)
v0 ( 0) = ( 0) +
@x @
Remarkably, the rst term is null because by Fermat's Theorem (@f =@x) ( ( 0 ) ; 0) = 0
(provided the solution is interior). Thus,
@f ( ( 0 ) ; 0)
v0 ( 0) = (41.11)
@
Next we make general and rigorous this important nding.
@v ( 0 ) @f (^
x; 0 )
= 8i = 1; :::; k (41.12)
@ i @ i
@v ( 0 ) @f ( ( 0 ) ; 0)
= 8i = 1; :::; k
@ i @ i
We thus have
w( 0 + tu) w ( 0) v( 0 + tu) v ( 0)
t t
for all u 2 Rk and t > 0 su ciently small. Hence,
@f (x; 0 ) f x ( 0) ; + hei
0 f (x ( 0 ) ; 0) w 0 + hei w ( 0)
= lim = lim
@ i h!0+ h h!0+ h
v i
0 + he v ( 0) @v ( 0 )
lim =
h!0+ h @ i
for all u 2 Rk and t < 0 su ciently small. By proceeding as before, we then have
@f (x; 0 ) @v ( 0 )
@ i @ i
This proves (41.12).
The hypothesis that v is di erentiable is not that appealing because it is not in terms
of the primitive elements f and C of problem (41.10). Indeed, to check it we need to know
the value function. Remarkably, in concave problems this di erentiability hypothesis follows
from hypotheses that are directly on the objective function.
Theorem 1774 Let C and be convex. Suppose f (x; ) is, for every x 2 C, di erentiable
at 0 2 int . If f is concave on C , then v is di erentiable at 0 .
w( ) v( ) v ( 0) + ( 0) = w ( 0) + ( 0)
and ( ) is the unique solution that corresponds to . A heuristic application of the chain
rule suggests that, if exists, the derivative of v at 0 is
@f ( ( 0 ) ; 0) ^ ( 0) @ ( ( 0) ; 0)
v0 ( 0) =
@ @
where ^ ( 0 ) is the Lagrange multiplier that corresponds to the unique solution ( 0 ). Indeed,
being ( ( ) ; ) = 0 for every 2 , by a heuristic application of the chain rule we have
@ ( ( 0) ; 0) 0 @ ( ( 0) ; 0)
( 0) + =0
@x @
On the other hand, being v ( ) = f ( ( ) ; ) for every 2 , again by a heuristic application
of the chain rule we have
@f ( ( 0 ) ; 0 ) 0 @f
v0 ( 0) = ( 0) + ( ( 0) ; 0)
@x @
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) ^ @ ( ( 0) ; 0) 0
= ( 0) + ( 0) ( 0)
@x @x @x
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) 0
= ( 0 ) 0 ( ( 0 )) 0 ( 0 ) + ^ ( 0 ) ( 0)
@x @x
| {z }
=0
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0)
= ( 0)
@ @
as desired. Next we make more rigorous and general the result. We study the case of unique
solutions, common in applications.
Theorem 1775 Suppose that problem (41.14) has a unique solution ( ) at all 2 .3
Suppose that the sets A and are open and that f and are continuously di erentiable on
A . If the determinant of the Jacobian of the operator (rx L; ) is non-zero on , then
rv ( ) = r f ( ( ) ; ) ^( ) r ( ( ); ) 8 2
for all 2 .
Proof As in the heuristic argument, we consider the case n = k = m = 1 (the general case
being just notationally messier). By hypothesis, there is a solution function : ! A. By
3
Earlier in the chapter we saw which conditions ensure the existence and uniqueness of solutions.
41.7. MARGINAL INTERPRETATION OF MULTIPLIERS 1219
Lagrange's Theorem, is then the unique function that, along with a \multiplier" function
^ : ! R, satis es for all 2 the equations
@f ( ( ) ; ) ^ @ ( ( ); )
rx L( ( ) ; ^ ( )) = ( ) =0
@x @x
r L( ( ) ; ^ ( )) = ( ( ); ) = 0
@v ( 0 ) @f (^
x; 0 ) X X @ ( ( 0) ; 0)
= ^i ( 0) @ i( ( 0) ; 0)
^j ( 0)
j
(41.18)
@ s @ s @ s @ s
i2I j2J
jJj
for every s = 1; :::; k. Here ( ^ ( 0 ) ; ^ ( 0 )) 2 RjIj R+ are the Lagrange multipliers associated
with the solution ( 0 ), assumed to be unique (for simplicity).
We can derive heuristically this formula with the heuristic argument that we just used for
the equality case. Indeed, if we denote by A ( ( 0 )) be the set of the binding constraints at
0 , by Lemma 1725 we have ^ j = 0 for each j 2 = A ( ( 0 )). So, the non-binding constraints
at 0 do not a ect the derivation because their multipliers are null.
That said, let us consider the standard problem (39.4) in which the objective function does
not depend on the parameter, i (x; ) = gi (x) bi for every i 2 I, and j (x; ) = hj (x) cj
for every j 2 J (Example 1758). Formula (41.18) then implies
@v (b; c)
= ^ i (b; c) 8i 2 I
@bi
@v (b; c)
= ^ j (b; c) 8j 2 J
@cj
1220 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
Interestingly, the multipliers describe the marginal e ect on the value function of relaxing
the constraints, that is, how much it is valuable to relax them. In particular, we have
@v (b; c) =@cj = ^ j (b; c) 0 because it is always bene cial to relax an inequality constraint:
more alternatives become available. In contrast, this might not be the case for an equality
constraint, so the sign of @v (b; c) =@bi = ^ i (b; c) is ambiguous.
The next class of functions will play a key role in our analysis.
@f (x; ) @f (x; )
=0 81 i 6= j m and =1 81 i n; 81 j m
@ i@ j @xi @ j
We can now address the question that we posed at the beginning of this section. To ease
matters, from now on we assume that problem (41.19) has a solution for every 2 (e.g.,
I is compact and f is continuous in x), so that we can write the solution correspondence as
: Rn . In most applications, comparative statics exercises actually feature solution
functions : ! Rn rather than correspondences (as we already argued several times).
This motivates the next result.
Example 1782 From the last example we know that, given a supermodular function :
I ! R, the function f : I Rn Rn ! R de ned by f (x; ) = (x) + x is
parametrically supermodular. Consider the parametric problem
where the feasibility correspondence ' is ascending. By the last corollary, the solution
correspondence of this problem is ascending. For instance, consider a Cobb-Douglas pro-
duction function (x1 ; x2 ) = x1 1 x2 2 , with 1 ; 2 > 0. If q 0 is the output's price and
p = (p1 ; p2 ) 0 are the inputs' prices, the pro t function (x; q) = qx1 1 x2 2 p1 x1 p2 x2 is
parametrically supermodular because is supermodular (see Example 937). The producer
problem is
max (x; q) sub x1 ; x2 0
x1 ;x2
where output's price q plays the role of the parameter . Since the pro t function is strictly
concave, solutions are unique (if they exist). In particular, a solution of the producer problem
is an optimal amount of inputs that the producer will demand. By the last corollary,5 the
solution function is increasing: if the output's price increases, the inputs' demand of the
producer increases. N
and
0 0
f y; > f x _ y; =) f (x ^ y; ) > f (x; ) (41.22)
Proof In view of Example 912, g is Lipschitz, concave and translation invariant. It is easy
to check that g is also strongly increasing and normalized: g (k) = k for all k 2 R.
To prove the sandwich (41.23), set x = min fx1 ; :::; xn g and = min f 1 ; :::; n g 2
(0; 1). Since g is strongly increasing Pand normalized, we have g (x) g (x ) = x for all
x 2 Rn . On the other hand, we have ni=1 i e xi e x and so
n
X
1 xi log
g (x) = log ie x
i=1
as desired.
The sandwich (41.23) shows that, as diverges to +1, the log-exponential function
better and better approximate the Leontief function:
n
X
1 xi
lim log ie = min fx1 ; :::; xn g 8x 2 Rn (41.24)
!+1
i=1
6 1 Pn xi
Recall that in Example 912 we brie y studied a more general version g (x) = log i=1 ie of
this function. Here we require = .
1224 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
So,
n
X
1 xi
lim log ie = max fx1 ; :::; xn g 8x 2 Rn (41.26)
!+1
i=1
The softmax operator f = (f1 ; :::; fn ) : Rn ! n 1 indexed by > 0 is de ned by
e xi
i
fi (x) = Pn xi
8i = 1; :::; n
i=1 e i
The softmax operator is the gradient operator of the log-exponential function,7 i.e.,
f (x) = rg (x) 8x 2 X
Being the gradient of a Lipschitz, strongly increasing and concave function, the softmax oper-
ator is, ipso facto, cyclically monotone (Theorem 1529): for any nite sequence x0 ; x1 ; :::; xm
of vectors in Rn , it holds
To see another remarkable property of the softmax operator, observe that the vector of Rn
de ned by
e xi i
pi = Pn xi
8i = 1; :::; n
i=1 e i
is an element of the simplex, i.e., p 2 n 1 . In particular, we can interpret pi as the
probability that the component xi of vector x 2 Rn is selected, say by a suitable random
device calibrated with these probabilities.
Thus, the probability that the random device selects a non-minimum component of x
goes to 0 as diverges to +1. So, with a higher and higher probability the random device
select a minimum component of the vector x. In particular, when the minimum component
is unique, the random device eventually selects it, i.e.,
8j 6= i; xi < xj =) lim pi = 1
!+1
7
Sometimes we talk of softmax functions, a legitimate (as operators are functions) but less precise termi-
nology.
41.9. APPROXIMATIONS: THE LAPLACE METHOD 1225
Proof Since x is non-constant, assume without loss of generality that x1 > xn = min fx1 ; :::; xn g.
We want to show that lim !+1 p1 = 0. Then
e x1 1 1
1
p1 = Pn xi
= Pn e xi = Pn 1 !0
i=1 e
(x1 xi ) (x1 xn )
i=1 e +e
i i n
i i=1 e x1
1 1 1
As to the max, we can consider the dual softmax operator f = (f1 ; :::; fn ) : Rn ! n 1
de ned by
e xi i
fi (x) = Pn xi
8i = 1; :::; n
i=1 e i
e xi
i
qi = Pn xi
8i = 1; :::; n
i=1 e i
Suppose, without loss of generality, that the set Z = fz1 ; :::; zn g consists of n alternatives. Set
xi = u (zi ) for each i = 1; :::; n and consider a uniform , i.e., i = 1=n for each i = 1; :::; n.
By (41.26) and (41.28), it is easy to see that
n
X
1 u(zi )
lim log e = max u (z)
!+1 z2Z
i=1
and, if z^ is unique,
e u(^
z)
lim Pn u(z)
=1
i=1 e
!+1
Thus, as diverges to +1, the log-exponential function better and better approximates the
maximum value maxz2Z u (z), while via the softmax operator we can construct a random
device that, with a higher and higher probability, selects the maximizer z^.
1226 CHAPTER 41. PARAMETRIC OPTIMIZATION PROBLEMS
Back to the subject matter of this chapter, consider a parametric optimization problem
where ' ( ) is a nite set for each 2 . Assume that there is a unique solution for each
2 . In view of what we just proved, for each 2 we have
1 X
u(z)
v ( ) = lim log e
!+1
z2'( )
and
e u( ( ))
lim Pn u(z)
=1
i=1 e
!+1
We thus have approximations, deterministic and probabilistic, of the solution and value
functions : ! Z and v : ! R. Jointly, they form the Laplace (approximation)
method.
Chapter 42
Interdependent optimization
So far we have considered individual optimization problems. Many economic and social phe-
nomena, however, are characterized by the interplay of several such problems, in which the
outcomes of agents' decisions depend on their decisions as well as on the decisions of other
agents. Market interactions are an obvious example of interdependence among agents' deci-
sions: for instance, in an oligopoly problem the pro ts that each producer can earn depends
both on his production decision and on the production decisions of the other oligopolists.
Interdependent decisions must coexist: the mutual compatibility of agents' decisions is
the novel conceptual issue that emerges in the study of interdependent optimization. Equi-
librium notions address this issue. In this chapter we present an introductory mathematical
analysis of this most important topic, which is the subject matter of game theory and is at
the heart of economic analysis. In particular, the theorems of von Neumann and Nash that
we will present in this chapter are wonderful examples of deep mathematical results that
have been motivated by economic applications.
In other words, (^
x1 ; x
^2 ) is a saddle point if the function f (^
x1 ; ) : C2 ! R has a minimum
at x
^2 and the function f ( ; x ^2 ) : C1 ! R has a maximum at x ^1 . To visualize these points,
think of centers of horse saddles: these points at the same time maximize f along one
dimension and minimize it along the other, perpendicular, one. This motivates their name.
Their nature is clari ed by the next characterization.
1
Since we have inf and sup, we must allow the values 1 and +1, respectively.
1227
1228 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
(i) the function inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) attains its maximum value at x
^1 ,
(ii) the function supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x
^2 ,
This characterization consists of two optimization conditions, (i) and (ii), and a nal
condition, (iii), that requires their mutual consistency. Let us consider these conditions one
by one.
By condition (i), the component x ^1 of a saddle point, called maximinimizer, solves the
following optimization problem, called maximinimization (or primal ) problem,
where inf x2 2C2 f ( ; x2 ) : C1 ! [ 1; +1) is the objective function. If f does not depend on
x2 , this problem reduces to the standard maximization problem
where supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] is the objective function. If f does not depend
on x1 , this problem reduces to the standard minimization problem
The optimization conditions (i) and (ii) have standard optimization (maximization or
minimization) problems as special cases, so conceptually they are generalizations of famil-
iar notions. In contrast, the consistency condition (iii) is the actual novel feature of the
characterization in that it introduces a notion of mutual consistency between optimization
problems, which are no longer studied in isolation, as we did so far. The scope of this
condition will become more clear with the notion of Nash equilibrium.
The proof of Proposition 1790 relies on the following simple but important lemma (inter
alia, it shows that the more interesting part in an equality sup inf = inf sup is the inequality
sup inf inf sup).
42.1. MINIMAX THEOREM 1229
Then, inf x2 2A2 supx1 2A1 f (x1 ; x2 ) supx1 2A1 inf x2 2A2 f (x1 ; x2 ).
inf f (^
x1 ; x2 ) = f (^ ^2 ) = sup f (x1 ; x
x1 ; x ^2 ) (42.6)
x2 2C2 x1 2C1
So,
sup inf f (x1 ; x2 ) f (^
x1 ; x
^2 ) inf sup f (x1 ; x2 )
x1 2C1 x2 2C2 x2 2C2 x1 2C1
By the previous lemma, the inequalities are actually equalities, that is,
inf f (^
x1 ; x2 ) = sup inf f (x1 ; x2 ) and sup f (x1 ; x
^2 ) = inf sup f (x1 ; x2 )
x2 2C2 x1 2C1 x2 2C2 x1 2C1 x2 2C2 x1 2C1
inf f (^
x1 ; x2 ) = f (^
x1 ; x
^2 ) = sup f (x1 ; x
^2 )
x2 2C2 x1 2C1
The last proposition implies the next remarkable interchangeability property of saddle
points.
In words, if we interchange the two components of a saddle point, we get a new saddle
point.
supx1 2C1 f (x1 ; ) : C2 ! ( 1; +1] attains its minimum value at x ^02 . In turn, by the \if"
part of Proposition 1790 this implies that (^
x1 ; x0
^2 ) is a saddle point of f on C1 C2 .
@f (x1 ; x2 ) @f (x1 ; x2 )
rx1 f (x1 ; x2 ) = ; ::::;
@x11 @x1m
@f (x1 ; x2 ) @f (x1 ; x2 )
rx2 f (x1 ; x2 ) = ; ::::;
@x21 @x2n
This distinction is key for the next di erential characterization of saddle points.
(i) Ci is a closed and convex subset of the open and convex set Ai for i = 1; 2;
If (^
x1 ; x
^2 ) 2 C1 C2 is a saddle point of f on C1 C2 , then
rx1 f (^
x1 ; x
^2 ) (x1 x
^1 ) 0 8x1 2 C1 (42.7)
rx2 f (^
x1 ; x
^2 ) (x2 x
^2 ) 0 8x2 2 C2 (42.8)
When x
^1 is an interior point, condition (42.7) takes the simpler Fermat's form
rx1 f (^
x1 ; x
^2 ) = 0
and the same is true for condition (42.8) if x
^2 is an interior point. Remarkably, conditions
(42.7) and (42.8) become necessary and su cient when f is a saddle function on C1 C2 ,
i.e., when f is concave in x1 2 C1 and convex in x2 2 C2 . Saddle functions have therefore
for saddle points the remarkable status that concave and convex functions have in standard
optimization problems for maximizers and minimizers, respectively.
Example 1794 Consider the saddle function f : R2 ! R de ned by f (x1 ; x2 ) = x21 x22 .
Since
@f (x1 ; x2 ) @f (x1 ; x2 )
= = 0 () x1 = x2 = 0
@x1 @x2
from the last theorem it follows that the origin (0; 0) is the only saddle point of f on R2 (cf.
Example 1304). Graphically:
0
x3
-2
-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1
The previous result establishes, inter alia, the existence of saddle points under di eren-
tiability and concavity assumptions on the function f . Next we give a fundamental existence
result, the Minimax Theorem, that relaxes these requirements on f , in particular it drops
any di erentiability assumption. It requires, however, the sets C1 and C2 to be compact (as
usual, there are no free meals).
Theorem 1795 (Minimax) Let f : A1 A2 Rn Rm ! R be a real-valued function and
C1 and C2 subsets of A1 and A2 , respectively. Suppose that:
(i) C1 and C2 are convex and compact subsets of A1 and A2 , respectively;
(ii) f ( ; x2 ) : A1 ! R is continuous and quasi-concave on C1 ;
(iii) f (x1 ; ) : A2 ! R is continuous and quasi-convex on C2 .
Then, f has a saddle point on C1 C2 , with
max min f (x1 ; x
^2 ) = f (^
x1 ; x
^2 ) = min max f (x1 ; x2 ) (42.9)
x1 2C1 x2 2C2 x2 2C2 x1 2C1
1232 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
Proof The existence of the saddle point follows from Nash's Theorem, which will be proved
below. Since the sets C1 and C2 are compact and the function f is continuous in x1 and
in x2 , by Weierstrass' Theorem we can de ne the functions minx2 2C2 f ( ; x2 ) : C1 ! R and
maxx1 2C1 f (x1 ; ) : C2 ! R. So, (42.2) implies (42.9).
The Minimax Theorem was proved in 1928 by John von Neumann in his seminal paper
on game theory. Interestingly, the choice sets C1 and C2 are required to be convex, so they
have to be in nite (unless they are singletons, a trivial case).
A simple, yet useful, corollary of the Minimax Theorem is that continuous saddle func-
tions on a compact convex set C1 C2 have a saddle point on C1 C2 . If, in addition, they
are di erentiable, conditions (42.7) and (42.8) then characterize any such point.
fi : C1 Cn ! R
For instance, the objective function f1 of agent 1 depends on the agent decision x1 , as well
on the decisions x2 , ...., xn of the other agents. In the oligopoly example below, x1 is the
production decision of agent 1, while x2 , ...., xn are the production decisions of the other
agents.
Decisions are simultaneous, described by a vector (x1 ; :::; xn ). The operator f = (f1 ; :::; fn ) :
C1 Cn ! Rn , with
describes the value fi (x1 ; :::; xn ) that each agent attains at (x1 ; :::; xn ). The operator f is an
interdependent objective function called game.
Example 1796 Consider n rms that produce the same output, say potatoes, that they
sell in the same market. The market price of the output depends on the total output
that together all rms o er. Assume that the output has a strictly decreasing demand
function 1
Pn D : [0; 1) ! [0; 1) in the market. So, D (q) is the market price of the output if
q = i=1 qi is the sum of the individual quantities qi 0 of the output produced by each
n
rm i = 1; :::; n. The pro t function i : R+ ! R of rm i is
1
i (q1 ; :::; qn ) =D (q) qi ci (qi )
where ci : [0; 1) ! R is its cost function. Thus, the pro t of rm i depends via q on
the production decisions of all rms, not just on their own decisions qi . We thus have an
interdependent optimization problem, called Cournot oligopoly. Here the choice sets Ci are
the positive half-line [0; 1) and the game f is given by = ( 1 ; :::; n ) : Rn+ ! Rn . N
5
In game theory agents are often called players (or co-players or opponents).
42.2. NASH EQUILIBRIA 1233
To introduce the next equilibrium notion, to x ideas we rst consider the case n = 2
of two agents. Here f : C1 C2 ! R2 with f (x1 ; x2 ) = (f1 (x1 ; x2 ) ; f2 (x1 ; x2 )). Suppose a
decision pro le (^
x1 ; x
^2 ) 2 C1 C2 is such that
f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1 (42.10)
f2 (^
x1 ; x
^2 ) f2 (^
x1 ; x2 ) 8x2 2 C2
In this case, each agent is doing his best given what the other agent does. Agent i has no
incentive to deviate from x ^i { that is, to select a di erent decision { as long as he knows
that the other agent (his \opponent"), denoted i, is playing x ^ i .6 In this sense, decisions
(^
x1 ; x
^2 ) are mutually compatible.
All this motivates the following classic de nition proposed in 1950 by John Nash, which
is the most important equilibrium notion in economics. Here for each agent i we denote by
x i 2 C i = j6=i Cj the decision pro le of his opponents.
fi (^
x) fi (xi ; x
^ i) 8xi 2 Ci (42.11)
The vector x^ is thus a Nash equilibrium of the game f : C ! Rn . In the case n = 2, the
equilibrium conditions becomes (42.10). The interpretation is similar: each agent i has no
incentive to deviate from x ^i as long as he knows that his opponents are playing x
^ i . Note
that the de nition of Nash equilibrium does not require any structure on the choice sets Ci .
The scope of this de nition is, therefore, huge. Indeed, it has been widely applied in many
disciplines, within and outside the social sciences.
N.B. Nash equilibrium is de ned purely in terms of agents' individual decisions xi , unlike the
notion of Arrow-Debreu equilibrium (Section 22.9) that involves a variable, the price vector,
which is not under the control of agents. In this sense, the Arrow-Debreu equilibrium is a
spurious equilibrium notion from a methodological individualism standpoint, though most
useful in understanding markets' behavior.7 O
where the opponents' decisions x i play the role of the parameter. The solution corre-
spondence i : C i Ci de ned by i (x i ) = arg maxxi fi (xi ; x i ) is called best reply
correspondence. We can reformulate the equilibrium condition (42.11) as
x
^i 2 i (^
x i) 8i = 1; :::; n (42.12)
6
How such mutual understanding among agents emerges is a non-trivial conceptual issue from which we
abstract away, leaving it to game theory courses.
7
Methodological principles are important but a pragmatic attitude should be kept not to transform them
in dogmas.
1234 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
max fi (xi ; x
^ i) sub xi 2 Ci (42.13)
xi
In turn, this easily leads to a di erential characterization of Nash equilibria via Stam-
pacchia's Theorem. To ease matters, we assume that each Ai is a subset of the same space
Rm , so that both A and C are subsets of (Rm )n .
(i) Ci is a closed and convex subset of the open and convex set Ai ;
If x
^ = (^
x1 ; :::; x
^n ) 2 C is a Nash equilibrium of f on C, then, for each i = 1; :::; n,
rxi fi (^
x) (xi x
^i ) 0 8xi 2 Ci (42.14)
When m = 1, so that each Ai is a subset of the real line, the condition takes the simpler
form:
@fi (^
x)
(xi x^i ) 0 8xi 2 Ci
@xi
Moreover, when x
^i is an interior point of Ci , the condition takes the Fermat's form
rxi fi (^
x) = 0 8xi 2 Ci (42.15)
Example 1799 In the Cournot oligopoly, assume that both the demand and cost functions
are linear, where D 1 (q) = a bq and ci (qi ) = cqi with a > c and b > 0. Then, the pro t
function of rm i is i (q1 ; :::; qn ) = (a bq) qi cqi , which is strictly concave in qi . The
choice set of rm i is the set Ci = [0; +1). By the last proposition, the rst-order condition
(42.14) is necessary and su cient for a Nash equilibrium (^ q1 ; :::; q^n ). This condition is, for
every i,
@ i (^
q1 ; :::; q^n )
(qi q^i ) = (a b^
q b^
qi c) (qi q^i ) 0 8qi 0
@qi
So, for every i we have a b^ q b^ qi = c if q^i > 0, and (a b^
q b^
qi ) c if q^i = 0.
We have q^i > 0 for every i. Indeed, assume by contradiction that q^i = 0 for some i. The
rst-order condition then implies a b^ q c, which in turn implies a c, thus contradicting
a > c. We conclude that q^i > 0 for every i. Then, the rst-order condition implies
a c b^
q
q^i = 8i = 1; :::; n
b
42.3. NASH EQUILIBRIA AND SADDLE POINTS 1235
1 a c
q^i = 8i = 1; :::; n
1+n b
The best reply formulation (42.12) permits to establish the existence of Nash equilibria
via a xed point argument based on Kakutani's Theorem.
(^
x1 ; :::; x
^n ) 2 ' (^
x1 ; :::; x
^n ) = 1 (^
x 1) n (^
x n)
So, x
^i 2 i (^
x i) for each i = 1; :::; n, as desired.
Example 1801 When ' (x) = x, we have f2 = f1 . This strictly competitive game f is
called zero-sum. It is the polar case that may arise, for example, in military interactions.
This is the case originally studied by von Neumann and Morgenstern in their celebrated
(wartime) 1944 opus. N
1236 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
(' f1 ) (^
x1 ; x
^2 ) (' f1 ) (^
x1 ; x2 ) () f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 )
f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 ) 8x1 2 C1
f1 (^
x1 ; x
^2 ) f1 (^
x1 ; x2 ) 8x2 2 C2
that is,
f1 (^
x1 ; x2 ) f1 (^
x1 ; x
^2 ) f1 (x1 ; x
^2 )
In this case, a pair (^
x1 ; x
^2 ) is a Nash equilibrium if and only if it is a saddle point of f
on C1 C2 . We have thus proved the following mathematically simple, yet conceptually
important, result.
Saddle points are thus Nash equilibria of strictly competitive games. In particular, the
Minimax Theorem is the special case of Nash's Theorem for strictly competitive games. This
further clari es the nature of saddle points as a way to model individual optimization prob-
lems that are \negatively" interdependent, so agents expect the worst from their opponent
and best reply by maxminimizing.
rxi fi (^
x) 2 N m 1 (^
x) 8xi 2 Ci
8
Recall Section 40.2.2.
42.5. PARAMETRIC INTERDEPENDENT OPTIMIZATION 1237
So, the result follows from Proposition 1747 and from Stampacchia's Theorem.
max fi (xi ; x
^ i) = max fi (xi ; x
^ i) (42.16)
xi 2 m 1 xi 2fe1 ;:::;em g
and
;=
6 arg max fi (xi ; x
^ i ) = co arg max fi (xi ; x
^ i) (42.17)
xi 2 m 1 xi 2fe1 ;:::;em g
By (42.17), the set of Nash equilibria is a non-empty set that consists of the n-tuples
(^
x1 ; :::; x
^n ) 2 m 1 m 1 such that
x
^i 2 co arg max fi (xi ; x
^ i)
xi 2fe1 ;:::;em g
max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi
that only involves the versors. In the next section we will discuss the signi cance of all this
for games and decisions under randomization.
fi (^
x; ) fi (xi ; x
^ i; ) 8xi 2 Ci (42.20)
( ) = NE ( )
So, the correspondence associates to each parameter the corresponding set of Nash
equilibria. Its domain S is the collection of all parameters for which Nash equilibria
exist. If such equilibria are unique for all 2 S, then is a Nash equilibrium function.
Example 1806 In the last Cournot oligopoly example, with D 1 (q) = a bq and ci (qi ) =
cqi , it is natural to regard the coe cients as parameters. Thus, i (q1 ; :::; qn ; ) = (a bq) qi
cqi with = (a; b; c). In particular, the Nash equilibrium function is given by
1 a c 1 a c
( )= ; :::; 2 Rn
1+n b 1+n b
N
Example 1807 In the Cournot oligopoly example, assume that D 1 (q) = 10 q and
ci (qi ) = ci qi . In this case, i (q1 ; :::; qn ; i ) = (10 q) qi ci qi with i = ci . Thus, =
(c1 ; :::; cn ). N
Often we are in an intermediate case where the parameter space has the form =
~ 1 n , where ~ is a common parameter space across all agents and, instead, each
i is an individual parameter space that is relevant only for player i. So, = (~; 1 ; :::; n )
and we can write fi : A ~ n
i !R .
Example 1808 In the Cournot oligopoly example, now assume that D 1 (q) = a bq and
ci (qi ) = ci qi . Then i (q1 ; :::; qn ; ~; i ) = (a bq) qi ci qi with ~ = (a; b) and i = ci . So,
= (a; b; c1 ; :::; cn ). N
9
For simplicity, we abstract from any dependence of the choice sets on parameters (otherwise we should
consider a feasibility correspondence ' : C).
42.6. APPLICATIONS 1239
fi x
~i ; x i ; > fi x; = lim fi xn ; n
n!1
lim fi x
~i ; (xn ) i; n = fi x
~i ; x i ;
n!1
42.6 Applications
42.6.1 Randomization in games and decisions
Suppose that an agent has a set S = fs1 ; s2 ; :::; sm g of m pure actions (or strategies),
evaluated with a utility function u : S ! R. Since the set S is nite, it is not convex (unless
it is a singleton), so we cannot use the powerful results { such as Nash's Theorem { that
throughout the book we saw to hold for concave (or convex) functions de ned on convex
sets. A standard way to embed S in a convex set is via randomization, as readers will learn
in game theory courses. Here we just outline the argument to illustrate the results of the
chapter.
Speci cally, by randomizing via some random device { coin tossing, roulette wheels,
and the like { agents can select a mixed (or randomized ) action in which (sk ) is the
probability that the random device assigns to the pure action sk . Denote by (S) the set of
all randomized actions. According to the expected utility criterion, an agent evaluates the
randomized action via the function U : (S) ! R de ned by
m
X
U( )= u (sk ) (sk )
k=1
In words, the randomized action is evaluated by taking the average of the utilities of
the pure actions weighted by their probabilities under .10 Note that each pure action sk
corresponds to the \degenerated" randomized action that assigns it probability 1, i.e.,
10
Weighted averages are discussed in Section 15.10.
1240 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
(sk ) = 1. Via this identi cation, we can regard S as a subset of (S) and thus write, with
an abuse of notation, S (S).
Under randomization, agents aim to select the best randomized action by solving the
optimization problem
max U ( ) sub 2 (S) (42.21)
(sk ) ! xk
In particular, a degenerate , with (sk ) = 1, is identi ed with the versor ek . That is, pure
actions can be identi ed with the versors of the simplex, i.e., with its extreme points. For
instance, if is such that (s2 ) = 1, then it corresponds to the versor e2 .
Summing up, we have the following identi cations and inclusions:
S ! ext m 1
(S) ! m 1
In this way, we have \convexi ed" S by identifying it with a subset of the simplex, which is
a convex set in Rm . In this sense, we have convexi ed S.
Here we have:
S = fs1 ; s2 ; s3 g ! ext 2 = e1 ; e2 ; e3
For instance, if 2 (S) is such that (s1 ) = (s2 ) = 1=4, and (s3 ) = 1=2, then it
corresponds to x = (1=4; 1=4; 1=2). N
By setting uk = u (sk ) for each k, the expected utility function U can be identi ed with
the a ne function V : m 1 ! R de ned by
m
X
V (x) = uk xk = u x
k=1
where u = (u1 ; u2 ; :::; um ) 2 Rm . The optimization problem (42.21) of the agent becomes
It is a nice concave optimization problem in which the objective function V is a ne and the
choice set m 1 is a convex and compact set of Rm . In particular, by Proposition 1804 we
have
max V (x) = max V (x) (42.23)
x2 m 1 x2fe1 ;:::;em g
and
;=
6 arg max V (x) = co arg max V (x) (42.24)
x2 m 1 x2fe1 ;:::;em g
By (42.23), agents' optimal mixed actions are convex combinations of pure actions that,
in turn, are optimal. So, the optimal x
^ is such that
That is, the pure actions that are assigned a strictly positive weight by an optimal mixed
action are, in turn, optimal. By (42.24), in terms of value attainment problem (42.22) is
equivalent to the much simpler problem
Ui (^ i ; ^ i ) Ui ( i ; ^ i ) 8 i 2 (Si )
for each i = 1; 2.
The mixed actions (Si ) can be identi ed with the simplex m 1 , with its extreme points
ei representing the pure actions si . De ne ui : f1; :::; mg f1; :::; mg ! R by ui (k 0 ; k 00 ) =
ui (s1k0 ; s2k00 ). We can then identify Ui with the function Vi : m 1 m 1 ! R de ned by
X
Vi (x1 ; x2 ) = x1k0 x2k00 ui k 0 ; k 00 = x1 Ui x2
(k0 ;k00 )2f1;:::;mg f1;:::;mg
where Ui is the square matrix of order m that has the values ui (k 0 ; k 00 ) as entries.
The function Vi is a ne in xi . A pair (^
x1 ; x
^2 ) 2 m 1 m 1 is a Nash equilibrium if
Vi (^
xi ; x
^ i) Vi (xi ; x
^ i) 8xi 2 m 1
1242 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
max Vi (xi ; x
^ i) = max Vi (xi ; x
^ i) (42.25)
xi 2 m 1 xi 2fe1 ;:::;em g
and
;=
6 arg max Vi (xi ; x
^ i ) = co arg max Vi (xi ; x
^ i) (42.26)
xi 2 m 1 xi 2fe1 ;:::;em g
By (42.26), equilibrium mixed actions are convex combinations of pure actions that, in turn,
best reply to the opponent's mixed action. So, the equilibrium x
^i is such that (42.18) holds,
i.e.,
x^ik > 0 =) ek 2 arg max Vi (xi ; x ^ i)
xi 2 m 1
for each i = 1; 2. That is, the pure actions ek that are assigned a strictly positive weight x
^ik
by an equilibrium mixed action x ^i of an agent are, in turn, best replies to the opponent's
equilibrium mixed action x ^ i . Moreover, by (42.25) in terms of value attainment agent i can
solve the optimum problem
max Vi (xi ; x
^ i) sub xi 2 e1 ; :::; em
xi
x; ^ ) 2 A
A pair (^ Rm
+ is a saddle point of L on A Rm
+ if
L (^
x; ) x; ^ )
L(^ L(x; ^ ) 8x 2 A; 8 0
(i) f (^
x) f (x) + ^ (b g (x)) for every x 2 A;
(ii) g (^
x) b and ^ i (bi gi (^
x)) = 0 for all i = 1; :::; m.
11
Later we will invoke Slater's condition: till then, this setup actually includes also equality constraints (cf.
the discussion at the end of Section 39.1). For this reason we use the letters g and (rather than h and ).
42.6. APPLICATIONS 1243
x; ^ ) 2 A Rm
Proof \Only if". Let (^ + be a saddle point of the Lagrangian function L :
m
A R+ ! R. Since L (^ x; ) L(^ x; ^ ) for all 0, it follows that
( ^ ) (b g (^
x)) 0 8 0 (42.28)
f (^ x; ^ )
x) = L(^ L(x; ^ ) = f (x) + ^ (b g (x)) 8x 2 A (42.29)
^ (b g (^
x)) = 0 (42.30)
x; ^ )
L(^ x; ) = ( ^
L (^ ) (b g (^
x)) = (b g (^
x)) 0
x; ^ )
which implies L(^ L (^
x; ) for all 0. On the other hand, (i) and (42.30) imply
x; ^ ) = f (^
L(^ x) f (x) + ^ (b g (x)) = L(x; ^ ) 8x 2 A
x; ^ )
so that L(^ L(x; ^ ) for all x 2 A. We conclude that (^
x; ^ ) is a saddle point of L on
A R+ .m
Proposition 1812 A vector x ^ 2 A solves problem (42.27) if there exists ^ 0 such that
x; ^ ) is a saddle point of the Lagrangian function L on A Rm
(^ +.
So, the existence of a saddle point for the Lagrangian function implies the existence of
a solution for the underlying optimization problem with inequality constraints. No assump-
tions are made on the functions f and gi . If we make some standard assumptions on them,
the converse becomes true, thus establishing the following remarkable \saddle" version of
Kuhn-Tucker's Theorem.
(i) x
^ 2 A solves problem (42.27);
1244 CHAPTER 42. INTERDEPENDENT OPTIMIZATION
(iii) there exists a vector ^ 0 such that the Kuhn-Tucker conditions hold
x; ^ ) = 0
rx L(^ (42.31)
^ i r L(^x; ^ ) = 0 8i = 1; :::; m (42.32)
i
r L(^x; ^ ) 0 (42.33)
Proof (ii) implies (i) by the last proposition. (i) implies (iii) by what we learned in Section
40.3. (iii) implies (ii) by Theorem 1793. Indeed the Kuhn-Tucker conditions are nothing but
conditions (42.7) and (42.8) for the Lagrangian function (cf. Example 1746). First, note that
condition (42.7) takes the form rx L(^ x; ^ ) = 0 because the set A is open. As to condition
(42.8), here it becomes
r L(^x; ^ ) ( ^) 0 8 0 (42.34)
This condition is equivalent to (42.32) and (42.33). From (42.32) it follows r L(^ x; ^ )
^ = 0, while from (42.33) it follows that r L(^ x; ^ ) 0 for all 0. So, (42.34) holds.
Conversely, by taking = 0 in (42.34), we have r L(^ ^
x; ) ^ 0 and by taking = 2 ^ we
have r L(^ x; ^ ) ^ x; ^ ) ^ = 0. Finally, by taking = ^ + ei in (42.34), we
0, so r L(^
easily get r L(^ x; ^ ) 0. Since r L(^ x; ^ ) = b g (x), from b g (x) and the positivity of
^ it follows that r L(^ x; ) = 0 is equivalent to ^ i r L(^
^ ^ x; ^ ) = 0 for all i = 1; :::; m. In
i
sum, the Kuhn-Tucker conditions are the form that conditions (42.7) and (42.8) take here.
Since the Lagrangian function is easily seen to be a saddle function when f concave and each
gi convex, this prove that properties (ii) and (iii) are equivalent, thus completing the proof.
(i) x
^ solves the primal problem
The primal problem is actually equivalent to the original problem (42.27). Indeed, let us
write problem (42.27) in canonical form as
we have (
1 if x 2
=C
inf L (x; ) =
0 f (x) if x 2 C
because inf 0 (b g (x)) = 1 if x 2
= C and inf 0 (b g (x)) = 0 if x 2 C.
We conclude that
max inf L (x; ) = max f (x)
x2A 0 x2C
and
arg max inf L (x; ) = arg max f (x)
x2A 0 x2C
so the primal and the original problem are equivalent in terms of both solutions and value
attainment. We thus have the following corollary of the last theorem, which relates the
original and dual problems.
Corollary 1814 Let f : A Rn ! R and gi : A Rn ! R be continuously di erentiable
on an open and convex set A, with f concave and each gi convex. If x^ 2 A solves problem
(42.27) and Slater's condition holds, then there exists ^ 0 that solves the dual problem
(42.35), with maxx2C f (x) = min 0 supx2A L (x; ).
Summing up, in concave optimization problems with inequality constraints the solution
^ and the multiplier ^ solve dual optimization problems that are mutually consistent. In
x
particular, multipliers admit a dual optimization interpretation in which they can be viewed
as (optimally) chosen by some ctitious, yet malevolent, opponent (say, nature). An individ-
ual optimization problem is thus solved by embedding it in a ctitious game against nature,
a surprising paranoid twist on multipliers.
Under such game-theoretic interpretation, the Kuhn-Tucker conditions characterize a
saddle point of the Lagrangian function in that they are the form that conditions (42.7) and
(42.8) take for the Lagrangian function. We can write them explicitly as:
x; ^ )
@L(^ x; ^ )
@f (^
= i (bi gi (^
x)) = 0 8i = 1; :::; n
@xi @xi
x; ^ ) ^
@L(^
i =0 8i = 1; :::; m
@ i
x; ^ )
@L(^
= bi gi (x) 0 8i = 1; :::; m
@ i
This is our last angle on Kuhn-Tucker's Theorem, the deepest one.
As the proof clari es, the two problems (42.36) and (42.37) are one the dual of the other,
either providing the multipliers to the other. In particular, solutions exists if either of the
two polyhedra P and is bounded (Corollary 1038).
L (x; ) = c x + (b Ax)
Its dual problem is
min sup L (x; ) sub 0 (42.38)
x 0
We have
sup L (x; ) = sup c x + (b Ax) = b + sup c x Ax
x 0 x 0 x 0
n m
!
X X
= b + sup cj aij i xj = b + sup c AT x
x 0 j=1 i=1 x 0
~ : Rm
In turn, the Lagrangian function L Rn+ ! R of this problem is
+
n m
!
X X
~ ( ; x) =
L b+x c + AT = b+ cj + aij i xj
j=1 i=1
= c x (b Ax) = L (x; )
x; ^ ) is a saddle point of L if and only if ( ^ ; x
So, (^ ~ We conclude
^) is a saddle point of L.
that the linear programs (42.36) and (42.39) are one dual to the other, each providing the
multipliers to the other. By Corollary 1814 the result then follows.
42.6. APPLICATIONS 1247
Since 2 3
1 0 0
6 2 2 1 7
AT = 6
4 2
7
1 1 5
1 2 3
the dual problem is
min 1 +3 2 +2 3
1; 2; 3
sub 1 1; 2 ( 2 1) + 3 2; 2 1 2 3 4, 1 +2 2 +3 3 2
1 0; 2 0, 3 0
In view of the Duality Theorem of Linear Programming, if the two problems satisfy Slater's
condition (do they?) then either problem has a solution if the other does, with
In this nal chapter of this part we study variational inequality problems, a topic started
in the early 1960s with the seminal works of Gaetano Fichera and Guido Stampacchia that
elegantly uni es the analysis of concave optimization problems and of operator equations.
43.1 De nition
We begin with a key notion.1
Find x
^ 2 C such that (43.1) holds
where fy = f y.
fi (^
x) x
^i = 0 8i = 1; :::; n and f (^
x) y
N
1 n
Throughout this section, C denotes a closed and convex set of R . When compact, it is denoted by K.
The term \equalizer" is not standard.
1249
1250 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS
Remarkably, two apparently di erent classes of problems are uni ed by variational in-
equality problems. Results for equalizers thus deliver, as corollaries, results for solutions of
operator equations and of concave optimization problems.
43.2 Properties
We denote by
arg var fy
C
the collection of equalizers, i.e., the solution set of the variational inequality problem (43.3).
We begin with an interesting property of interior solutions.
Lemma 1819 If x
^ 2 arg varC fy is an interior point of C, then f (^
x) = y.
Thus, interior solutions of the variational inequality problem (43.3) are solutions of the
equation f (x) = y. Hence,
arg var fy \ @C
C
consists of the solutions of this variational inequality problem that do not solve equation
f (x) = y.
Proof As x ^ 2 int C, there exists " > 0 such that B" (^ x) C. There exists " > 0 small
enough so that (1 ") x
^ 2 B" (^
x). By taking x = (1 ") x
^ in (43.2) we get (f (^
x) y) x ^=
0. Hence, (43.2) becomes (f (^x) y) x = 0 for all x 2 C. There exists > 0 small enough
^ + ei 2 B" (^
so that x x) for each i = 1; :::; n. By taking x = x ^ + ei for each i = 1; :::; n we
then have f (^
x) y = 0, as desired.
Lemma 1820 If x
^1 ; x
^2 2 arg varC fy , then
(f (^
x1 ) f (^
x2 )) (^
x1 x
^2 ) 0 (43.4)
43.2. PROPERTIES 1251
The operator f is thus inner increasing on this solution set. An immediate consequence
is that arg varC fy is at most a singleton { i.e., solutions are unique if they exist { when f is
strictly inner decreasing. Next we state a deeper result.
This result has, as special cases, the convexity of the solution sets in concave optimization
problems and in operator equations de ned via inner decreasing functions (cf. Proposition
1461). The proof relies on an ingenious lemma proved in Minty (1962).
Proof of Proposition 1821 Suppose that arg varC fy 6= ;. For each x 2 C, set
Ex = f~
x 2 C : (f (x) y) (x x
^) 0g
This set is closed and convex. By Minty's Lemma,
\
arg var fy = Ex
C
x2C
f (xn ) (xn x0 )
kxn k ! +1 =) ! 1 (43.6)
kxn k
An operator is trivially inner coercive when its domain C is bounded. So, this notion
has a bite on unbounded domains, otherwise it automatically holds.
Proof Let fxn g C be such that kxn k ! +1. We want to show that kf (xn )k ! +1. As
f is inner coercive,
jf (xn ) (xn x0 )j
! +1
kxn k
Let " > 0. As kxn k ! +1, there is n large enough so that kx0 k = kxn k < ". By the
Cauchy-Schwarz inequality, for n large enough we then have
Being proper operators, the preimages of inner coercive operators are bounded sets (cf.
Proposition 1637). It is then not surprising that inner coercivity implies the boundedness of
solution sets.
f (^
xn ) (^
xn x0 )
! 1 (43.7)
k^
xn k
The following immediate consequence of Propositions 1821 and 1825 completes our anal-
ysis of solution sets.
43.3 Existence
We now turn to the all-important problem of the existence of equalizers. This problem is
addressed by the following classic existence result proved in the mid-1960s by Felix Browder
and by Philip Hartman and Guido Stampacchia.2
arg var fy 6= ;
C
If, in addition, f is inner decreasing, then arg varC fy is closed and convex.
The proof relies on two interesting lemmas. The rst one considers the important special
case when C is compact (here continuity is enough as inner coercivity is automatically
satis ed, as previously remarked). It is the variational inequality counterpart of Weierstrass'
Theorem, while the theorem is that of Tonelli's Theorem.
(f (^
x) y) (x x
^) 0 8x 2 K (43.8)
PK (f (^
x) + x
^) = (PK g) (^
x) = x
^
(f (^
x) + x
^ PK (f (^
x) x
^)) (PK (f (^
x) + x
^) x) 0 8x 2 K
that is,
(f (^
x) + x
^ x
^) (^
x x) 0 8x 2 K
Thus, f (^
x) (x x ^) 0 for all x 2 K, as desired.
Finally, when y 6= 0 it is enough to consider the continuous function fy : K ! Rn de ned
by fy (x) = f (x) y. By what has been just proved, there exists x ^ 2 K such that
(f (^
x) y) (x x
^) = fy (^
x) (x x
^) 0
(f (^
x) y) (x x
^) 0 8x 2 B" (^
x) \ C (43.9)
then x
^ 2 arg varC fy .
2
We refer to Kinderlehrer and Stampacchia (1980) for references. In the proof we follow them.
1254 CHAPTER 43. VARIATIONAL INEQUALITY PROBLEMS
(f (^
x) y) (x x
^) = n (f (^
x) y) (~
xn x
^) 0
(f (^
x) y) (x x
^) 0 8x 2 C
that is, x
^ 2 arg varC fy .
f (xn ) xn
kxn k ! +1 =) ! 1 (43.10)
kxn k
f (^
x) (x x
^) 0 8x 2 K (43.12)
As 0 2 K, this implies f (^
x) x
^ 0. By (43.11), we then have
k^
xk < kc (43.13)
f (^
x) (x x
^) 0 8x 2 B" (^
x) \ C
f (^
x) (x x
^) 0 8x 2 C (43.14)
Now, let C be any convex and closed set, not necessarily with 0 2 C. Let C0 = C x0 .
Clearly, 0 2 C0 . De ne f0 : C0 ! R by
f0 (x) = f (x + x0 )
43.3. EXISTENCE 1255
f (x) (x x0 ) f0 (z) z
=
kxk kz + x0 k
f (xn ) (xn x0 )
! 1
kxn k
Fix " 2 (0; 1). There is n large enough so that kzn k = kzn + x0 k 1 ". Hence,
and so
f0 (zn ) zn
! 1
kzn k
Thus, f0 is inner coercive since 0 2 C0 . By (43.14), there exists z^ 2 C0 such that f0 (^ z)
(z z^) 0 for all z 2 C0 . Let x ^ = z^ + x0 , so that f (^
x) = f0 (^
z ). For each x 2 C, we then
have
f (^
x) (x x ^) = f (^x x) ((x x0 ) (^ x x0 )) = f0 (^ z ) (z z^) 0
as desired. This completes the proof when y = 0. For the case y 6= 0, it is enough to
observe that the function fy : C ! Rn given by fy = f y is easily seen to inherit both
continuity and inner coercivity from f . This completes the proof that arg varC fy 6= ;. The
inner decreasing part follows from Proposition 1821.
Clearly, a strongly inner decreasing operator is strictly inner decreasing. The converse is
false: the quadratic function on [0; 1) is strictly but not strongly inner increasing.
The next result shows the importance of this strong notion of inner monotonicity by
proving that it provides the di erential characterization of strong concavity. In so doing, it
completes the di erential characterizations of concavity established in Theorem 1473.
Proof We prove the \only if" and leave the converse to the reader. Let f be strongly
concave. So, there exists k > 0 such that g = f + k k k is concave. Hence,
It holds a 0 because g is concave and so, by Theorem 1473, its gradient operator is inner
decreasing. Thus,
and, by setting = 2k < 0, this proves that the gradient operator rf : C ! Rn is strongly
inner decreasing.
Strongly inner decreasing operators are easily seen to be inner coercive. They thus feature
both inner monotonicity and coercivity. For them we can then establish an elegant unique
existence result that comes with a key regularity property of the solution map.
f (^
xy ) y 2 NC (^
x)
This result can be seen as the counterpart here of Theorem 1501, which established the
remarkable optimality properties of strongly concave functions.
(f (^
x) y) (^
xn x
^) 0 and (f (^
xn ) yn ) (^
x x
^n ) 0
By adding up,
(f (^
x) f (^
xn ) + yn y) (^
xn x
^) 0
By the Cauchy-Schwarz inequality,
(f (^
x) f (^
xn )) (^
xn x
^) kyn yk k^
xn x
^k (43.15)
(f (^
xn ) f (^
x)) (^
xn x
^) k^
xn ^ k2
x
Hence,
(f (^
x) f (^
xn )) (^
xn x
^) ( ) k^
xn ^k2
x
By (43.15), we then have
kyn yk k^
xn x
^k ( ) k^
xn ^ k2
x
that is,
kyn yk ( ) k^
xn x
^k
We conclude that
1
k (yn ) (y)k kyn yk
that is, the solution function is Lipschitz continuous with coe cient 1.
43.4. EQUATIONS 1257
43.4 Equations
An important dividend of the previous analysis is the following existence result for solutions
of equations.
f (x) = y (43.16)
has a solution. If, in addition, f is inner decreasing, the solution set f 1 (y) is closed and
convex.
Proof As f is inner decreasing, by Proposition 1461 its level sets are convex. As f is
continuously di erentiable, by Proposition 1641 its level sets are discrete sets. Hence, they
are singletons, i.e., f : U ! Im f is a bijective function. As f is k-times continuously
di erentiable, by Proposition 1644 it is a C k -di eomorphism.
This lemma, along with the Browder-Minty Theorem, has as an immediate consequence
the following global inversion result, a monotone version of the classic Caccioppoli-Hadamard
Theorem.
Proposition 1835 An inner decreasing, inner coercive and k-times di erentiable operator
f : Rn ! Rn , with 1 k 1, is a C k -di eomorphism if and only if it
det Df (x) 6= 0 8x 2 Rn
Integration
1259
Chapter 44
6
y
1
O a b x
0
0 1 2 3 4 5 6
The problem is how to make this natural intuition rigorous. As the gure shows, the
plane region A f[a;b] is a \curved" trapezoid with three straight sides and a curved one.
So, it is not an elementary geometric gure that we know how to compute its area. To
our rescue comes a classic procedure known as the method of exhaustion. It consists in ap-
proximating from above and below the area of a non-trivial geometric gure (such as our
trapezoid) through the areas of simple circumscribed and inscribed elementary geometric
gures, typically polygons (in our case, the so-called \plurirectangles"), whose measure can
be calculated in an elementary way. Assume that the resulting upper and lower approxima-
tions can be made more and more precise via polygons having more and more sides, till in
the limit of \in nitely many sides" they reach a common limit value, we then take such a
1261
1262 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
common value as the sought-after area of the non-trivial geometric gure (in our case, the
area of the trapezoid, so the integral of f on [a; b]).
In the next sections we will make rigorous the heuristic procedure just outlined. The
method of exhaustion originates in Greek mathematics, where it found wonderful applications
in the works of Eudoxus of Cnidus and Archimedes of Syracuse, who with this method were
able to compute or approximate the areas of some highly non-trivial geometric gures.1
44.2 Plurirectangles
We know how to calculate the areas of elementary geometric gures. Among them, the
simplest ones are rectangles, whose area is given by the product of the side lengths. A
simple, but key for our purposes, generalization of a rectangle is the plurirectangle, that is,
the polygon formed by contiguous rectangles. Graphically:
-1
-1 0 1 2 3 4 5 6 7 8 9
Clearly, the area of a plurirectangle is just the sum of the areas of the individual rectangles
that compose it.
Let us go back now to the plane region A f[a;b] under the graph of a positive function f
on [a; b]. It is easy to see how such region can be sandwiched between inscribed plurirectangles
and circumscribed plurirectangles. For example, the following plurirectangle
1
For instance, Example 2095 of Appendix C reports the famous Archimedes approximation of , the area
of the closed unit ball, via the method of exhaustion based on circumscribed and inscribed regular polygons.
44.2. PLURIRECTANGLES 1263
4 y
3.5
2.5
1.5
0.5
0
O a b x
-0.5
-1
0 1 2 3 4 5 6
4 y
3.5
2.5
1.5
0.5
0
O a b x
-0.5
-1
0 1 2 3 4 5 6
Naturally, the area of A f[a;b] is larger than the area of any inscribed plurirectangle and
smaller than the area of any circumscribed plurirectangle. The area of A f[a;b] is, therefore,
in between the areas of the inscribed and circumscribed plurirectangles.
We thus have a rst key observation: the area of A f[a;b] can always be sandwiched
between areas of plurirectangles. This yields simple lower approximations (the areas of
the inscribed plurirectangles) and upper approximations (the areas of the circumscribed
plurirectangles) of the area of A f[a;b] .
A second key observation is that such a sandwich, and consequently the relative ap-
proximations, can be made better and better by considering ner and ner plurirectangles,
obtained by subdividing further and further their bases:
1264 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
4 y 4 y
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
O a b x O a b x
-0.5 -0.5
-1 -1
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Indeed, by subdividing further and further the bases, the area of the inscribed plurirectangles
becomes larger and larger, though it remains always smaller than the area of A f[a;b] . On
the other hand, the area of the circumscribed plurirectangles becomes smaller and smaller,
though it remains always larger than the area of A f[a;b] . In other words, the two slices of
the sandwich that include the region A f[a;b] { i.e., the lower and the upper approximations
{ take values that become closer and closer to each other.
If by considering ner and ner plurirectangles, corresponding to ner and ner sub-
divisions of the bases, in the limit the lower and upper approximations coincide { so, the
two slices of the sandwich merge { such a limit common value can be rightfully taken to be
the area of A f[a;b] . In this way, starting with objects, the plurirectangles, that are sim-
ple to measure we are able to measure via better and better approximations a much more
complicated object such as the area of the plane region A f[a;b] under f . The method of
exhaustion is one of the most powerful ideas in mathematics.
44.3 De nition
We now formalize the method of exhaustion. We rst consider positive and bounded func-
tions f : [a; b] ! R+ . In the next section, we will then consider general bounded functions,
not necessarily positive
Let us construct on them the largest plurirectangle inscribed in the plane region under f .
In particular, for the i-th base, the maximum height mi of an inscribed rectangle with base
[xi 1 ; xi ] is
mi = inf f (x)
x2[xi 1 ;xi ]
Since f is bounded, by the Least Upper Bound Principle this in mum exists and is nite,
that is, mi 2 R. Since the length xi of each base [xi 1 ; xi ] is
xi = xi xi 1
In a similar way, let us construct on the contiguous bases (44.2) determined by the subdivision
, the smallest plurirectangle that circumscribes the plane region under f . For the i-th base,
the minimum height Mi of a circumscribed rectangle with base [xi 1 ; xi ] is
Mi = sup f (x)
x2[xi 1 ;xi ]
Graphically:
4
M
i
0
m
i
-1
-2 x x
i-1 i
-3
-2 -1 0 1 2 3 4
As before, since f is bounded by the Least Upper Bound Principle the supremum exists
and is nite, that is, Mi 2 R. Therefore, the area S (f; ) of the minimal circumscribed
plurirectangle is
Xn
S (f; ) = Mi xi (44.4)
i=1
In particular, the area of the plane region under f lies between these two values. Hence,
I (f; ) gives a lower approximation of this area, while S (f; ) gives an upper approximation
of it. They are called the lower and upper integral sums of f with respect to , respectively.
De nition 1837 Given two subdivisions and 0 of [a; b], we say that 0 re nes if 0.
In other words, the ner subdivision 0 is obtained by adding further points to . For
example, the subdivision
0 1 1 3
= 0; ; ; ; 1
4 2 4
of the unit interval [0; 1] re nes the subdivision = f0; 1=2; 1g.
It is easy to see that if 0 re nes , then
0 0
I (f; ) I f; S f; S (f; ) (44.6)
In other words, a ner subdivision 0 yields a better approximation, both lower and upper, of
the area under f .2 By starting from any subdivision, we can always re ne it, thus improving
(or, at least, not worsening) the approximations given by the corresponding plurirectangles.
The same can be done by starting from any two subdivisions and 0 , not necessarily
nested. Indeed, the subdivision 00 = [ 0 formed by all the points that belong to the two
subdivisions and 0 re nes both of them. In other words, 00 is a common re nement of
and 0 .
1 1 2 0 1 1 3
= 0; ; ; ; 1 and = 0; ; ; ; 1
3 2 3 4 2 4
of [0; 1]. They are not nested: neither re nes 0 nor 0 re nes . However, the subdivision
00 0 1 1 1 2 3
= [ = 0; ; ; ; ; ; 1
4 3 2 3 4
and
0 00 00 0
I f; I f; S f; S f; (44.8)
The common re nement 00 gives a better approximation, both lower and upper, of the area
under f than the original subdivisions and 0 .
All this motivates the next de nition.
2
For sake of brevity, we write \area under f " instead of the more precise expression \area of the plane
region that lies under the graph of f and above the horizontal axis".
44.3. DEFINITION 1267
A rst important question is whether the lower and upper integrals of a bounded function
exist. Fortunately, this is the case, as next we show.
Lemma 1840 If f : [a; b] ! R+ is a bounded function, then both the lower integral and the
upper integral exist and are nite, with
Z b Z b
f (x) dx f (x) dx (44.11)
a a
Proof Since f is positive and bounded, there exists M 0 such that 0 f (x) M for
every x 2 [a; b]. Therefore, for every subdivision = fxi gni=0 we have
and so
0 I (f; ) S (f; ) M (b a) 8 2
By the Least Upper Bound Principle, the supremum in (44.9) and the in mum in (44.10)
Rb Rb
exist and are nite and positive, that is, f (x) dx 2 R+ and a f (x) dx 2 R+ .
a
We still need to prove the inequality (44.11). Let us suppose, by contradiction, that
Z b Z b
f (x) dx f (x) dx = " > 0
a a
By the previous lemma, every bounded function f : [a; b] ! R+ has both the lower
integral and the upper integral, with
Z b Z b
f (x) dx f (x) dx
a a
The area under f lies between these two values. The last inequality is the most re ned
version of (44.6). The lower and upper integrals are, respectively, the best lower and upper
approximations of the area under f that can be obtained through plurirectangles. In partic-
Rb Rb
ular, when f (x) dx = a f (x) dx, the area under f will be assumed to be such common
a
value. This motivates the next fundamental de nition.
For brevity, in the rest of the chapter we will often talk about integrals and integrable
functions, omitting the clause \in the sense of Riemann". Since there are other notions of
integral, it is important however to keep always in mind such quali cation. In addition, note
that the de nition applies only to bounded functions. When in the sequel we will consider
integrable functions, they will be assumed to be bounded (even if not stated explicitly).
Rb
O.R. The notation
Pn a f (x) dx reminds us that P
the integral is obtained as the limit of sums
of the type
R i=1 i xi , in which the symbol is replaced by the integral sign (\a long
letter s") , the length xi by dx, and the values i of the function by f (x). H
Let us illustrate the de nition of the integral with, rst, an example of an integrable
function and, then, of a non-integrable one.
44.3. DEFINITION 1269
Example 1842 Let a 0 and f : [a; b] ! R be de ned by f (x) = x. For any subdivision
fxi gni=0 we have
n
X
I (f; ) = x0 x1 + x1 x2 + + xn 1 xn = xi 1 xi
i=1
n
X
S (f; ) = x1 x1 + x2 x2 + + xn xn = xi xi
i=1
Therefore,
n
X
S (f; ) I (f; ) = (x1 x0 ) x1 + (x2 x1 ) x2 + + (xn xn 1) xn = ( xi )2
i=1
By choosing the subdivision such that x0 = a and xi = xi 1 + (b a) =n for all i 2 f1; :::; ng,
Z b Z b
(b a)2
0 f (x) dx f (x) dx (b a) max xj = !0
a a
j2f1;:::;ng n
Rb Rb
as n ! 1. Thus, af (x) dx = f (x) dx and we conclude that f (x) = x is integrable. N
a
Finally, let us introduce a useful quantity that characterizes the \ ness" of a subdivision
of [a; b].
5 y
2
+
1
O - x
0
-1
-2
-3 -2 -1 0 1 2 3 4
Intuitively, the integral is now the di erence between the area of the positive part and the
area of the negative part. If they have equal value, the integral is zero: this is the case, for
example, of the function f (x) = sin x on the interval [0; 2 ].
To make this idea rigorous, it is useful to decompose a function into its positive and
negative parts.
The function f + is called the positive part of f , while f is called the negative part.
0 x<0 x x<0
f + (x) = and f (x) =
x x 0 0 x 0
Graphically:
3 3
y y
2.5 2.5
2 2
+ -
f f
1.5 1.5
1 1
0.5 0.5
0 0
O x O x
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-4 -2 0 2 4 6 -4 -2 0 2 4 6
and 8 [
>
< 0 x2 [2n ; (2n + 1) ]
f (x) = n2Z
>
:
sin x otherwise
Graphically:
4 4
y y
3 3
-
f
2 2
+
f
1 1
0 0
O x O x
-1 -1
-2 -2
-8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 10
N
1272 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
f (x) = max ff (x) ; 0g + min ff (x) ; 0g = max ff (x) ; 0g ( min ff (x) ; 0g)
+
= f (x) f (x)
f = f+ f (44.14)
of its positive and negative parts.4 Such a decomposition permits to extend in a natural way
the notion of integral to any function, not necessarily positive. Indeed, since both functions
f + and f are positive, the de nition of Riemann integral for positive functions applies to
the areas under each of them. The di erence between their integrals
Z b Z b
f + (x) dx f (x) dx
a a
is the di erence between the areas under f + and f . So, it is the integral which we were
looking for.
All of this motivates the following de nition of Riemann integral for general bounded
functions, not necessarily positive.
This de nition makes rigorous and transparent the idea of considering with di erent sign
the areas of the plane regions bounded by f that lie, respectively, above and below the
horizontal axis.
To this end, we rst note that, given a subdivision = fxi gni=0 , we can still de ne for
any bounded function f : [a; b] ! R the sums I (f; ) and S (f; ) as in (44.3) and (44.4),
that is,
Xn Xn
I (f; ) = mi xi and S (f; ) = Mi xi
i=1 i=1
For general functions, too, the sums I(f; ) and S(f; ) are called the lower and upper
integral sum of f with respect to , respectively. The reader can easily verify that for these
sums the properties (44.5), (44.6), (44.7) and (44.8) continue to hold. In particular,
Moreover, for any bounded function f : [a; b] ! R, positive or not, we can still de ne the
lower and upper integrals
Z b Z b
f (x) dx = sup I (f; ) and f (x) dx = inf S (f; ) (44.15)
2 a 2
a
in perfect analogy with what we did for positive functions. The next result shows that
everything ts together: the notion of Riemann integral obtained through the decomposition
(44.14) into positive and negative part is given by the equality between upper and lower
integrals of (44.15).
Rb
Proposition 1848 A bounded function f : [a; b] ! R is integrable if and only if f (x) dx =
a
Rb
a f (x) dx. In this case,
Z b Z b Z b
f (x) dx = f (x) dx = f (x) dx
a a a
The proof is based on the next three lemmas. The rst one establishes a general property
of the suprema and in ma of sums of functions, the second one has also a theoretical interest
for the theory of integration (as we will explain at the end of the section), while the last one
has a more technical nature.
Lemma 1849 For any two bounded functions g; h : A ! R, we have supx2A (g + h) (x)
supx2A g (x) + supx2A h (x) and inf x2A (g + h) (x) inf x2A g (x) + inf x2A h (x).
Thus, supx2A g (x)+supx2A h (x) is an upper bound for the collection f(g + h) (y)gy2A . Since
the sup is the least upper bound,
The reader can prove, in a similar way, that inf x2A (g + h) (x) inf x2A g (x) + inf x2A h (x).
1274 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Lemma 1850 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b], we have
S (f; ) = S f + ; I f ; (44.16)
and
I (f; ) = I f + ; S f ; (44.17)
Proof Let f : [a; b] ! R be a bounded function and let = fxi gni=0 be a subdivision of [a; b].
For a generic interval [xi 1 ; xi ], set = supx2[xi 1 ;xi ] f (x). Since f is bounded, exists by
the Least Upper Bound Principle. We have
0 =) = sup f + (x)
x2[xi 1 ;xi ]
and
< 0 =) sup f + (x) = 0 and = inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
So,
sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
On the other hand, by Lemma 1849 for any pair of functions g; h : A ! R we have
and so
In sum,
= sup f + (x) inf f (x)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
Lemma 1851 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b],
Putting together (44.20), (44.21) and (44.5) applied to both f + and f , we get the inequality
(44.19).
Rb Rb
Proof of Proposition 1848 We begin with the \if": suppose f (x) dx = af (x) dx. We
a
show that f + and f are integrable. From (44.19) it follows
So
sup I f + ; inf S f ; = inf S f + ; sup I f ;
2 2 2 2
which implies
which implies
It remains to prove the \only if". Suppose that f be integrable, that is, that f + and f
are both integrable. We will show that
By (44.19), we have
Z b Z b
sup I (f; ) f + (x) dx f (x) dx inf S (f; ) (44.24)
2 a a 2
1276 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Since f + and f are both integrable, by the integrability criterion of Proposition 1852 we
have that, for every " > 0, there exist subdivisions and 0 such that5
S f +; 00
I f +; 00
< " and S f ; 00
I f ; 00
<"
as desired.
N.B. The Riemann integral is often de ned directly for general functions, not necessarily
positive, through the lower and upper sums. What is lost in de ning these sums for not nec-
essarily positive functions is the geometric intuition. While for positive functions I(f; ) is
the area of the inscribed plurirectangles and S(f; ) the area of the circumscribed plurirect-
angles, this is no longer true for a generic function that takes positive and negative values,
as (44.16) and (44.17) show. The formulation we adopt with De nition 1847 is suggested
by pedagogical motivations and is equivalent to the usual formulation, as Proposition 1848
shows. O
Proof \If". Suppose that, for every " > 0, there exists a subdivision such that S (f; )
I (f; ) < ". Then
Z b Z b
0 f (x) dx f (x) dx S (f; ) I (f; ) < "
a a
5
The integrability criterion of Proposition 1852 for positive functions (all we need here) can be proved
directly via De nition 1841. Thus, there is no circularity in using in the current proof such criterion.
44.4. INTEGRABILITY CRITERIA 1277
Rb Rb
and therefore, since " > 0 is arbitrary, we have a f (x) dx = f (x) dx.
a
Rb Rb
\Only if". Suppose that a f (x) dx = f (x) dx. By Proposition 127, for every " > 0
a
Rb
there exist a subdivision 0 such that S (f; 0 ) a f (x) dx < " and a subdivision
00 such
Rb
that f (x) dx I (f; 00 ) < ". Let be a subdivision that re nes both 0 and 00 . Thanks
a
to (44.6), we have I (f; 00 ) I (f; ) S (f; ) S (f; 0 ), so
Z b Z b
0 00
S (f; ) I (f; ) S f; I f; < f (x) dx + " f (x) dx + " = 2"
a a
as desired.
The next result shows that, if two functions are equal except at a nite number of points,
then their integrals (if they exist) are equal. It is an important property of stability of the
integral, whose value does not change if we modify a function f : [a; b] ! R at a nite
number of points.
Proof It is su cient to prove the statement for the case in which g di ers from f at only
one point x^ 2 [a; b]. The case of n points is then proved by ( nite) induction by adding one
point at a time.
Suppose, therefore, that f (^ x) 6= g(^ x) with x^ 2 [a; b]. Without loss of generality, suppose
that f (^
x) > g(^x). Setting k = f (^ x) g(^ x) > 0, let h : [a; b] ! R be the function h = f g.
Then
0 x 6= x ^
h(x) =
k x=x ^
Rb
Let us prove that h is integrable and that a h(x)dx = 0. Let " > 0. Consider an arbitrary
subdivision = fx0 ; x1 ; :::; xn g of [a; b] such that j j < "=2k. Since h(x) = 0 for every x 6= x^,
we have
I(h; ) = 0
Next, we turn to S(h; ). Since x ^ 2 [a; b], there are two possibilities: (i) x
^ is not an interme-
diate point of the subdivision, that is, we have either x ^ 2 fx0 ; xn g or x
^ 2 (xi 1 ; xi ) for some
i = 1; :::; n; (ii) x
^ is a point of the subdivision, with the exclusion of the extremes, that is,
x
^ = xi for some i = 1; :::; n 1. In case (i), with either x ^ 2 fx0 ; xn g or x ^ 2 (xi 1 ; xi ) for
some i = 1; :::; n, we have 6
" "
S(h; ) = k xi < k = <"
2k 2
In case (ii), with x
^ = xi for some i = 1; :::; n 1, we have
"
S(h; ) = k ( xi + xi+1 ) < 2k ="
2k
6
If x
^ = x0 , we have S(h; ) = k x1 , while if x
^ = xn , we have S(h; ) = k xn . In both cases, we have
S(h; ) < ".
1278 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Therefore, in both cases (i) and (ii) we have S(h; ) I(h; ) < ". Since " > 0 is arbitrary,
by Proposition 1852 h is integrable on [a; b]. Hence,
Z b
h(x)dx = sup I(h; ) = inf S(h; ) (44.25)
a 2 2
sup I(h; ) = 0
2
By applying the linearity of the integral (Theorem 1862), the di erence g = f h is integrable
because f and h are both integrable. In particular, again by the linearity of the integral, we
have Z Z Z Z
b b b b
g(x)dx = f (x)dx h(x)dx = f (x)dx
a a a a
as desired.
O.R. In view of Proposition 1853, even if a function f is not de ned at a nite number of
points of the interval [a; b], we can still talk about its integral. Indeed, it can be regarded as
the integral of any function g : [a; b] ! R that coincides with f on the domain of f . With
this, the integrals of f on the intervals [a; b], (a; b], [a; b) and (a; b) always coincide, thus
Z b
making unambiguous the notation f (x) dx. H
a
Proof Let " > 0. Since g is continuous on [m; M ], by Theorem 603 the function g is
uniformly continuous on [m; M ], that is, there exists " > 0 such that
and therefore
n
" #
X
S (g f; ) I (g f; ) = sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=0
" #
X
= sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
" #
X
+ sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
=
X X
" xi + 2 max jg (y)j xi < " (b a) + 2 max jg (y)j "
y2[m;M ] y2[m;M ]
i2I i2I
=
Since the function g (x) = jxj is continuous, a simple but important consequence of
Proposition 1854 is that the integrability of a bounded function f : [a; b] ! R implies the
integrability of the function absolute value jf j : [a; b] ! R. Note that the converse is false:
the function 8
< 1 if x 2 Q \ [0; 1]
f (x) = (44.27)
:
1 if x 2 (R Q) \ [0; 1]
is a simple modi cation of the Dirichlet function and hence it is not integrable, contrary to
its absolute value jf j which is the constant function equal to 1 on the interval [0; 1].
Finally, observe that the rst integrability criterion of this section, Proposition 1852,
opens an interesting perspective on the Riemann integral. Given any subdivision = fxi gni=0 ,
by de nition we have mi f (x0i ) Mi for every x0i 2 [xi 1 ; xi ], so that
n
X
I (f; ) f x0i xi S (f; )
i=1
7
Here i 2
= I stands for i 2 f1; 2; : : : ; ng I.
1280 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Hence, since
Z b
I(f; ) f (x) dx S(f; )
a
we have
n
X Z b
I (f; ) S (f; ) f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
which is equivalent to
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
By Proposition 1852, for every " > 0 there exists a su ciently ne subdivision for which
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; ) < "
i=1 a
Rb
That is, the Riemann integral a f (x) dx can P be seen as a limit, for smaller and smaller
meshes j j of the subdivisions , of the sums ni=1 f (x0i ) xi .8 It is an equivalent way to see
Riemann integral, which is indeed sometimes de ned directly in these terms through (44.28).
Even if evocative, the limit limj j!0 is not among the notions of limit, for sequences or
functions, discussed in the book (indeed, it requires a more subtle de nition, which coda
readers can see in Section 47.9.3). Moreover, the de nition we have adopted is particularly
well suited for generalizations of the Riemann integral, as the reader will see in more advanced
courses on integration.
De nition 1855 A function f : [a; b] ! R is called step function if there exist a subdivision
= fxi gni=0 and a set fci gni=1 of constants such that
n
X1
f (x) = ci 1[xi 1 ;xi )
(x) + cn 1[xn 1 ;xn ]
(x) (44.30)
i=1
and
n
X
g (x) = c1 1[x0 ;x1 ] (x) + ci 1(xi 1 ;xi ]
(x) (44.31)
i=2
are step functions where, for every set A in R, we denote by 1A : R ! R the indicator
function
(
1 if x 2 A
1A (x) = (44.32)
0 if x 2
=A
The two following gures give, for n = 4, examples of functions f and g described by (44.30)
and (44.31). Note that f and g are, respectively, continuous from the right and from the
left, that is, limx!^x+ f (x) = f (^
x) and limx!^x g (x) = g (^
x) for all x
^ 2 [a; b].
7 7
6 6
5 f(x) 5 g(x)
4 c 4 c
4 4
3 c 3 c
2 2
2 c 2 c
3 3
1 c 1 c
1 1
0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9
On the intervals
[x0 ; x1 ) [ (x1 ; x2 ) [ (x2 ; x3 ) [ (x3 ; x4 ]
4 c
4
3 c
2
2 c
3
1 c
1
0
x x x x x
0 1 2 3 4
-1
-1 0 1 2 3 4 5 6 7 8 9
determined by the subdivision fxi g4i=0 and by the constants fci g4i=1 . Nevertheless, at the
points x1 < x2 < x3 the functions f and g di er and it is easy to verify that on the entire
interval [x0 ; x4 ] they do not generate this plurirectangle, as the next gure shows. Indeed,
the dashed segment at x2 is not under f and the dashed segments at x1 and x3 are not under
g.
7 7
6 6
5 f(x) 5 g(x)
4 c 4 c
4 4
3 c 3 c
2 2
2 c 2 c
3 3
1 c 1 c
1 1
0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 -1
8 90 1 2 3 4 5 6 7
But, thanks to Proposition 1853, such a discrepancy at a nite number of points is irrelevant
for the integral. The next result shows that the area under the step functions f and g is,
actually, equal to that of the corresponding plurirectangle (independently of the values of
the function at the points x1 < x2 < x3 ).
Proposition 1856 A step function f : [a; b] ! R, determined by the subdivision fxi gni=0
and by the constants fci gni=1 according to (44.29), is integrable, with
Z b n
X
f (x) dx = ci xi (44.33)
a i=1
44.5. CLASSES OF INTEGRABLE FUNCTIONS 1283
All the step functions that are determined by a subdivision fxi gni=0 and a set of constants
fci gni=1 according to (44.29), share therefore the same integral (44.33). In particular, this
holds for the step functions (44.30) and (44.31).
In the special case of a constant function, say f (x) = c for all x 2 [a; b], formula (44.33)
reduces to Z b
c dx = c (b a)
a
where on the left hand side c denotes the constant function f .9
Rb Rb
Proof Since f is bounded, an immediate extension of Lemma 1840 shows that f (x) dx; a f (x) dx 2
a
R. Let m = inf x2[a;b] f (x) and M = supx2[a;b] f (x). Fix " > 0 su ciently small, and consider
the subdivision " given by
x0 < x0 + " < x1 " < x1 + " < x2 " < x2 + " < < xn 1 " < xn 1 + " < xn " < xn
We have
I (f; ") =" inf f (x) + c1 (x1 " x0 ") + 2" inf f (x) + c2 (x2 " x1 ")
x2[x0 ;x0 +"] x2[x1 ";x1 +"]
+ 2" inf f (x) + + 2" inf f (x) + cn (xn " xn 1 ") + " inf f (x)
x2[x2 ";x2 +"] x2[xn 1 ";xn 1 +"] x2[xn ";xn ]
n
X n
X1
=" inf f (x) + inf f (x) + ci ( xi 2") + 2" inf f (x)
x2[x0 ;x0 +"] x2[xn ";xn ] x2[xi ";xi +"]
i=1 i=1
n
X n
X
2"m + ci xi 2"M n + 2"m (n 1) = ci xi 2"n (M m)
i=1 i=1
Since " > 0 is arbitrary, Proposition 1852 shows that f is integrable. Moreover, since
Z b
I (f; " ) f (x) dx S (f; " )
a
we have Z
n
X b n
X
ci xi (K 1) " f (x) dx ci xi + (K 1) "
i=1 a i=1
Rb Pn
which, given the arbitrariness of " > 0, guarantees that a f (x) dx = i=1 ci xi .
9
So, in this formula c denotes two altogether di erent notions, a constant function on the left hand side
and a real number on the right hand side. It is a standard abuse of notation that makes the formula easier
to understand.
1284 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
and
Z b Z b
f (x) dx = inf h (x) dx : h f and h 2 S ([a; b]) (44.35)
a a
That is, if and only if the lower approximation given by the integrals of step functions smaller
than f coincides, at the limit, with the upper approximation given by the integrals of step
functions larger than f . In this case the method of exhaustion assumes a more analytic and
less geometric aspect with the approximation by elementary polygons (the plurirectangles)
replaced by the one given by elementary functions (the step functions).10
This suggests a di erent approach to the Riemann integral, more analytic and less geo-
metric. In such an approach, we rst de ne the integrals of step functions (that is, the area
under them), which can be determined on the basis of elementary geometric considerations
based on plurirectangles. We then use these \elementary" integrals to suitably approximate
the areas under more complicated functions. In particular, we de ne the lower integral of
a bounded function f : [a; b] ! R as the best approximation \from below" obtained by
means of step functions h f , and, analogously, the upper integral of a bounded function
f : [a; b] ! R as the best approximation \from above" obtained by means of step functions
h f.
Thanks to (44.34) and (44.35), this more analytic interpretation of the method of ex-
haustion is equivalent to the geometric one previously adopted. The analytic approach is
quite fruitful, as readers will learn in more advanced courses.
Proof Since f is continuous on [a; b], by Weierstrass' Theorem f is bounded. Let " > 0. By
Theorem 603, f is uniformly continuous, that is, there exists " > 0 such that
Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". By (44.36), for every i =
1; 2; : : : ; n we therefore have
where max and min exist thanks to Weierstrass' Theorem. It follows that
n
X n
X
S (f; ) I (f; ) = max f (x) xi min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
n
X
= max f (x) min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
<" xi = " (b a)
i=1
Because of the stability of the integral seen in Proposition 1853, we have the following
immediate generalization of the last result: every function f : [a; b] ! R that has at most
a nite number of removable discontinuities is integrable. Indeed, by recalling (13.9) of
Chapter 13, if S = fxi gni=1 is the set of points where f has removable discontinuities, the
function
f (x) if x 2
=S
f~ (x) =
limy!x f (y) if x 2 S
is continuous (so, integrable) and is equal to f except at the points of S.
More is true: the hypothesis that the discontinuities are removable is actually super uous,
and we can actually allow for countably many points of discontinuity.
Theorem 1859 Every bounded function f : [a; b] ! R with at most countably many discon-
tinuities is integrable.
is continuous at all the points of [0; 1], except at the two extreme points 0 and 1. By Theorem
1859, the function f is integrable.
(ii) Consider the countable set
1
E= :n 1 [0; 1]
n
The function f : [0; 1] ! R de ned by
x2 if x 2
=E
f (x) =
0 if x 2 E
is continuous at all the points of [0; 1], except at the points of E.11 Since E is a countable
set, by Theorem 1859 the function f is integrable. N
The result follows immediately from Theorem 1859 because monotone functions have at
most countably many points of discontinuity (Proposition 564). Next we give, however, a
simple direct proof of the result.
Proof Let us suppose that f is increasing (the argument for f decreasing is analogous).
First, observe that f is obviously bounded. Now, let " > 0. Let = fxi gni=0 be a subdivision
of [a; b] such that j j < ". We have
inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
and therefore
n
X n
X
S (f; ) I (f; ) = sup f (x) xi inf f (x) xi
x2[xi 1 ;xi ]
i=1 x2[xi 1 ;xi ] i=1
Xn n
X n
X
= f (xi ) xi f (xi 1 ) xi = (f (xi ) f (xi 1 )) xi
i=1 i=1 i=1
n
X
j j (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1
Theorem 1862 Let f; g : [a; b] ! R be two integrable functions. Then, for every ; 2R
the function f + g : [a; b] ! R is integrable, with
Z b Z b Z b
( f + g) (x) dx = f (x) dx + g (x) dx (44.37)
a a a
Proof The proof is divided into two parts. First we will prove homogeneity, that is,
Z b Z b
f (x) dx = f (x) dx 8 2R (44.38)
a a
whenever f and g are integrable. Together, relations (44.38) and (44.39) are equivalent to
(44.37).
(i) Homogeneity. First, recall that an integrable function is, by de nition, bounded.
Thus, f is bounded for all 2 R. Let = fxi gni=0 be a subdivision of [a; b]. If 0 we
have I ( f; ) = I (f; ) and S ( f; ) = S (f; ). Therefore, f is integrable, with
Z b Z b
f (x) dx = f (x) dx (44.40)
a a
Z b Z b
( f ) (x) dx = sup I ( f; ) = sup S (f; ) = inf S (f; ) = f (x) dx
a 2 2 2 a
1288 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Now let < 0. We have f = ( )( f ) with > 0. Then, by applying (44.40) we obtain
Z b Z b Z b
( f ) (x) dx = ( ) ( f ) (x) dx = ( ) ( f ) (x) dx
a a a
Z b Z b
=( ) f (x) dx = f (x) dx
a a
Therefore,
Z b Z b
f (x) dx = f (x) dx 8 2R (44.41)
a a
that is, (44.38).
(ii) Additivity. First, observe that the sum f + g is bounded since integrable functions
are, by de nition, bounded. Let us prove (44.39). Let " > 0. Since f and g are integrable,
by Proposition 1852 there exists a subdivision of [a; b] such that S (f; ) I (f; ) < " and
there exists 0 such that S (g; 0 ) I (g; 0 ) < ". Let 00 be a subdivision of [a; b] that re nes
both and 0 . Thanks to (44.6), we have S (f; 00 ) I (f; 00 ) < " and S (g; 00 ) I (g; 00 ) < ".
Moreover, by applying the inequalities of Lemma 1849,
00 00 00 00 00 00
I f; + I g; I f + g; S f + g; S f; + S g; (44.42)
and therefore
00 00 00 00 00 00
S f + g; I f + g; S f; I f; + S g; I g; < 2"
that is,
Z b Z b
I (f; ) f (x) dx + I (g; ) g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) f (x) dx + S (g; ) g (x) dx
a a
44.6. PROPERTIES OF THE INTEGRAL 1289
Since f and g are integrable, given any " > 0 we can nd a subdivision " such that, for
h = f; g, we have
Z b Z b
" " " "
I (h; ) h (x) dx > and S (h; ) h (x) dx <
a 2 a 2
Therefore,
Z b Z b Z b
"< (f + g)(x)dx f (x) dx + g (x) dx <"
a a a
An important consequence of the linearity of the integral is that the product of two
integrable functions is integrable.
Corollary 1863 If f; g : [a; b] ! R are integrable functions, then their product f g : [a; b] !
R is integrable.
O.R. Thanks to the linearity of the integral, knowing the integrals of f and g allows one
to calculate the integral of f + g. It is not so for the product or for the composition of
integrable functions: the integrability of f guarantees the integrability of f 2 , but knowing
the value of
R bthe integral ofR f does not help in the calculation of the integral of f 2 { indeed,
b
in general a f (x) dx 6= ( a f (x) dx)2 . More generally, knowing that g f is integrable does
2
not give any useful indication for the computation of the integral of the composite function.
H
Finally, the linearity of the integral implies that it is possible to freely subdivide the
domain of integration [a; b] into subintervals.
Corollary 1864 Let f : [a; b] ! R be a bounded and integrable function. If a < c < b, then
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (44.44)
a a c
1290 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Vice versa, if f1 : [a; c] ! R and f2 : [c; b] ! R are bounded and integrable, then the function
f : [a; b] ! R de ned by (
f1 (x) if x 2 [a; c]
f (x) =
f2 (x) if x 2 (c; b]
is also bounded and integrable, with
Z b Z c Z b
f (x) dx = f1 (x) dx + f2 (x) dx
a a c
Proof Before proving the statement, we make an observation. We rst consider a generic
bounded and integrable function f : [a; b] ! R. We show that:12
Z b Z c
1[a;c] f (x) dx = f (x) dx (44.45)
a a
and Z Z
b b
1(c;b] f (x) dx = f (x) dx (44.46)
a c
Note that the left-hand side of (44.45) and (44.46) is well-de ned. Indeed, since indicator
functions are step functions, they are bounded and integrable. By Corollary 1863 and since
f is bounded and integrable, 1[a;c] f and 1(c;b] f are bounded and integrable.
We proceed by proving (44.45). By Proposition 1853, we have that
Z b Z b
1[a;c] f (x) dx = 1[a;c) f (x) dx
a a
for 1[a;c] f is equal to 1[a;c) f except at most at c. Next, consider f~ : [a; c] ! R de ned by
f (x) if x 2 [a; c)
f~ (x) =
0 if x = c
By Proposition 1853, if f~ and fj[a;c] are bounded and integrable, we have that
Z c Z c
fj[a;c] (x) dx = f~ (x) dx
a a
for fj[a;c] is equal to f~ except at most at c. Thus, in order to prove (44.45), it is enough to
show that f~ is integrable (it is clearly bounded) and
Z b Z c
1[a;c) f (x) dx = f~ (x) dx (44.47)
a a
12
Rc
The careful reader might have noticed that the symbol a f (x) dx implicitly suggests that the domain
of f is [a; c]. Nonetheless, in our
R c current case, f has been de ned over the larger interval [a; b]. Thus, as
it
Rc should appear natural, with a
f (x) dx, we refer to the integral Rof the restriction of f to [a; c], that is,
c
f
a Rj[a;c]
(x)dx. For notational convenience, it is usual to just write a f (x) dx. A similar observation holds
b
for c f (x) dx.
44.6. PROPERTIES OF THE INTEGRAL 1291
Let " > 0. Since 1[a;c) f is bounded and integrable, by Proposition 1852, there exists a
subdivision of [a; b] such that
S(1[a;c) f; ) I(1[a;c) f; ) < "
Let 0 = fxi gni=0 be a re nement of that has c as point of subdivision, say c = xj . Then
we have
S(1[a;c) f; 0 ) I(1[a;c) f; 0 ) < "
Let 00 = 0 \ [a; c]. In other words, 00 = fx0 ; x1 ; :::xj g is the restriction of the subdivision
0 to the interval [a; c]. Since 1 ~
[a;c) f (x) = f (x) for all x 2 [a; c], this implies that for each
i 2 f1; :::; jg
mi = inf 1[a;c) f (x) = inf f~ (x)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
and
Mi = sup 1[a;c) f (x) = sup f~ (x)
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
and
n
X j
X
S(1[a;c) f; 0
)= Mi xi = Mi xi = S(f~; 00
) (44.49)
i=1 i=1
Therefore,
S(f~; 00 ) I(f~; 00 ) < "
By Proposition 1852, it follows that f~ : [a; c] ! R is integrable and bounded. Moreover,
from (44.48) and (44.49) we deduce that
Z b Z c
1[a;c) f (x) dx = f~(x)dx
a a
proving (44.47) which in turn implies (44.45). A similar argument yields (44.46).
We can now prove the rst part of the statement. Observe that
f = 1[a;c] f + 1(c;b] f
Thus, by (44.45) and (44.46) and since 1[a;c] f and 1(c;b] f are bounded and integrable, the
linearity of the integral implies that
Z b Z b Z b
f (x) dx = 1[a;c] f + 1(c;b] f (x) dx = 1[a;c] f (x) + 1(c;b] f (x) dx
a a a
Z b Z b Z c Z b
= 1[a;c] f (x) dx + 1(c;b] f (x) dx = f (x) dx + f (x) dx
a a a c
1292 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
proving (44.44).
Let us prove the second part. First, de ne f~1 ; f~2 : [a; b] ! R by
Clearly, f~1 and f~2 are bounded. We next prove that f~1 is integrable. A similar argument
yields the integrability of f~2 . Consider f^1 : [a; b] ! R de ned by
f~1 (x) if x 6= c
f^1 (x) =
0 if x = c
By Proposition 1853, it is enough to show that f^1 is integrable. Let " > 0. Since f1 is
integrable over [a; c], there exists a subdivision 0 = fxi gni=0 of [a; c] such that
0 0
S(f1 ; ) I(f1 ; )<"
Consider the subdivision 00 = fyi gn+1i=0 of [a; b] where yi = xi for all i 2 f0; :::; ng and
^
yn+1 = b. Since f1 (x) = 0 for all x 2 [c; b], note that
S(f^1 ; 00
) = S(f1 ; 0
) and I(f^1 ; 00
) = I(f1 ; 0
)
yielding that S(f^1 ; 00 ) I(f^1 ; 00 ) < ". By Proposition 1852, f^1 is integrable and so is f~1 .
Observe that f = 1[a;c] f~1 + 1(c;b] f~2 . Since f~1 , f~2 , and the indicator functions are bounded
and integrable, so is f . By the linearity of the integral and (44.45) as well as (44.46), we
have
Z b Z b Z b
f (x) dx = ~
1[a;c] f1 (x) dx + 1(c;b] f~2 (x) dx
a a a
Z c Z b Z c Z b
= f~1 (x) dx + f~2 (x) dx = f1 (x)dx + f2 (x)dx
a c a c
as desired.
The next property of monotonicity of the integral shows that to larger functions there
correspond larger integrals. The writing f g means f (x) g (x) for every x 2 [a; b], i.e.,
the function f is pointwise smaller than the function g.
Rb
Theorem 1865 Let f; g : [a; b] ! R be two integrable functions. If f g, then a f (x) dx
Rb
a g (x) dx.
Proof Since f g, it follows that I (f; ) I (g; ) for all 2 . In turn, as f and g are
Rb Rb
integrable, this implies a f (x) dx a g (x) dx.
From the monotonicity of the integral we obtain an important inequality between \ab-
solute values of integrals" and \integrals of absolute values", the latter being larger. In
reading the result keep in mind that, as observed after Proposition 1854, the integrability of
jf j follows from that of f .
44.6. PROPERTIES OF THE INTEGRAL 1293
44.6.2 Panini
The monotonicity of the integral allows us to establish an interesting sandwich property for
integrals.
Proof We have
m f (x) M 8x 2 [a; b]
Hence, by the monotonicity of the integral,
Z b Z b Z b
mdx f (x) dx M dx
a a a
Rb Rb
Clearly, a mdx = m (b a) and a M dx = M (b a). This shows that (44.51) holds.
In turn, the previous sandwich property leads to the classic Integral Mean Value Theorem.
Theorem 1868 (Integral Mean Value) Let f : [a; b] ! R be a bounded and integrable
function. Then, setting m = inf [a;b] f (x) and M = sup[a;b] f (x), there exists a scalar 2
[m; M ] such that
Z b
f (x) dx = (b a) (44.52)
a
In particular, if f is continuous, there exists c 2 [a; b] such that f (c) = , that is,
Z b
f (x) dx = f (c) (b a)
a
For this reason, is called the mean value (of the images) of f : the value of the integral
does not change if we replace each value f (x) by the constant value .
O.R. The Integral Mean Value Theorem is quite intuitive: there exists a rectangle with base
[a; b] and height , with area equal to the one under f on [a; b]:
25
y
20
15
10
0
O a b x
-2 0 2 4 6 8
If, moreover, the function f is continuous, the height of such a rectangle coincides with
the image of some point c in [a; b], that is, f (c) = . H
One may expect that, among the positive integrable functions, only zero functions can
have zero integrals. In general, however, this is not the case: the positive function f : [0; 1] !
R given by (
0 if x 2 (0; 1)
f (x) = 1
2 if x 2 f0; 1g
is
R 1 integrable by Theorem 1859 (cf. also Proposition 1853), yet it is easy to see that
0 f (x) dx = 0. This function is not continuous. As a consequence of the Integral Mean
Value Theorem, next we show that under continuity only zero functions can, indeed, have
zero integrals.
Rb
Corollary 1869 Let f : [a; b] ! R be a continuous and positive function. If a f (x) dx = 0,
then f = 0.
44.6. PROPERTIES OF THE INTEGRAL 1295
In this case, it is under continuity that the behavior of the Riemann integral best accords
with intuition.
Rb
Proof Let a f (x) dx = 0. We want to show that f = 0. Suppose, by contradiction, that
there exists x0 2 [a; b] such that f (x0 ) 6= 0. Since f 0, we have f (x0 ) > 0. Assume rst
that x0 2 (a; b). Since f is continuous, there exists " > 0 small enough so that f (x) > 0 for
all x 2 [x0 "; x0 + "] [a; b] (cf. the theorem on the permanence of sign). The function f
is continuous on [x0 "; x0 + "]. By (44.44), we have
Z x0 " Z x0 +" Z b Z b
f (x) dx + f (x) dx + f (x) dx = f (x) dx = 0
a x0 " x0 +" a
R x0 +"
So, f (x) dx = 0 because the three addends are positive since f
x0 " 0. By the Integral
R x0 +"
Mean Value Theorem, there exists c 2 [x0 "; x0 + "] such that 0 = x0 " f (x) dx = f (c) 2".
Thus, f (c) = 0 and this contradicts the strict positivity of f on [x0 "; x0 + "]. A similar
contradiction is easily obtained if x0 2 fa; bg. We conclude that f = 0.
We close with a nice dividend of the properties of integrals proved in this section, an
integral version of Stone-Weierstrass' Theorem (with the sandwich avor of Corollary 608)
that shows how the integral of any function can be approximated, arbitrarily well, by the
integral of a polynomial.
Proposition 1870 Let f : [a; b] ! R be a bounded function. If f is integrable, then for each
" > 0 there exist two polynomials p; P : [a; b] ! R such that p f P and
Z b
(P (x) p (x)) dx "
a
N.B. Given a function f : [a; b] ! R, until now we have considered the Riemann integral of
Rb
f from a to b, that is, a f (x)dx. Sometimes it is useful to consider the integral of f from b
1296 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Ra Ra
to a, that is, b f (x)dx,13 as well as the integral of f from a to a, that is, a f (x)dx. What
do we mean by such expressions? By convention we de ne, for a < b,
Z a Z b
f (x)dx = f (x)dx (44.53)
b a
and Z a
f (x)dx = 0 (44.54)
a
Rb
Thanks to these conventions, it is no longer essential that in a we have a < b: in the case in
which a b the integral assumes the meaning given to it by (44.53) and (44.54). Moreover, it
Rb
is possible to prove that the properties established for the integral a f (x)dx extend, mutatis
mutandis, also in the case a b. O
P 0 (x) = f (x) 8x 2 I
We denote a primitive of f by Z
f (x) dx
13
This happens, for example, if f is integrable on an interval [a; b] and we take two generic points x; y 2 [a; b],
without specifying if x < y or x y, and then consider the integral of f between x and y.
44.7. INTEGRAL CALCULUS 1297
In other words, moving from the function f to its primitive P can be seen as the inverse
procedure with respect to moving from P to f through di erentiation. In this sense, the
primitive function is the inverse of the derivative function (indeed, sometimes it is called
antiderivative).
Let us provide a couple of examples. Here it is important to keep in mind that, as Example
1876 will show, a function might not have a primitive, so the search of the primitive of a
function might be vain. In any case, by Corollary 1317 a necessary condition for a function
f to have a primitive is that it has no removable or jump discontinuities.
Example 1872 Let f : [0; 1] ! R be given by f (x) = x. The function P : [0; 1] ! R given
by P (x) = x2 =2 is a primitive of f . Indeed, P 0 (x) = 2x=2 = x. N
N.B. If I1 and I2 are two nested intervals, with I1 I2 , then a primitive of f on I2 is also a
primitive on I1 . For example, if we consider the restriction of f (x) = x= 1 + x2 on [0; 1],
that is, the function f~ : [0; 1] ! R given by f~ (x) = x= 1 + x2 , then a primitive on [0; 1]
remains P (x) = 2 1 log 1 + x2 . O
Proof The \if" is obvious, given our previous discussion. Let us prove the \only if". Let
I = [a; b] and let P1 ; P2 : [a; b] ! R be two primitive functions of f on [a; b]. Since P10 (x) =
f (x) = P20 (x) for every x 2 [a; b], by Corollary 1313, we have P2 = P1 + k.
Let now I be an open and bounded interval (a; b). Let " > 0 be su ciently small so that
a + " < b ". We have
1 h
[ " "i
(a; b) = a + ;b
n n
n=1
By what has just been proved, for every n 1 there exists a constant kn 2 R such that
h " "i
P2 (x) = P1 (x) + kn 8x 2 a + ; b (44.55)
n n
1298 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Let x0 2 (a; b) be such that a + " < x0 < b ", so that x0 2 [a + "=n; b "=n] for every
n 1. From (44.55) it follows that P2 (x0 ) = P1 (x0 ) + kn for every n 1. Therefore,
kn = P2 (x0 ) P1 (x0 ) for every n 1, that is, k1 = k2 = = kn . There exists, therefore,
k 2 R such that P2 (x) = P1 (x) + k for every x 2 (a; b).
In a similar way one can show the result when I is a half-open and bounded [1 interval
(a; b] or [a; b). If I = R, we proceed as in the case (a; b), observing that R = [ n; n].
n=1
A similar argument, which we leave to the reader, holds also for unbounded intervals.
This proposition is another important application of the Mean Value Theorem (of di er-
ential calculus). Thanks to it, once a primitive P of a function f is identi ed, we can write
the family of all the primitives as fP + kgk2R .
Example 1875 Let us go back to Examples 1872 and 1873. For the function f : [0; 1] ! R
given by f (x) = x, we have Z
x2
f (x) dx = +k
2
For the function f : R ! R given by f (x) = x= 1 + x2 we have
Z
1
f (x) dx = log 1 + x2 + k
2
N
We close the section by showing that not all the functions admit a primitive, so an
inde nite integral.
Example 1876 The signum function sgn : R ! R given by (13.17), i.e.,
8
>
> 1 if x > 0
<
sgn (x) = 0 if x = 0
>
>
:
1 if x < 0
does not admit a primitive. Let us suppose, by contradiction, that there exists a primitive
P : R ! R, i.e., a di erentiable function such that P 0 (x) = sgn x. By Proposition 1874 and
by focusing separately on the intervals ( 1; 0) and (0; 1), there exist k1 ; k2 2 R such that
P (x) = x + k2 if x < 0 and P (x) = x + k1 if x > 0
Since P is di erentiable, P is continuous, yielding that P (0) = k1 = k2 . Therefore, P (x) =
jxj+k1 for every x 2 R. But, this function is not di erentiable at the origin, which contradicts
what has been assumed on P .
We could have reached the same conclusion by also noting that the signum function has
a jump discontinuity at 0. By Corollary 1317, this prevents the signum function from being
the derivative of any other function. Finally, observe that, despite not admitting a primitive,
by Proposition 1856, the signum function is integrable on any closed and bounded interval.
In fact, it is a step function. N
Rb
The Riemann integral a f (x) dx is often called a de nite integral to distinguish it from
the inde nite integral introduced above. Note that the inde nite integral is a di erential
calculus notion. The Riemann integral, with its connection with the method of exhaustion,
is a conceptually much deeper notion.
44.7. INTEGRAL CALCULUS 1299
44.7.2 Formulary
The next table, obtained by \reversing" the corresponding table of the basic derivatives,
records some fundamental inde nite integrals.
R
f f (x) dx
xa+1
xa +k 1 6= a 2 R and x > 0
a+1
xn+1
xn +k n 2 N and x 2 R
n+1
1
log x + k x>0
x
1
log ( x) + k x<0
x
cos x sin x + k x2R
sin x cos x + k x2R
ex ex + k x2R
x
x +k > 0 and x 2 R
log
1
p arcsin x + k x 2 ( 1; 1)
1 x2
1
arctan x + k x2R
1 + x2
1
(cos x)2
tan x + k x2R
Proof Let = fxi gni=0 be a subdivision of [a; b]. If we add and subtract P (xi ) for every
i = 1; 2; : : : ; n 1, we have
P (b) P (a) = P (xn ) P (xn 1 ) + P (xn 1) P (x1 ) + P (x1 ) P (x0 )
Xn
= (P (xi ) P (xi 1 ))
i=1
which implies
I (f; ) P (b) P (a) S (f; ) (44.57)
Since is any subdivision, (44.57) holds for every 2 and therefore
sup I (f; ) P (b) P (a) inf S (f; )
2 2
Let us illustrate the theorem with some examples, which use the primitives computed in
Examples 1872 and 1873.
44.7. INTEGRAL CALCULUS 1301
For integrable functions without primitives, such as the signum function, the last theorem
cannot be applied and the calculation of integrals cannot be done through formula (44.56).
In some simple cases it is, however, possible to calculate the integral using directly the
de nition. For example, the signum function is a step function and therefore we can apply
Proposition 1856 in which, using the de nition of the integral, we determined the value of
the integral for this class of functions. In particular, we have
8
Z b < b a if a 0
>
sgn x dx = a + b if a < 0 < b
a >
:
a b if b 0
The cases a 0 and b 0 are obvious using (44.33). Let us consider the case a < 0 < b.
Using (44.33) and (44.44), we have
Z b Z 0 Z b Z 0 Z b
sgn x dx = sgn x dx + sgn x dx = ( 1)dx + 1dx
a a 0 a 0
= ( 1)(0 a) + (1)(b 0) = a + b
A nal remark is due. Recall that Darboux's Theorem shows that derivative functions,
though in general discontinuous, enjoy a remarkable property of continuity, namely, they
satisfy a version of the Intermediate Value Theorem. One might then wonder if this grain of
1302 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
continuity is enough to make derivative functions integrable. If so, the hypothesis of integra-
bility in the First Fundamental Theorem of Calculus would be redundant. Unfortunately,
this is not true: Vito Volterra published in 1881 an highly non-trivial example of a derivative
function which is not integrable.14
In other words, the value F (x) of the integral function is the (signed) area under f on
the interval [a; x], when x varies.15
Rx
N.B. The integral function F (x) = a f (t) dt is a function F : [a; b] ! R that has, as
variable,Rthe upper limit of integration x that, when varies, determines a di erent Riemann
x
integral a f (t) dt. The value of this integral (which is a scalar) is the image F (x) of the
integral function. In this regard, note that F is de ned on [a; b] since, f being integrable on
this interval, it is integrable on all the subintervals [a; x] [a; b]. O
Proof Since f is bounded, there exists M > 0 such that jf (x)j M for all x 2 [a; b].
We next show that jF (x) F (y)j M jx yj for all x; y 2 [a; b]. Consider x; y 2 [a; b].
To avoid the trivial case x = y, assume that x 6= y and without loss of generality
R x assume
also that x > y. By the de nition of integral function, we have F (x) F (y) = y f (t) dt.
Thanks to (44.50), we have
Z x Z x Z x
jF (x) F (y)j = f (t) dt jf (t)j dt M dt = M jx yj
y y y
Armed with the notion of integral function, we can address the problem that opened the
section: the next important result, the Second Fundamental Theorem of Calculus, shows
that the integral function is a primitive of a continuous f . Continuity is, therefore, a simple
condition that guarantees the existence of a primitive.
In view of (44.56), the fact that the integral function may be a primitive is then not that
surprising. The proof of this fundamental result gives a rigorous argument. It relies on an
interesting lemma.
Lemma 1883 Let f : [a; b] ! R be a continuous function. Then, for each x 2 [a; b),
R x+h
f (t) dt
lim x = f (x) (44.60)
h!0+ h
and, for each x 2 (a; b], Rx
x hf (t) dt
lim = f (x) (44.61)
h!0+ h
Proof Let x0 2 [a; b). Since f is continuous, for each " > 0 there exists " > 0 such that,
for each x 2 [a; b],
jx x0 j < " =) jf (x) f (x0 )j < "
Fix " > 0 and take h 2 (0; " ) with x0 + h < b. By the Integral Mean Value Theorem, there
exists ch 2 [x0 ; x0 + h] such that
Z x0 +h
f (t) dt = f (ch ) h
x0
Since " was arbitrarily chosen, we conclude that (44.60) holds. A similar argument proves
(44.61).
Note that (44.60) and (44.61) imply that, for each x 2 (a; b),
R x+h
f (t) dt
lim x h = f (x0 )
h!0 + 2h
1304 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
Proof of the Second Fundamental Theorem of Calculus Let x0 2 (a; b). By (44.59),
it holds
Z x0 +h Z x0
F (x0 + h) F (x0 ) = f (t) dt and F (x0 ) F (x0 h) = f (t) dt
x0 x0 h
By (44.61),
Therefore,
On the other hand, a di erentiable function f : [a; b] ! R is, obviously, a primitive of its
derivative function f 0 : [a; b] ! R. By the First Fundamental Theorem of Calculus, if f 0
is integrable { e.g., if f is continuously di erentiable (cf. Proposition 1858) { then formula
(44.56) takes the form Z x
f 0 (t) dt = f (x) f (a) (44.63)
a
44.7. INTEGRAL CALCULUS 1305
The next example shows that continuity is only a su cient, but not necessary, condition
for an integrable function to admit a primitive.
Indeed, for x 6= 0 this can be veri ed by di erentiating x2 sin 1=x, while for x = 0 one
observes that
P (h) P (0) h2 sin h1 1
P 0 (0) = lim = lim = lim h sin = 0 = f (0)
h!0 h h!0 h h!0 h
So, there exist discontinuous functions that have primitives. Moreover, on the interval [0; 1]
the function f is integrable by Theorem 1859, so the First Fundamental Theorem of Calculus
can be applied. N
The signum function, which has no primitive (Example 1876), is an example of a discon-
tinuous function for which the last theorem altogether fails. The next example shows that,
in the Second Fundamental Theorem of Calculus, even if F is assumed to be di erentiable,
the continuity of f cannot be weakened to integrability of f . In other words, if f is integrable
and F is di erentiable, it might be the case that F 0 6= f .
The function f is a modi cation of the Dirichlet function, known as Thomae's function (after
its 1875 discoverer). It is continuous at each irrational point and discontinuous at each non-
zero rational point of the unit interval. So, unlike the Dirichlet function, it is continuous
16
Coda readers will nd a neat version of this duality in the Barrow-Torricelli's Theorem (Section 48.7).
1306 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
somewhere and, by RTheorem 1859, this property makes it integrable. In particular, its
1
integral is zero, i.e., 0 f (t) dt = 0. It is a useful and non-trivial exercise to check all this.
Thomae's function is thus an instance of an integrable function which is discontinuous at
in nitely many points. Rx
For the integral function F : [0; 1] ! R, given by F (x) = 0 f (t) dt, we then have
R1
F (x) = 0 for every x 2 [0; 1] since 0 F (x) 0 f (t) dt = 0. Hence, F is trivially
di erentiable, with F (x) = 0 for every x 2 [0; 1]. But F 0 6= f because F 0 (x) = f (x) if and
0
only if x is either irrational or 0. We conclude that (44.58) does not hold, and so the Second
Fundamental RTheorem of Calculus fails because F is not a primitive of f . Nevertheless, we
x
have F (x) = 0 F 0 (t) dt for every x 2 [0; 1]. N
O.R. The operation of integration makes a function more regular: the integral function F of
f is always continuous and, if f is continuous, it is di erentiable. In contrast, the operation
of di erentiation makes a function more irregular. Speci cally, integration scales up of a
degree the regularity: F is always continuous; if f is continuous, F is di erentiable and,
continuing in this way, if f is di erentiable, F is twice di erentiable, and so on and so forth.
Di erentiation, instead, scales down the regularity of a function. H
(ii) we calculate the di erence P (b) P (a): this di erence is often denoted by P (x)jba or
[P (x)]ba .
Next we present some properties of the inde nite integral that simplify its calculation.
A rst observation is that the linearity of derivatives, established in (26.12), implies the
linearity of the inde nite integral.17
Proposition 1886 Let f; g : I ! R be two functions that admit a primitive. Then, for
every ; 2 R the function f + g : I ! R admits a primitive and
Z Z Z
( f + g) (x) dx = f (x) dx + g (x) dx + k (44.65)
for some k 2 R.
R R
Proof For ease of notation, denote Pf = f (x) dx and Pg = g (x) dx. Since f; g : I ! R
admit a primitive, both objects are well de ned. By (26.12), we have
R R
So, Pf + Pg = f (x) dx + g (x) dx is a primitive of f + g. By Proposition 1874,
(44.65) follows.
A simple application of the result is the calculation of the inde nite integral of a poly-
nomial. Namely, given a polynomial f (x) = 0 + 1 x + + n xn , it follows from (44.65)
that Z Z X !
n Xn
i xi+1
f (x) dx = ix dx = i +k
i+1
i=0 i=0
Using partial fraction expansions, we can then also calculate the inde nite integral of rational
fractions, as the next example illustrates.
The product rule for di erentiation leads to an important formula for the calculation of
the inde nite integral, called integration by parts.
Proof By the product rule (26.13), (f g)0 = f 0 g + f g 0 . Hence, f g = Pf 0 g+f g0 , and thanks to
(44.65) we have
Z Z Z
f (x) g (x) = f (x) g (x) + f (x) g (x) dx = f (x) g (x) dx + f (x) g 0 (x) dx + k^
0 0 0
Formula (44.66) is useful becauseR sometimes there is Ra strong asymmetry in the com-
putability of the inde nite integrals f 0 (x) g (x) dx and f (x) g 0 (x) dx, one of them may
be much simpler to calculate than the other one. By exploiting this asymmetry, thanks to
(44.66) we may be able to calculate the more complicated integral as the di erence between
f (x) g (x) and the simpler integral.
1308 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
R
Example 1889 Let us calculate the inde niteR integral log x dx. Let f; g :R(0; 1) ! R be
de ned by f (x) = log x and g (x) = x, so that log x dx can be rewritten as log x g 0 (x) dx.
By formula (44.66), we have
Z Z
0
xf (x) dx + log x dx = x log x + k
that is, Z Z
1
x dx + log x dx = x log x + k
x
So, Z
log x dx = x (log x 1) + k
N
R
Example 1890 Let us calculate the inde nite R integral x sin x dx. Let f;Rg : R ! R be
given by f (x) = x and g (x) = cos x, so that x sin x dx can be rewritten as f (x) g 0 (x) dx.
By formula (44.66),
Z Z
0
f (x) g (x) dx + x sin x dx = x cos x + k
that is, Z Z
x sin x dx = cos xdx x cos x + k = sin x x cos x + k
N
Note that in the last example, if instead we set f (x) = sin x and g (x) = x2 =2, formula
(44.66)
R becomes
R useless. Also with such choice of f and g, it is still possible to rewrite
x sin x dx as f (x) g 0 (x) dx. Yet, here (44.66) implies
Z Z
x2
f 0 (x) g (x) dx + x sin x dx = sin x + k
2
that is, Z Z
x2 1
x sin x dx = sin x x2 cos xdx + k
2 2
R 2
which actually complicates things because
R the integral x cos xdx is more di cult to com-
pute compared to the original integral x sin x dx. This shows that integration by parts
cannot proceed in a mechanical way, but it requires a bit of imagination and experience.
R
O.R. Example 1890 shows that to calculate the integral xn h(x)dx, where h is a function
whose primitive has a similar \complexity" (e.g., h is sin x, cos x or ex ), a good choice is
to set f (x) = xn and g 0 (x) = h(x). Indeed, after having di erentiated f (x) for n times,
the polynomial form disappears and one is left with g(x) or g 0 (x), which is immediately
integrable. Such a choice has been used in Example 1890. H
The two factors of the product f (x) g 0 (x) dx are called, respectively, the nite factor, f (x),
and the di erential factor, g 0 (x) dx. So, the formula says that \the integral of the product
between the nite factor and a di erential factor is equal to the product between nite
factor and the integral of the di erential factor minus the integral of the product between
the derivative of the nite factor and the integral just found". We stress that it is important
to carefully choose which of the two factors to take as nite factor and which as di erential
factor.
Theorem 1891 Let ' : [c; d] ! [a; b] be a di erentiable and strictly increasing function
such that '0 : [c; d] ! R is integrable. If f : [a; b] ! R is continuous, then the function
(f ') '0 : [c; d] ! R is integrable and
Z d Z '(d)
0
f (' (t)) ' (t) dt = f (x) dx (44.68)
c '(c)
If ' is surjective, we have a = ' (c) and b = ' (d). Formula (44.68) can therefore be
rewritten as Z d Z b
0
f (' (t)) ' (t) dt = f (x) dx (44.69)
c a
Heuristically, (44.68) can be seen as the result of the change of variable x = ' (t) and of the
corresponding change
dx = '0 (t) dt = d' (t) (44.70)
in dx. At a mnemonic and calculation level, this observation can be useful, even if the writing
(44.70) is per se meaningless.
Moreover, by the Second Fundamental Theorem of Calculus and since f is continuous, the
chain rule implies
(F ')0 (t) = F 0 ' (t) '0 (t) = (f ') (t) '0 (t)
1310 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
that is, F ' is a primitive of (f ') '0 : [c; d] ! R. By Proposition 1854, the composite
function f ' : [c; d] ! R is integrable. By Corollary 1863 and since '0 : [c; d] ! R is
integrable, so is the product function (f ') '0 : [c; d] ! R. By the First Fundamental
Theorem of Calculus, we have
Z d
(f ') (t) '0 (t) dt = (F ') (d) (F ') (c) (44.72)
c
as desired.
Theorem 1891, besides having a theoretical interest, can be useful in the calculation of
integrals. Formula (44.68), and its rewriting (44.69), can be used both from \right to left"
and from \left to right". In the rst case, from right to left in (44.69), the objective is
Rb
to calculate the integral a f (x) dx by nding a suitable change of variable x = ' (t) that
R ' 1 (b)
leads to an integral ' 1 (a) f (' (t)) '0 (t) dt that is easier to calculate. The di culty is in
nding a suitable change of variable x = ' (t): indeed, nothing guarantees that there exists
a \simplifying" change and, even if it existed, it might not be obvious how to nd it.
On the other hand, the application in direction left to right of formula (44.68) is useful
Rd
to calculate an integral that can be written as c f (' (t)) '0 (t) dt for some function f for
R '(d)
which we know the primitive F . In such a case, the corresponding integral '(c) f (x) dx,
obtained by setting x = ' (t), is easier to calculate since
Z
f ('(x))'0 (x)dx = F ('(x)) (44.73)
Rd
In such a case the di culty is in recognizing the composite form c f (' (t)) '0 (t) dt in the
integral that we want to calculate. Also here, nothing guarantees that the integral can be
rewritten in this form, nor that, also when possible, it is easy to recognize. Only experience
(and exercise) can be of help. The next example presents some classic integrals that can be
calculated with this technique.
For example, Z
1
sin4 x cos xdx = sin5 x + k
5
44.9. CHANGE OF VARIABLE 1311
For example,
Z
sin(3x3 2x2 ) (9x2 4x)dx = cos(3x3 2x2 ) + k
(iv) We have Z
e'(x) '0 (x)dx = e'(x) + k
For example, Z Z
x2 1 2 1 2
xe dx = 2xex dx = ex + k
2 2
We present now three examples that illustrate the two possible applications of formula
(44.68). The rst example considers the case right to left, the second and third example
consider the case left to right. For simplicity we use the variables x and t as they appear in
(44.68), even if it is obviously a mere convenience, without substantial value.
Example 1893 Consider the integral
Z b p
sin x dx
a
p
with [a; b] [0; 1). Set t = x, so that x = t2 . Here we have ' (t) = t2 and, thanks to
(44.69), p p
Z b Z b Z b
p
sin xdx = p
2t sin tdt = 2 p
t sin tdt
a a a
R
In Example 1890, by integrating by parts, we computed the inde nite integral t sin tdt. In
light of that example, we have
Z pb p p p p
p p p
t sin tdt = sin t t cos tjpb = sin b sin a + a cos a b cos b
p a
a
and so Z b p p p p p p p
sin xdx = 2 sin b
sin a + a cos a b cos b
a
p
Note how the starting point has been to set t = x, that is, to specify the inverse function t =
p
' 1 (x) = x. This is often the case because it is simpler to think of which transformation
of x may simplify the integration. N
1312 CHAPTER 44. THE RIEMANN INTEGRAL (SDOGANATO)
In the integral we recognize a form of type (i) of Example 1892, that is, an integral of the
type Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = 1 + sin t and a = 3. Since '(t)a '0 (t)dt = a+1 , we have
Z
2 cos t 1 2 1 1 3
dt = = + =
0 (1 + sin t)3 2 (1 + sin t)2 0
8 2 8
with [c; d] (0; 1). Here we recognize again a form of type (i) of Example 1892, an integral
of the type Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = log t and a = 1. Since again '(t)a '0 (t)dt = a+1 , we have
Z d
d
log t log2 t 1
dt = = log2 d log2 c
c t 2 c 2
N
Chapter 45
So far we considered the integrals of bounded functions on bounded intervals [a; b]. In this
chapter we extend Riemann integration beyond this case, as applications often require. We
consider two cases: unbounded intervals of integration, with integrands that are bounded
or not, and bounded intervals of integration with integrands that, however, are unbounded
near some point of the interval.
Most of the chapter will focus on the rst case, except for the last section. In any event,
in both cases we talk of improper integrals to distinguish them from the \canonical" integrals
of the previous chapter.
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
1313
1314 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)
seen in Section 36.4 and whose area is given by the Gauss integral
Z +1
2
e x dx (45.1)
1
In this case the domain of integration is the whole real line ( 1; +1).
Let us begin with domains of integration of the form [a; +1). Given a function f :
[a; +1) ! R, consider the integral function F : [a; +1) ! R given by
Z x
F (x) = f (t) dt
a
R +1
The de nition of the improper integral a f (x) dx is based on the limit limx!+1 F (x),
that is, on the asymptotic behavior of the integral function. For such behavior, we can have
three cases:
De nition 1896 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
[a; +1) with integral function F . If limx!+1 F (x) 2 R, we set
Z +1
f (x) dx = lim F (x)
a x!+1
and
R +1the function f is said to be integrable in the improper sense on [a; +1). The value
a f (x) dx is called the improper (or generalized) Riemann integral.
For brevity, in the sequel we will say that a function f is integrable on [a; +1), omitting
\in an improper sense". We have the following terminology:
R +1
(i) the integral a f (x) dx converges if limx!+1 F (x) 2 R;
R +1
(ii) the integral a f (x) dx diverges positively (resp., negatively) if limx!+1 F (x) = +1
(resp., 1);
R +1
(iii) nally, if limx!+1 F (x) does not exist, we say that the integral a f (x) dx does not
exist (or that it is oscillating).
Example 1897 Fix > 0 and let f : [1; 1) ! R be given by f (x) = x . The integral
function F : [1; 1) ! R is
8
Z x < 1
1 x1 1 if =6 1
F (x) = dt = 1
1 t :
log x if = 1
45.1. INTEGRATION ON THE POSITIVE HALF-LINE 1315
So, 8
< +1 if 1
lim F (x) = 1
x!+1 : if >1
1
It follows that the improper integral
Z +1
1
dx
1 x
exists for every > 0: it converges if > 1 and diverges positively if 1. N
Example 1898 De ne f : [0; 1) ! R by f (x) = ( 1)[x] where [x] is the integer part of
x 0 (Section 1.4.3). So, f (x) = ( 1)n if n x < n + 1, that is,
8
>
> 1 if 0 x < 1
>
>
>
>
>
> 1 if 1 x < 2
<
f (x) = 1 if 2 x < 3
>
>
>
>
>
> 1 if 3 x < 4
>
>
:
This implies (
Z n 0 if n is even
F (n) = f (x) dx =
0 1 if n is odd
R1
We conclude that limx!+1 F (x) does not exist, so the integral 0 f (x) dx is oscillating.N
Example 1899 A continuous time version of the discrete time intertemporal problem of
Section 9.1.2 features an in nitely lived consumer who chooses over consumption streams
f : [0; 1) ! [0; 1) of a single good. Such streams are evaluated by a continuous time
intertemporal utility function U : A R[0;1) ! R, often de ned by the improper integral
Z 1
U (f ) = u (f (t)) e t dt
0
Example 1900 Let h : [0; 1) ! N be the function that associates to each positive scalar
its nearest integer, with the convention h(n + 1=2) = n. For instance, h (2:7) = h (3:2) =
h (3:5) = 3 and h (2:5) = 2. De ne f : [0; 1) ! R by
8
<h (x) h2 (x) 2h(x) jx h (x)j if h (x) 6= 0 and jx h (x)j 1
h(x)2h(x)
f (x) =
:
0 otherwise
1316 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)
1 1
f (n + k) = n n2 2n jkj 8k 2 ;
n2n n2n
In words, there is an isosceles triangle centered at each n, with base 1=n2n 1 and height n.
The triangles have thus area 1=2n , so they shrink as n grows. In particular, the graph of f
is:
The function f is continuous { so, integrable on each interval [0; a] [0; 1) { as well as
unbounded because f (n) = n for all n 1. Moreover, since f is positive its integral function
F : [0; 1) ! R is positive and increasing, with
Z n+ n21n n
X
1 1 1
F n+ n = f (t) dt = =1 8n 1
n2 0 2k 2n
k=1
Improper integrals and series have a clear analogy. Intuitively, integral functions are to
improper integrals as partial sums are to series. The next result clari es this important
analogy.
R n+1
Proposition 1902 Let f : [1; 1) ! R and set an = n f (x) dx for every n 1. If the
R +1 P
integral 1 f (x) dx converges, then the series 1 n=1 an converges, with
1
X Z +1
an = f (x) dx
n=1 1
It is easy to see that this de nition does not depend on the choice of the point a 2 R.
Often, for convenience we take a = 0.
R +1
Also the improper integral 1 f (x) dx is called convergent or divergent according to
whether its value is nite or is equal to 1.
Next we illustrate this notion withR couple of examples.
R a Note that it is necessary to
+1
compute separately the two integrals a f (x) dx and 1 f (x) dx, whose values must
then be summed (unless the indeterminate form 1 1 arises).
The value of the integral in the previous example is consistent with the geometric inter-
pretation of the integral as the area (with sign) of the region under f . Indeed, such a gure
is a big rectangle with in nite base and height k. Its area is +1 if k > 0, zero if k = 0, and
1 if k < 0.
2
Example 1905 Let f : R ! R be given by f (x) = xe x . We have
Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
t2 t2
= lim te dt + lim te dt
x!+1 0 x! 1 x
1 x2 1 x2 1 1
= lim 1 e + lim e 1 = =0
x!+1 2 x! 12 2 2
Therefore, the improper integral Z +1
x2
xe dx
1
exists and is equal to 0. N
x2 x2
= lim + lim =1 1
x!+1 2 x! 1 2
So, the improper integral Z +1
xdx
1
does not exist because we have the indeterminate form 1 1. N
45.3. PRINCIPAL VALUES 1319
y
2
1
(+)
0
O x
(-)
-1
-2
-3
-3 -2 -1 0 1 2 3
The areas of the two regions under f for x < 0 and x > 0 are two \big triangles" that
are, intuitively, equal because they are perfectly symmetric with respect to the vertical axis,
but of opposite sign { as indicated by the signs (+) and ( ) in the gure. It is then natural
to think that they compensate each other, resulting in an integral equal to 0. Nevertheless,
the last de nition requires the separate calculation of the two integrals as x ! +1 and as
x ! 1, which in this case generates the indeterminate form 1 1.
instead of the two separate limits in (45.2). This motivates the following de nition.
De nition 1907 Let f : R ! R be a function integrable on Reach interval [a; b]. The Cauchy
R1 1
principal value, denoted by PV 1 f (x) dx, of the integral 1 f (x) dx is given by
Z +1 Z k
PV f (x) dx = lim f (x) dx
1 k!+1 k
In place of the two limits upon which the Rde nition of the improper integral is based,
k
the principal value considers only the limit of k f (x) dx. We will see in examples below
that, with this de nition, the geometric intuition of the integral as the area (with sign) of
the region under f is preserved. It is, however, a weaker notion than the improper integral.
Indeed:
1320 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)
(i) when the improper integral exists, also the principal value exists and one has
Z +1 Z +1
PV f (x) dx = f (x) dx
1 1
(ii) the principal value may exist also Rwhen the improper integral does not exist: in Ex-
+1
ample 1906 the improper integral 1 xdx does not exist, yet
Z +1 Z k
PV xdx = lim xdx = 0
1 k!+1 k
R +1
and therefore PV 1 xdx exists and is nite.
In sum, the principal value may exist even when the improper integral does not exist.
To better illustrate this key relation between the two notions of integral on ( 1; 1), let us
consider a more general version of Example 1906.
x2 x2
= lim + x + lim x =1 1
x!+1 2 x! 1 2
So the improper integral Z 1
(x + ) dx
1
does not exist because we have the indeterminate form 1 1. By taking the principal
value, we have
Z +1 Z k
PV f (x) dx = lim (x + ) dx
1 k!+1 k
8
Z k < +1 if > 0
= lim xdx + 2 k = 2 lim k = 0 if = 0
k!+1 k k!+1 :
1 if < 0
R +1
The principal value thus exists: PV 1 (x + ) dx = 1, unless is zero. N
45.3. PRINCIPAL VALUES 1321
In the last example the principal value agrees with the geometric intuition of the integral
as area with sign. Indeed, when = 0 the intuition is obvious (see the gure and the
comment after Example 1906). In the case > 0, look at the gure
2.5 y
1.5
0.5
(+)
0
x
-0.5 (-)
-1
-1.5
-2
-3 -2 -1 0 1 2 3
The negative area of the \big triangle" indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other, what
\is left" is the area of the dotted gure, which is clearly in nite and with + sign (lying above
the horizontal axis). For < 0 similar considerations hold:
y
2
1
(+)
0
(-) x
-1
-2
-3
-3 -2 -1 0 1 2 3
The negative area of the \big triangle" indicated by ( ) in the negative part of the horizontal
axis is equal and opposite to the positive area of the big triangle indicated by (+) in the
positive part of the horizontal axis. If we imagine that such areas cancel each other out,
\what is left" is here again the area of the dotted gure, which is clearly in nite and with
negative sign (lying below the horizontal axis).
1322 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)
Therefore, the improper integral does not exist because we have the indeterminate form
1 1. By calculating the principal value, we have instead
Z +1 Z k
x
PV f (x) dx = lim dx
1 k!+1 k 1 + x2
1 1
= lim log 1 + k 2 log 1 + k 2 =0
k!+1 2 2
and so Z +1
x
PV dx = 0:
1 1 + x2
N
45.4.1 Properties
Being de ned as limits, the properties of improper integrals follow from the properties of
limits of functions (Section 12.4). In particular, the improper integral retains the properties
of linearity and of monotonicity of the Riemann integral.
Let us begin with linearity, which follows from the algebra of limits established in Propo-
sition 536.
Proposition 1910 Let f; g : [a; +1) ! R be two functions integrable on [a; +1). Then,
for every ; 2 R, the function f + g : [a; +1) ! R is integrable on [a; +1) and
Z +1 Z +1 Z +1
( f + g) (x) dx = f (x) dx + g (x) dx (45.3)
a a a
Proof By the linearity of the Riemann integral, and by points (i) and (ii) of Proposition
536, we have
Z x
lim ( f + g) (x) dx = lim ( F (x) + G (x)) = lim F (x) + lim G (x)
x!+1 a x!+1 x!+1 x!+1
Z +1 Z +1
= f (x) dx + g (x) dx
a a
The property of monotonicity of limits of functions (see Proposition 535 and its scalar
variant) yields the property of monotonicity of the improper integral.
Proposition
R +1 1911 Let
R +1f; g : [a; +1) ! R be two functions integrable on [a; +1). If f g,
then a f (x) dx a g (x) dx.
Proof Thanks to the monotonicity of the Riemann integral, F (x) G (x) for every x 2
[a; +1). By the monotonicity of the limits of functions, we have therefore limx!+1 F (x)
limx!+1 G (x).
R +1
As we have seen in Example 1904, a 0dx = 0. So, a simple consequence of Proposition
R +1
1911 is that a f (x) dx 0 whenever f is positive and integrable on [a; +1).
Proposition 1912 Let f : [a; +1) ! R be a function positive and integrable on every
interval [a; b] [a; +1). Then, f is integrable on [a; +1) and
Z +1
f (t) dt = sup F (x) (45.4)
a x2[a;+1)
R1
In particular, a f (t) dt converges only if limx!+1 f (x) = 0 (provided this limit exists).
1324 CHAPTER 45. IMPROPER RIEMANN INTEGRALS (SDOGANATO)
RPositive
+1
functions f : [a; +1) ! R are thereforeRintegrable in an improper sense, that
1
is, a f (t) dt 2 [0; 1]. In particular, their integral a f (t) dt either converges or diverges
positively: tertium non datur. We have convergence if and only if supx2[a;+1) F (x) < +1,
and
R +1 only if f is in nitesimal as x ! +1 (provided limx!+1 f (x) exists). Otherwise,
a f (t) dt diverges positively.
The condition limx!+1 f (x) = 0 is only necessary for convergence, as Example 1897
with 0 < 1 shows. For instance, if = 1 we have limx!+1 1=x = 0, but for every a > 0
we have Z +1 Z x
1 1 x
dt = lim dt = lim log = +1
a t x!+1 a t x!+1 a
R +1
and therefore a (1=t) dt diverges positively.
In stating the necessary condition limx!+1 f (x) = 0 we put the clause \provided this
limit exists". The next simple example
R 1 shows that the clause is important because the limit
may not exist even if the integral a f (t) dt converges.
The proof of Proposition 1912 rests on the following simple property of limits of monotone
functions, which is the version for functions of Theorem 323 for monotone sequences.
Lemma 1914 Let ' : [a; +1) ! R be an increasing function. Then, limx!+1 ' (x) =
supx2[a;+1) ' (x).
Proof Let us consider rst the case supx2[a;+1) ' (x) 2 R. Let " > 0. Since supx2[a;+1) ' (x) =
sup ' ([a; +1)), thanks to Proposition 127 there exists x" 2 [a; +1) such that ' (x" ) >
supx2[a;+1) ' (x) ". Since ' is increasing, we have
sup ' (x) " < ' (x" ) ' (x) sup ' (x) 8x x"
x2[a;+1) x2[a;+1)
Proof of Proposition 1912 Since f is positive, its integral function F : [a; +1) ! R is
increasing and therefore, by Lemma 1914,
Suppose that limx!+1 f (x) exists. Let us show that the integral converges only if limx!+1 f (x) =
0. Suppose, by contradiction, that limx!+1 f (x) = L 2 (0; +1]. Given 0 < " < L, there
exists x" > a such that f (x) L " > 0 for every x x" . Therefore
Z +1 Z x" Z +1 Z +1 Z x
f (t) dt = f (t) dt + f (t) dt f (t) dt = lim f (t) dt
a a x" x" x!+1 x
"
Z x
lim (L ") dt = (L ") lim (x x" ) = +1
x!+1 x x!+1
"
R +1
i.e., a f (t) dt diverges positively.
The next result is a simple comparison criterion to determine if the improper integral of
a positive function is convergent or divergent.
Corollary 1915 Let f; g : [a; +1) ! R be two positive functions integrable on every [a; b]
[a; +1), with f g. Then
Z +1 Z +1
g (x) dx 2 [0; 1) =) f (x) dx 2 [0; 1) (45.5)
a a
and Z Z
+1 +1
f (x) dx = +1 =) g (x) dx = +1 (45.6)
a a
2
The study of integral (45.1) of the Gaussian function f (x) = e x , to which we will
devote the next section, is a remarkable application of this corollary.
R +1 R +1
Proof By Proposition 1911, a f (x) dx g (x) dx, while thanks to Proposition
R +1 R +1 a R +1
1912 we have a f (x) dx 2 [0; 1] and a g (x) dx 2 [0; 1]. Therefore, a f (x) dx
R +1 R +1 R +1
converges if a g (x) dx converges, while a g (x) dx diverges positively if a f (x) dx
diverges positively.
Example 1917 Let a > 0 and f : [a; 1) ! R be the positive function given by
1 1
sin3 x + x2
f (x) = 1 1
x + x3
As x ! +1, we have
1
f
x
R +1
By Proposition 1916, 0 f (x) dx = +1, i.e., the integral diverges positively. N
N.B. As the reader can check, what has been proved for positive functions extends easily to
functions f : [a; +1) ! R that are eventually positive, that is, such that there exists c > a
for which f (x) 0 for every x c. O
De nition 1919 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
R +1
[a; +1). The improper integral a f (x) dx is said to be absolutely convergent if the im-
R +1
proper integral a jf (x)j dx is convergent.
Like for series, absolute convergence plays a key role also for improper integration, as the
next result clari es.
Theorem 1920 Let f : [a; +1)R ! R be a function integrable on every interval [a; b]
+1
[a; +1). The improper integral a f (x) dx converges if it converges absolutely. In this
case, Z Z
+1 +1
f (x) dx jf (x)j dx
a a
This result is the analog of Theorem 402 and is an easy consequence { by taking g = jf j
{ of the following lemma, which is in turn the analog of Lemma 406.
Lemma 1921 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
R[a;+1
+1). Suppose that g : [a; +1) ! R is positive and integrable on [a; +1) and such that
a g (x) dx converges, with
(i) f + g 0,
45.5. GAUSS INTEGRAL 1327
Theorem 1920 permits to study the integrability of general integrands through the inte-
grability criteria previously established for positive integrands. For instance, the comparison
criterion for positive integrands (Corollary
R +1 1915) implies the following general comparison
criterion: the improper integral a f (x) dx converges if there exists a positive function
R +1
g : [a; +1) ! R with jf j g for which a g (x) dx converges. In symbols,
Z +1 Z +1
jf j g and g (x) dx 2 [0; 1) =) f (x) dx 2 R (45.7)
a a
R +1
Indeed, by Corollary 1915 the condition jf j g ensures the convergence of jf j (x) dx,
R +1 a
and so that of a f (x) dx by Theorem 1920.
We close by observing that, as for series, also for improper integrals the converse of
Theorem 1920 fails: there exist improper integrals that converge but not absolutely, as
readers can check.
If x > 0, we have
2
g (x) f (x) () e x e x () x x2 () x 1
R +1 R +1
By (45.5) of Corollary 1915, if 1 g (x) dx converges, then also 1 f (x) dx converges. In
R +1
turn, this implies that a f (x) dx converges for every a 2 R. This is obvious if a 1. If
a < 1, we have Z Z Z
+1 1 +1
f (x) dx = f (x) dx + f (x) dx (45.8)
a a 1
R1 R1
Since a f (x) dx exists
R1 because of the continuity of f on [a; 1], the convergence of 1 f (x) dx
then implies that of a f (x) dx.
R +1
Thus, it remains to show that 1 g (x) dx converges. We have
Z x
G (x) = g (t) dt = e 1 e x
1
The Gauss integral is central in probability theory, where it is usually presented in the form:
Z +1
1 x2
p e 2 dx
1 2
45.6. UNBOUNDED FUNCTIONS 1329
By proceeding by substitution, it is easy to verify that, for every pair of scalars a 2 R and
b > 0, one has Z +1
(x+a)2 p
e b2 dx = b (45.12)
1
p
By setting b = 2 and a = 0, we then have
Z +1
1 x2
p e 2 dx = 1 (45.13)
1 2
has therefore unit value and, thus, it is a density function (as it will be seen in Section 48.8).
This explains the importance of this particular form of the Gaussian function.
De nition 1924 Let f : [a; b) ! R be a continuous function such that limx!b f (x) = 1.
If Z z
lim f (x) dx = lim [F (z) F (a)]
z!b a z!b
If the unboundedness of the function concerns the other endpoint a, or both endpoints,
we can give a similar de nition based on limz!a+ .
So,
0 if 0 < < 1
lim F (x) =
x!b +1 if 1
It follows that the improper integral
Z b
1
dx
a (b x)
exists for every > 0: it converges if 0 < < 1 and diverges positively if 1. N
R b Proposition 1916 holds also for these improper integrals and allows us to state that
a f (x) dx converges if there exists 2 (0; 1) such that
1 1
f or f = o as x ! b
(b x) (b x)
The next example requires the version of last de nition involving the limit limz!a+ .
We have
Z 1 Z 1
1 1 p 1 p
p dx = lim p dx = lim 2 x z
= lim 2 2 z =2
0 x z!0+ z x z!0+ z!0+
O.R. Intuitively, when the interval is unbounded, for the improper integral to converge the
function f must converge to zero quite rapidly{ e.g., as x with > 1. When the function
is unbounded, instead, f must converge to in nity fairly slowly{ e.g., as x with 2 (0; 1).
Both things are quite natural: for the area of an unbounded surface to be nite, its portion
\that escapes to in nity" must be very narrow.
To see what may go wrong,Rconsider for instance the function f : (0; 1) ! (0; 1) de ned
1
by f (x) = 1=x. Observe that 1 f (x) dx = limx!+1 log x = +1, so the integral diverges.
R1
Similarly, 0 f (x) = limz!0+ log x = +1 and again the integral diverges. H
Chapter 46
Unlike the integrals seen so far, the value of the de nite integral (46.1) depends on the value
of the variable , which is usually interpreted as a parameter (so the choice of the symbol
). Such an integral, referred to as parametric integral, therefore de nes a scalar function
F : [c; d] ! R in the following way:
Z b
F( ) = f (x; )dx (46.2)
a
Note that, although function f is of two variables, the function F is scalar. Indeed, it does
not depend in any way on the variable x, which here plays the role of a mute variable of
integration.
Functions of type (46.2) appear in applications more frequently than one may initially
think. Therefore, having the appropriate instruments to study them is important.
46.1 Properties
We will study two properties of the function F , namely continuity and di erentiability. Let
us start with continuity.
1331
1332 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)
Formula (46.3) is referred to as \passage of the limit under the integral sign".
Proof Take " > 0. We must show that there exists a > 0 such that
Zb Zb
jF ( ) F ( 0 )j = (f (x; ) f (x; 0 )) dx jf (x; ) f (x; 0 )j dx
a a
By hypothesis, f is continuous on the compact set [a; b] [c; d]. By Theorem 603, it is
therefore uniformly continuous on [a; b] [c; d] , so there is a > 0 such that
"
k(x; ) (x0 ; 0 )k < =) jf (x; ) f (x0 ; 0 )j < (46.4)
b a
for every (x; ) 2 [a; b] [c; d]. Therefore, for every 2 [c; d] \ ( 0 ; 0 + ) we have
Zb
"
jF ( ) F ( 0 )j jf (x; ) f (x; 0 )j dx < (b a) = "
b a
a
as desired.
Proposition 1928 Suppose that f : [a; b] [c; d] ! R and its partial derivative @f =@ :
[a; b] [c; d] ! R are both continuous.1 Then, the function F : [c; d] ! R is di erentiable on
(c; d), with
Z b
0 @
F ( )= f (x; )dx (46.5)
a @
Formula (46.5) is referred to as \di erentiation under the integral sign". Since
Zb
0 F ( + h) F ( 0) f (x; + h) f (x; )
F ( ) = lim = lim dx
h!0 h h!0 h
a
and Z Z
b b
@ f (x; + h) f (x; )
f (x; )dx = lim dx
a @ a h!0 h
1
That is, the section f (x; ) : [c; d] ! R is di erentiable, i.e., the partial derivative @f (x; ) =@ exists for
each (x; ) 2 [a; b] [c; d].
46.1. PROPERTIES 1333
Proof Let 0 2 (c; d). For every x 2 [a; b] the function f (x; ) : [c; d] ! R is by hypothesis
di erentiable. Take h small enough so that 0 + h 2 [c; d]. By the Mean Value Theorem,
there exists x 2 [0; 1] such that
f (x;
+ h) f (x; 0 ) 0 @f
= (x; 0 + x h)
h @
Being f continuous, the di erence quotient above is continuous in x, thus integrable (note
that x depends on x). Let us write the di erence quotient of function F at 0 :
Zb
F( 0 + h) F ( 0) @f
(x; 0 ) dx (46.6)
h @
a
Zb Zb
f (x; 0 + h) f (x; 0) @f
= dx (x; 0 ) dx
h @
a a
Zb
@f @f
= (x; 0 + x h) (x; 0) dx
@ @
a
Zb
@f @f
(x; 0 + x h) (x; 0) dx
@ @
a
The partial derivative @f =@ is continuous on the compact set [a; b] [c; d], so it is also
uniformly continuous. Thus, given any " > 0, there exists a > 0 such that
@f @f "
k(x; ) (x; 0 )k < =) (x; ) (x; 0) < (46.7)
@ @ b a
for every 2 [c; d]. Therefore, for jhj < we have that
k(x; 0 + x h) (x; 0 )k = x jhj jhj < 8x 2 [a; b]
Thanks to conditions (46.6) and (46.7), this implies that
Zb
F( 0 + h) F ( 0) @f
(x; 0 ) dx <" 8jhj <
h @
a
proving that
Zb
F( 0 + h) F ( 0) @f
lim = (x; 0 ) dx
h!0 h @
a
as desired.
1334 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)
2
Example 1929 Set f (x; ) = x2 + x and
Z b
F( )= x2 + 2
x dx
a
As the hypotheses of Proposition 1928 are satis ed, we di erentiate under the integral sign:
Z b
F0 ( ) = 2 xdx = b2 a2
a
The following result extends Proposition 1928 to the case of variable limits of integration.
Proposition 1930 Suppose that f : [a; b] [c; d] R2 ! R and its partial derivative @f =@
are both continuous. If ; : [c; d] ! (a; b) are continuously di erentiable, then G : [c; d] ! R
is di erentiable on (c; d), with
Z ( )
@f
G0 ( ) = (x; )dx + 0
( )f ( ( ); ) 0
( )f ( ( ); ) (46.9)
( ) @
Formula (46.9) is referred to as Leibniz's rule. Heuristically, we can derive this rule when
via the auxiliary function H : [a; b] [c; d] ! R de ned by
Zx
H (x; ) = f (t; ) dt (46.10)
a
@H @H @H @H
G0 ( ) = ( ( ); ) ( ( ); ) + ( ( ); ) 0( ) ( ( ); ) 0
( )
@ @ @x @x
Z ()
@f
= (x; ) dx + 0 ( )f ( ( ); ) 0
( )f ( ( ); )
( ) @
Proof The auxiliary function (46.10) has two sections H : [a; b] ! R and H x : [c; d] ! R
de ned by H (x) = H (x; ) and H x ( ) = H (x; ).2 Fix 2 (c; d). Since f is continuous,
by the Second Fundamental Theorem of Calculus the section H is di erentiable, with
@H dH
(x; ) = (x) = f (x; ) 8x 2 [a; b]
@x dx
Since f is continuous, this implies that @H=@x is a continuous function on (a; b) (c; d). By
Proposition 1928, the section H is, for each x, di erentiable on (c; d), with
Z x
@H dH x @
(x; ) = (x) = f (t; )dt 8 2 (c; d)
@ d a @
Since @f =@ is continuous, by Propositions 1881 and 1881 the function @H=@ is, on the
open rectangle (a; b) (c; d), Lipschitz continuous in x and continuous in . In particular,
the Lipschitz constant can be chosen to be independent of . This implies that @H=@ is
continuous on (a; b) (c; d). Indeed, if take a sequence f(xn ; n )g (a; b) (c; d) that
converges to a vector (x; ) 2 (a; b) (c; d), we have
@H @H @H @H @H @H
(x; ) (xn ; n) (x; ) (x; n ) + (x; n ) (xn ; n)
@ @ @ @ @ @
jH (x; ) H (x; n )j + M jx xn j ! 0
Z( )
G( ) = f (x; ) dx = H ( ( ) ; ) H ( ( ); ) (46.11)
( )
Since H is di erentiable and and are continuously di erentiable, by the Chain rule
(Theorem 1296) and Corollary 1864 we have
@H @H @H @H
G0 ( ) = ( ( ); ) 0( ) + ( ( ); ) ( ( ); ) 0
( ) ( ( ); )
@x @ @x @
@H @H @H @H
= ( ( ); ) ( ( ); ) + ( ( ); ) 0( ) ( ( ); ) 0
( )
@ @ @x @x
Z ()
@f 0 0
= (x; ) dx + ( )f ( ( ); ) ( )f ( ( ); )
( ) @
We are left with the case ( )> ( ). Recall that in this case
Z ( ) Z ( )
f (x; ) dx = f (x; ) dx = G( ) 8 2 (c; d)
( ) ( )
2
See Section 20.4.1.
1336 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)
Z
cos
G( ) = x2 + 2
dx
sin
The hypotheses of Proposition 1930 are satis ed at any compact interval, so by Leibniz's
rule we have:
Z
cos
G0 ( ) = 2 dx sin cos2 + 2
cos sin2 + 2
sin
Z
cos
= 2 dx sin cos2 + 2
+ cos sin2 + 2
sin
= 2 (cos sin ) sin cos2 + 2 + cos sin2 + 2
p
(ii) Let f (x; ) = sin x, (x) = and (x) = 3 . Set
Z3
G( ) = sin x dx
p
The hypotheses of Proposition 1930 are satis ed at any compact interval of (0; 1), so by
Leibniz's rule we have:
Z3
1 3
G0 ( ) = x cos x dx + 3 2
sin 4
p sin 2
p 2
The extension of Proposition 1928 to the improper case is a delicate issue that requires a
dominance condition. For simplicity, in the statement we make the assumption that I is the
real line and a compact interval. An analogous result, which we omit for brevity, holds
when I is a half-line and an unbounded interval.
@
jf (x; )j g (x) and f (x; ) g (x) 8 (x; ) 2 R (c; d) (46.13)
@
The proof of this result is not simple, so we omit it. Note that the dominance condition
(46.13), which is based on the auxiliary function g, guarantees inter alia that the integrals
Z +1 Z +1
@
f (x; )dx and f (x; )dx
1 1 @
We have
@ 2
x2
f (x; ) = 2 sin x e
@
and so @f (x; ) =@ is continuous
R +1 on R ( 1; 1). Let g be the Gaussian-type function
2 2
g (x) = 2e x . We have 1 2e x dx < +1 and, for each 2 ( 1; 1),
2 2 2 2
x2 x2 x2 x2 x2
sin x e = jsin xj e e =e e e g (x)
as well as
2 2
x2 x2
2 sin x e = 2 j j sin x e g (x)
The hypotheses of Proposition 1932 are satis ed, so formula (46.14) takes the form
Z +1 Z +1
0 @ 2
x2 2
x2
F ( )= sin x e dx = 2 sin x e dx = 2 F( )
1 @ 1
1338 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)
Besides its intrinsic interest, it will help us to illustrate a key methodological principle:
sometimes the best way to deal with a tree is to address rst the entire forest, and then to
go back to the tree itself.
We rst establish the convergence of the Dirichlet integral, and then we solve it.
Proof De ne f : [0; 1) ! R by
sin x
x if x > 0
f (x) =
1 if x = 0
R +1
We can write the Dirichlet integral as 0 f (x) dx. Since limx!0 (sin x) =x = 1, the inte-
Rb
grand is continuous. So, the integral 0 f (x) dx exists for all b > 0. By integrating by parts,
for all 0 < a < b, we have3
Z b Z b
sin x cos b cos a cos x
dx = + dx
a x b a a x2
We now split the domain of integration in the same way as we did to solve the Gauss integral
{ cf. (45.8). We thus have
Z b Z a Z b Z a Z b
sin x sin x sin x sin x cos b cos a cos x
dx = dx + dx = dx + dx
0 x 0 x a x 0 x b a a x2
The value of the Dirichlet integral is quite remarkable. The method used to solve this
integral is, however, even more remarkable. Indeed, since to solve it directly is di cult, we
embed it in a larger class of integrals by introducing an ad hoc positive parameter via the
function F : [0; 1) ! R given by
Z +1
sin x x
F( )= e dx
0 x
This function is easily seen to be well de ned (why?). The integral of interest corresponds
to = 0. To nd it, in the proof we will compute, all at once, the integrals corresponding
to all the values of the parameter .
This proof strategy illustrates the methodological principle previously mentioned: some-
times the best way to solve a speci c problem (a tree) is to embed it, via a suitable pa-
rameterization, in a general class of problems (the forest) and to solve directly the general
problem.
Proof It is possible to prove that the function F is continuous at 0 and that it is possible
to di erentiate under the integral sign.4 So, for every > 0 we have
Z +1 Z +1 Z +1
0 d sin x x sin x x x
F ( )= e dx = ( x) e dx = sin x e dx
0 d x 0 x 0
e x( cos x sin x)
1+ 2
2
Hence, F 0 ( ) = 1= 1 + for every > 0, that is,
F( )= arctan + k 8 >0
for some k 2 R (cf. Example 1238). Recall that arctan x : R ! ( =2; =2) is the inverse of
the function tan x (Section 6.5.3), with graph
4
Suitable versions of Propositions 1927 and 1932 are needed, as readers can check (see, e.g., Roussos, 2014,
pp. 85-90).
1340 CHAPTER 46. PARAMETRIC RIEMANN INTEGRALS (SDOGANATO)
3 y
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
lim ( arctan + k) = +k
!+1 2
F( )= arctan + 8 >0
2
By the continuity of F at 0, we thus have F (0) = =2.
Chapter 47
Stieltjes' integral
n
X n
X
mk (g(xk ) g(xk 1 )) and Mk (g (xk ) g (xk 1 )) (47.2)
k=1 k=1
where g is a scalar function. Clearly, (47.1) is the special case of (47.2) that corresponds to
the identity function g(x) = x.
But, why are the more general sums (47.2) relevant? Recall that the sums (47.1) arise
in Riemann integration because every interval [xi 1 ; xi ], obtained by subdividing [a; b], is
measured according to its length xi = xi xi 1 . Clearly, the length is the most natural
way to measure an interval. However, it is not the only way: in some problems it might be
more suitable to measure an interval in a di erent way. For example, if [xi 1 ; xi ] represents
levels of production between xi 1 and xi , the most appropriate economic measure for such
an interval may be the additional cost that a higher production level entails: if C (x) is
the total cost for producing x, the measure that must be assigned to [xi 1 ; xi ] is then the
di erence C (xi ) C (xi 1 ). If [xi 1 ; xi ] represents, instead, an interval in which a random
variable may assume values and F (x) is the probability that such value is x, then the most
natural way to measure [xi 1 ; xi ] is the di erence F (xi ) F (xi 1 ). In such cases, which are
quite common in economic applications (see, e.g., Section 47.8), the Stieltjes' integral is the
natural notion of integral to use.
Besides its interest for applications, however, Stieltjes integration also sheds further light
on Riemann integration. Indeed, we will see in this chapter that some results that we
established for Riemann's integrals are actually best understood in terms of the more general
Stieltjes' integral.
1341
1342 CHAPTER 47. STIELTJES' INTEGRAL
47.1 De nition
Consider two functions f; g : [a; b] R ! R with f bounded and g increasing.1 For every
subdivision = fxi gni=0 2 of [a; b], with a = x0 < x1 < < xn = b, and for every interval
[xi 1 ; xi ] we can de ne the following quantities
is referred to as upper Stieltjes sum. It can be easily shown that, for every subdivision of
[a; b], we have
I(f; g; ) S(f; g; )
0 0
I (f; g; ) I f; g; S f; g; S (f; g; ) (47.3)
Using the lower and upper Stieltjes sums, we de ne the Stieltjes' integral.
Rb
The common value, denoted by a f (x)dg(x), is called integral in the sense of Stieltjes (or
Stieltjes' integral) of f with respect to g on [a; b].
When g (x) = x, we get back to Riemann's integral (cf. Proposition 1848). The functions
f and g are called
R b integrand function and integrator function, respectively. For brevity, we
will often write a f dg, thus omitting the arguments of such functions.
N.B. In the rest of the chapter (except in the coda) we will tacitly assume g is an increasing
scalar function, but not constant (that is g (b) > g (a)). If g is constant, then the Stieltjes'
Rb
integral is always de ned for any bounded function f and a f dg is trivially equal to 0. O
1
If g were decreasing, we could consider h = g instead, which is clearly increasing.
47.2. INTEGRABILITY CRITERIA 1343
As for Riemann's integral, it is important to know which are the classes of integrable
functions. As one may expect, the answer depends on the regularity of both functions f and
g (recall that we assumed g to be increasing).
Rb
Proposition 1938 The integral a f dg exists if at least one of the following two conditions
is satis ed:
Note that (i) and (ii) generalize, respectively, Propositions 1858 and 1861 for Riemann's
integral.
Proof (i) The proof relies on the same steps as that of Proposition 1858. Since f is continu-
ous on [a; b], it is also bounded (Weierstrass' Theorem) and uniformly continuous (Theorem
603). Take " > 0. There exists " > 0 such that
Let 2 be a subdivision of [a; b] such that j j < " . By condition (47.4), for every
i = 1; 2; : : : ; n we have
max f (x) min f (x) < "
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
(ii) Since f is monotone, f is bounded. Since g is continuous on [a; b], it is also bounded
and uniformly continuous. Let " > 0. There exists " > 0 such that
Let 2 be a subdivision of [a; b] such that j j < " . For every pair of consecutive points
of such a subdivision, we have that g(xi ) g(xi 1 ) = jg(xi ) g(xi 1 )j < ". The proof now
follows the same steps as that of Proposition 1861. Suppose that f is increasing (if f is
decreasing the argument is analogous). We have
so that
n
X
S (f; g; ) I (f; g; ) = sup f (x) (g(xi ) g(xi 1 ))
i=1 x2[xi 1 ;xi ]
Xn
inf f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ]
i=1
n
X n
X
= f (xi ) (g(xi ) g(xi 1 )) f (xi 1 ) (g(xi ) g(xi 1 ))
i=1 i=1
n
X
= (f (xi ) f (xi 1 )) (g(xi ) g(xi 1 ))
i=1
Xn
< " (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1
Lastly, we partially extend Theorem 1859 to Stieltjes' integral by requiring that g does
not share any point of discontinuity with f .2
Proposition 1939 Every bounded function f : [a; b] ! R with nitely many discontinuities
is Stieltjes integrable with respect to g : [a; b] ! R, provided g is continuous at such points.
We omit the proof of this remarkable result which, inter alia, generalizes Proposition
1938-(i). However, while Theorem 1859 allowed for in nitely many discontinuities, in this
more general setting we restrict ourselves to consider nitely many ones.
47.3 Calculus
When g is di erentiable, the Stieltjes' integral can be written as a Riemann's integral.
2
In other words, we require the two functions f and g not to be discontinuous at the same points.
47.3. CALCULUS 1345
Proof Since g 0 is Riemann integrable, for any given " > 0 there exists a subdivision =
fxi gni=0 2 such that
n
!
X
sup g 0 (x) inf g 0 (x) xi = S(g 0 ; ) I(g 0 ; ) < " (47.6)
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=1
From (47.6) we also deduce that, for any pair of points si ; ti 2 [xi 1 ; xi ] and for any i =
1; :::; n, we have
Xn
g 0 (si ) g 0 (ti ) xi < " (47.7)
i=1
By the Mean Value Theorem and since g is di erentiable, for each i = 1; :::; n there exists a
point ti 2 [xi 1 ; xi ] such that
n
X n
X n
X
0
f (si ) gi f (si )g (si ) xi = f (si ) g 0 (ti ) g 0 (si ) xi
i=1 i=1 i=1
n
X
M g 0 (si ) g 0 (ti ) xi M"
i=1
yielding that
S(f; g; ) S(f g 0 ; ) + M " (47.8)
1346 CHAPTER 47. STIELTJES' INTEGRAL
Thus, for every " > 0 there exists 2 such that inequality (47.10) holds. This implies
that
Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (47.11)
a a
From (47.11) and (47.12) one can see that f g 0 is Riemann integrable if and only if f is
Stieltjes integrable with respect to g, in which case we get (47.5).
This greatly simpli es computations because the techniques developed to solve Riemann's
integrals can be then used for Stieltjes' integrals.3
From a theoretical standpoint, Stieltjes' integral substantially extends the scope of Rie-
mann's integral, while keeping { also thanks to (47.5) { its remarkable analytical properties.
Such a remarkable balance between generality and tractability explains the importance of
Stieltjes' integral.
Proposition 1941 Let g beR the integral function of a Riemann integrable function :
x
[a; b] ! R, that is, g (x) = a (t) dt for every x 2 [a; b]. If f : [a; b] ! R is continu-
ous, we have
Z b Z b
f (x)dg (x) = f (x) (x)dx
a a
We omit the proof of this result. However, when is continuous (so, Riemann integrable)
it follows from the previous result because, by the Second Fundamental Theorem of Calculus,
the function g is di erentiable with g 0 = .
We call (47.13) Ito's formula because it is a precursor of this celebrated formula (a much
deeper result that readers will study in probability courses).4 This formula is easily checked
when g : [a; b] ! R is continuously di erentiable. Indeed, by the chain rule f g : [a; b] ! R is
di erentiable at each x 2 (a; b), with (f g)0 (x) = f 0 (g (x)) g 0 (x). So, (f g)0 is continuous.
We then have:
Z x Z x Z x
f (g (x)) f (g (a)) = (f g)0 (t) dt = f 0 (g (t)) g 0 (t) dt = f 0 (g (t)) dg (t)
a a a
where the rst equality follows from the First Fundamental Theorem of Calculus and the
last one from Proposition 1940. That said, next we prove Ito's formula in full generality.
Proof Let = fxi gni=0 be a subdivision of [a; x]. So, in particular, xn = x. Consider
the function h = f g : [a; b] ! R. If we add and subtract h (xi ) = f (g (xi )) for each
i = 1; 2; : : : ; n 1, we have
Let i 2 f1; :::; ng. Consider the interval [g (xi 1 ) ; g (xi )]. We have two cases:
(i) Let g (xi 1 ) < g (xi ). Consider f on [g (xi 1 ) ; g (xi )]. By the Mean Value Theorem,
there exists y^i 2 (g (xi 1 ) ; g (xi )) such that
f (g (xi )) f (g (xi 1 ))
f 0 (^
yi ) =
g (xi ) g (xi 1 )
f (g (xi )) f (g (xi 1 ))
f 0 (g (^
xi )) =
g (xi ) g (xi 1 )
which yields
Example 1943 (i) For f (x) = x with 1 and g continuous, Ito's formula becomes
Z x
g (x) g (a) = g 1 (t) dg (t)
a
which allows us to compute a Stieltjes integral that has the same integrand and integrator.
(ii) Let f (x) = log x and Im g (0; 1). Then, Ito's formula becomes
Z x
1
log g (x) log g (a) = dg (t)
a g (t)
which allows us to compute a Stieltjes integral where the integrand is the reciprocal of the
integrator. N
Ito's formula generalizes the First Fundamental Theorem of Calculus for continuously
di erentiable f . Indeed, if g (x) = x then formula (47.13) reduces to the version (44.63) of
the First Fundamental Theorem of Calculus, that is,
Z x
f (x) f (a) = f 0 (t) dt
a
47.4. PROPERTIES 1349
Ito's formula permits to compute Stieltjes integrals featuring integrands that one is able to
recognize to have the form f 0 g, where g is the integrator. In this regard, note that if
H : [c; d] ! R is a primitive of a continuous function h : [c; d] ! R, we can rewrite Ito's
formula as Z x
H (g (x)) H (g (a)) = h (g (t)) dg (t) (47.18)
a
If we compare this version of Ito's formula with the change of variable formula
Z g(x) Z x
h (t) dt = h (g (t)) g 0 (t) dt (47.19)
g(a) a
we see that Ito's formula can be actually viewed as a Stieltjes elaboration of the change of
variable formula of Riemann integration.5
47.4 Properties
Properties similar to those of Riemann's integral hold for Stieltjes' integral. The only sub-
stantial novelty lies in a linearity property that now holds with respect to both the integrand
function f and the integrator function g. Next we list the properties without proving them
(the proofs being similar to those of Section 44.6).
(iv) Monotonicity:
Z b Z b
f1 f2 =) f1 dg f2 dg
a a
g x+
0 g x0
is therefore the potential positive jump of g at x0 (recall that g was assumed to be increasing).
In other words, Stieltjes' integral is the sum of all the jumps of the integrator at the
points of discontinuity, multiplied by the value of the integrand in such points. A similar
argument yields that the same equality holds if g is decreasing (cf. Section 47.9).
Proof Since f is continuous on [a; b], it is also bounded (Weierstrass' Theorem) and uni-
formly continuous (Theorem 603). In particular, for each " > 0 there exists " > 0 such
that
jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (47.24)
Fix " > 0. Let ^ = fxi gki=1 be a subdivision of [a; b] such that: 1) j^ j < " and 2) each dj
belongs to at most one of the intervals f[xi 1 ; xi ]gki=1 . Given the second property, we denote
by ij the index in f1; :::; kg such that the corresponding interval contains dj . Without loss of
generality, we assume that the discontinuity points have been listed so that d1 d2 :::
dn . By condition (47.24), for each i = 1; 2; : : : ; k we have
where max and min exist by Weierstrass' Theorem. This implies that
k
X
S (f; g; ^ ) I (f; g; ^ ) = (Mi mi ) (g (xi ) g (xi 1 )) < " (g (b) g (a)) (47.25)
i=1
Next, we compute g (xi ) g (xi 1 ) for all i = 1; 2:::; k. If [xi 1 ; xi ] does not contain any
discontinuity point of g, then g (xi ) g (xi 1 ) = 0. If instead [xi 1 ; xi ] contains a discontinuity
point dj of g, then we have three cases:
2. dj 2 (a; b). In this case, we have that dj 2 [xi 1 ; xi ] for some i 2 f1; :::; kg. Moreover,
since each dj belongs to at most one of the intervals f[xi 1 ; xi ]gki=1 , it follows that
dj 2 (xi 1 ; xi ). We can conclude that g (xi ) g (xi 1 ) = g(d+ j ) g(dj ).
1+ 3 1 2+ 2 3
g = ; g =0 ; g =1 ; g =
2 4 2 3 3 4
1352 CHAPTER 47. STIELTJES' INTEGRAL
Consider an integrator step function with unitary jumps, that is, for every i we have
g d+
i g di =1
Equation (47.23) then becomes
Z b n
X
f dg = f (di )
a i=1
In particular, if f is the identity we get
Z b n
X
f dg = di
a i=1
Stieltjes' integral thus includes addition as a particular case. More generally, we will soon
see that expected values are represented by Stieltjes' integral (Section 48.9).
Proposition 1946 Given any two increasing and continuous functions f; g : [a; b] ! R, it
holds Z b Z b
f dg + g df = f (b) g (b) f (a) g (a) (47.28)
a a
Rb Rb
Proof By Proposition 1938, both integrals a f dg and a g df exist. So, for every " > 0
0
there are two subdivisions, = fxi gni=0 and 0 = fyi gni=0 , of [a; b] such that
Z b n
X "
f dg f (xi 1 ) (g (xi ) g (xi 1 )) <
a 2
i=1
and
Z b n
X
0
"
g df g (yi ) (f (yi ) f (yi 1 )) <
a 2
i=1
00 n00
Let = fzi gi=0 be the subdivision 00 = [ 0. By (47.3), the two inequalities still hold
for subdivision 00 . Moreover, note that
n 00 n 00
X X
f (zi 1 ) (g (zi ) g (zi 1 )) + g (zi ) (f (zi ) f (zi 1 )) = f (b) g (b) f (a) g (a)
i=1 i=1
47.7. CHANGE OF VARIABLE 1353
which implies
Z b Z b
f dg + g df f (b) g (b) + f (a) g (a) < "
a a
Since " was arbitrarily chosen, we reach the desired conclusion.
If on top both ' and g are di erentiable, by Proposition 1940 we then have
Z d Z b
f (' (t)) g 0 (' (t)) '0 (t) dt = f (x) dg (x)
c a
In particular, if g (x) = x we get back to the Riemann formula (44.69), that is,
Z d Z b
f (' (t)) '0 (t) dt = f (x) dx
c a
The more general Stieltjes formula thus clari es the nature of this earlier formula, besides
extending its scope. After integration by parts, the change of variable formula is thus another
result that is best understood in terms of the Stieltjes' integral.
where = fti gni=0 is a subdivision of [0; T ], that is, 0 = t0 < t1 < < tn 1 < tn = T . At
each time t 2 [ti 1 ; ti ) the portfolio x (t) thus features ci units of the asset, the outcome of
trading at time ti 1 . Till time ti the portfolio does not change, so no trading is made. The
last trading occurs at tn 1 , so at T the position does not change.8
How do portfolio's gains/losses cumulate over time? This is a most basic bookkeeping
question that we need to answer to assess a portfolio's performance. To this end, de ne the
integral function Gx : [0; T ] ! R, called gains' process, by the Stieltjes' integral
Z t
Gx (t) = x (s) dp (s) (47.30)
0
where x is the integrand and p is the integrator. Since x is a step function, it is easy to see
that
8
>
> c1 (p (t) p (t0 )) if t 2 [t0 ; t1 )
< P
k 1
Gx (t) = i=1 ci (p (ti ) p (ti 1 )) + ck (p (t) p (tk 1 )) if t 2 [tk 1 ; tk ) ; k = 2; :::; n
>
> Pn
:
i=1 ci (p (ti ) p (ti 1 )) if t = T
The gains' process describes how portfolio's gains/losses cumulate over time, thus answering
the previous question. To x ideas, suppose that each ci is positive { i.e., x 0 { and
consider t 2 [t0 ; t1 ). Throughout all the time interval [t0 ; t1 ), the portfolio x features c1 units
of the asset. These units were traded at time 0 at a price p (0) and at time t their price is
p (t). The change in price is p (t) p (t0 ), so the portfolio's gains/losses up to time t are
At time t1 , our position changed from c1 to c2 and then remained constant throughout the
time interval [t1 ; t2 ). To obtain this new position, we could have for example sold c1 at time
8
For simplicity, we do not consider any dividend, so the cumulated gains/losses only come from trading
(\capital gains" in the nance jargon).
47.9. CODA: BEYOND MONOTONICITY 1355
t1 and bought simultaneously c2 or just directly acquired the di erence c2 c1 . If markets are
frictionless, these possible trading strategies are equivalent. So, let us focus on the former.
It yields that, up to time t 2 [t1 ; t2 ), the portfolio's cumulated gain is
Indeed, c1 (p (t1 ) p (t0 )) are the gains/losses matured in the period [0; t1 ] coming from
buying c1 units at 0 and selling them at time t1 , while c2 (p (t) p (t1 )) are the gains/losses
occurred between [t1 ; t), given by the new position c2 . By iterating this reasoning, the
Stieltjes' integral (47.30) follows immediately { indeed, (47.31) and (47.32) correspond to
t 2 [t0 ; t1 ) and t 2 [t1 ; t2 ) in such integral. In particular, if one operates in the market
throughout, from time 0 through time T , so to keep the long and short positions of portfolio
x, then one ends up with the gains/losses Gx (T ).
Finally, we can relax the assumption that portfolios are adjusted only nitely many times:
as long as functions x and p satisfy, for example, the hypotheses of Proposition 1939, the
gains' process de ned via the Stieltjes' integral (47.30) is well-de ned and can be interpreted
in terms of gains/losses. Also, as the next section will show, we do not need to assume that
p is increasing, which clearly is not a realistic assumption for prices.
where the supremum is taken over all subdivisions = fxi gni=0 2 of [a; b].
Intuitively, the total variation describes the variability of a function. Indeed, because of
the absolute value, here the ups and downs of the function add up. So, the lower is the total
variation, the lower is the cumulative magnitude of the variations that a function features
(for instance, the reader can check that a function has zero total variation if and only if it is
constant, so it has no variability).
The next de nition singles out the class of functions that have nite variability.
A function has bounded variation if its variability is nite, however large it can be.
Otherwise, we say that it is of unbounded variation.
A rst simple property:
Proof Let g : [a; b] ! R be of bounded variation. Given any x 2 (a; b), if we take the
subdivision fa; x; bg we have
max fjg (b) g (x)j ; jg (x) g (a)jg jg (b) g (x)j + jg (x) g (a)j tg
yielding that jg (b) g (x)j tg and jg (x) g (a)j tg . So, for all x 2 (a; b) we have
min fg (b) ; g (a)g tg g (x) max fg (b) ; g (a)g + tg , as desired. It is immediate to see
that this inequality holds for all x 2 [a; b], proving boundedness.
Proposition 1951 If f; g : [a; b] ! R are two functions of bounded variation, then also the
function f + g is of bounded variation for all ; 2 R.
Proof Let f; g : [a; b] ! R be of bounded variation and let ; 2 R. For each subdivision
2 of [a; b], we have
n
X
j( f + g) (xi ) ( f + g) (xi 1 )j
i=1
n
X
= j (f (xi ) f (xi 1 )) + (g (xi ) g (xi 1 ))j
i=1
n
X n
X
j j jf (xi ) f (xi 1 )j + j j jg (xi ) g (xi 1 )j
i=1 i=1
So,
n
X n
X
sup j( f + g) (xi ) ( f + g) (xi 1 )j j j sup jf (xi ) f (xi 1 )j
2 i=1 2 i=1
n
X
+ j j sup jg (xi ) g (xi 1 )j < +1
2 i=1
where the rst equality follows from the monotonicity of g because g (xi ) g (xi 1 ). The
next key result shows that monotonicity and bounded variation are closely connected, thus
clarifying the nature of functions of bounded variation.
9
So the space of the functions of bounded variation is an example of a vector space, as readers will study
in more advanced courses.
47.9. CODA: BEYOND MONOTONICITY 1357
g = g1 g2 (47.33)
In words, a function is of bounded variation if and only if it can be written as the di erence
of two increasing functions. In particular, (47.33) is called the Jordan decomposition of g.
Such decomposition is not unique: given any increasing function h : [a; b] ! R, we also have
g = (g1 + h) (g2 + h) (observe that g1 + h and g2 + h are increasing).
n
X n
X
+
p (x) = sup [g (xi ) g (xi 1 )] = sup max fg (xi ) g (xi 1 ) ; 0g
2 x i=1 2 x i=1
Xn Xn
n (x) = sup [g (xi ) g (xi 1 )] = sup min fg (xi ) g (xi 1 ) ; 0g
2 x i=1 2 x i=1
Xn
t (x) = sup jg (xi ) g (xi 1 )j
2 x i=1
where the supremum is taken over all subdivisions = fxi gni=0 2 x of [a; x], that is,
a = x0 < x1 < < xn = x. Since g is of bounded variation, we have t (x) 2 [0; 1)
for all x 2 [a; b]. From 0 p (x) t (x) and 0 n (x) t (x), it then follows that
p (x) ; n (x) 2 [0; 1) for all x 2 [a; b]. It is easy to see that p, n and t are increasing
functions.
In view of (44.13), we have that for each x 2 [a; b] for each 2 x
n
X n
X
+
[g (xi ) g (xi 1 )] [g (xi ) g (xi 1 )]
i=1 i=1
n
X n
X
= max fg (xi ) g (xi 1 ) ; 0g ( min fg (xi ) g (xi 1 ) ; 0g)
i=1 i=1
Xn
= [max fg (xi ) g (xi 1 ) ; 0g + min fg (xi ) g (xi 1 ) ; 0g]
i=1
Xn
= [g (xi ) g (xi 1 )] = g (x) g (a)
i=1
1358 CHAPTER 47. STIELTJES' INTEGRAL
That is,
g (x) = p (x) [n (x) g (a)] 8x 2 [a; b] (47.34)
Since the functions p : [a; b] ! R and n g (a) : [a; b] ! R are both increasing, we conclude
that g is the di erence of two increasing functions de ned on [a; b]. So, (47.34) is the sought-
after decomposition.
Assume that g is continuous. We show that t : [a; b] ! R is continuous. Let x 2 (a; b].
We rst show that t is continuous at x from the left, i.e., limx!x t (x) = t (x). Let xk " x.
Fix " > 0. By Theorem 603, g is uniformly continuous. So, there exists > 0 such that
"
x x0 < =) g (x) g x0 < 8x; x0 2 [a; b] (47.35)
2
By the de nition of t, there exists a subdivision 2 x of [a; x] such that
n
X "
t (x) jg (xi ) g (xi 1 )j
2
i=1
and x xn 1 < . Otherwise, if x xn 1 one can always add points to the subdivision
,
Pn something that in any case preserves the last inequality because it increases the term
i=1 jg (xi ) g (xi 1 )j. So, by (47.35) we have jg (x) g (xn 1 )j < "=2. In turn, this implies
n
X1
t (x) t (xn 1) t (x) jg (xi ) g (xi 1 )j " (47.36)
i=1
Since xk " x, there exists k" such that xn 1 xk x for all k k" . Since t is increasing,
from (47.36) we have
0 t (x) t (xk ) " 8k k"
This implies limx!x t (x) = t (x), as desired. A similar argument shows that limx!x+ t (x) =
t (x) for x 2 [a; b), that is, t is right continuous. So, limx!x t (x) = t (x) and we conclude that
t is continuous at x for all x 2 [a; b]. In turn, this implies that the functions p; n : [a; b] ! R
are both continuous. Indeed, from t = p + n and (47.34) it follows that
t (x) g (x) + g (a) t (x) + g (x) g (a)
n (x) = and p (x) =
2 2
If g is continuous, then the increasing functions p : [a; b] ! R and n g (a) : [a; b] ! R are
both continuous.
In view of Jordan's Theorem, functions of bounded variation inherit the following re-
markable continuity property of monotone functions (cf. Proposition 564).
47.9. CODA: BEYOND MONOTONICITY 1359
Corollary 1953 A function of bounded variation can have at most countably many jump
discontinuities.
As readers will learn in more advanced courses, monotone functions de ned on an interval
are di erentiable at \almost" every point. Jordan's Theorem allows us to conclude that also
functions of bounded variation are di erentiable at almost every point. Thus, we can say
that nowhere di erentiable functions, in particular Weierstrass' monsters (Section 26.15),
are examples of functions that are of unbounded variation. Indeed, they have graphs like
with frantic ups and downs that add up to +1. Moreover, this observation shows that
continuity is not su cient to guarantee bounded variations (Weierstrass' monsters are con-
tinuous functions). The result below though shows that a stronger form of continuity, namely
Lipschitz, is enough.
So,
n
X n
X
jg (xi ) g (xi 1 )j k (xi xi 1) = k (b a)
i=1 i=1
as desired.
We close with a \di erential" criterion of bounded variation (cf. Example 896), which
leads to an interesting integral characterization of total variation that sheds further light on
its nature of a variability measure.
1360 CHAPTER 47. STIELTJES' INTEGRAL
Proof Since g 0 is bounded, there exists a constant k > 0 such that jg 0 (x)j k for all
x 2 (a; b). Let x1 ; x2 2 [a; b]. Without loss of generality, assume that x1 x2 . If x1 = x2 ,
then trivially jg (x2 ) g (x1 )j k jx2 x1 j. By the Mean Value Theorem, if x1 < x2 , then
there exists x
^ 2 (x1 ; x2 ) such that
jg (x2 ) g (x1 )j
= g 0 (^
x) k
jx2 x1 j
Rb
Since jg 0 j is integrable, it follows that a jg 0 (x)j dx = inf 2 S (jg 0 j ; ), yielding that
n
X
0
inf S g ; < sup jg (xi ) g (xi 1 )j
2 2 i=1
47.9. CODA: BEYOND MONOTONICITY 1361
0 00
It follows that there exist two subdivisions 0 = fx0i gni=0 2 and 00 = fx00i gni=0 2 such
that
Xn0 Xn00
0 0 0
sup g (x) xi xi 1 < g x00i g x00i 1
i=1 x2 x 0
[ i 1 i]
;x 0
i=1
Example 1957 In Example 1372, we saw that the oscillating (so highly non-monotone)
function g : R ! R de ned by
( 2
x sin x1 x 6= 0
g (x) =
0 x=0
is di erentiable, with (
0
2x sin x1 cos x1 x 6= 0
g (x) =
0 x=0
By the last proposition, g is of bounded variation on [ 1; 1]. In contrast, the reader can
check that the function g : R ! R de ned by
(
x sin x1 x 6= 0
g (x) =
0 x=0
is not of bounded variation on [ 1; 1]. N
Rb
is denoted by a f (x)dg(x) and is called integral in the sense of Stieltjes (or Stieltjes' integral)
of f with respect to g on [a; b].
In the special case when g is increasing, we get back to the earlier de nition of Stielt-
jes' integral. More importantly, this de nition is well posed. Indeed, consider two Jordan
decompositions g = g1 g2 = g10 g20 . Then, g1 + g20 = g10 + g2 and so by (47.21) we have
Z b Z b Z b Z b Z b Z b
f dg1 + f dg20 = f d g1 + g20 = fd g10 + g2 = f dg10 + f dg2
a a a a a a
The value of the integral is thus independent of the speci c Jordan decomposition considered,
so the de nition is well posed.
Integrators of bounded variations substantially extend the scope of Stieltjes' integrals.
For example, in the gains' process (47.30) we can consider any price function p of bounded
variation, not necessarily increasing (a demanding assumption).
(i) f is continuous;
Proof (i) Since g is of bounded variation, there exist two increasing functions g1 ; g2 : [a; b] !
Rb Rb
R such that g = g1 g2 . By Proposition 1938, the integrals a f dg1 and a f dg2 exist. So,
Rb
the integral a f dg exists and is given by the di erence (47.38).
(ii) Since f and g are of bounded variation, there exist increasing functions f1 ; f2 ; g1 ; g2 :
[a; b] ! R such that f = f1 f2 and g = g1 g2 . In particular, since g is continuous,
Rweb can assume
Rb thatR both g1 and Rg2 are continuous. By Proposition 1938, theR integrals
b b b
f dg
1 1 , a 1 2 ,R a f2 dg1 R
f dg and a f2 dg2 exist. So, by (47.20) the integrals a f dg1 =
Rab b b R b R b R b Rb
a (f1 f2 ) dg1 = a f1 dg1 a f2 dg1 and a f dg2 = aR(f1 f2 ) dg2 = a f1 dg2
Rb Rb a f2 dg2
b
exist. In turn, this implies the existence of the integral a f dg = a f dg1 a f g2 .
A consequence of the last proposition is the following general integration by parts formula
that greatly extends the earlier formula (47.28).
Proposition 1960 Given any two continuous functions of bounded variation f; g : [a; b] !
R, it holds
Z b Z b
f dg + g df = f (b) g (b) f (a) g (a)
a a
47.9. CODA: BEYOND MONOTONICITY 1363
Rb Rb
Proof The integrals a f dg and a g df exist by the last proposition. Let f1 ; f2 ; g1 ; g2 :
Rb Rb
[a; b] ! R be as in the last proof. By Proposition 1938, the integrals a f1 dg1 , a f1 dg2 ,
Rb Rb
a f2 dg1 and a f2 dg2 exist. Then,
Z b Z b Z b Z b Z b Z b
f dg + g df = f1 dg1 f2 dg1 f1 dg2 + f2 dg2
a a a a a a
Z b Z b Z b Z b
+ g1 df1 g2 df1 g1 df2 + g2 df2
a a a a
Z b Z b Z b Z b
= f1 dg1 + g1 df1 f2 dg1 + g1 df2
a a a a
Z b Z b Z b Z b
f1 dg2 + g2 df1 + f2 dg2 + g2 df2
a a a a
= f1 (b) g1 (b) f1 (a) g1 (a) (f2 (b) g1 (b) f2 (a) g1 (a))
(f1 (b) g2 (b) f1 (a) g2 (a)) + f2 (b) g2 (b) f2 (a) g2 (a)
= f1 (b) g (b) f1 (a) g (a) f2 (b) g (b) f2 (a) g (a)
= f (b) g (b) f (a) g (a)
as desired.
Other results established for increasing integrators extend to bounded variation ones, as
readers can check. Here we close by noting that full- edged linearity holds for the general
Stieltjes integral with integrators of bounded variation: if g1 ; g2 : [a; b] ! R are functions of
bounded variation, then
Z b Z b Z b
f d( g1 + g2 ) = f dg1 + f dg2 8 ; 2R
a a a
In contrast, in property (ii) of Section 47.4 we remarked that for increasing integrators only
positive coe cients and were permitted.
In words, ner subdivisions feature higher variations. To see why this is the case, take the
unit interval and let
1 0 1 1 3
= 0; ; 1 and = 0; ; ; ; 1 (47.40)
2 4 2 4
Then
1 1 1 1 1
tg = g g (0) + g (1) g = g g +g g (0)
2 2 2 4 4
3 3 1
+ g (1) g +g g
4 4 2
1 1 1 3 3 1 0
g g + g g (0) + g (1) g + g g = tg
2 4 4 4 4 2
where ner subdivisions are identi ed via smaller meshes j j. Next we make rigorous this
notion of limit along subdivisions.
When the total variation is in nite, i.e., tg = +1, this limit characterization has a
natural version. In particular, we write limj j!0 tg = +1 when, for every M > 0, there
exists M > 0 such that
j j < M =) tg > M 8 2
The analogy with (12.9) is obvious.
Proof \Only if" We assume that L = tg and show that (47.42) holds.10 Fix " > 0. Since
sup 2 tg = tg = L 2 [0; 1), there exists a subdivision ~ = f~ xi gni=0
~
such that tg tg~ < "=2.
By (47.39), we have
"
tg tg < 8 ~ (47.43)
2
Since g is continuous on [a; b], by Theorem 603 it is also uniformly continuous on [a; b]. So,
there exists ~ > 0 such that jx yj < ~ implies
"
jf (x) f (y)j <
4~
n
10
The proof is based on Wheeden and Zygmund (2015) p. 22.
47.9. CODA: BEYOND MONOTONICITY 1365
n o
for all x; y 2 [a; b]. Let " = min ~; m and m = minfi:~xi 6=x~i 1g
j~
xi x
~i 1j > 0.
Let = fxi gni=0
2 be any subdivision such that j j < " . Since " m, each interval
(xi 1 ; xi ) can contain at most one element of the subdivision ~ . Denote by I the collection
of i 2 f1; :::; ng such that (xi 1 ; xi ) contains one element of ~ . Clearly, I has at most n
~
elements. This implies
n
X X X
tg = jf (xi ) f (xi 1 )j = jf (xi ) f (xi 1 )j + jf (xi ) f (xi 1 )j
i=1 i2I i2I
=
X X
(jf (xi ) f (~
xi )j + jf (~
xi ) f (xi 1 )j) + jf (xi ) f (xi 1 )j = tg [~
i2I i2I
=
where x
~i is the unique element of the subdivision ~ that belongs to (xi 1 ; xi ). We have
X " "
(jf (xi ) f (~
xi )j + jf (~
xi ) f (xi 1 )j) 2~
n =
4~
n 2
i2I
Let ^ be any subdivision such that j^ j < " . Consider 0 = ^ [ . It is immediate to see
0
that j 0 j < " and, by (47.39), that tg tg > L + 1 > L. This would imply that
0 0
1 < tg L= L tg <1
a contradiction, proving that tg is nite. We are left to show that indeed tg = L. Let " > 0.
Since tg is nite, there exists a subdivision ~ such that tg tg~ < "=2. Moreover, recall that
"
tg tg < 8 ~
2
Since (47.42) holds, let be any subdivision such that j j < 2" so that L tg < 2" . If we
consider 00 = [ ~ , it follows that 00 ~ and j 00 j < 2" , yielding that
00 00 " "
jtg Lj tg tg + tg L < + ="
2 2
1366 CHAPTER 47. STIELTJES' INTEGRAL
Since " > 0 was arbitrarily chosen, this implies that jtg Lj = 0, that is, tg = L.
N.B. In a similar spirit, we can formalize the suggestive limit (44.28) as follows: a function
Rb
f : [a; b] ! R is Riemann integrable, with a f (x) dx = I 2 R, if and only if for every " > 0
there exists " > 0 such that
n
X
j j< " =) f x0i xi I <"
i=1
for any chosen fx0i g, with x0i 2 [xi 1 ; xi ], and any subdivision 2 . In this case, we write
n
X Z b
lim f x0i xi = f (x) dx
j j!0 a
i=1
The limit (47.41) clari es that only arbitrarily ne subdivisions matter for total varia-
tion. For all scalars a we trivially have a2 jaj if jaj 1. If g is continuous on [a; b], so
uniformly continuous, over such arbitrarily ne subdivisions we have (g (xi ) g (xi 1 ))2
jg (xi ) g (xi 1 )j. One can then conjecture that, by assessing variations over subdivisions
via squares rather than via absolute values, one may get a smaller notion of variation.
All this motivates the following de nition.
De nition 1962 The second (total) variation of a function g : [a; b] ! R is the quantity
n
X 2
t2g = sup (g (xi ) g (xi 1 ))
2 i=1
where the supremum is taken over all subdivisions = fxi gni=0 2 of [a; b].
Variations are now described through squares rather than absolute values. Remarkably,
a continuous function of bounded variation has zero second variation.
Proof Assume that tg < +1. By Theorem 603, g is uniformly continuous on [a; b]. Fix
" > 0. There exists " > 0 such that jxi xi 1 j < " implies jg (xi ) g (xi 1 )j < ". Now,
0
x a subdivision = fxi gni=0 and take a ner subdivision 0 = fx0i gni=0 such that j 0 j < " .
Then
n
X n
X
2
(g (xi ) g (xi 1 )) = jg (xi ) g (xi 1 )j jg (xi ) g (xi 1 )j
i=1 i=1
n 0
X
g x0i g x0i 1 g x0i g x0i 1
i=1
Xn
" g x0i g x0i 1 "tg
i=1
47.9. CODA: BEYOND MONOTONICITY 1367
In view of this result, a function can have either nite variability or in nite variability and
nite volatility or both in nite variability and volatility. For two functions f; g : [a; b] ! R,
we thus have the following mutually exclusive comparisons:
(i) f exhibits less variability than g if tf tg < +1;
(ii) f exhibits less volatility than g if tf = tg = +1 and t2f t2g < +1.
That said, let
n
X 2
tg ;2 = (g (xi ) g (xi 1 ))
i=1
;2
so that t2g= sup 2 tg . Remarkably, the monotonicity property (47.39) no longer holds
for the second variation. Indeed, take for instance the subdivisions and 0 in (47.40) and
0
consider the identity function g (x) = x; we have tg ;2 = 1=2 > tg ;2 = 1=4.
The failure of the monotonicity property is a major di erence between total variation
and second variation. In particular, it means that the limit limj j!0 t2;
g is trickier to handle
and its relations with t2g are less clear. That said, we close with a notion of second-order
variation de ned directly through limits. To this end, say that a sequence of subdivisions
k is tightly nested if k k+1 for all k 1 and limk!+1 j k j = 0. So, subdivisions in a
tightly nested sequence are nested one into another { i.e., subdivision k+1 is obtained from
subdivision k by adding one or more points { and their meshes vanish.
De nition 1964 The quadratic variation of a function g : [a; b] ! R is the quantity
nk
X 2
s2g = lim (g (xi ) g (xi 1 ))
k!+1
i=1
k nk
where the limit is taken along a tightly nested sequence of subdivisions = xki i=0
2 of
[a; b].
For the quadratic variation to be well-de ned, the limit has to be independent of the
speci c tightly nested sequence considered. In this case, we clearly have s2g t2g . A function
might thus have nite quadratic variation and yet in nite second variation. Indeed, in
probability theory this notion is used { for instance, in dealing with Brownian phenomena {
as readers will learn in more advanced courses.
For two functions f; g : [a; b] ! R, we now have three mutually exclusive comparisons:
(i) f exhibits less variability than g if tf tg < +1;
(ii) f exhibits less volatility than g if tf = tg = +1 and t2f t2g < +1;
(iii) f exhibits less \quadratic" volatility than g if t2f = t2g = +1 and s2f s2g < +1.
1368 CHAPTER 47. STIELTJES' INTEGRAL
Chapter 48
48.1 Measures
Let 2 be the power set of a set , that is, the collection
2 = fA : A g
of all its subsets, typically denoted by A and B. When is nite, with cardinality j j, the
cardinality of the power set 2 is 2j j .1
Therefore, set functions are functions with domain the power set 2 and codomain the
real line.2
Example 1966 (i) Let = f! 1 ; ! 2 ; ! 3 g be a set with three elements. Its power set
2 = f;; f! 1 g ; f! 2 g ; f! 3 g ; f! 1 ; ! 2 g ; f! 1 ; ! 3 g ; f! 2 ; ! 3 g ; g
(ii) Let be the set of all citizens of a country. A subset A of represents a group of
citizens. A basic set function is the counting measure : 2 ! R that associates to each
group of citizens the number of its members, i.e.,
(A) = jAj 8A
Assume that, say using data from a population census, we can construct the function :
! R that associates the age (!) to each citizen !. The set function : 2 ! R de ned
1
See Proposition 280. If we write in extenso = f! 1 ; :::; ! n g, the cardinality of 2 is 2n .
2
In the notation of De nition 177, we have A = 2 and B = R.
1369
1370 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
by 8 X
< 1
> (!) if A 6= ;
(A) = jAj !2A
>
: 0 if A = ;
indicates the average age of each group A of citizens.3 For example, if A is the subset of the
female citizens, (A) is their average age within the country. On the other hand, ( ) is
the average age within the country. N
(v) normalized if ( ) = 1.
Example 1968 (i) The counting measure is readily seen to satisfy all these properties
except the last one, that is, it is grounded, positive, monotone and additive, but not nor-
malized.
(ii) The average age set function is grounded and positive, but does not satisfy the
other properties. Intuitively, is not monotone because, by enlarging a group, the average
age can either increase or decrease: for instance, the average age of a group of undergraduate
students increases (decreases) if seniors (toddlers) join them. As to additivity, for each pair
of nonempty disjoint subsets A and B we have only the subadditive property (A [ B) <
(A) + (B). Indeed,
X X X
(!) (!) + (!)
!2A[B !2A !2B
(A [ B) = =
jA [ Bj jA [ Bj
X X
(!) (!)
!2A !2B
= +
jA [ Bj jA [ Bj
X X
jAj (!) jBj (!)
!2A !2B
= +
jA [ Bj jAj jA [ Bj jBj
X X
(!) (!)
!2A !2B
< + = (A) + (B)
jAj jBj
(iii) Let be the set of all taxpayers of a country. Now, let : ! [0; 1) be the
function that indicates, for each taxpayer !, the amount of taxes paid (!). The set function
: 2 ! [0; 1) de ned by X
(A) = (!)
!2A
records the total amount of taxes paid by a group A of taxpayers. It is easy to see that
the taxpayer set function is, like the counting measure, grounded, positive, monotone and
additive, but not normalized. Its normalized version is given by the set function : 2 !
[0; 1) de ned by = = ( ), i.e., by
1 X
(A) = (!) 8A
( )
!2A
Since ( ) is the total amount of taxes collected in the country, (A) is the proportion of
taxes paid by a group A of taxpayers. For example, if A consists of the taxpayers working in
the industrial sector and (A) = 1=4, it means that these taxpayers bear 25% of the overall
tax burden. The set function satis es all the properties (i)-(v). N
De nition 1969 A grounded, positive and additive set function is called ( positive) mea-
sure.
Proposition 1971 Let : 2 ! [0; 1) be a measure. For every nite collection fA1 ; :::; An g
of pairwise disjoint subsets of ,4 it holds
n
! n
[ X
Ai = (Ai ) (48.1)
i=1 i=1
4
That is, Ai \ Aj = ; for all distinct indices i; j 2 f1; :::; ng.
1372 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
This property is called nite additivity. It generalizes additivity, which is the special case
n = 2.
Proof We proceed by induction. Initial step: for n = 2 equality (48.1) holds by the additivity
of the measure . Induction step: suppose that this equality holds for n 1 (induction
hypothesis). We want to show that it holds for n. Consider a collection of pairwise disjoint
[n 1
events fA1 ; :::; An g. Set A = Ai . It holds Ai \ An = ; for all i = 1; :::; n 1. Thus,
i=1
n[1
! n[1
A \ An = Ai \ An = (Ai \ An ) = ;
i=1 i=1
[n 1 Pn 1
By the induction hypothesis, Ai = i=1 (Ai ). Since is additive, we conclude
i=1
that
n
! n[1
! !
[
Ai = Ai [ An = (A [ An ) = (A) + (An )
i=1 i=1
n[1
! n
X1 n
X
= Ai + (An ) = (Ai ) + (An ) = (Ai )
i=1 i=1 i=1
as desired.
Thus, the measure of a nite set is uniquely pinned down by the measures of its elements.
Indeed, the singletons f! i g are pairwise disjoint and so, by (48.1),
n
! n
[ X
(A) = (f! 1 ; :::; ! n g) = f! i g = (! i )
i=1 i=1
without using indices. In particular, when the space itself is nite we can write
X
( )= (!)
!2
jA [ Bj + jA \ Bj = jAj + jBj
A = (A B) [ (A \ B) and (A B) \ (A \ B) = ;
(A B) = (A) (A \ B)
(B A) = (B) (A \ B)
A [ B = (A B) [ (A \ B) [ (B A)
(A [ B) = (A B) + (A \ B) + (B A)
= (A) (A \ B) + (A \ B) + (B) (A \ B)
= (A) + (B) (A \ B)
as desired.
48.2 Probabilities
48.2.1 Generalities
The most important class of measures : 2 ! [0; 1) are the normalized ones, i.e., those
for which ( ) = 1. They play a fundamental role in the study of uncertainty, as their name
suggests.
We can write P : 2 ! [0; 1] because a probability takes values only in [0; 1]. Indeed, by
monotonicity (Proposition 1970) we have, for each A ,
Example 1974 (i) An agent bets on the outcome of the toss of a single coin, winning
(losing) if the coin lands heads (tails) up. There are two states, Head and Tail, so the state
space is
= fH; T g
Its power set
2 = f;; fHg ; fT g ; g
consists of 22 = 4 events. When, instead, the bet depends on the toss of two coins, the state
space becomes the Cartesian product:7
Its power set now consists of 24 = 16 events. For instance, the event A = fHH; HT g obtains
when \the rst toss is heads", while the event B = fHT; T Hg obtains when \the two tosses
have di erent outcomes".
6
In this chapter we use the terms \agent" and \decision maker" interchangeably.
7
To ease notation, we write HH instead of (H; T ) and so on.
48.2. PROBABILITIES 1375
(ii) Now, our agent bets on the outcome of the roll of a die. There are six states, one per
die's face, numbered 1 to 6. The state space is
= f1; :::; 6g
Its power set consists of 26 = 64 events. For example, the event A = f2; 4; 6g obtains when
\an even face comes out".
(iii) Finally, our agent bets on the drawing of a ball from an urn containing 100 balls,
numbered from 1 to 100. In this case, the state space is
= f1; :::; 100g
with a power set with 2100 events (sic!). For instance, event A = f1; 2; 3; 4; 5g obtains when
\a ball is drawn with a number 5". N
Next we present a few classic probabilities.
Example 1975 (i) The simplest example of a probability on a nite state space , like
the ones just seen in the last example, is the uniform probability P that assigns the same
probability to all states, i.e.,
1
P (!) = 8! 2
j j
In the single coin state space = fH; T g, the uniform P assigns equal probability to heads
and tails, i.e.,
1
P (H) = P (T ) =
2
It models the toss of a fair coin. In the two-coin state space = fT H; T T; HT; HHg, fair
coins deliver the uniform probability P de ned by
1
P (T H) = P (T T ) = P (HT ) = P (HH) =
4
Similarly, the uniform probability P on the roll of a die state space = f1; :::; 6g is de ned
by
1
P (1) = P (2) = P (3) = P (4) = P (5) = P (6) =
6
It models an unbiased die. Finally, the uniform probability P on the urn state space =
f1; :::; 100g is de ned by
1
P (n) = 8n 2
100
It models a blind drawing.
Example 1976 Fix a state ! 0 in any state space , nite or in nite. The set function
P : 2 ! R de ned by (
1 if ! 0 2 A
P (A) =
0 if ! 0 2
=A
is easily checked to be a probability. It assigns probability 1 to any event containing state
! 0 and probability 0 otherwise. It is denoted by
!0
Example 1977 Take = N = f0; 1; :::; n; :::g, i.e., the states are the natural numbers. Fix
a scalar > 0 and de ne the scalar sequence:8
n
pn = e 8n 2 N (48.5)
n!
For each event A N de ne the sequence fan g by
(
1 n2A
an = (48.6)
0 n2 =A
or, equivalently, by X
P (A) = pn
n2A
for all A N. For a singleton A = fng, we get
P (n) = pn
Thus, pn is the probability of state n. The Poisson probability is well de ned because the
sandwich
0 an pn pn 8n 2 N
P1
implies that the positive series n=0 an pn converges to a number in [0; 1] (why?). When
A = ; we trivially have an = 0 for all n 2 N and so
1
X
P (;) = an pn = 0
n=0
When A = N we, instead, have an = 1 for all n 2 N and so, by Theorem 399,
1
X 1
X 1
X n
P( )= an pn = pn = e =1
n!
n=0 n=0 n=0
To prove that P is a probability it remains to check additivity. Take two disjoint events A
and B in N: As in (48.6), de ne the sequence fan g for the event A. In a similar way, de ne
the sequence fbn g for the event B and the sequence fcn g for the event A [ B. Since A and
B are disjoint, it is easy to see that cn = an + bn for all n 2 N. In turn, this implies that
1
X 1
X 1
X
P (A [ B) = cn pn = an pn + bn pn = P (A) + P (B)
n=0 n=0 n=0
proving additivity. When the Poisson probability is used, = N is often interpreted as time.
For example, state n may describe the state \a light bulb breaks after n periods". N
8
These are the coe cients of the Poisson power series (Example 474).
48.2. PROBABILITIES 1377
Probabilities over the naturals thus involve series. The Poisson probability suggests a
general rule to construct these probabilities.
The arguments used in the Poisson case of last example yield that P is a probability. N
Example 1979 The average taxpayer set function : 2 ! R is, formally, a probability
measure. Of course, in this case the uncertainty interpretation is meaningless. As always,
it is important to distinguish interpretation and formal analysis (which might well admit
alternative interpretations). N
A B =) P (A) P (B)
As natural, larger events are more likely to obtain. Additivity readily implies that, for each
event A,
P (Ac ) = 1 P (A) (48.8)
Thus, either an event or its complement obtains, tertium non datur.
Finite additivity also holds for probabilities, i.e.,
n
! n
[ X
P Ai = P (Ai ) (48.9)
i=1 i=1
for any collection fA1 ; :::; An g of pairwise disjoint events (Proposition 1971). In particular,
for nite events we have, by (48.3),
X
P (A) = P (!) (48.10)
!2A
That is, the probability of a nite event is just the sum of the probabilities of its states. For
instance, in the last example for the two-coin event A = fHH; HT g we have
That is, the probability that the rst toss is heads is the sum of the probabilities of the
states HH, two consecutive heads, and HT , rst heads and then tails. When the coins are
fair this event has probability 1=2 since
1 1 1
P (A) = + =
4 4 2
Clearly, in this two-coin example we have
1 = P ( ) = P (T H) + P (T T ) + P (HT ) + P (HH)
that is, at least one of the four states obtains. In general, when the state space is nite
we have X
1=P( )= P (!) (48.11)
!2
De nition 1980 A probability measure P : 2 ! [0; 1] is simple if there exists a nite event
E such that P (E) = 1.
Probabilities de ned on a nite state space are trivially simple. This notion gets traction
when the state is in nite. In this case, it requires that all the mass be concentrated on a
nite set of states E. Indeed, by (48.8) we have
So, event E gets all the mass, nothing is left in its complement E c .
Example 1981 (i) Dirac probabilities are simple: the set E can be chosen to be the sin-
gleton f! 0 g. (ii) Take = R and let P : 2R ! R be the probability with P ( 2) = 1=3 and
P ( ) = 2=3, that is, for each event A,
8
>
> 1 2; 2 A
>
>
>
< 1
3 2 2 A and 62 A
P (A) = (48.13)
>
> 2
2 A and 2 2
6 A
> 3
>
>
:
0 2; 2=A
Proof Let ! 2
= E. Then, f!g E c and so, by (48.12) and the monotonicity of P ,
0 P (!) P (E c ) = 0
Thus, P (!) = 0.
The nite event E is not unique. Indeed, once such an event is found, any larger nite
event can play the same role: if F E then 1 P (F ) P (E) = 1 and so P (F ) = 1. Yet,
there is a smallest one. To this end, consider the set
f! 2 : P (!) > 0g
called the support of P and denoted by supp P . It consists of all states that the probability
P actually deems possible. For instance, the simple probability (48.13) we have supp P =
f 2; g.
Lemma 1983 The support of a simple probability P : 2 ! [0; 1] is a nite event with
probability 1, that is,
P (supp P ) = 1
Moreover, P (A) = 1 implies supp P A for all events A.
Proof Since P is simple, by de nition there exists a nite event E with P (E) = 1. By
Lemma 1982, supp P E. Thus, supp P is a nite event. Since P is additive, we have
This proves that P (supp P ) = 1. To conclude, let A be any event with P (A) = 1. We
want to show that supp P A. Suppose, per contra, that there exists ! 2 supp P such that
!2= A. As P (!) > 0, by the additivity of P we have
Proposition 1984 Let P : 2 ! [0; 1] be a simple probability measure. For each event A,
X
P (A) = P (!) (48.14)
!2A\supp P
In words, the probability of an event is the sum of the probabilities of its states that
belong to the probability support.
0 = P ((supp P )c ) P (A \ (supp P )c ) 0
as desired.
Corollary 1985 Let P : 2 ! [0; 1] be a simple probability measure. For each event A,
X
P (A) = P (!) ! (A) (48.16)
!2supp P
In words, P is the weighted sum of the Dirac probabilities centered at the points of its
support, with weights given by the probabilities of these points. For instance, the simple
probability (48.13) can be written as the sum
1 2
P = 2 +
3 3
of the two Dirac probabilities centered at the two points 2 and of its support.
48.2. PROBABILITIES 1381
for any nite collection fAi gni=1 of pairwise disjoint events. The property of -additivity
extends this property to countable collections of this kind.
1
An = [0; 1 )
n
and A = [0; 1), we have An " A. Finally, if An = ( n; n) we have An " R, while if
An = [n; +1) we have An # ;. N
The next proposition shows that countable additivity is, as previously mentioned, a prop-
erty of continuity for probabilities. To this end, observe that the monotonicity of probabilities
implies
An " A =) P (A1 ) P (An ) 1
as well as
An # A =) P (A1 ) P (An ) 0
In both cases, limn P (An ) exists because it is the limit of a bounded monotone sequence of
scalars fP (An )g. What characterizes countable additivity is that the value of these limits
is indeed P (A), so its continuity nature.
1382 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
Proposition 1988 Let P : 2 ! [0; 1] be a probability. The following statements are equiv-
alent:
(i) P is countably additive;
(ii) if An " A, then P (An ) " P (A);
(iii) if An # A, then P (An ) # P (A).
Proof (i) implies (ii). Consider a countable collection of events fAn g with An " A. De ne
the collection of events fEn g by setting
n[1
E1 = A1 ; E2 = A2 A1 ; : : : ; En = An Ai ; : : :
i=1
[n
By construction, the events fEn g are pairwise disjoint. Next, note that An = Ei for
[1 i=1
all n 1 and, in particular, A = Ei . Since P is countable additive, we conclude that
i=1
1
! 1 n
[ X X
P (A) = P Ei = P (Ei ) = lim P (Ei ) = lim P (An )
n n
i=1 i=1 i=1
as desired.
(ii) implies (iii). Consider a countable collection of events fAn g with An # A. Clearly,
Acn " Ac . By hypothesis,
P (A) = 1 P (Ac ) = 1 lim P (Acn ) = lim [1 P (Acn )] = lim P (An )
n n n
as desired.
(iii) implies (i). Consider a countable collection fAi g of pairwise disjoint events. De ne
n
[
En = Ai
i=1
1 1
!c
[ [
for all n 1. By construction, En " Ai and so Enc # Ai . By hypothesis,
i=1 i=1
1
!c !
[
P (Enc ) #P Ai
i=1
Thus,
1
! 1
!c !
[ [
P Ai = 1 P Ai =1 lim P (Enc ) = lim [1 P (Enc )] = lim P (En )
n n n
i=1 i=1
n
! n 1
[ X X
= lim P Ai = lim P (Ai ) = P (Ai )
n n
i=1 i=1 i=1
as desired.
By (48.16), X
P (A) = P (!) ! (A) 8A
!2supp P
We next show that also the Poisson probability is countably additive. Similar arguments
will then yield that, more generally, all the probabilities on naturals de ned in Example 1978
are countably additive. To prove this property we rst further elaborate on the continuity
conditions that we showed to characterize countable additivity. Interestingly, it is enough to
check continuity at either the empty set or the entire space.
Lemma 1990 Let P : 2 ! [0; 1] be a probability. The following statements are equivalent:
Proof By Proposition 1988, (ii) implies (iii). As (iii) trivially implies its special case (iv),
it remains to prove that (i) implies (ii) and that (iv) implies (i).
(i) implies (ii). Consider a countable collection of events fAn g with An " A. For each
n 1, de ne
Bn = An [ Ac
1384 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
(iv) implies (i). Consider a countable collection of events fAn g with An " . Clearly,
Acn # ;. By hypothesis, P (Acn ) # 0. This implies that
as desired.
n= max nl
l2f1;:::;k" g
for all n n. By the de nition of the Poisson probability, we conclude that, for each n n,
k"
X 1
X 1
X
P (An ) = 1 P (Acn ) 1 P (f1; :::; k" g) = 1 pk = pk pk "
k=1 k=k" +1 k=k"
Hence, 0 P (An ) " for all n n. As " was arbitrarily chosen, this proves that
= 0, as desired.
Proof Suppose, per contra, that there exists a countable additive probability P : 2N ! [0; 1]
satisfying (48.18). Set k = P (n) for all n 2 N. Clearly, k 0. As
[
N= fng
n2N
Example 1994 (i) Back to Example 1974, consider an agent who bets on the outcome of
the toss of a single coin, winning (losing) 50 euros if the coin lands heads (tails) up. The
function f : fH; T g ! R de ned by
f (H) = 50 ; f (T ) = 50 (48.19)
!2 = [0; 1)
1386 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
this future price, which is the payo -relevant contingency. The nancial operation is repre-
sented by the random variable f : [0; 1) ! R de ned by
f (!) = max f! 50; 0g p 8! 2 [0; 1)
Indeed, if ! < 50 the investor will not exercise the option and just bear its cost p. If ! 50,
the investor will instead exercise the option, with a gain of ! 50 p euros because, one
year from now, the unit of the asset is paid at the agreed strike price of 50 euros and sold at
its market price !. N
Example 1995 The plant of a manufacturing company is subject to a failure that can be
either small or large. To repair a small (large) failure costs 100 (1000) euros and takes one
( ve) days of production interruption, with a pro t loss of 300 euros per day. The state
space is
= fs; l; ng
where n covers the happy case of no failure. The random variable f : ! R de ned by
8 8
>
> 100 + 300 if ! = s >
> 400 if ! = s
< <
f (!) = 1000 + 5 300 if ! = l = 2500 if ! = l
>
> >
>
: :
0 if ! = n 0 if ! = n
represents the uncertain loss of the company. This random variable may be then used in a
decision problem, for instance in the choice of either an insurance or a maintenance contract.
Yet, when the random variable is constructed this possible decision problem is not speci ed.
N
Let us call the pair ( ; P ) a probability space, i.e., a space endowed with a probability
measure. In this probabilistic context, when can we declare two random variables to be in-
distinguishable? Of course, if they are equal at all states, they are trivially indistinguishable.
But, this trivial case neglects the probabilistic information that P embodies. To use it, let
us consider the following notion.
De nition 1996 Two random variables f; g : ! R are equal P -almost everywhere (for
short, P -a.e.) when
P (! 2 : f (!) = g (!)) = 1
or, more compactly, when P (f = g) = 1.
In words, two random variables are equal P -a.e. when the set of all states where they
are equal has probability 1. Equivalently, when
P (! 2 : f (!) 6= g (!)) = 0
that is, when the set of states where they di er has probability zero. In this case, we regard
them as probabilistically indistinguishable, thus answering the previous question. As usual,
simple probabilities clarify.
48.4. EXPECTED VALUES I 1387
Proof \If". Let f (!) = g (!) for all ! 2 supp P . Then, supp P (f = g) and so, by the
monotonicity of P , we have P (f = g) = 1. \Only if". Suppose P (f = g) = 1. By Lemma
1983, supp P (f = g). Hence, f (!) = g (!) for all ! 2 .
In the simple case, two random variables are thus equal P -a.e. when they agree on the
support of the probability P , so on the states that P deems possible. What happens at
the zero probability states is not relevant: according to P they will not occur and so the
behavior of the random variables at these states is of no probabilistic concern.
De nition 1998 The expected (or mean) value of a random variable f : ! R with
respect to a simple probability P is the quantity
X
EP (f ) = f (!)P (!)
!2supp P
The expected value considers the images f (!) of the states in the support of P and
add them up, weighted according to their probability P (!). It is thus a general notion of
weighted average (cf. Section 15.10).
Example 1999 (i) Back to Example 1981, let P : 2R ! [0; 1] be the simple probability
with P ( 2) = 1=3 and P ( ) = 2=3. The expected value of a random variable f : R ! R is
f ( 2) + 2f ( )
EP (f ) = f ( 2) P ( 2) + f ( ) P ( ) =
3
(ii) In the single coin toss example, assume that P deems equally likely heads and tails.
The expected value of bet (48.19) is
1 1
EP (f ) = f (H)P (H) + f (T )P (T ) = 50 + ( 50) = 0
2 2
Bets with a zero expected value are called fair.
(iii) In the plant example, take P with P (s) = P (l) = 1=4 and P (n) = 1=2. Then,
1 1
EP (f ) = f (s)P (s) + f (l)P (l) + f (n)P (n) = 400 + 2500 = 725
4 4
is the average loss of the company. N
We begin with a key invariance property of expected values, an immediate but important
consequence of Proposition 1997.
Thus, random variables that are probabilistically indistinguishable have the same ex-
pected value. Next we discuss a few other basic properties of expected values. In particular,
they are monotone and linear.
(ii) EP (f ) EP (g) if f g;
P
(iii) EP (f ) = !2A f (!)P (!) if event A is nite and contains supp P .
(ii) Let f g, i.e., f (!) g (!) for all ! 2 . As P 0, we then have f (!) P (!)
g (!) P (!) for all ! 2 . In turn, this implies
X X
f (!)P (!) g (!) P (!)
!2supp P !2supp P
(iii) Let A be a nite event with A supp P . By Lemma 1982, P (!) = 0 for all ! 2 A
with ! 2= supp P . Thus, X X
f (!)P (!) = f (!)P (!)
!2supp P !2A
as desired.
The states ! that belong to A but not to supp P have zero probability, so they are super uous
but do not create problems either. This simple remark is important when dealing with a
nite state space . In this case one often writes
X
EP (f ) = f (!) P (!)
!2
without bothering
Pto specify the support of P . We can a ord this neglect because when
is nite the sum is always well de ned.
48.5. EUCLIDEAN TWIST 1389
Pi = P (! i )
1 1 1 1
P = ; ; ; 2 3
4 4 6 3
1 1 1
P (! 1 ) = P (! 2 ) = ; P (! 3 ) = ; P (! 4 ) =
4 6 3
The probability of the other twelve events can then be found via formula (48.10). N
Example 2003 In the single coin toss example, assume that the coin is fair. The two
states H and T are then equally likely, that is, P (H) = P (T ) = 1=2. The probability P is
identi ed by the vector
1 1
; 2 1
2 2
In a similar vein, in the example of a roll of a die assume that each face is equally likely,
that is, P (!) = 1=6 for all ! 2 = f1; :::; 6g. In this case, the probability P is identi ed by
the vector
1 1 1 1 1 1
; ; ; ; ; 2 5
6 6 6 6 6 6
. N
9 Pn
Recall that n 1 = x 2 Rn
+ : i=1 xi = 1 is the standard simplex of Rn (see Example 774).
1390 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
In sum, the standard simplex n 1 can be seen as the collection of all possible probabili-
ties de ned on a nite state space with n elements. Similarly, random variables f : ! R
de ned on this state space can be identi ed with vectors of Rn . Indeed, for each index
i = 1; :::; n now set
fi = f (! i ) (48.21)
We can then identify f with the vector
f = (f1 ; :::; fn ) 2 Rn
Thus, the space Rn itself can be seen as the collection of all random variables that can be
de ned on a nite state space with n elements. We can then write the expected value as
the inner product
Xn
EP (f ) = f P = fi Pi (48.22)
i=1
of vectors P 2 n 1 and f 2 Rn .
2 1 3 15 25
P = ; ; ; ; 2 4
10 10 10 100 100
f = (5; 10; 8; 6; 4) 2 R5
identi es a random variable on , for instance representing a nancial asset that pays 5
euros if ! 1 occurs, 10 euros if ! 2 occurs, 8 euros if ! 3 occurs, 6 euros if ! 4 occurs and 4
euros if ! 5 occurs. The expected value
2 1 3 15 25 63
EP (f ) = (5; 10; 8; 6; 4) ; ; ; ; =
10 10 10 100 100 10
1 1
P = ; :::; 2 n 1
n n
identi es the uniform probability P on . The expected value of any random variable
f = (f1 ; :::; fn ) 2 Rn
N
48.6. MEASURES OF VARIABILITY 1391
As previously mentioned (cf. Example 1994), nancial assets are a notable example of
random variables f : ! R, where f (!) is the payment of the asset when state ! occurs.
With a nite state space = f! 1 ; : : : ; ! n g, an asset can be indicated with the vector
y = (y1 ; :::; yn ) 2 Rn
where
yi = f (! i )
is the payment of the asset when state ! i 2 occurs. Thus, y is the vector of the possible
payments of the asset in the di erent states. This notation, used for example in Section 24.6,
is completely consistent with considering assets as random variables.
Example 2005 In the single coin toss example, consider the original bet f given by (48.19)
as well as other two bets g and h with
g (H) = 25 ; g(T ) = 25 and h(H) = h (T ) = 0
When P (H) = P (T ) = 1=2, the three bets have a zero expected value. But, of course, the
rst random variable seems more variable than the second one, which in turn is, obviously,
more variable than the third, constant, one. N
The variance helps to quantify the variability that distinguishes the three random vari-
ables of this example. In particular, at a state ! the quantity
(f (!) EP (f ))2 (48.23)
measures the deviation, in that state, of f from its expected value EP (f ). Since a priori
we do not mind about the speci c direction of the deviation, we square the quantity to
remove the sign of the deviation. Of course, at di erent states ! we might well have di erent
deviations (48.23). Thus, to get a representative measure of variability we have to average
out these deviations through the probability P .
(i) VP (f ) = EP (f 2 ) EP (f )2 ;
The characterization of the variance in (i) is often useful. Point (ii) shows that all
translates f + of a random variable share the same variance, while its multiples f get the
coe cient squared.
VP (f ) = EP (f EP (f ))2 = EP f 2 2EP (f ) f + EP (f )
= EP f 2 2EP (f ) EP (f ) + EP (f )2 = EP f 2 EP (f )2
as desired.
( f (!) + EP ( f + ))2 = 2
(f (!) EP (f ))2
By (48.24), we have
VP ( f + ) = EP ( f + EP ( f + ))2 = EP 2
(f EP (f ))2
= 2
EP (f EP (f ))2 = 2
VP (f )
as desired.
Another measure of variability, strictly related to the variance, is the standard deviation
p
P (f ) = VP (f )
It is nothing but the square root of the variance. It has the advantage over the variance to
be expressed in the same units of the expected value: for instance, if the outcomes f (!) of
a random variable in the di erent states ! are expressed in euros, this is the case also for
both the expected value and the standard deviation. Note that
p
2 V (f ) = j j
P ( f + )= P P (f )
in the opposite direction or independently from one another. To this end, observe that at a
state ! the inequality
(f (!) EP (f )) (g (!) EP (g)) > 0
reveals that
f (!) EP (f ) > 0 () g (!) EP (g) > 0
as well as
f (!) EP (f ) < 0 () g (!) EP (g) < 0
So f and g are, in state !, moving in the same direction. As we did for the variance, to get a
representative measure of co-variability we need to average the values (f (!) EP (f )) (g (!) EP (g))
through the probability P .
We can interpret the inequality CovP (f; g) > 0 as saying that on average f and g move
together, and the opposite inequality CovP (f; g) < 0 as saying that, instead, on average f
and g move in opposite directions. Accordingly, we can interpret CovP (f; g) = 0 as saying
that on average f and g are moving independently.
The variance can be seen to be the special case of a \solo" covariance:
The next result extends to covariance the properties established for variance (cf. Proposition
2007).
as desired.
( f (!) + EP ( f + )) ( g (!) + EP ( g + ))
= (f (!) EP (f )) (g (!) EP (g))
CovP ( f + ; g + ) = EP (( f + EP ( f + )) ( g + EP ( g + )))
= EP ( (f EP (f )) (g EP (g))) = CovP (f; g)
as desired.
The covariance allows us to express the variance of a sum of random variables. At rst
sight, we might be tempted to say that the variability of f + g should be the variability of f
plus the variability of g. Yet, even at an intuitive level, we may immediately realize that the
volatility of the sum f + g may actually reduce when f and g move in opposite directions,
that is, when CovP (f; g) < 0. The next formula formalizes this intuition.
as desired.
Proof Let supp P = f! 1 ; :::; ! n g be the support of the simple probability P . First, assume
that EP (f ) = EP (g) = 0. For each i = 1; ::; n, set
p p
xi = f (! i ) P (! i ) and yi = g (! i ) P (! i )
48.6. MEASURES OF VARIABILITY 1395
X n
X
jCovP (f; g)j = f (!) g (!) P (!) = xi yi = jx yj kxk kyk
!2supp P i=1
v v v v
u n u n u n u n
uX p 2 uX p uX uX
= t (xi pi ) t (yi pi ) = t
2
xi pi t
2 yi2 pi
i=1 i=1 i=1 i=1
s X s X
= f 2 (!) P (!) g 2 (!) P (!) = P (f ) P (g)
!2supp P !2supp P
f~ = f EP (f ) and g~ = g EP (g)
As EP (f~) = EP (~
g ) = 0, by what just proved we have
By (ii) of Propositions 2007 and 2009, P (f~) = (f ), P (~ g ) = P (g) and CovP (f~; g~) =
CovP (f; g). We conclude that (48.26) holds for all random variables f; g : ! R.
P ( f+ ; g+ )= P (f; g) (48.27)
j jj j
for all ; ; ; 2 R. Thus, when ; > 0 we have
P ( f+ ; g+ )= P (f; g)
In words, the correlation coe cient is invariant under positive a ne transformations.10 This
implies, inter alia, that it does not depend on the units in which the random variables are
expressed: for instance, if f and g are nancial assets, their correlation coe cient is the
same regardless of the currency in which they are expressed, say dollars or euros.
By the inequality (48.26), it holds
1 P (f; g) 1
i.e., the correlation coe cient takes values between 1 and 1. The extreme values 1
correspond to a perfect correlation between the random variables that, intuitively, occurs
when there is a linear relationship between them. The next result con rms this intuition.
10
A transformation ' : R ! R is a ne if it is an a ne function ' (x) = x + , with ; 2 R. Clearly,
' (f ) = f + . It is a positive a ne transformation when > 0 (in this case ' is strictly increasing).
1396 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
j P (f; g)j = 1
f = g+ (48.28)
In particular,
Thus, perfect correlation corresponds to a linear relationship, P -a.e., between the random
variables f and g, positive when = 1 and negative when = 1.
as desired.
\Only if". Let j P (f; g)j = 1. By proceeding as in the last proof, it is easy to see that
by the Cauchy-Schwarz equality (Theorem 109) there exist 1 ; 2 2 R, not both zero, such
that
~
1 f (!) = 2 g
~ (!) 8! 2 supp P
Hence, by setting = 1 EP (f ) 2 EP (g), we get
By setting = 2= 1 we conclude that (48.28) holds. Finally, we leave (48.29) to the reader.
48.7 Intermezzo
To continue the analysis we state and prove a neat version of the fundamental duality between
di erentiation and integration, which we discussed earlier in the book.12 A piece of notation:
we denote by C01 ([a; b]) the class of the continuously di erentiable functions g : [a; b] ! R
such that g (a) = 0.13
11
That is, f (!) 6= g (!) for some ! 2 supp P (f and g are not constant on the support of P ).
12
Recall the discussion around equations (44.62) and (44.64).
13
The condition g (a) = 0 is a normalization needed to make the duality sharp: indeed, primitives are
unique up to a constant (cf. Proposition 1874), and this condition pins down one of them.
48.8. DISTRIBUTION FUNCTIONS 1397
The bijective function T : C01 ([a; b]) ! C ([a; b]) that to each g 2 C01 ([a; b]) associates 2
C ([a; b]) is the di erential operator
T (g) = g 0 (48.32)
Its inverse function T 1 : C ([a; b]) ! C01 ([a; b]), which to each 2 C ([a; b]) associates
T 1 ( ) 2 C01 ([a; b]), is the integral operator
Z x
1
T ( ) (x) = (t) dt 8x 2 [a; b]
a
The function T describes the duality between di erentiation and integration, which thus
works at its best for continuously di erentiable functions, a further sign of the special role
that this class of functions plays in calculus (cf. Section 27.4).14
Rx
Proof \If". Assume that there is a 2 C ([a; b]) such that g (x) = a (t) dt for all x 2 [a; b].
By the Second Fundamental Theorem of Calculus, we have g 0 = and, trivially, g (a) = 0.
So, g 2 C01 ([a; b]).
\Only if". Assume that g R2 C01 ([a; b]). By the First Fundamental Theorem of Calculus
x
and since g (a) = 0, g (x) = a g 0 (t) dt for all x 2 [a; b]. So, can be chosen to 0
R xbe g 2
C ([a; b]). It remains to prove that is unique. If 2 C ([a; b]) is such that g (x) = a (t) dt
for all x 2 [a; b], by the Second Fundamental Theorem of Calculus we then have = g 0 ,
proving uniqueness.
Consider now T . Since to each function g 2 C01 ([a; b]), T associates the function 2
C ([a; b]) that satis es (48.31), we have already seen that = g 0 , proving (48.32). We next
show that T is bijective. We start by showing it is injective. Let g1 ; g2 2 C01 ([a; b]) be such
that T (g1 ) = T (g R x2 ). Set = T (g1 ) = T (g2 ). Thus, satis es (48.31) for both g1 and g2 ,
that is, g1 (x) = a (t) dt = g2 (x) for all x 2 [a; b], and so g1 = g2 . It remains to show that
T is surjective. Let 2 C ([a; b]). De ne g as in (48.31). The previous part of the proof and
the Second Fundamental Theorem of Calculus imply that g 2 C01 ([a; b]) and T (g) = g 0 = ,
as desired.
The next continuity result is a dividend of countable additivity, a further sign of the
continuity nature of this property.
Proposition 2016 If P is countably additive, then is right continuous with
lim (x) = 0 and lim (x) = 1
x! 1 x!+1
Proof Take x 2 R and a scalar sequence fxn g with xn # x. Consider the set An = (f xn )
for all n 1. Since fxn g is a decreasing sequence, we have An+1 An for all n 1. It holds
\
An = (f x)
n
For, if f (!) x then f (!) x xn for all n 1. Vice versa, if f (!) xn for all n 1,
by passing to the limit we get f (!) x. By Proposition 1988, we then have
lim (xn ) = lim P (An ) = P (f x) = (x)
n n
proving the right continuity. Take now a sequence fxn g with xn # 1. Now, consider the set
An = (f xn ) for all n\ 1. Since fxn g is a decreasing sequence, we have that An+1 An
for all n 1 as well as An = ; (why?). By Proposition 1988, this implies that
n
Since is increasing, we have limx! 1 (x) = 0. We leave to the reader to show that
limx!+1 (x) = 1.
Next we show that, remarkably, the size of this jump is the probability P (f = x0 ) that f
takes on value x0 . In so doing, we also characterize the continuity of .
48.8. DISTRIBUTION FUNCTIONS 1399
(f x0 ") (f < x0 ) (f x0 )
Example 2018 Let !0 be the Dirac probability centered at some state ! 0 2 . The
distribution function of a random variable f : ! R is given by
8
< 0 if x < x0
(x) =
: 1 if x x0
P ([ 5; 5]) = 1
That is, P concentrates all its mass on the interval [ 5; 5]. For the quadratic function
f (x) = x2 it holds
P ( 25 f 25) = 1
Thus, f is unbounded, yet probabilistically bounded under our P : with probability 1 it
assumes values between 25 and 25. This motivates the following de nition.
All bounded random variables are, trivially, essentially bounded. When P is simple, all
random variables are, again trivially, essentially bounded. Next we show that, remarkably,
distribution functions of essentially bounded random variables become eventually constant.
Proposition 2020 If f : ! R is essentially bounded, there exist scalars a and b such that
(x) = 0 for all x a and (x) = 1 for all x b.
(m f M) (f x)
1 (x) = P (f x) P (m f M) = 1
Thus, (x) = 1. Now, if we set b = M and take any a < m, it holds (x) = 0 for all x a
and (x) = 1 for all x b, as desired.
Example 2021 (i) We say that a random variable f : ! R is simple when it takes only
nitely many distinct values, that is, Im f = fx1 ; :::; xn g. Without loss of generality, we
assume that
x1 < < xn
The probability that f assumes value xi is given by
1
P f (xi ) = (xi ) xi
48.8. DISTRIBUTION FUNCTIONS 1401
for each i = 1; :::; n. With this, the distribution function is easily seen to be the step
function: 8
>
> 0 if x < x1
>
>
>
> 1 (x )
>
> P f 1 if x1 x < x2
<
(x) = P f 1 (x ) +P f 1 (x ) if x2 x < x3 (48.36)
1 2
>
>
>
>
>
>
>
> Pn
: 1 (x )
i=1 P f i =1 if x xn
Any interval [a; xn ] with a < x1 is a carrier of . Observe that is right continuous, so
x+
i = (xi ), even though P is not required to be countably additive.
(ii) Random variables de ned on a nite set are automatically simple, so their distri-
bution function is a step function. For instance, in the production plant example, we have a
simple random variable 8
>
> 400 if ! = s
<
f (!) = 2500 if ! = l
>
>
:
0 if ! = n
Here Im f = f0; 400; 2500g and
1 1 1
P (n) = P f (0) ; P (l) = P f (2500) ; P (s) = P f (400)
The distribution function of f is the right-continuous step function
8
>
> 0 if x < 0
>
>
>
>
< P (n) if 0 x < 400
(x) =
>
> P (n) + P (s) if 400 x < 2500
>
>
>
>
: P (n) + P (s) + P (l) = 1 if x 2500
Any interval [a; 2500] with a < 0 is a carrier of : If, for concreteness, we take P with
P (s) = P (l) = 1=4 and P (n) = 1=2, we have
8
>
> 0 if x < 0
>
>
>
< 1 if 0 x < 400
2
(x) =
>
> 3
> 4 if 400 x < 2500
>
>
:
1 if x 2500
Observe that there is no smallest carrier: the interval [0; 2500] is not a carrier since (0) =
1=2.
(iii) When the probability P itself is simple, the distribution function of any random
variable f is a right-continuous step function, as the reader can check. N
The plant example shows that smallest carriers may not exist. Example 2023-(i) will
show that, on the other hand, they may exist. More importantly, Example 2023-(ii) will
show that carriers might well not exist at all. As it will become clear as the analysis unfolds,
for our purposes what matters is the existence of carriers, not their possible minimality.
1402 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
x1 < < xn
when has a carrier [a; b]. In this case, a necessary condition for a distribution to be an
integral function is to be continuous (see Proposition 1881). So, integrable density functions
require continuous distribution functions.
48.8. DISTRIBUTION FUNCTIONS 1403
Example 2023 (i) Given any two scalars a < b, consider the uniform distribution function
8
>
> 0 if x < a
<
x a
(x) = b a if a x b
>
>
:
1 if x > b
The interval [a; b] is the smallest carrier of . Its density function, called uniform, is
( 1
b a if a x b
(x) =
0 else
because Z x Z x
1
(t) dt = dt = (x) 8x 2 [a; b]
1 a b a
R +1
and 1 (x) dx = 1.
(ii) The Gaussian distribution function is
Z x
1 t2
(x) = p e 2 dt
1 2
This distribution has no carriers. Its density function, called Gaussian, is
1 x2
(x) = p e 2
2
R +1
because 1 (t) dt = 1 (see Section 45.5). N
Continuous densities are especially important. Indeed, they are well behaved: they are
zero where they should.
Proposition 2024 Let be a distribution function with a carrier [a; b]. If has a contin-
uous density function , then
(x) = 0
for all x 2
= [a; b].
Thus, a continuous density is zero outside any carrier of its distribution function.
Rx R +1
Proof By de nition, (x) = 1 (t) dt for all x 2 R. Fix z b. Since 1 (x) dx =
Rb Rz
a (x) dx = 1, we have b (x) dx = 0 for all z b. By Corollary 1869, (t) = 0 for all
b t z. Since z was chosen arbitrarily, we conclude that (z) = 0 for all z b. A similar
argument shows that (z) = 0 for all z a.
This result requires continuity: we can always make discontinuous, so non-zero, at any
point x
~2= [a; b] as follows (
(x) if x 6= x
~
~ (x) =
y~ if x = x~
1404 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
Rx
with y~ 6= (~ x) = 0. By Theorem 1859, it still holds (x) = ~ (t) dt for all x 2 R, so ~
1
is still a density of . Yet, it is not 0 outside [a; b].
When has a carrier, the Barrow-Torricelli Theorem has the following immediate, yet
remarkable, consequence.
Proposition 2025 A distribution function with a carrier [a; b] has a unique continuous
density function if and only if it is continuously di erentiable. In this case, = 0 .
xi = f (! i ) 8i = 1; :::; n
To ease matters, assume that these values are distinct. It is then without loss of generality
to let
x1 < < xn (48.38)
Hence, a < x1 and b xn (why?). With this, for each i = 1; :::; n we have
P (! i ) = P (f = xi )
and so
n
X n
X
EP (f ) = f (! i ) P (! i ) = xi P (f = xi )
i=1 i=1
17
For convenience we picked the compact carrier but the choice of a speci c carrier is actually immaterial
(in any case, (x) = 0 for all x a and (x) = 1 for all x b).
48.9. EXPECTED VALUES II 1405
Rb
where a < x1 < xn b.18 We conclude that EP (f ) = a xd , as desired. We leave to
the reader the general case when the inequalities in (48.38) are weak, with some of the xi
possibly equal.
In this result the choice of the carrier of is irrelevant. Indeed, take any scalars c and
d with c < a b < d. By (47.22),
Z d Z a Z b Z d
xd = xd + xd + xd
c c a b
Ra Rd
As (x) = 0 on [c; a] and (x) = 1 on [b; d], it holds c xd = b xd = 0. We conclude
that Z Z
b d
xd = xd
a c
In the rest of the analysis it is convenient to use improper Stieltjes' integral. They are
de ned in a similar way than the improper Riemann integral. For it, the proprieties (i)-(v)
of Section 47.4 continue to hold. Improperly armed, we continue the analysis by observing
that the last result suggests a general, Stieltjes-based, notion of expected value.
De nition 2027 The expected (or mean) value of a random variable f : ! R with
distribution function under a probability P : 2 ! [0; 1] is the improper Stieltjes' integral
Z 1
EP (f ) = xd
1
when it exists.
The choice of the carrier is irrelevant (in analogy with what remarked after the last
result). When has a carrier [a; b], in particular when P is simple, the distribution function
has always a carrier [a; b] and so the expected value reduces to a standard Stieltjes' integral
Z 1 Z b
EP (f ) = xd = xd (48.39)
1 a
In view of Proposition 2026, this shows that this notion of expected value indeed subsumes
the earlier one of Section 48.4. At the same time, it considerably enlarges its scope because
18
The choice of a and b is irrelevant because, in any case, (x) = 0 for all x a and (x) = 1 for all
x b.
1406 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
When has a density , by Proposition 1940 { which continues to hold in the improper
case { it holds Z +1 Z +1
EP (f ) = xd (x) = x (x) dx
1 1
This permits to reduce an expected value to an improper Riemann integral.
Example 2029 (i) For the uniform density it holds
Z +1 Z b Z b
1 1 1 b2 a2 a+b
EP (f ) = x (x) dx = x dx = xdx = =
1 a b a b a a b a 2 2
(ii) For the Gaussian density it holds
Z +1 Z +1 Z +1 Z 0
1 x2 1 x2 1 x2
EP (f ) = x (x) dx = xp e 2 dx = xp e 2 dx + xp e 2 dx
1 1 2 0 2 1 2
Z +1 Z 0
1 x2 1 x2
= x p e 2 dx ( x) p e 2 dx
0 2 1 2
Z +1 Z +1
1 x2 1 x2
= x p e 2 dx x p e 2 dx = 0
0 2 0 2
N
Riemann integration comes up also without appealing to densities. Indeed, integration
by parts { which takes an elegant form in the Stieltjes integral (Proposition 1946) { permits
to express expected values as Riemann integrals with distribution functions as integrands.
Theorem 2030 (Cavalieri) If the random variable f : ! R is essentially bounded, it
holds Z +1 Z 0
EP (f ) = (1 (x)) dx (x) dx (48.40)
0 1
If f is positive, we have (x) = 0 for all x < 0 and so the Cavalieri formula (48.40)
takes the elegant form Z +1
EP (f ) = (1 (x)) dx
0
Proof We prove the result for a bounded random variable f : ! R, leaving the \essential"
case to the reader. By hypothesis, there exist scalars m and M with m f (!) M for all
! 2 . In formula (47.28) let f = and g = IdR , i.e., g (x) = x. We consider two cases.
48.9. EXPECTED VALUES II 1407
Cavalieri's formula is used in the proof of the next important result showing that linearity
and monotonicity continue to hold for the general notion of expected value.
Proposition 2031 Let f; g : ! R be random variables with nite expected values. Then
(ii) EP (f ) EP (g) if f g.
Note that (i) and (iii) together imply that, for each k 2 R,
EP (f + k) = EP (f ) + k
because (b) = 1 and (a) = 0. We prove (ii) when f is bounded, leaving the rest of the
proof to more advanced courses. We begin by considering two positive and bounded random
variables f and g. We have, for each x 2 R,
f (x) = P (f x) P (g t) = g (x)
19
With a standard abuse of notation (cf. Section 44.5.1), c denotes both a scalar c (on the left-hand side)
and a function constant to c (under the integral sign).
1408 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
Along with the Cavalieri Theorem, the monotonicity of the Riemann integral then implies
Z 1 Z 1
EP (f ) = (1 f (x)) dx (1 g (x)) dx = EP (g)
0 0
Now, let f and g be any two bounded random variables, not necessarily positive. As they
are bounded, there exists a scalar k 0 large enough so that f + k g + k 0. By point
(i) and by what just proved in the positive case,
EP (f ) + k = EP (f + k) EP (g + k) = EP (g) + k
as desired.
RTheorem 2032 Let f : ! R and ' : Im f ! R be such that the Stieltjes integral
+1
1 ' (x) d (x) exists nite. Then,
Z +1
EP (' f ) = ' (x) d (x) (48.41)
1
Proof Let g = ' f and denote by the distribution function of g. We prove the result
only when ' is strictly increasing and surjective, leaving a complete proof to more advanced
courses. We have:
1 1 1
(x) = P (g x) = P (' f x) = P f ' (x) = ' (x) = ' (x)
Thus,
Z +1 Z +1 Z +1
1
E (' f ) = xd (x) = xd ' (x) = ' (z) d (z)
1 1 1
under the change of variable z = ' 1 (x), as the reader can check.
Example 2033 Let u : R ! R be a utility function. As discussed in the coda, the expected
value EP (u X) is called expected utility of f . By the last result, we can write
Z +1
EP (u f ) = u (x) d (x)
1
Concave transformation are especially important. For them we can establish a Stieltjes
integral version of the all-important Jensen inequality.
48.10. MOMENTS AND ALL THAT 1409
Proposition 2034 Let be a distribution function with a carrier [a; b]. Given a concave
function ' : I ! R de ned on an open interval of the real line, we have
Z b Z b
' d (' ) d
a a
Proof The Stieltjes' integralsR on both sides exist because both and ' are continuous
b
(cf. Theorem 833). Set y0 = a d . The superdi erential @' (y0 ) of ' at y0 is not empty
(Theorem 1521). Let 2 @' (y0 ), so that ' (y) ' (y0 ) + (y y0 ) for all y 2 I. In
particular,
' ( (x)) ' (y0 ) + ( (x) y0 ) 8x 2 [a; b]
By the monotonicity of the Stieltjes integral and by the last lemma, we then have
Z b Z b Z b
' ( (x)) d (x) [' (y0 ) + ( (x) y0 )] d (x) = ' (y0 ) + (x) d (x) y0
a a a
Z b
= ' (y0 ) + (y0 y0 ) = ' (y0 ) = ' d
a
as desired.
48.10.2 Moments
When ' (x) = xn we get expected values of powers, an important class of expected values.
De nition 2035 The n-th moment of a distribution function is given by the Stieltjes
integral Z +1
n = xn d (x) (48.42)
1
For instance, 1 is the rst moment of , 2 is its second moment, 3 is its third moment,
and so on. In particular, by formula (48.41) it holds, for each n 1,
Z +1
n
EP (f ) = xn d (x) = n
1
So, the moments of correspond to the moments of the underlying random variable f . In
particular, the rst moment corresponds to the expected value of f and, as easily checked,
2
VP (f ) = 2 1
That said, it is convenient to carry out the analysis of moments directly in terms of
distribution functions.
1410 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
Proposition 2036 If the moment n exists, then all lower moments k, with k n, exist.
To assume the existence of higher and higher moments is, therefore, a more and more
demanding requirement. For instance, to assume the existence of the second moment is a
stronger hypothesis than to assume the existence of the rst moment.
RProof To ease matters, assume that there is a scalar a such that (a) = 0, so that n =
+1 n
x d (x). Since xk = o (xn ) if k < n, the version for improper Stieltjes integrals of
a R +1
Proposition 1916-(ii) ensures the convergence of a xk d (x), that is, the existence of k .
In this case, we are back to Riemann integration and we directly say that n is the n-th
moment of the density .
Z +1 Z b
n 1 1 bn+1 an+1
n = x (x) dx = xn dx =
1 a b a n+1 b a
p x2
where we adapted (44.67) to the improper case, with g (x) = x= 2 and f 0 (x) = xe 2 , so
p x2
that g 0 (x) = 1= 2 and f (x) = e 2 .
(iii) The Cauchy density, a version of Agnesi's versiera, is given by
1 1
(x) =
x2 +1
The primitive of (x) is 1 arctan x, so its distribution function is
Z x Z 0 Z x
1 1
(x) = (t) dt = (t) dt + (t) dt = lim arctan tj0x + arctan tjx0
1 1 0 x! 1
1 1 1 1
= + arctan x = + arctan x
2 2
In view of Example 1909, the mean of the Cauchy density does not exist. N
The next result { a consequence of the Stone-Weierstrass Theorem { shows under which
conditions moments uniquely pin down probability densities.
then f1 = f2 .
Moments thus uniquely pin down probability densities of continuously di erentiable dis-
tribution function with a carrier [a; b]. Indeed, assume that 1 and 2 are any two continuous
probability densities that are 0 outside [a; b]. If
Z b Z b
n
x 1 (x) dx = xn 2 (x) dx 8n 1
a a
then 1 = 2 .20 For any such density, to know the moments amounts to know the density
itself. For instance, the uniform density on [a; b] is the only density that has the moments
1 bn+1 an+1
n = 8n 1
n+1 b a
To specify these moments amounts to specify the uniform density.
for any polynomial p : [a; b] ! R. Let " > 0. By the Stone-Weierstrass Theorem, there exists
a polynomial p" : [a; b] ! R such that
jh (x) p" (x)j " 8x 2 [a; b]
Since h is continuous on the compact interval [a; b], by the Weierstrass Theorem the maximum
value M = maxx2[a;b] jh (x)j exists nite. So,
h2 (x) p" (x) h (x) = jh (x)j jh (x) p" (x)j "M 8x 2 [a; b]
Then,
p" (x) h (x) "M h2 (x) p" (x) h (x) + "M 8x 2 [a; b]
By the monotonicity of the integral, we then have:
Z b Z b Z b Z b
2
p" (x) h (x) dx " (b a) M = (p" (x) h (x) + ") dx h (x) dx (p" (x) h (x) + "
a a a a
Z b
= p" (x) h (x) dx + " (b a) M
a
Hence,
Z b Z b
2
h (x) dx p" (x) h (x) dx " (b a) M
a a
By (48.45),
Z b
h2 (x) dx " (b a) M
a
Rb
Since " was arbitrarily chosen, we conclude that a h2 (x) dx = 0. By Corollary 1869, this
implies h = 0, as desired.
In words, a sequence is completely monotone if its nite di erences keep alternating sign
across their orders. A completely monotone sequence is positive because 0 xn = xn , as well
as decreasing because xn 0 (Lemma 419). It is the discrete analog for sequences of the
di erential notion of complete monotonicity for functions on open intervals (Section 30.3).
Proof We prove the \only if" part, the converse being signi cantly more complicated.
RSo, let f n g be a sequence of moments (48.46). It su ces to show that ( 1)k k xn =
1 n
0 t (1 t)k dg (t) 0. We proceed by induction on k. For k = 0 we trivially have
k k R1 n k 1 k 1 R1
( 1) n = n = 0 t dg (t) for all n. Assume ( 1) xn = 0 tn (1 t)k 1 dg (t)
for all n (induction hypothesis). Then,
k k 1
xn = xn = k 1 xn+1 k 1
xn
Z 1 Z 1
k 1
= ( 1) tn+1 (1 t)k 1
dg (t) tn (1 t)k 1
dg (t)
0 0
Z 1 Z 1
= ( 1)k 1
tn (1 t)k 1
(1 t) dg (t) = ( 1)k tn (1 t)k dg (t)
0 0
as desired.
The characterizing property of moment sequences is, thus, total monotonicity. It is truly
remarkable that a property of nite di erences is able to pin down moments' sequences. Note
that for this result the Stieltjes integral is required: in the \if" part the integrator, whose
moments turn out to be the terms of the given completely monotone sequence, might well
be non-di erentiable (so, the Riemann version (48.43) might not hold).
For each y 2 R, it has a positive integrand eyx and so it is either nite or equal to +1. Let
Z +1
D = y2R: eyx d (x) < +1
1
Lemma 2041 The set D is a non-empty interval that contains the origin.
1414 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
R +1
Proof We have 0 2 D because 1 d (x) = 1. Let y; y 0 2 D and 2 [0; 1]. By the
convexity of the exponential function and the monotonicity of the integral, we have
Z +1 Z +1
[ y+(1 )y 0 ]x 0
e d (x) = e yx+(1 )y x d (x)
1 1
Z +1 Z +1
yx 0
e d (x) + (1 ) ey x d (x) < +1
1 1
as desired.
The set D can be the entire real line, but it can also reduce to the singleton f0g. The
following examples illustrate. In reading them, observe that the integral (48.47) becomes
Z +1 Z 1
yx
e d (x) = eyx (x) dx
1 1
Example 2042 (i) Suppose that has a carrier [a; b]. Then, for each y 2 R,
Z +1 Z b Z b Z b
yx yx jyjjxj
e d (x) = e d (x) e d (x) ejyj maxfjaj;jbjg d (x)
1 a a a
= (b a) ejyj maxfjaj;jbjg < +1
We conclude that D = R.
(ii) Let 8 1
< x2
if x > 1
(x) =
:
0 else
R +1
be the so-called Pareto density (recall from Example 1897 that 1 x 2 dx = 1). For every
y > 0,
Z +1 Z +1 Z +1 yx Z +1 yx
yx yx e e
e d (x) = e (x) dx = 2
dx = dx = +1
1 1 1 x 1 x2
Therefore, D = f0g.
(iii) Let 8
x
< e if x 0
(x) =
:
0 else
R +1 x
be the so-called exponential density with parameter > 0 (it holds 0 e = 1= ). We
have:
Z +1 Z +1 Z +1
yx yx
e d (x) = e (x) dx = eyx e x dx
1 1 0
8
Z +1 < y if y <
= e(y )x dx =
0 :
+1 if y
Thus, D = ( 1; ). N
48.10. MOMENTS AND ALL THAT 1415
Denote by I the interior of the interval D ; i.e., I = int D . The Pareto density shows
that I may be empty. When non-empty, I is an open interval that permits to de ne the
following important function.
It is easy to check that F is a convex function. The next result shows its importance.
Theorem 2044 The moment generating function is analytic on a neighborhood B (0) of the
origin, with
X1
n n
F (y) = y 8y 2 B (0) (48.48)
n!
n=0
In particular,
n = F (n) (0) 8n 1 (48.49)
Formula (48.48) is thus the exact Maclaurin expansion for the moment generating func-
tion.
Proof Since 0 2 I , there is a small enough neighborhood B (0) = ( "; ") included in I .
Let y 2 B (0). By Theorem 399,
1
X
yx y 2 x2 y 3 x3 y n xn y n xn
e = 1 + yx + + + + + =
2 3! n! n!
n=0
For each n 1,
n
X jyxjk
ejyxj e"jxj e "x
+ e"x
k!
k=0
and so, by the additivity of the integral
Xn Z 1 Z 1X n Z Z
jyxjk jyxjk 1
"x
1
d (x) = d (x) e d (x)+ e"x d (x) = F ( ")+F (")
k=0 1 k! 1 k!
k=0 1 1
This is easily seen to imply that has nite moments n of all orders. Moreover, it can be
proved that:21
Z 1 Z 1X1 1
X Z X1
y n xn yn 1 n yn
F (y) = eyx d (x) = d (x) = x d (x) =
1 1 n! n! 1 n! n
n=0 n=0 n=0
21
The third equality follows from a dominated convergence theorem for the Riemann integral, proved by
Cesare Arzela in 1885, that readers will learn in more advanced courses.
1416 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
As this holds for all y 2 B (0), by Proposition 1400 the restriction F : ( "; ") ! R is analytic.
Hence, it is in nitely di erentiable on ( "; "). Formula (48.49) follows from Proposition 1398,
but it is also easily checked by direct computation.
The derivative of order n at 0 of the moment generating function F is, therefore, the
n-th moment of . That is,
F 0 (0) = 1
00
F (0) = 2
F (n) (0) = n
This property of moment generating functions, which justi es their name, is important
because it is often convenient to compute moments through them.
x2 p
Example 2045 For the Gaussian density (x) = e 2 = 2 we have
Z +1 Z +1 Z +1
yx 1 yx x2 1 1
(x2 2yx)
F (y) = e (x) dx = p e e 2 dx = p e 2 dx
1 2 1 2 1
Z +1 Z +1 2
Z +1
y2
1
(x2 2yx+y 2 y 2 ) 1
(x2 2yx+y 2 )+ y2 1
(x y)2
= e 2 dx = e 2 dx = e 2 e 2 dx
1 1 1
y2
F (y) = e 2
y2 y2
We have F 0 (y) = ye 2 and F 00 (y) = e 2 (1 y), so 1 = F 0 (0) = 0 and 2 = F 00 (0) = 1.
N
The exact Maclaurin expansion (48.48) may hold on the entire real line, as next we show.
Thus, here the moment generating function F is the generating function of the sequence
f n =n!g.
48.11. CODA OSCURA 1417
Proof Let I = R. By proceeding as in the proof of Theorem 2044, for each neighborhood
( 1=n; 1=n) of the origin it holds
1
X
n n 1 1
F (y) = y 8y 2 ;
n! n n
n=1
As the union of all intervals ( 1=n; 1=n), with n 1, is the real line, we conclude that
(48.50) holds.
has a carrier (Example 2042). So, I = R. Thus, (48.50) holds. Let us compute the moment
generating function F . For each y 6= 0, we have
Z +1 Z 1
ey 1
F (y) = eyx d (x) = eyx dx =
1 0 y
R1
As F (0) = 0 dx = 1, we conclude that the moment generating function F : R ! R is given
by ( ey 1
y if y 6= 0
F (y) =
1 if y = 0
It holds, for each y 6= 0,
1
! 1 1
ey 1 1 X yn 1 X yn X yn
= 1 = =
y y n! y n! n! (n + 1)
n=0 n=1 n=0
have probability zero (Lemma 1982), a probability naturally interpreted as indicating their
impossibility to obtain.
This ease of interpretation motivates our detailed study of simple probabilities. Yet, in
applications are often used probabilities that, though formally perfectly legitimate, accord
less well with our intuition. To introduce them, let = [0; 1], i.e., take as a state space the
closed unit interval. With some imagination, suppose that blind-folded you pick a point !
of [0; 1]. It seems natural to assume that, being blind-folded, the probability that you pick
a point in the interval
1 3
;
4 4
is equal to the interval, that is, 1=2. Similarly, the probability that you pick a point in any
interval
[a; b] [0; 1]
is equal to its length, that is, b a. It can be proved that there exists a probability P :
2[0;1] ! [0; 1] such that
P ([a; b]) = b a (48.51)
for all 0 a b 1. Take the middle point ! = 1=2 of the closed unit interval: what is the
probability that you pick this speci c point? For all n 2,
1 1 1
! ;! + [0; 1]
2 n n
1 1 1 2
0 P P ! ;! + =
2 n n n
1
P =0
2
P (!) = 0
Our probability P ends up assigning a zero probability to all states. But, of course, at least
one of them obtains.
This simple example shows that a zero probability event is not, in general, an event that
it will not happen for sure. This puzzling fact is yet another surprising feature of actual
in nities, a Pandora box (cf. Section 7.3). Indeed, observe that for each probability value
0 x 1 there is some event A [0; 1] with P (A) = x (just take an interval in [0; 1] of
length x). So, the probability P assumes uncountably many values. It is thus drastically
di erent from the simple, nitely valued, probabilities for which our nitist intuition works
so well.
This discussion motivates the next de nition. Here is any state space whatsoever,
nite or in nite.
48.11. CODA OSCURA 1419
Theorem 2049 (Ulam) Let I be any interval, bounded or not, of the real line. There is
no countably additive probability measure P : 2I ! [0; 1] which is di use.
The power set turns out to be a too large a domain for a probability on the interval I
that, at the same time, is di use and countably additive. There are too many sets to take
care. Thus, either we give up a most convenient property like countable additivity or we
have to look for a smaller family of subsets of I over which it is possible to de ne a di use
and countably additive probability. This latter possibility is explored next.
De nition 2050 The family of Borel sets of R, denoted by B, is the smallest collection of
subsets of R that contains:
(i) the empty set as well as all intervals, bounded or not, of R (including R itself );
(ii) all nite and countable unions and intersections of its elements;23
Thus, B is the smallest family of sets of the real line that contains all intervals, a most im-
portant class of sets, and is closed under under nite and countable unions and intersections
as well as under complementation. These closure properties render B a suitable domain for
countably additive probability measures. More importantly, B is adequate for applications:
most sets of the real line of interest are Borel. Indeed, by de nition this is the case for all all
intervals. This implies, inter alia, that also all singletons { being one-element closed intervals
{ are Borel sets. By property (ii), all nite and countable sets are then Borel as well. Next
we show that also the topologically signi cant sets of the real line are Borel.
Proposition 2051 Open and closed sets of the real line are Borel.
Lemma 2052 Each open set of the real line is the union, nite or countable, of disjoint
open intervals.
22
That is, that = @1 as discussed in Section 7.3. S
23
That is, if fBi gi2I is any nite or countable collection of Borel sets, their union i2I Bi and intersection
\
Bi are still Borel sets.
i2I
24
That is, if B is a Borel set, then B c is also a Borel set.
1420 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
"x 2 [ 1; 1), with "x nite if G is bounded below. Set Ix = (x "x ; x + "x ). We have
Ix G (why?). Let (a; b) G be an open interval that contains x. Clearly, (a; b) Ix .
Thus, Ix is the largest interval with x 2 Ix and Ix G. As a result,
[
G= Ix
x2G
Let x; y 2 G with x 6= y. The intervals Ix and Iy are either equal or disjoint. For, suppose
that Ix \ Iy 6= ;. Then, the union Ix [ Iy is an open interval (why?) containing both x and
y. Hence, by the maximality of Ix and Iy , it holds
Ix Ix [ Iy Ix ; Iy Ix [ Iy Iy
I 3 Ix 7 ! q (x) 2 Q
thus de nes an injective function q : I ! Q: for any two distinct intervals in I, there exist
two distinct rationals. Thus, jQj jIj (cf. Section 7.3), that is, the collection I is at most
countable, as desired.
The family B is signi cantly smaller than the entire power set 2R : a non-trivial result
of set theory shows that B has the cardinality of the real line { i.e., jBj = c { and so, by
Cantor's Theorem, jBj < 2R . The real line thus features plenty of non-Borel sets. Yet, they
are not easy to construct, a further sign that B contains the sets of interest.
With this, let us denote by BI the family of all Borel sets that belong to an interval I.
It can be proved that it is possible to de ne di use and countably additive probabilities
P : BI ! [0; 1]
Here P (B) is the probability of a Borel set B I. So, the family BI is large enough to
contain the sets of the real line of interest, but small enough not to confront an Ulam-type
impossibility result. In particular, by taking I = [0; 1] there exists a countably additive
probability P : B[0;1] ! [0; 1] satisfying (48.51). What discussed in this section can be made
fully rigorous { despite its puzzling interpretative aspects { even under countable additivity.
We close with a simple but interesting result about the cardinality of supports, i.e., about
the number of non-zero probability states.
is at most countable.
48.11. CODA OSCURA 1421
In words, there exist at most countably many states with a non-zero probability. Of
course, di use probabilities have no states of this kind: their support is empty and has
therefore no relevance. In contrast, for a simple probability the support is a key notion, a
further indication of the dramatic di erence between di use and simple probabilities.
k
P (En ) P (Dk ) = P (! 1 ) + + P (! k )
n
We thus reach the contradiction
k
1=P( ) P (En ) lim = +1
k!1 n
Thus, the support (48.52) is a countable union of nite sets and, therefore, is at most
countable (cf. Theorem 273).
that is, (f = x) is the preimage of x through f . More generally, we can take the preimage
1
f (A) = f! 2 : f (!) 2 Ag
of any Borel set A in the real line. This preimage consists of all states at which f assume a
value x in A: if any of them obtains, f delivers an outcome x in A. The quantity
1
P f (A)
is then the probability that an outcome in A obtains under f . These remarks motivate the
following result.
The proof of this proposition relies on the nice behavior of preimages under unions and
intersections.
Lemma 2055 Let f : X ! Y be a function between any two sets X and Y . We have
! !
[ [ \ \
1 1 1
f Ai = f (Ai ) and f Ai = f 1 (Ai )
i2I i2I i2I i2I
Proof of Proposition 2054 It is easy to see that Pf (;) = 0, Pf (R) = 1 and Pf (A) 0
for all A 2 B. It remains to check additivity. So, let A and B two disjoint Borel subsets
of R. Their preimages f 1 (A) and f 1 (B) are disjoint: by Lemma 2055, ; = f 1 (;) =
f 1 (A \ B) = f 1 (A) \ f 1 (B). Thus, again by Lemma 2055,
1 1 1
Pf (A [ B) = P f (A [ B) = P f (A) [ f (B)
1 1
= P f (A) + P f (B) = Pf (A) + Pf (B)
(i) Let P be countably additive. Let fAi g1i=1 be any countable collection of pairwise
disjoint Borel subsets of R. Their preimages f 1 (Ai ) are easily seen to be pairwise disjoint.
By Lemma 2055,
1
! 1
!! 1
! 1 1
[ [ [ X X
1 1
Pf Ai = P f Ai =P f (Ai ) = P f 1 (Ai ) = Pf (Ai )
i=1 i=1 i=1 i=1 i=1
As the set f (E) R is nite (so Borel), this proves that Pf is simple.
The law Pf accounts for the outcomes' probabilities under f . For instance, when f
describes a nancial asset the preimage
1
f ([0; 1)) = f! 2 : f (!) 2 [0; 1)g = f! 2 : f (!) 0g = (f 0)
of the set A = [0; 1) is the collection of the states where this asset pays a positive amount
of money. Thus,
Pf ([0; 1)) = P (f 0)
represents the probability to gain with this asset. Similarly,
Pf (( 1; 0]) = P (f 0)
(x) = Pf (( 1; x])
It is easy to check that the properties of seen in Propositions 2015-2017 can be proved
using those of Pf . To illustrate, let us prove in this alternative way the right continuity of
when P is countably additive. By the last result, \ Pf as well is countably additive. Take
x 2 R and a scalar sequence fxn g with xn # x. As ( 1; xn ] = ( 1; x], by the continuity
n
of Pf (see Proposition 1988) we have
In the simple case, using laws we can express expected values as weighted averages of
outcomes.
To extend this outcome perspective beyond the simple case, however, we have to wait
for a more general notion of integral.
as desired.
We can use to retrieve the values of Pf on intervals. Clearly, for all x; y 2 R with
y < x it holds
Pf ((y; x]) = (x) (y)
When P is countably additive, by Proposition 2017 we have
Pf (x) = (x) lim (y) (48.53)
y!x
This allows us to retrieve the values of Pf on all intervals, bounded or not. For example,
Pf ((y; x)) = Pf ((y; x]) Pf (x). More interestingly, (48.53) has the following immediate
stark implication.
Proposition 2058 The law of a random variable with continuous distribution function is
di use.
as desired.
48.12. ULTRACODA: EXPECTED UTILITY 1425
f % g () EP (u(f )) EP (u(g))
Thus, the expected utility criterion ranks a random variable f via the expected value of its
utility pro le
u f :R!R
that to each state ! associates the utility u (f (!)) of the consequence f (!). Random
variables are ranked higher when, on average, provide a greater utility. In particular,
that is, random variables are indi erent when, on average, yield the same utility.
The expected utility criterion was introduced in 1738 by Daniel Bernoulli in a beautiful
work that was well ahead of its time. In this work Bernoulli used the logarithmic utility
function u(c) = ln c, for which
X
EP (u(f )) = log (f (!)) P (!)
!2supp P
26
Formally, % is a binary relation (see Chapter A).
1426 CHAPTER 48. INTRODUCTORY PROBABILITY THEORY
Other basic examples of utility functions are linear u(c) = c, power u(c) = c with 0 < < 1,
and negative exponential u(c) = e c . Accordingly, EP (u(f )) is
X X X
f (!)
f (!)P (!) ; f (!) P (!) ; e P (!)
!2supp P !2supp P !2supp P
As their arguments are monetary amounts, it is natural to assume that u is strictly increasing,
that is,
x > y =) u(x) > u(y) 8x; y 2 R
In this way decision makers (strictly) prefer greater amounts of money, a property satis ed
by all the aforementioned speci cations of u.
Di erent speci cations of u result in di erent rankings of the random variables. For
instance, in the single coin toss example with = fH; T g, consider again the bet f : ! R
de ned by
f (H) = 50 ; f (T ) = 50
as well as the certain random variable g : ! R that pays 0 for sure (i.e., in each state).
For a decision maker with linear u we have
EP (u(g)) = 0 = EP (u(f ))
and so, by (48.55), this decision maker is indi erent between f and g. In contrast, for a
decision maker with negative exponential u we have
1 50
EP (u(f )) = e + e50 < 0 = EP (u(g))
2
and so this decision maker (strictly) prefers g to f . As the reader will learn in more advanced
courses, it is the concavity of the negative exponential utility that underlies this, intuitively,
more prudential behavior.
48.12.2 Lotteries
The law Pf : B ! [0; 1] of a random variable f : ! R is given by
1
pf (B) = P f (B) 8C 2 B
It assigns a probability to each (Borel) set of consequences B. For instance, Pf ([0; 1)) is
the probability of a gain.
The probability measure Pf is called lottery induced by f . This terminology reminds the
gambling origins of the calculus of probability in the sixteenth and seventeenth centuries.
Lotteries permit to talk of probabilities of consequences, not just of states as we did so far.
The lottery Pf thus informs about the probabilities of the various consequences that the
random variable f . Yet, di erent random variables, say f and g, may happen to induce the
same lottery, that is, Pf = Pg .
Example 2060 In the single coin toss example, with = fH; T g and P (H) = P (T ) = 1=2,
consider the two bets
( (
10 if ! = T 0 if ! = T
f (!) = ; g (!) =
0 if ! = C 10 if ! = C
In this case,
1
Pf (10) = Pf (0) = = Pg (10) = Pg (0)
2
and so Pf = Pg . N
Although lottery Pf contains essential information about the random variable f , this
example shows that some important information is lost in translating a random variable
into a lottery: it is no longer known which state/event generated the consequence, only its
probability.
Theorem 2061 Let P : 2 ! [0; 1] be a simple probability. For each random variable
f : ! R,
EP (u f ) = EPf (u)
That is, X X
u(f (!))P (!) = u(c)Pf (c)
!2supp P c2supp Pf
The expected utility EP (u f ) of f is thus equal to the expected utility EPf (u) of the
induced lottery. As long as we consider the expected utility criterion, it is equivalent to rank
random variable or the lotteries that they induce. What was lost in translation is, therefore,
immaterial for this criterion. This parsimony is one of the reasons why this criterion is widely
used.
De ne the distinct scalars fx1 ; :::; xn g as in (??). Set D = fx1 ; :::; xn g. By construction,
D = f (supp P ). De ne
1
Ei = f (xi ) \ supp P 8i = 1; :::; n
Since the scalars xi are distinct, the sets fE1 ; ::; En g are pairwise disjoint. Moreover,
n
[
Ei = supp P
i=1
and
1 1
Pf (xi ) = P f (xi ) = P f (xi ) \ supp P = P (Ei ) 8i = 1; :::; n
Since P is additive,
n n n
!
X X [
Pf (D) = Pf (xi ) = P (Ei ) = P Ei = P (supp P ) = 1
i=1 i=1 i=1
Moreover,
X n X
X n
X
EP (u f ) = u(f (!))P (!) = u(f (!))P (!) = u (xi ) P (Ei )
!2supp P i=1 !2Ei i=1
Xm X
= u (xi ) Pf (xi ) = u (x) Pf (x)
i=1 x2D
Appendices
1429
Appendix A
A.1 De nition
Throughout the book we already encountered a few times binary relations to model connec-
tions between elements, but we never formally introduced them. In a nutshell, the notion
of binary relation formalizes the idea that an element x is in a relation with an element y.
It is an abstract notion that is best understood after having seen a few concrete examples
that make it possible to appreciate its unifying power. We discuss it in an Appendix, so that
readers can decide if and when to go through it.
A rst example of a binary relation is the relation \being greater or equal than" among
natural numbers: given any two natural numbers x and y, we can always say if x is greater
or equal than y. For instance, 6 is greater or equal than 4. In this example, x and y are
natural numbers and \being in relation with" is equivalent to say \being greater or equal
than".
Imagination is the only limit to the number of binary relations one can think of. Set
theory is the language that we can use to formalize the idea that two objects are related
to each other, that is, connected. For example, given the set of citizens C of a country, we
could say that citizen x is in relation with citizen y if x is the mother of y. In this case,
\being in relation with" amounts to \being the mother of ".
Economics is a source of examples of binary relations. For instance, consider an agent
and a set of alternatives X. The preference relation % is a binary relation. In this case, \x
is in relation with y" is equivalent to say that the agent regards \x is at least as good as y".
What do all these examples have in common? First, in all of them we considered two
elements x and y of a set X. Second, these elements were in a speci c order: indeed, one
thing is to say that x is in relation with y, another is to say that y is in relation with x. So,
the pair formed by x and y is an ordered pair (x; y) that belongs to the Cartesian product
X X. Finally, in all three examples it might well happen that a generic pair of elements
x and y is actually unrelated. For instance, if in our second example x and y are siblings,
obviously neither is a mother of the other. In other words, a given notion of \being in relation
with" might not include all pairs of elements of X.
We are now ready to give a (set-theoretic) de nition of binary relations.
1431
1432APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)
In terms of notation, we write xRy in place of (x; y) 2 R. Indeed, the notation xRy,
which reads \x is in the relation R with y", is more evocative of what the concept of binary
relation is trying to capture. So, in what follows we will adopt it.
To get acquainted with this new mathematical notion, let us now formalize our rst three
examples.
Example 2063 (i) Let X be the set of natural numbers N. The binary relation can be
viewed as the subset of N N given by
Indeed, it contains all pairs in which the rst element x is greater or equal than the second
element y.
(ii) Let X be the set of all citizens C of a country. The binary relation \being the mother
of" can be viewed as the subset of C C given by
Indeed, it contains all pairs in which the rst element is the mother of the second element.
(iii) Let X be the set of all consumption bundles Rn+ . The binary relation % can be seen
as the subset of Rn+ Rn+ given by
Indeed, it contains all pairs of bundles in which the rst bundle is at least as good as the
second one. N
A binary relation associates to each element x of X some element y of the same set
(possibly x itself, i.e., x = y). We denote by R (x) = fy 2 X : xRyg the image of x through
R, i.e., the collections of all y that stand in the relation R with a given x.
Example 2064 (i) For the binary relation on N, the image R (x) = fy 2 N : x yg of
x 2 N consists of all natural numbers that are smaller than or equal to x. (ii) For the binary
relation \being the mother of" on C, the image R (x) consists of all children.1 (iii) For the
binary relation % on Rn+ , the image R (x) = y 2 Rn+ : x % y of x 2 Rn+ consists of all
bundles that are at most as good as x. N
Rf = f(x; f (x)) 2 X X : x 2 Xg
on X consisting of all pairs (x; f (x)). The image Rf (x) = ff (x)g is a singleton consisting
of the image f (x). Indeed, self-maps can be regarded as the binary relations on X that have
singleton images, i.e., that associate to each element of X a unique element of X. N
This last example is the occasion to remark that, though to x ideas we focus on binary
relations R X X, the analysis easily extends to general binary relations R X Y with
X and Y possibly distinct. For instance, a function f : X ! Y can be viewed as a binary
relation
Rf = f(x; f (x)) 2 X Y : x 2 Xg
It is easy to see that binary relations R X Y correspond to the correspondences : X Y
de ned by (x) = R (x). We close by referring readers to Section D.7 for an important logic
perspective on binary relations.
A.2 Properties
A binary relation R can satisfy several properties. In particular, a binary relation R on a
set X is:
Often we will consider binary relations that satisfy more than one of these properties.
However, some of them are incompatible, for example asymmetry and symmetry, while others
are related, for example completeness implies re exivity.2
Example 2066 (i) Consider the binary relation on N. Clearly, is complete (so, it is
re exive). Indeed, given any two natural numbers x and y, either is greater or equal than
the other. Actually, if both x y and y x, then x = y. Thus, is antisymmetric. Finally,
is transitive but it is neither symmetric nor asymmetric.
(ii) Let R be the binary relation \being the mother of" on C. Individuals cannot be
their own mothers, so R is not re exive (thus, it is not complete either). Similarly, R is not
symmetric since if x is the mother of y, then y cannot be the mother of x. We leave to the
reader to verify that R is not transitive. N
2
Indeed, a complete binary relation R on X is, in particular, able to compare an element of X with itself.
1434APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)
Example 2067 Let R be the binary relation \being married to" on C. This relation consists
of all pairs of citizens (x; y) 2 C C such that x is the spouse of y. That is, xRy means that
x is married to y. The image R (x) is a singleton consisting of the spouse. The \married to"
relation is neither re exive (individuals cannot be married to themselves) nor antisymmetric
(married couples do not become single individuals). It is symmetric since individuals are
each other spouses, while transitivity does not hold because xRy and yRz implies x = z.
Finally, this relation is not complete since it is not re exive. N
The relation on N is the prototype for the following important class of binary relations.
For example, the binary relation on Rn satis es re exivity, transitivity and antisym-
metry, so it is a partial order (cf. Section 2.3). If n = 1, this binary relation is complete, thus
is a complete order. If n > 1, this is no longer the case, as we emphasized several times
in the text { for instance, the vectors (1; 2) and (2; 1) cannot be ordered by the relation .
Example 2069 (i) Consider the space R1 of all the sequences of real numbers (Section
8.2). The componentwise order on R1 de ned by x y if xn yn for each n 1 is
easily seen to be a partial order. (ii) Given any set A, consider the space RA of real-valued
functions f : A ! R (Section 6.3.2). The pointwise order on RA de ned by f g if
f (x) g (x) for all x 2 A is also easily seen to be a partial order (the componentwise order
on R1 is the special case A = N). (iii) Consider the power set 2X = fA : A Xg of a set
X, i.e., the collection of all its subsets (cf. Section 7.3). The inclusion relation on 2X
is a partial order: e.g., if X = fa; b; cg the sets fa; bg and fb; cg cannot be ordered by the
inclusion relation. N
The preference relation % is typically assumed to be re exive and transitive (Section 6.8).
It is also often assumed to be complete. In contrast, antisymmetry is a too strong property
for a preference relation in that it rules out the possibility that two di erent alternatives
be indi erent. For example, if X is a set of sports cars, an agent could rightfully declare a
Ferrari as good as a Lamborghini and obviously these two objects are quite di erent cars.
This important example motivates the next de nition.
So, the preference relations that one usually encounters in economics are an important
example of complete preorders. Interestingly, we also encountered a preorder when we dis-
cussed the notion of \having cardinality less or equal than" (Section 7.3).
Example 2071 Let 2R be the collection of all subsets of the real line. De ne the binary
relation on 2R by A B if jAj jBj, i.e., if A has cardinality higher or equal than B
(Section 7.3). By Proposition 283, is re exive and transitive, so it is a preorder. It is
A.3. EQUIVALENCE RELATIONS 1435
not, however, a partial order because antisymmetry is clearly violated: for example, the sets
A = f1; g and B = f2; 5g have the same cardinality { i.e., both A B and B A { yet
they are di erent, i.e., A 6= B. N
Clearly, a partial order is a preorder, while this example shows that the converse is false.
Proposition 2072 Let R be a preorder on a set X. The induced binary relation I is re ex-
ive, symmetric and transitive.
This result is the general abstract version of what Lemma 261 established for a preference
relation.
Proof Consider x 2 X and y = x. Since R is re exive and x = y, we have both xRy and
yRx. So, by de nition xIx, proving re exivity of I. Next assume that xIy. By de nition,
we have that xRy and yRx, which means that yRx and xRy, yielding that yIx and proving
symmetry. Finally, assume that xIy and yIz. It follows that xRy and yRx as well as yRz
and zRy. By xRy and yRz and the transitivity of R, we conclude that xRz. By yRx and
zRy and the transitivity of R, we conclude that zRx. So, we have both xRz and zRx,
yielding xIz and proving the transitivity of I.
The indi erence relation is, of course, an important economic example of an equiva-
lence relation. More generally, the induced relation I is an equivalence relation by the last
proposition. Equivalence relations play an important role in both mathematics and appli-
cations because they formalize a notion of similarity. Re exivity captures the idea that an
object must be similar to itself, while symmetry amounts to say that if x is similar to y, then
y is similar to x. As for transitivity, an analogous argument holds.
[x] = fy 2 X : xRyg
The collection [x], which is nothing but the image R (x) of x, is called the equivalence class
of x.
Thus, the choice of the representative x in de ning the equivalence class is immaterial:
any element of the equivalence class can play that role.
Proof Let y 2 [x]. Then [y] [x]. Indeed, if y 0 2 [y], then yRy 0 and so, being xRy, by
0 0
transitivity xRy , i.e., y 2 [x]. On the other hand, y 2 [x] implies x 2 [y] by symmetry. So,
[x] [y]. We conclude that [y] = [x].
For a preference relation, the equivalence classes are the indi erence classes, i.e., [x] is
the collection of all alternatives indi erent to x. Let us see another classic example.
Example 2075 The preorder on 2R of Example 2071 induces the equivalence relation
on 2R de ned by A B if and only if jAj = jBj, i.e., if A has the same cardinality than B.
If we consider the set Q, the equivalence class [Q] is the class of all sets that are countable,
for example N and Z. Intuitively, this binary relation declares two sets similar if they share
the same number of elements. N
At this point the reader might think that all equivalence relations of interest are derived
from an underlying preorder, so have the form I and are a derived notion. This is not the case:
the following classic equivalence relation has an independent interest and is not obtained via
a meaningful preorder.
Example 2076 Let n 2 Z be such that n 2. Consider the binary relation R on the set
of integers Z such that xRy if and only if n divides x y, that is, there exists k 2 Z such
that x y = kn. Clearly, for any x 2 Z, we have xRx since x x = kn with k = 0. At the
same time, if x and y in Z are such that xRy, then x y = kn for some k 2 Z, yielding that
y x = ( k) n. It follows that yRx, proving that R is symmetric. Finally, if x, y and z in
Z are such that xRy and yRz, then x y = kn and y z = k 0 n for some k; k 0 2 Z, yielding
that x z = (k + k 0 ) n. It follows that xRz, proving that R is transitive. We conclude that
R is an equivalence relation. It is often denoted by x = y (mod n). N
The next result shows that equivalence relations are closely connected to partitions of X,
so to subdivisions of the set of interest X in mutually exclusive classes. It generalizes the
basic property that indi erence curves are disjoint (Lemma 262).
Proposition 2077 If R is an equivalence relation on a set X, the collection of its equiva-
lence classes f[x] : x 2 Xg is a partition of X. Vice versa, any partition = fAi gi2I of X
is the collection of equivalence classes of the equivalence relation R de ned by xRy if there
exists A 2 such that x; y 2 A.
Proof Let us prove that the equivalence classes f[x] : x 2 Xg are pairwise disjoint. Given
any x; y 2 X, suppose that [x] \ [y] 6= ; for some x; y 2 X. We want to show that [x] = [y].
Since we can interchange the roles of x and y, it is enough to prove that [y] [x]. So, let
y 0 2 [y], that is, yRy 0 . Since [x] \ [y] 6= ;, there exists z 2 [x] \ [y], that is, xRz and yRz.
By symmetry, zRy and so, by transitivity, zRy 0 . Again by transitivity, along with xRz this
implies xRy 0 , that is, y 0 2 [x]. This proves the inclusion [y] [x]. We leave the rest of the
statement to the reader.
Example 2078 (i) The relation \having the same age" is an equivalence relation on C,
whose equivalence classes consist of all citizens that have the same age, that is, who belong to
same age cohort. The quotient space has, as points, the age cohorts. (ii) For the indi erence
relation on Rn+ , the quotient space has, as points, the indi erence curves. N
1438APPENDIX A. BINARY RELATIONS: MODELLING CONNECTIONS (SDOGANATO)
Appendix B
Permutations (sdoganato)
B.1 Generalities
Combinatorics is an important area of discrete mathematics, useful in many applications.
Here we focus on a few combinatorial notions that are important to understand some of the
topics of the book.
We start with a simple problem. We have at our disposal three pairs of pants and ve
T-shirts. If there are no chromatic pairs that hurt our aesthetic sense, in how many ways
can we possibly dress? The answer is simple: in 3 5 = 15 ways. Indeed, let us call the pairs
of pants a, b, c and the T-shirts 1, 2, 3, 4, 5: since the choice of a certain T-shirt does not
impose any (aesthetic) restriction on the choice of the pants, the possible pairings are
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
We can therefore conclude that if we have to make two independent choices, one among
n di erent alternative and the other among m di erent alternatives, the total number of
possible choices is n m. In particular, suppose that A and B are any two sets with n and
m elements, respectively. Their Cartesian product A B, which is set of ordered pairs (a; b)
with a 2 A and b 2 B, has n m elements. That is:
What has been said can be easily extended to the case of more than two choices: if we
have to make multiple choices, none of which imposes restrictions on the others, the total
number of possible choices is the product of the numbers of alternatives for each choice.
Formally:
Example 2081 (i) How many Italian licence plates are possible? They have the form AA
000 AA with two letters, three digits and again two letters. There are 22 letters that can be
used and, obviously, 10 digits. The number of (di erent) plates is, therefore, 22 22 10 10
10 22 22 = 234; 256; 000. (ii) In a multiple choice test, in each question students have to
1439
1440 APPENDIX B. PERMUTATIONS (SDOGANATO)
select one of the three possible answers. If there are 13 questions, then the overall number of
possible selections is 313 = 1; 594; 323. (iii) A three-course meal in an American restaurant
consists of an appetizer, a main course and a dessert. If a menu lists 3 appetizers, 4 main
courses and 2 desserts, we can then have 3 4 2 = 24 di erent three-course meals. N
B.2 Permutations
Intuitively, a permutation of n distinct objects is a possible arrangement of these objects.
For instance, with three objects a, b, c there are 6 permutations:
Permutations are thus nothing but the bijective functions f : X ! X. Though combi-
natorics typically considers nite sets X, the de nition is fully general.
For instance, if X = fa; b; cg the permutations f : fa; b; cg ! fa; b; cg that correspond to
the arrangements (B.1) are:
Example 2084 (i) A deck of 52 cards can be reshu ed in 52! di erent ways. (ii) Six
passengers can occupy in 6! = 720 di erent ways a six-passenger car. N
B.3. ANAGRAMS 1441
Indeed, Lemma 361 showed that n = o (n!). The already very fast exponentials are actually
slower than factorials, which de nitely deserve their exclamation mark.
B.3 Anagrams
We now drop the requirement that the objects be distinct and allow for repetitions. Specif-
ically in this section we consider Pn objects of h n di erent types, each type i with multi-
h
plicity ki , with i = 1; :::; h and i=1 ki = n.1 For instance, consider the 6 objects
a; a; b; b; b; c
Proposition 2085 The number of distinct arrangements, called permutations with repeti-
tions (or anagrams), is
n!
(B.2)
k1 !k2 ! kh !
The integers (B.2) are called multinomial coe cients, sometimes denoted by
n
k1 k2 :::kh
Example 2086 (i) The possible anagrams of the word ABA are 3!= (2!1!) = 3. They
are ABA, AAB, BAA. (ii) The possible anagrams of the word MAMMA are 5!= (3!2!) =
120= (6 2) = 10. They are MAMMA, MAMAM, MMAMA, MMAAM, MAAMM, AM-
MMA, AMMAM, AAMMM, MMMAA, AMAMM. N
1
Note that, because of repetitions, these n objects do not form a set X. The notion of \multiset" is
sometimes used for collections in which repetitions are permitted (cf. Section 3.7).
1442 APPENDIX B. PERMUTATIONS (SDOGANATO)
In the important two-type case, h = 2, we have k objects of one type and n k of the
other type. By (B.2), the number of distinct arrangements is
n!
(B.3)
k! (n k)!
n n! n (n 1) (n k + 1)
= =
k k! (n k)! k!
with
n n!
= =1
0 0!n!
The following identity can be easily proved, for 0 k n,
n n
= (B.4)
k n k
It captures a natural symmetry: the number of distinct arrangements remains the same,
regardless of which of the two types we focus on.
Example 2087 (i) In a parking lot, spots can be either free or busy. Suppose that 15
out of the 20 available spots are busy. The possible arrangements of the 5 free spots (or,
symmetrically, of the 15 busy spots) are:
20 20
= = 15; 504
5 15
(ii) We repeat an experiment 100 times: each time we can record either a \success" or a
\failure", so a string of a 100 outcomes like F SF F:::S results. Suppose that we have recorded
92 \successes" and 8 \failures". The number of the di erent strings that may result is:
100 100
= = 186; 087; 894; 300
92 8
n n n 1
= (B.5)
k k k 1
This formula relates binomial coe cients with the corresponding ratios and establishes a
recurrence for binomial coe cients.
B.4. A SET-THEORETIC ANGLE 1443
Proposition 2088 Let A be a set with n elements. The number of subsets of A that have
n
k elements is .
k
n
So, the binomial coe cient gives the number of ways in which we can select k
k
di erent elements from a set that has n elements.
Example 2089 For the set A = fa1 ; a2 ; a3 ; a4 ; a5 g, the number of subsets of A that have 2
elements is
5 5!
= = 10
2 2!3!
Indeed, these sets are fa1 ; a2 g, fa1 ; a3 g, fa1 ; a4 g fa1 ; a5 g, fa2 ; a3 g, fa2 ; a4 g, fa2 ; a5 g, fa3 ; a4 g,
fa3 ; a5 g, fa4 ; a5 g. N
Example 2090 Consider two urns I and II, and 10 balls numbered from 1 to 10. If urn I
10
can contain 3 balls, there are di erent ways in which the balls can ll the two urns.N
3
n
In view of the last example, we can say that the binomial coe cient is the number
k
of ways in which n numbered (so, distinguishable) balls can ll 2 urns that can contain k
and n k balls, respectively.
In a similar vein,Prather than 2 consider h di erent urns that can contain k1 , k2 , ..., kh
balls. If we set n = hi=1 ki , the multinomial coe cient
n
k1 k2 :::kh
Example 2091 Let B = fb1 ; b2 ; b3 ; b4 ; b5 g be a set of 5 numbered balls. Assume there are
3 di erent urns I, II and III that can contain 1 ball, 2 balls and again 2 balls, respectively.
The number of ways in which these 5 numbered balls can ll urns I, II and III is
5!
= 30 (B.6)
1!2!2!
1444 APPENDIX B. PERMUTATIONS (SDOGANATO)
For each ball of the 5 balls that we select to put in urn I, we have 6 similar ways to ll the
urns. So, we have 30 ways to ll them, in accordance with the multinomial coe cient (B.6).
N
n n n n n
(a + b)n = an + a 1
b+ a 2 2
b + + abn 1
+ bn (B.7)
1 2 n 1
n
X n n k k
= a b
k
k=0
Proof We proceed by induction. The initial step, that is the veracity of the statement for
n = 1, is trivially veri ed. Indeed:
1
1 1 0 1 0 1 X 1 1
(a + b)1 = a + b = a1 b0 + a0 b1 = a b + a b = a k k
b
0 1 k
k=0
We next prove the induction step. We assume the statement holds for n, that is,
n
Xnn n k k
(a + b) = a b
k
k=0
B.5. NEWTON'S BINOMIAL FORMULA 1445
and we show it holds for n + 1 as well. In doing so, we will use the combinatorial identity
(10.6), that is,
n+1 n n
= + 8i = 1; :::; n
i i 1 i
Note that
n
X n n
(a + b)n+1 = (a + b) (a + b)n = (a + b) a k k
b
k
k=0
n
X n
X
n n+1 k k n n k k+1
= a b + a b
k k
k=0 k=0
Xn n+1
X
n n+1 i i n
= a b + an+1 i bi
i i 1
i=0 i=1
n
X n
n+1 n n+1 i i X n
= a + a b + an+1 i bi + bn+1
i i 1
i=1 i=1
n
X n n
= an+1 + + an+1 i bi + bn+1
i 1 i
i=1
Xn X n+1 n+1
n + 1 n+1 i i
= an+1 + a b + bn+1 = an+1 i bi
i i
i=1 i=0
So, the statement holds for n + 1, thus proving the induction step and the main statement.
Formula (B.7) is called the Newton binomial formula. It motivates the name of binomial
n
coe cients for the integers . In particular,
k
X n
n n k
(1 + x) = x (B.8)
k
k=0
If we take x = 1 we obtain the remarkable relation
n n n n
+ + + + = 2n
0 1 2 n
which can be used to prove that if a nite set has cardinality n , then its power set has
cardinality 2n (cf. Proposition 280). Indeed, by Proposition 2088 there is only one, 1 = n0 ,
subset with 0 elements (the empty set), n = n1 subsets with only one element, n2 subsets
with two elements, ..., and nally only one, 1 = nn , subset { the set itself { with all the n
elements.
k1 ; k2 ; :::; kh
P
such that hi=1 ki = n. This formula motivates the name of multinomial coe cients for the
integers (B.2). For instance, the classic formula
(a1 + a2 + a3 )3 = a31 + a32 + a33 + 3a1 a22 + 3a1 a23 + 3a21 a2 + 3a21 a3 + 3a2 a23 + 3a22 a3 + 6a1 a2 a3
Notions of trigonometry
(sdoganato)
C.1 Generalities
We call trigonometric circle the unit circle with center at the origin and radius 1, oriented
counterclockwise, and on which one moves starting from the point of coordinates (1; 0).1
y
1.5
0.5
(1,0)
0
O x
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Clearly, each point on the circle determines an angle between the positive horizontal axis
and the straight line joining the point with the origin; vice versa, each angle determines a
point on the circle. This correspondence between points and angles can be, equivalently,
viewed as a correspondence between points and arcs of circle. In the following gure the
1
For an introduction to trigonometry we refer readers to Gelfand and Saul (2001).
1447
1448 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)
y
1.5
P
P
2
1
α'
0.5
α
0
O P 1 x
1
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Angles are usually measured in either degrees or radians. A degree is the 360th part of
a round angle (corresponding to a complete round of the circle); a radian is an, apparently
strange, unit of measure that assigns measure 2 to a round angle; it is therefore its 2 -th
part. We will use the radian as unit of measure of angles because it presents some advantages
over the degree. In any case, the next table lists some equivalent values of degrees and
radians.
degrees 0 30 45 60 90 180 270 360
3
radians 0 2
6 4 3 2 2
Angles that di er by one or more complete rounds of the circle are identical: to write or
+ 2k , with k 2 Z, is the same. We will therefore always take 0 <2 .
Fix a point P = (P1 ; P2 ) on the trigonometric circle, as in the previous gure. The sine
of the angle determined by the point P is the ordinate P2 of such point, while the cosine
of is the abscissa P1 .
The sine and the cosine of the angle are denoted, respectively, by sin and cos . The
sine is positive in the quadrants I and II, and negative in the quadrants III and IV. The
cosine is positive in the quadrants I and IV, and negative in the quadrants II and III. For
example,
3
0 p4 2 2 2
2
sin 0 p2 1 0 1 0
2
cos 1 2 0 1 0 1
tan2
sin2 =
1 + tan2
Finally, the reciprocals of sine, cosine and tangent are called secant, cosecant and cotangent,
respectively.
Next we list some formulas that we do not prove (in any case, it would be enough to
prove the rst two because the other ones are simple consequences).
1450 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)
sin ( + ) = sin cos + sin cos ; cos ( + ) = cos cos sin sin
and
sin ( ) = sin cos sin cos ; cos ( ) = cos cos + sin sin (C.4)
and r r
1 cos 2 1 + cos 2
sin = ; cos =
2 2
Prostaphaeresis formulas (addition and subtraction):
and
We close with a few classic theorems that show how trigonometry is intimately linked
to the study of triangles. In these theorems a, b, c denote the lengths of the three sides
of a triangle and , , the angles opposite to them:
Theorem 2093 (Law of Sines) Sides are proportional to the sines of their opposite an-
gles, that is,
a b c
= =
sin sin sin
An interesting consequence of the law of sines is that the area of a triangle can be
expressed in trigonometric form via the length of two sides and of the angle opposite to the
third side. Speci cally, if the two sides are b and c, the area is
1
bc sin (C.5)
2
Indeed, draw in the last gure a perpendicular from the top vertex to the side of length c,
and denote its length by h. From, at least, high school we know that the area of the triangle
C.2. CONCERTO D'ARCHI (STRING CONCERT) 1451
is ch=2 (it is the classic \half the base times the height" formula). Consider the right triangle
that has the side of length b as hypotenuse and the perpendicular of length h as a cathetus.
By the law of sines,
h b
=
sin sin 2
So, h = b sin . From the high school formula ch=2 it then follows the trigonometric formula
(C.5).
Example 2094 Some important geometric gures in the plane can be subdivided in trian-
gles, so their area can be recovered by adding up the area of such triangles. For instance,
consider a regular polygon with n sides of equal length and n angles of equal measure 2 =n.
For example, in the following gure we have an hexagon with six sides of equal length and
six angles of equal measure =3 (i.e., 60 degrees):
Denote by r the radius of this regular polygon. The area of each regular polygon is partitioned
in n identical isosceles triangles with two sides of equal length r. For instance, in the hexagon
there are six such triangles. By formula (C.5), the area of each of these identical isosceles
triangles is 2 1 r2 sin 2 =n, so the area of the polygon is
n 2 2
r sin (C.6)
2 n
p p
For example, the area of the hexagon is 3r2 3=2 since sin =3 = 3=2.
The subdivision of geometric gures of the plane in triangles is called triangulation, an
important technique that may permit to reduce the study of geometric gures to that of
triangles (by taking limits via arbitrarily small triangles, the technique becomes especially
powerful). N
1452 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)
Example 2095 The famous number can be de ned as the area of the closed unit ball
4
x
2
3
0
O x
1
-1
-2
-3 -2 -1 0 1 2 3 4 5
To compute amounts to compute this area, a problem that Archimedes famously ap-
proached via the method of exhaustion.2 This method considers the areas of inscribed and
circumscribed polygons, which provide lower and upper approximations for , respectively.
Indeed, the area of any inscribed polygon is always , while the area of any circumscribed
polygon is always . For instance, consider a regular polygons inscribed in the closed unit
ball, like the hexagon:
By increasing the number of sides, we get larger and larger inscribed regular polygons that
provide better and better lower approximations of . The area of each such polygon is given
by formula (C.6). Since their radius r is 1, we thus have the lower approximations
n 2
sin 8n 1
2 n
2
A remarkable early estimate of can be found in the Bible. In the rst Book of Kings (written around
the VI or V century B.C.), one reads that Solomon made a wash basin for ablution with a diameter of 10
cubits and a circumference of 30 cubits. As a result, here = 3.
C.2. CONCERTO D'ARCHI (STRING CONCERT) 1453
Similarly, by increasing the number of sides we get smaller and smaller circumscribed regular
polygons that provide better and better upper approximations of . The radius r of the
circumscribed regular polygon with n sides is the length of the equal sides of the isosceles
triangles in which it can be partitioned. So, r = cos 1 =n > 1 as the reader can check with
the help of the next gure:
that are better and better as n increases. At the limit, by setting again x = 2 =n we have:
n 1 2 1 sin x
lim sin = lim x =
n!1 2 cos2 n n x!0 cos2 2 x
Summing up,
n 2 n 1 2
" sin sin # 8n 1 (C.7)
2 n 2 cos2 n n
Via a trigonometric argument, we thus showed that the areas of the inscribed and circum-
scribed regular polygons provide lower and upper approximations of that, as the number
of sides increases, better and better sandwich till, in the limit of \in nitely many sides",
they reach as their common limit value.3
3
The role of in the approximations is to identify radians, so the actual knowledge of is not needed
(thus, there is no circularity in using these approximations for ).
1454 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)
The trigonometric approximations (C.7) thus justify the use of the method of exhaustion
to compute . Archimedes was able to compute the area of the inscribed and circumscribed
regular polygons till n = 96, getting the remarkable approximation
10 1
3:1408 = 3 + 3+ = 3:1429
71 7
By computing the areas of the inscribed and circumscribed regular polygons for larger and
larger n, we get better and better approximations of . N
We close with a result that generalizes Pythagoras' Theorem, which is the special case
when the triangle is right and side a is the hypotenuse (indeed, cos = cos =2 = 0).
C.3 Perpendicularity
The trigonometric circle consists of the points x 2 R2 of unit norm, that is, kxk = 1. Hence,
any point x = (x1 ; x2 ) 2 R2 can be moved back on the unit circle by dividing it by its norm
kxk since
x
=1
kxk
The following picture illustrates:
It follows that
x2 x1
sin = and cos = (C.8)
kxk kxk
that is,
x = (kxk cos ; kxk sin )
This trigonometric representation of the vector x is called polar. The components kxk cos
and kxk sin are called polar coordinates.
C.3. PERPENDICULARITY 1455
The angle can be expressed through the inverse trigonometric functions arcsin x,
arccos x and arctan x. To this end, observe that
x2
sin kxk x2
tan = = x1 =
cos kxk x1
x2 x1 x2
= arctan = arccos = arcsin
x1 kxk kxk
The equality = arctan x2 =x1 is especially important because it permits to express the angle
as a function of the coordinates of the point x = (x1 ; x2 ).
Let x and y be two vectors in the plane R2 that determine the angles and :
By (C.4), we have
that is,
x y
= cos ( )
kxk kyk
1456 APPENDIX C. NOTIONS OF TRIGONOMETRY (SDOGANATO)
where is the angle that is di erence of the angles determined by the two points.
This angle is a right one, i.e., the vectors x and y are \perpendicular", when
x y
= cos = 0
kxk kyk 2
that is, if and only if x y = 0. In other words, two vectors in the plane R2 are perpendicular
when their inner product is zero.
Appendix D
In this chapter we will introduce some basic notions of logic. Though, \logically", these
notions should actually be introduced at the beginning of a textbook, they can be best
appreciated after having learned some mathematics (even if in a logically disordered way).
This is why this chapter is an Appendix, leaving to the reader to judge when it is best to
read it.
D.1 Propositions
We call proposition a statement that can be either true or false. For example, \ravens are
black" and \in the year 1965 it rained in Milan" are propositions. On the contrary, the
statement \in the year 1965 it has been cold in Milan" is not a proposition, unless we specify
the meaning of cold, for example with the proposition \in the year 1965 the temperature
went below zero in Milan".
We will denote propositions by letters such as p; q; :::. Moreover, we will denote with 1
and 0, respectively, the truth or the falsity of a proposition: these are called truth values.
Thus, a true proposition has truth value 1, while a false proposition has truth value 0.
D.2 Operations
Let us introduce some operations on propositions.
(i) Negation. Let p be a proposition; the negation, denoted by :p, is the proposition that
is true when p is false and that is false when p is true. We can summarize the de nition
with the following truth table
p :p
1 0
0 1
which reports the truth values of p and :p. For instance, if p is \in the year 1965 it
rained in Milan", then :p is \in the year 1965 it did not rain in Milan".
(ii) Conjunction. Let p and q be two propositions; the conjunction of p and q, denoted by
p ^ q, is the proposition that is true when p and q are both true and is false when at
1457
1458 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
p q p^q
1 1 1
1 0 0
0 1 0
0 0 0
For instance, if p is \in the year 1965 it rained in Milan" and q is \in the year 1965
the temperature went below zero in Milan", then p ^ q is \in the year 1965 it rained in
Milan and the temperature went below zero".
(iii) Disjunction. Let p and q be two propositions; the disjunction of p and q, denoted by
p _ q, is the proposition that is true when at least one between p and q is true and is
false when both of them are false.1 The truth table is:
p q p_q
1 1 1
1 0 1
0 1 1
0 0 0
For instance, with the previous examples of p and q, then p _ q is \in the year 1965 it
rained in Milan or the temperature went below zero".
p q p =) q
1 1 1
1 0 0
0 1 1
0 0 1 (D.1)
The conditional is therefore true if, when p is true, also q is true, or if p is false (in
which case the truth value of q is irrelevant). The proposition p is called the antecedent
and q is the consequent. For instance, suppose the antecedent p is \I go on vacation"
and the consequent q is \I go to the sea"; the conditional p =) q is \If I go on
vacation, then I go to the sea".
p q p =) q q =) p p () q
1 1 1 1 1
1 0 0 1 0
0 1 1 0 0
0 0 1 1 1
The biconditional is, therefore, true when the two involved implications are either both
true or both false. With the last example of p and q, the biconditional p () q is \I
go on vacation if and only if I go to the sea".
These ve logical operations allow us to build new propositions form old ones. Starting
from the three propositions p, q and r, through negation, disjunction and conditional we can
build, for example, the proposition
For example, in your local newspaper you may read that \if this winter will be colder or
more rainy than last winter, more people will get a u". Let us rewrite this sentence in a less
catchy but more accurate form, amenable to a logical analysis: \if (in our city) in this winter
the daily average temperature will be lower than last winter or the daily average rainfall will
be higher, then doctors will diagnose a higher number of u cases". This pedantic rewriting
shows that the newspaper claim corresponds to the following proposition
(p _ :q) =) r
where p is the proposition \in this winter the daily average temperature will be than last
year", q is the proposition \in this winter the daily average rainfall will be > than last year"
and r is the proposition \in this winter doctors will diagnose a higher number of u cases".
What about the negation (D.2)? It corresponds to a rival local newspaper that tells its
readers \Forget about the other guys, just think of the opposite of what they say".
O.R. The true-false dichotomy originates in the Eleatic school, which based its dialectics
upon it (Section 1.8). Apparently, it rst appears as \[a thing] is or it is not" in the poem
1460 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
of Parmenides (trans. Raven). A serious challenge to the universal validity of the true-false
dichotomy has been posed by some, old and new, paradoxes. We already encountered the
set-theoretic paradox of Russell (Section 1.1.4). A simpler, much older, paradox is that of
the liar: consider the self-referential proposition \this proposition is false". Is it true or false?
Maybe it is both.2 Be that as it may, in many matters { in mathematics, let alone in the
empirical sciences { the dichotomy can be safely assumed.
p q q =) p p =) (q =) p)
1 1 1 1
1 0 1 1
0 1 0 1
0 0 1 1
In symbols, p =) (q =) p) 1.
Two propositions p and q are said to be (logically) equivalent, written p q, when they
have the same truth values, i.e., they are always both true or both false. In other words, two
propositions p and q are equivalent when the co-implication p () q is a tautology, i.e., it
is always true. The relation is called logical equivalence.
The following properties are evident:
p :p p ^ :p p _ :p
1 0 0 1
0 1 0 1
If p is the proposition \all ravens are black", the contradiction p ^ :p is \all ravens are both
black and non-black" and the tautology p _ :p is \all ravens are either black or non-black".
: (p ^ q) :p _ :q and : (p _ q) :p ^ :q
They can be proved through the truth tables; we con ne ourselves to the rst law:
p q p^q : (p ^ q) :p :q :p _ :q
1 1 1 0 0 0 0
1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 0 1 1 1 1
The table shows that the true values of : (p ^ q) and of :p _ :q are identical, as claimed.
Note an interesting duality: the laws of non-contradiction and of the excluded middle can
be derived one from the other via de Morgan's laws.
Indeed:
p q p =) q :p :q :q =) :p
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1
p =) q :p _ q (D.4)
1462 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
That is, the conditional p =) q is equivalent to the disjunction of q and the negation of p.
Indeed:
p q p =) q :p :p _ q
1 1 1 0 1
1 0 0 0 0
0 1 1 1 1
0 0 1 1 1
For instance, the proposition \If I go on vacation, then I go to the sea" is equivalent to the
proposition \I do not go on vacation or I go to the sea".
Similarly, for biconditionals we have the equivalence
p () q (:p _ q) ^ (p _ :q)
Finally, note that by the de Morgan laws (D.4) implies the equivalence
: (p =) q) p ^ :q (D.5)
p q p =) q : (p =) q) p ^ :q
1 1 1 0 0
1 0 0 1 1
0 1 1 0 0
0 0 1 0 0
For instance, the proposition \If I go on vacation, then I go to the sea" is false (true) if and
only if the proposition \I go on vacation and I do not go to the sea" is true (false). Indeed,
what about a mountain vacation?
N.B. Given two equivalent propositions, one of them is a tautology if and only if the other
one is so. O
D.4 Deduction
D.4.1 Logical consequences
An equivalence is a biconditional which is a tautology, i.e., which is always true. In a similar
vein, we call implication a conditional which is a tautology, that is, (p =) q) 1. In this
case, if p is true then also q is true.3 We say that q is a logical consequence of p, written
p j= q
3
When p is false the implication is automatically true, as the truth table (D.1) shows. Ex falso sequitur
quodlibet.
D.4. DEDUCTION 1463
The antecedent p is now called hypothesis and the consequent q thesis. Naturally, we
have p q when simultaneously p j= q and q j= p.
In words, modus ponens says that if the conditional and the antecedent are both true,
then the consequent is true. Modus tollens, instead, says that if the conditional is true and
the consequent is false, then the antecedent is false. Thus, modus ponens is about the status
of the consequent, while modus tollens is about the status of the antecedent.
We check only modus ponens:
p q p =) q (p =) q) ^ p (p =) q) ^ p =) q
1 1 1 1 1
1 0 0 0 1
0 1 1 0 1
0 0 1 0 1
The transitive essence of this implication will be soon clari ed by Lemma 2098. Let p
and q be as before and r be the proposition \I swim". The hypothetical syllogism ensures
that if is true that \if I go on vacation, then I go to the sea" and that \I go to the sea, then
I swim", then it is also true that \if I go on vacation, then I swim".
(a) direct proof : p j= q, i.e., we establish directly that, if p is true, also q is so;
(b) proof by contraposition: :q j= :p, i.e., we establish that the contrapositive :q =) :p
is a tautology (i.e., that if q is false, so is p);
(c) proof by contradiction (reductio ad absurdum): p ^ :q j= r ^ :r, i.e., we establish that
the conditional p ^ :q =) r ^ :r is a tautology (i.e., that, if p is true and q is false,
we reach a contradiction r ^ :r).
The proof by contraposition relies on the equivalence (D.3) and is, basically, an upside
down direct proof (for instance, Theorem 2103 will be proved by contraposition). For this
reason in what follows we will focus on the two main types of proofs, direct and by contra-
diction.
N.B. (i) When both p j= q and q j= p hold, the theorem takes the equivalence form p q.
The implications p j= q and q j= p are independent and each of them requires its own proof
(this is why in the book we studied separately the \if" and the \only if").
(ii) When, as it is often the case, the hypothesis is the conjunction of several propositions,
we write
p1 ^ ^ pn j= q (D.6)
So, the scope of the implication p j= q is broader than it may appear prima facie. O
Direct proofs are, however, often articulated in several steps, in a divide et impera spirit.
In this regard, the next result is key.
Proof Assume p j= r and r j= q. We have to show that p =) q is a tautology, that is, that
if p is true, then q is true. Assume that p is true. Then, r is true because p j= r. In turn,
this implies that q is true because r j= q.
Example 2099 (i) Assume that p is \n2 + 1 is odd" and q is \n is even". To prove p j= q,
let us consider the auxiliary proposition \n2 is even". The implication p j= r is obvious,
while the implication r j= q will be proved momentarily (Theorem 2102). Jointly, these two
implications provide a direct proof p j= r j= q of p j= q, that is, of the proposition \if n2 + 1
is odd, then n is even".
(ii) Assume that p is \the scalar function f is di erentiable" and q is \the scalar function
f is integrable". To prove p j= q is natural to consider the auxiliary proposition \the scalar
function f is continuous". The implications p j= r and r j= q are basic calculus results that,
jointly, provide a direct proof p j= r j= q of p j= q, that is, of the proposition \if the scalar
function f is di erentiable, then it is integrable". N
p q p ^ :q r ^ :r p =) q p ^ :q =) r ^ :r
1 1 0 0 1 1
1 0 1 0 0 0
0 1 0 0 1 1
0 0 0 0 1 1
(p =) q) (p ^ :q =) r ^ :r) (D.8)
It does not matter what is the proposition r because, in any case, r^:r is a contradiction.
In a more compact way, we can indeed rewrite the last equivalence as
(p =) q) (p ^ :q =) 0)
The proof by contradiction is, intellectually, the most intriguing { recall Section 1.8 on the
birth of the deductive method. We illustrate it with one of the gems of Greek mathematics
that we saw in the rst chapter. For brevity, we do not repeat the proof of the rst chapter
and just present its logical analysis.
p
Theorem 2101 2 2 = Q.
Logical analysis In this, as in other theorems it might seem that there is no hypothesis, but
it is not so: simply the hypothesis is concealed. For example, here the concealed hypothesis
is \the axioms of arithmetic, in particular those aboutp arithmetical operations, hold". Let
a be this concealed hypothesis,5 let q be the thesis \ 2 2 = Q" and let r be the proposition
\m=n is reduced to its lowest terms". The scheme of the proof is a ^ :q j= r ^ :r, i.e., if
arithmetical operations apply, the negation of the thesis leads to a contradiction.
An important special case of the equivalence (D.8) is when the role of r is played by the
hypothesis p itself. In this case, (D.8) becomes
(p =) q) (p ^ :q =) p ^ :p)
The following truth table
p q p =) q p ^ :q :p p ^ :q =) :p p ^ :q =) p ^ :p
1 1 1 0 0 0 1
1 0 0 1 0 0 0
0 1 1 0 1 1 1
0 0 1 0 1 1 1
Proof Let us assume, by contradiction, that n is odd. Then n2 is odd, which contradicts
the hypothesis.
Logical analysis. Let p be the hypothesis \n2 is even" and q the thesis \n is even". The
scheme of the proof is p ^ :q j= :p.
5
This discussion will become clearer after the next section on the deductive method. In any case, we can
think of a = a1 ^ ^ an as the conjunction of a collection A = fa1 ; :::; an g of axioms of arithmetic (in our
naive setup, we do not worry wether all such axioms can be expressed via propositional calculus, an issue
that readers will study in more advanced courses).
D.4. DEDUCTION 1467
D.4.5 Summing up
Proofs require, in general, some inspiration: there are no recipes or mechanical rules that
can help us in nding in a proof by contradiction an auxiliary proposition r that determines
the contradiction and in a direct proof the auxiliary propositions ri that permit to articulate
a direct argument.
That said, as to terminology the implication p j= q can be read in di erent, but equivalent,
ways:
(i) p implies q;
(ii) if p, then q;
(iii) p only if q ;
(iv) q if p;
The choice among these versions is a matter of expositional convenience. Similarly, the
equivalence p q can be read as:
For example, the next simple result shows that the implication \a > 1 j= a2 > 1" is
true, i.e., that \a > 1 is a su cient condition for a2 > 1", i.e., that \a2 > 1 is a necessary
condition for a > 1".
Proofs are at the heart of all mathematical investigations, pure and applied, they are
their holy precincts. As such, their style has to be clear yet concise, with no frills: every
word or symbol should be there for a reason. The main purpose of a proof is prove that a
theorem is correct and, sometimes, in doing so itpmay also shed light on the result itself (a
major example is the proof of the irrationality of 2 via the odd-even dichotomy for natural
numbers). But, unfortunately, proofs might well be not that illuminating { indeed, they
might be the outcome of as much perspiration as inspiration.6
6
Le sudate carte of Leopardi's poem A Silvia.
1468 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
j= q
Logical consequences are established via deductive reasoning. Such reasoning might well be
sequential, according for example to the deduction scheme (D.7).
If all propositions in are true, so are their logical consequences. We say that is
(logically):
A j= p (D.9)
that stands for a j= p where the hypothesis a = a1 ^ ^ an is the conjunction of the axioms
A = fa1 ; :::; an g. The thesis p can, of course, be a proposition de ned in terms of simpler
propositions via some logical operations. For instance, theorems in a mathematical theory
have often the \if..., then..." form
A j= p ! q
where the thesis is a conditional p ! q. In this case, (D.9) takes the special form
A [ fp g j= q (D.10)
where steps 1, 2 and 5 follows from (D.4), step 3 from the associativity of _ and step 4 from
the de Morgan laws.
The scope of a mathematical theory is given by the propositions that, via theorems
(D.9), can be derived from the axioms. Yet, to ease exposition axioms are typically omitted
in theorems' statements because they are taken for granted within the mathematical theory
at hand. So, for theorems of the form (D.10) one just writes p j= q in place of A [ fp g j= q.
A classic instance is Pythagora's Theorem \if a triangle is right, then the area of the square
whose side is the hypotenuse is equal to the sum of the areas of the squares whose sides
are the two catheti". In the mathematics jargon, p is called hypothesis of the theorem and
q thesis. In view of the last proposition, it is a correct terminology with the caveat of the
omitted axioms { which are theorems' veritable convitati di pietra (stoned guests).
In a similar vein, some statements of theorems of the form A j= q may appear to have no
hypothesis, an optical illusion that we already noted for Theorem 2101. Many theorems of
Euclidean geometry have actually this form. For instance, the important theorem \the sum of
the three interior angles of a triangle equals to right angles" tacitly assumes Euclid's axioms,
in particular the parallel postulate. This famous axiom is peculiar to Euclid's geometry and,
indeed, this theorem is no longer true in non-Euclidean geometries. Even if not explicitly
mentioned in the theorem, the parallel postulate thus looms in the background.
A.1 The proposition a1 =\x y for all x; y 2 I" is true (i.e., is re exive).
A.2 The proposition a2 =\x z and y z imply x y for all x; y; z 2 I" is true.
Proof We have a2 j= r, where r =\z z and y z imply z y for all y; z 2 I". So, the
proof relies on the deduction scheme a1 ^ a2 j= a1 ^ r j= q.8
Thus, under axioms A.1 and A.2 the binary relation is symmetric. It is easily checked
to be also transitive.
A.2 the proposition a2 =\xRz and yRz imply xRy for all x; y; z 2 X" is true.
A.1 R is re exive;
If we call Tarskian the property in A.2, we can state the abstract version of Theorem
2105 in a legible way.
8
It is easy to check using truth tables that from q j= r it follows p ^ q j= p ^ r for all propositions p, q and
r.
D.6. INTERMEZZO: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1471
In all models of the abstract structure (X; R) that assume axioms A.1 and A.2, this
theorem holds and will be suitably interpreted. The relations between the abstract structure
(X; R) and the models that we discussed can be diagrammed as follows
(X; R)
. &
(segments, congruence) (numbers, congruence)
(i) v (p _ q) = max fv (p) ; v (q)g and v (p ^ q) = min fv (p) ; v (q)g for all v 2 V ;
9
Realism is a methodological position, widely held in the practice of natural and social science, that asserts
the existence of an external, objective, reality that it is the purpose of scienti c inquiries to investigate.
10
Of course, behind this sentence there are a number of highly non-trivial conceptual issues about meaning,
truth, reality, etc. etc. (an early classical analysis of these issues can be found in Carnap, 1936).
11
The importance of propositions whose truth value is independent of any interpretation was pointed out
by Ludwig Wittgenstein in his famous, yet often elusive (if not evanescent), Tractatus (the use of the term
tautology in logic is due to him; he also popularized the use of truth tables to handle truth assignments).
12
Debreu (1959) is a classic axiomatic work in economics. In the preface of his book, Debreu writes that
\Allegiance to rigor dictates the axiomatic form of the analysis where the theory, in the strict sense, is logically
entirely disconnected from its interpretations."
1472 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
Proof As all other points are easily checked, we just prove the \only if" of (ii). If p j= q
then, p ^ q p and so, by (i), v (p) = v (p ^ q) = min fv (p) ; v (q)g v (q) for all v 2 V .
Denote by v the true con guration of the empirical reality under investigation. A
scienti c theory takes a stance about the empirical reality that it is studying by positing a
consistent collection A = fa1 ; :::; an g of propositions, called axioms, that are assumed to be
true under the (unknown) true con guration v , i.e., it is assumed that
v (ai ) = 1 8i = 1; :::; n
All propositions that are logical consequences of the axioms are then assumed to be true
under v .13 In particular, if A is complete the truth value of all propositions in P can be, in
principle, decided. So, the function v is identi ed.
Example 2108 (i) In economics, a choice theory studies the behavior of a consumer who
faces di erent bundles of goods. Consider a choice theory that has two primitive terms I and
(cf. Section D.5.3). The symbol I indicates the set of all bundles of goods available to the
consumer. The symbol indicates the consumer's indi erence relation between the bundles,
so that x y reads as \for the consumer bundle x is indi erent to bundle y".14 If the theory
assumes axioms A.1 and A.2, so the truth of propositions a1 and a2 , then is symmetric
(Theorem 2105) and transitive. By assuming these two axioms, the theory makes a stance
about the consumer's behavior, which is the empirical reality that is studying. The theory
is empirically correct as long as these axioms are empirically true, i.e., v (a1 ) = v (a2 ) = 1.
Unlike a mathematical theory, which is concerned only about the logical consistency of its
axioms, an empirical theory is also concerned about their empirical status.
(ii) In physics, special relativity is based on two axioms: a1 =\invariance of the laws of
physics in all inertial frames of reference", a2 =\the velocity of light in vacuum is the same
in all inertial frames of reference". If v is the true physical con guration, the theory is true
if v (a1 ) = v (a2 ) = 1. Special relativity is a most brilliant example of the ability to pursue
relentlessly all logical implications of the posited axioms, even if this means to challenge
fundamental ideas, for example on time, rmly held till then. N
This operational asymmetry between veri cation and falsi cation { emphasized by Karl
Popper in the 1930s { is an important methodological aspect. Indirect falsi cation is, in
general, the kind of falsi cation that one might hope for. It is the so-called testing of the
implications of a scienti c theory. In this indirect case, however, it is unclear which one of
the posited axioms actually fails: in fact, : (p1 ^ ^ pn ) :p1 _ _:pn . If not all the
posited axioms have the same status, only some of them being \core" axioms (as opposed
to auxiliary ones), it is then unclear how serious is the falsi cation. Indeed, falsi cation is
often a chimera (especially in the social sciences), as even the highly stylized setup of this
section should suggest.
That said, with all its limitations logical argumentation is the basic method of rational
investigation of an empirical science, with theoretical reasoning at its core as a way to
understand and organize empirical data, in a tradition started by the Ionians and revived
in modern times by Galileo (recall the celebrated Saggiatore passage15 about the book of
nature written in a mathematical language).
p q p^q
1 0 0
0 1 0
0 0 0
p^q 0
The most basic instance of two disjoint propositions is given by a proposition and its negation,
i.e., p and :p. Indeed, according to the law of non-contradiction we have p ^ :p 0. Of
course, two propositions can be disjoint without being one the negation of the other: the
two disjoint propositions \in the year 1965 the average daily temperature in Milan was 15
degrees" and \in the year 1965 the average daily temperature in Milan was 16 degrees" are
a such example.
The next result captures what is peculiar to the proposition/negation case among pairs
of disjoint propositions.
Proposition 2109 Two disjoint propositions are one the negation of the other if and only
if their disjunction is a tautology.
15
\... questo grandissimo libro ... e scritto in lingua matematica, e i caratteri son triangoli, cerchi, ed
altre gure geometriche, senza i quali mezi e impossibile a intenderne umanamente parola." (trans. \... this
grand book ... is written in the language of mathematics, and its characters are triangles, circles, and other
geometric gures, without which it is impossible for man to understand its words").
1474 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
Proof The \only if" is the law of excluded middle. As to the converse, assume that p and q
are two disjoint propositions such that p _ q 1. This implies that p and q cannot be false
at the same time, so either of them has to be true. By adding a disjunction column to the
last truth table we have
p q p^q p_q
1 0 0 1
0 1 0 1
We say that two propositions p and q are exhaustive if they cannot be false at the same
time, that is, if their disjunction p _ q is a tautology. Their truth table is:
p q p_q
1 1 1
0 1 1
1 0 1
p_q 1
Interestingly, also the most basic instance of two exhaustive propositions is given by a propo-
sition p and its negation :p. Indeed, by the law of excluded middle we have p _ :p 1.
Yet, we might well have two propositions p and q that are exhaustive without being one the
negation of the other, i.e., without having p ^ q 0. For example, if in our city the oldest
person is 100 years old, the two propositions p and q given by \our fellow citizen Mario is
< 50 years old" and \our fellow citizen Mario is 30 years old" are exhaustive but not
disjoint.
Two exhaustive propositions are easily seen to be disjoint if and only if one is the negation
of the other. Along with the last proposition, this implies the following simple, yet interesting,
result.
Proposition 2110 Two propositions are both disjoint and exhaustive if and only if one is
the negation of the other.
What characterizes, among all binary collections, the ones of the form fp; :pg is thus
that their elements are disjoint as well as exhaustive.
When two propositions are disjoint, let us denote their disjunction p _ q by p + q. For
instance, we write the law of excluded middle as
p + :p 1
In general, if the elements a nite collection of propositions are pairwise disjoint, we denote
their disjunction by X
p
p2
Truth assignments turn out to be additive over collections of pairwise disjoint proposi-
tions.
D.6. INTERMEZZO: THE LOGIC OF EMPIRICAL SCIENTIFIC THEORIES 1475
Lemma 2111 Let be a nite collection of pairwise disjoint proposition. For each truth
assignment v 2 V , we have 0 1
X X
v@ pA = v (p)
p2 p2
Proof Let v 2 V . Since the elements of are pairwise disjoint propositions, at most one
of them is true. That is, either v (p) = 0 for all p 2 (all propositions in are false) or
there exists p 2 such that v (p) = 1 and v (q) = 0 for all p 6= q 2 . In the former
P P P
case, proposition p2 p is also false, so v p2 p = 0 = p2 v (p). In the latter case,
P P P
proposition p2 p is true, so v p2 p = 1 = p2 v (p).
Inspired by Proposition 2110, we single out a key class of collections of pairwise disjoint
propositions. Speci cally, we say that a nite collection of propositions is a partition if its
elements are pairwise disjoint and if their disjunction is a tautology. In symbols,
p^q 0
Among collections of pairwise disjoint propositions, partitions thus have the extra prop-
erty that its elements cannot be all false at the same time. So, it is an exhaustive collection.
Clearly, a binary collection fp; :pg is the most basic partition. Actually, by Proposition
2110 a binary collection is a partition if and only if it has this form, i.e., each element is
the negation of the other. For an example of a non-binary partition, consider again our city
whose the oldest person is 100 years old. The propositions pn given by \our fellow citizen
Mario is n years old" form a partition, with n = 0; :::; 100.16
The elements of a partition have two key features: they are mutually exclusive { at
most one of them is true under any truth assignment v { and exhaustive { at least one of
them is true under v. Indeed, this is what characterizes partitions, as next we show.
Proposition 2112 A nite collection of propositions is a partition if and only if one and
only proposition in is true, that is, for each truth assignment v 2 V , there exists p 2
such that v (p) = 1 and v (q) = 0 for all p 6= q 2 .
In view of this result, elements of a partition are called atoms. To know the true values
of a partition under a truth assignment v 2 V amounts to know which one of its atoms is
true under v (the others being then false automatically).
Proof \If". Assume that, for each v 2 V , there exists a p 2 such that v (p) = 1 and
v (q) = 0 for all p 6= q 2 . Let p0 ; p00 2 . In view of Lemma 2107, we have v (p0 ^ p00 ) =
min fv (p0 ) ; v (p00 )g = 0 for all v 2 V because at most one proposition between p0 and p00
is true under each v. So, v (p0 ^ p00 ) = 0 for all v 2 V , which implies that p0 ^ p00 is a
16
Here 0 is the age of a baby who is not yet 12 months old.
1476 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
To make further progress in our atomic quest, observe that partitions can be re ned.
Speci cally, say that a partition 0 is ner than a partition (or that is coarser than 0 )
if, for each element p of there exists an element p0 of 0 that logically implies it, that is,
p0 j= p.
Proof Let 0 and 0 be two partitions, with 0 being ner than . Let p 2 . Consider
the collection p of allPatoms p0 of 0 that logically imply p, i.e., 0
p = fp0 2 0 : p0 j= pg. It
is easy to check that p0 2 p p0 p.
So, if we know the true values of the ner partition 0 under a truth assignment v 2 V {
that is, which atom of 0 is true { then we also know the true values of partition under v.
Atoms of a ner partition can be thus regarded as more \fundamental". This naturally raises
a question: does there exist a nest partition? Indeed, its atoms could be then regarded as
genuine, irreducible, logical atoms.
The next result provides an answer to this important question when P is nite. We leave
the easy proof to the reader.
Proposition 2114 A nite collection P = fp1 ; :::; pn g of propositions, closed with respect
to the logical operations _, ^ and :, admits a nest partition. Its atoms have the form
:pk if ik = 0
pikk =
pk if ik = 1
For instance, for the binary case P = fp1 ; p2 g we have the 4 atoms
Atoms (D.11) that are di erent from 0 (i.e., non-contradictory) are called constituents of P .
Denote by P their collection. Its cardinality is at most 2n (which is attained when all atoms
are di erent from 0).
The constituents are the ultimate, irreducible, logical components of the collection P ,
its \logical atoms". In view of Proposition 2113, each proposition p in P is equivalent to a
disjunction of constituents, so can be expressed in their terms. Speci cally, we have
X
p= c 8p 2 P (D.12)
c2P:cj=p
which is called canonical form of p. So, each proposition p 2 P can be retrieved from the
constituents via its canonical form. Moreover, (D.12) implies that, for each truth assignment
v 2 V , we have X
v (p) = v (c) 8p 2 P (D.13)
c2P:cj=p
Once we know the truth values of the constituents { i.e., which one of the constituents is true
{ via this formula we can recover the true values of all propositions in P under any truth
assignment v 2 V . So, if a truth assignment v is a possible con guration of the empirical
reality described by P , each such con guration is uniquely pinned down by the constituents'
values
v (P) = fv (c) : c 2 Pg
Di erent con gurations corresponds to di erent constituents' values, which are all one needs
to know to retrieve the values that a con guration assigns to all propositions in P .
Summing up, both syntactically { via the disjunctions in formula (D.12) { and semanti-
cally { via the sums in formula (D.13) { the constituents are the logical elementary particles
of P .
The converse is also true because, in analogy with (D.12), for each v 2 V we have
X
v (e) = v (c) 8e 2 E
c2C:cj=e
So, the knowledge of the truth values of the elements of E amounts to that of its constituents.
That said, for each proposition p 2 P de ne the set Ep = fc 2 E : c j= pg of constituents
that logically imply p. We say that p is certain (under E) if Ep 6= ; and
X
p c
c2Ep
there is no uncertainty about the truth value of p once one knows the truth values of the
constituents, so of the elements of E.
De ne by A the collection of all propositions that are certain, that is,
( )
X
A = p 2 P : 9; 6= B E; p c
c2B
In view of what just observed, our agent knows all the answers for the propositions in A,
there is no uncertainty about them being true or false. So, A is the collection of propositions
whose truth value the agent can infer from the knowledge of the propositions in E.
In contrast, this is no longer the case for propositions that do not belong to A, their
truth values are unknown, so uncertain, to him. To talk about them we need probability
theory, as readers will learn in other courses.
A moment's re ection shows, however, that A should consist of all propositions that
either can be constructed from the elements of E via the logical operations _, ^ and :
or that are equivalent to propositions that are constructed in this way (e.g., recall (D.4)
for conditionals). Indeed, the truth value of any such proposition, say :e1 _ (e2 ^ e3 ), is
automatically known via truth tables once the truth values of the elements of E are known.
Though we do not pursue it analytically, this heuristic remark should nevertheless shed
further light on the nature of A, which can be constructed by carrying out all possible
logical operations on the elements of E.
Be that as it may, now we know what our agent can infer with certainty from his knowl-
edge and what remains, instead, uncertain for him.
D.7. PREDICATES AND QUANTIFIERS 1479
9x 2 R, x2 = 1 (D.16)
we would assert a (simple) truth: there is some real number (there are actually two of them:
x = 1) whose square is 1.
To understand the role of quanti ers, we consider expressions { called (logical ) predicates
and denoted by p (x) { that contain an argument x that varies in a given set X, the domain
(or universe of discourse). For example, the predicate p (x) can be \x2 = 1" or \in the
year x it rained in Milan". Once a speci c value of the domain x is considered, we have a
proposition p (x) that may be either true or false. For instance, if X is the real line and
x = 3, the proposition \x2 = 1" is false; it becomes true if and only if x = 1.
The propositions
9x 2 X, p (x) (D.17)
and
8x 2 X, p (x) (D.18)
mean that p(x) is true at least for some x in the domain and that p(x) is true for every
such x, respectively. For example, when p (x) is \x2 = 1" propositions (D.17) and (D.18)
reduce, respectively, to propositions (D.15) and (D.16), while for the weather predicate they
become the propositions \there exists a year in the last century in which it rained in Milan"
and \every year in the last century it rained in Milan" (here X is the set of the last century
years).
Note that when the domain is nite, say X = fx1 ; :::; xk g, the propositions (D.17) and
(D.18) can be written as p (x1 ) _ _ p (xk ) and p (x1 ) ^ ^ p (xk ), respectively.
Quanti ers transform, therefore, predicates in propositions, that is, in statements that
are either true or false. If X is in nite, however, to verify whether proposition (D.18) is
true requires an in nite number of checks: for each x 2 X we have to verify whether p (x)
is true. Operationally, such truth value cannot be determined and so universal propositions
are typically not veri able.
In contrast, to verify whether (D.18) is false is enough to exhibit a single x 2 X such
that p (x) is false. Though to come up with such an element might not obvious at all, still
there is a clear asymmetry between the operational content of the two truth values of (D.18).
One actually often confronts propositions like \8x 2 X, p1 (x) ^ ^ pn (x)", the universal
version of the propositions p1 ^ ^ pn discussed in the Intermezzo (when talking about
veri ability and falsi ability). In this case, a large n further magni es the asymmetry.
1480 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
A dual asymmetry holds for the existential quanti er. The existential proposition (D.17)
can be veri ed via an element x 2 X such that p (x) is true. Of course, if X is large (let
alone if it is in nite), it may be operationally not obvious how to nd such an element. Be
that as it may, falsi cation is in a much bigger trouble: to verify that proposition (D.17)
is false we should check that, for all x 2 X, the proposition p (x) is false. Operationally,
existential propositions are typically not falsi able.
The following table summarizes our operational discussion:
p (x) 8x 2 X
instead of
8x 2 X; p (x)
It is a common way to handle universal quanti ers. (ii) If X = X1 Xn is a Cartesian
product, the predicate takes the form p (x1 ; :::; xn ) because x = (x1 ; :::; xn ). O
D.7.2 Algebra
In a sense, 8 and 9 represent the negation of one another. So,17
and, symmetrically,
: (8x, p (x)) 9x, :p (x)
In the example where p (x) is \x2 = 1", we can equally well write:
: 8x, x2 = 1 or 9x, x2 6= 1
(respectively: it is not true that x2 = 1 for every x and it is true that for some x one has
x2 6= 1).
More generally
: (8x; 9y, p (x; y)) 9x; 8y, :p (x; y)
For example, let p (x; y) be the proposition \x + y 2 = 0". We can equally assert that
: 8x; 9y, x + y 2 = 0
(it is not true that, for every x 2 R, we can nd a value of y 2 R such that the sum x + y 2
is zero: it is su cient to take x = 5) or
9x; 8y, x + y 2 6= 0
17
To ease notation, in the quanti ers we omit the clause \2 X".
D.7. PREDICATES AND QUANTIFIERS 1481
(it is true that, for every choice of y 2 R, there exists some value of x 2 R such that
x + y 2 6= 0: it is su cient to take x 6= y 2 ).
Note that the last few lines show that quanti ers permit to reduce binary predicates to
predicates or, even, to proposition. For instance, if the domain of p (x; y) consists of a group
of people X and it is interpreted as x is the friend of y, then 9y; p (x; y) is the predicate \x
has a friend", while 8y; 9y; p (x; y) is the proposition \each x has a friend".
1 2 m
1x + 2x + + mx = 0 =) 1 = 2 = = m =0
m
The set xi i=1 has been, instead, called linearly dependent if it is not linearly inde-
pendent, i.e., if there exists a set f i gmi=1 of real numbers, not all equal to zero, such that
1 2 m
1x + 2x + + m x = 0.
We can write these notions by making the role of predicates explicit. Let p ( 1 ; :::; m ) and
q ( 1 ; :::; m ) be the predicates \ 1 x1 + 2 x2 + + m xm = 0" and \ 1 = 2 = = m = 0",
m
respectively. The set xi i=1 is linearly independent when
8 f i gm
i=1 , p ( 1 ; :::; m) =) q ( 1 ; :::; m)
9 f i gm
i=1 ; : (p ( 1 ; :::; m) =) q ( 1 ; :::; m ))
9 f i gm
i=1 ; p ( 1 ; :::; m) ^ :q ( 1 ; :::; m)
In other words, a sequence fxn g does not converge to a point L 2 R if there exists " > 0
such that for each k 1 there is n k such that
jxn Lj "
By denoting by nk any such n k,18 we de ne a subsequence fxnk g such that jxnk Lj "
for all k 1. So, we have the following useful characterization of non-convergence to a given
point.
Proposition 2115 A sequence fxn g does not converge to a point L 2 R if and only if there
is a subsequence fxnk g such that jxnk Lj " for all k 1.
which is true if there exists x 2 X such that p(x; y) is true for all y 2 Y , and
8x 2 X, 9y 2 Y , p (x; y)
which is true if for all x 2 X there exists y 2 Y such that p(x; y) is true.
For instance, with X = Y = R, the proposition
9x 2 R; 8y 2 R, x y
is false because it states that \there is a greatest scalar," while the proposition
8x 2 R; 9y 2 R, x y
p (x) is true () x 2 A
So, predicates and sets are two sides of the same coin. Indeed, predicates formalize the
speci cation of sets via a property that its elements have in common, as we mentioned at
the very beginning of the book. The set fx 2 X : p (x) is trueg is called the extension of p.
In a similar vein, a binary predicate p (x; y) with two arguments that belong to the same
set X can be identi ed with the binary relation R X Y consisting of all pairs (x; y) such
that the proposition p (x; y) is true, i.e., R = f(x; y) 2 X Y : p (x; y) is trueg. Indeed,
We conclude that binary relations are the extensions, so the set-theoretic counterparts, of
binary predicates.
Example 2116 (i) If X is a set of years and Y a set of cities, binary predicate p (x; y) given
by \in the year x it rained in city y" corresponds to the binary relation
(ii) Let X = Y = N. The binary predicate p (x; y) given by \x y" corresponds to the
binary relation
= f(x; y) 2 N N : x is greater or equal than yg
(iii) Let C be the set of all citizens of a country. If X = Y = C, the binary predicate
p (x; y) given by \ x is the mother of y" corresponds to the binary relation
It contains all pairs in which the rst element is the mother of the second element.
(iv) Let Rn+ be the set of all consumption bundles. If X = Y = Rn , the binary predicate
p (x; y) given by \bundle x is at least as good as y" corresponds to the binary relation
It contains all pairs of bundles in which the rst bundle is at least as good as the second
one. N
In general, predicates with n arguments can be identi ed with n-ary relations, as readers
will learn in more advanced courses. In any case, the set-theoretic translations of some key
logical notions is a further wonder of Cantor's paradise.
1484 APPENDIX D. ELEMENTS OF INTUITIVE LOGIC
Appendix E
Mathematical induction
(sdoganato)
E.1 Generalities
Suppose that we want to prove that a proposition p(n), formulated for every natural number
n, is true for every such number n. Intuitively, it is su cient to show that the \initial"
proposition p(1) is true and that the truth of each proposition p (n) implies that of the
\subsequent" one p (n + 1). Next we formalize this domino argument:1
Theorem 2117 (Induction principle) Let p (n) be a proposition stated in terms of each
natural number n. Suppose that:
Proof Suppose, by contradiction, that proposition p (n) is false for some n. Denote by n0
the smallest such n, which exists since every non-empty collection of natural numbers has
a smallest element.2 By (i), n0 > 1. Moreover, by the de nition of n0 , the proposition
p (n0 1) is true. By (ii), p (n0 ) is true, a contradiction.
We illustrate this important type of proof by determining the sum of some important
series.
1
There are many soldiers, one next to the other. The rst has the \right scarlet fever", a rare form of
scarlet fever that contaminates instantaneously who is at the right of the sick person. All the soldiers take it
because the rst one infects the second one, the second one infects the third one, and so on so forth.
2
In the set-theoretic jargon, we say that N is a well ordered set.
1485
1486 APPENDIX E. MATHEMATICAL INDUCTION (SDOGANATO)
(i) We have
n
X n (n + 1)
1+2+ +n= s=
2
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1)
1=
2
Induction step. Assume it is true for n = k (induction hypothesis), that is,
k
X k (k + 1)
s=
2
s=1
Indeed3
k+1
X k
X k (k + 1) (k + 1) (k + 2)
s= s + (k + 1) = +k+1=
2 2
s=1 s=1
(ii) We have
n
X
2 2 2 n (n + 1) (2n + 1)
1 +2 + +n = s2 =
6
s=1
Initial step. For n = 1 the property is trivially true:
1 (1 + 1) (2 + 1)
12 =
6
Induction step. By proceeding as above, we get:
k+1
X k
X k (k + 1) (2k + 1)
2
s = s2 + (k + 1)2 = + (k + 1)2
6
s=1 s=1
(k + 1) [k (2k + 1) + 6 (k + 1)] (k + 1) 2k 2 + 7k + 6
= =
6 6
(k + 1) (k + 2) (2k + 3)
=
6
as claimed.
3
Alternatively, this sum can be derived by observing that the sum of the rst and of the last addend is
n + 1, the sum of the second one and of the second-last one is still n+1, etc. There are n=2 pairs and therefore
the sum is (n + 1) n=2.
E.2. THE HARMONIC MENGOLI 1487
(iii) We have
n n
!2
X X n2 (n + 1)2
13 + 23 + + n3 = s3 = s =
4
s=1 s=1
12 (1 + 1)2
13 =
4
Induction step. By proceeding as above, we get:
k+1
X k
X k 2 (k + 1)2
s3 = s3 + (k + 1)3 = + (k + 1)3
4
s=1 s=1
(k + 1)2 k 2 + 4 (k + 1) (k + 1)2 (k + 2)2
= =
4 4
of n terms in the geometric progression with rst term a and common ratio q 6= 1.
Initial step. For n = 1 the formula is trivially true:
1 q
a=a
1 q
as claimed.
The proof is based on a couple of lemmas, the second of which is proven by induction.
1488 APPENDIX E. MATHEMATICAL INDUCTION (SDOGANATO)
Proof Consider the convex function f : (0; 1) ! (0; 1) de ned by f (x) = 1=x. Since
1 1 1
k= (k 1) + k + (k + 1)
3 3 3
Jensen's inequality implies
1 1 1 1 1
= f (k) = f (k 1) + k + (k + 1) (f (k 1) + f (k) + f (k + 1))
k 3 3 3 3
1 1 1 1
= + +
3 k 1 k k+1
as claimed.
Pn
Let sn = k=1 xk be the partial sum of the harmonic series xk = 1=k.
Proof We proceed by induction. Initial step: n = 1. We apply the previous lemma for
k = 3:
1 1 1 3
s3 1+1 = s4 = 1 + + + > 1 + = 1 + s1
2 3 4 3
Induction step: let us assume that the statement holds for n 1. We prove that it holds
for n + 1. We apply the previous lemma for k = 3n + 3,
1 1 1
s3(n+1)+1 = s3n+4 = s3n+1 + + +
3n + 2 3n + 3 3n + 4
1 1 1
sn + 1 + + +
3n + 2 3n + 3 3n + 4
3 1
sn + 1 + = sn + 1 + = sn+1 + 1
3n + 3 n+1
which completes the induction step. In conclusion, the result holds thanks to the induction
principle.
Proof of the theorem Since the harmonic series has positive terms, the sequence of its
partial sums fsn g is monotone increasing. Therefore, it either converges or diverges. By
contradiction, let us assume that it converges, i.e., sn " L < 1. From the last lemma it
follows that
L lim s3n+1 lim (1 + sn ) = 1 + lim sn = 1 + L
n n n
which is a contradiction.
Appendix F
Cast of characters
1489
1490 APPENDIX F. CAST OF CHARACTERS
[2] Maria G. Agnesi, Instituzioni analitiche ad uso della gioventu italiana, Regia Ducal
Corte, Milano, 1748 (trans. as Analytical Institutions, Taylor and Wilks, London, 1801).
[3] Tom M. Apostol, Mathematical analysis, 2nd ed., Addison Wesley, Boston, 1974.
[4] Richard M. Aron, Luis Bernal Gonzalez, Daniel M. Pellegrino, Juan B. Seoane Sepul-
veda, Lineability, CRC Press, Boca Raton, 2015.
[5] Kenneth J. Arrow, Essays in the theory of risk-bearing, Markham, Chicago, 1971.
[6] Kenneth J. Arrow, Methodological individualism and social knowledge, American Eco-
nomic Review, 84, 1-9, 1994.
[7] Emil Artin, The gamma function, Holt, Rinehart and Winston, New York, 1964.
[8] Rene Baire, Sur les fonctions de variables reelles, Annali di Matematica Pura ed Ap-
plicata, 3, 1-123, 1899.
[11] Claude Berge, Espaces topologiques et fonctions multivoques, Dunod, Paris, 1959 (trans.
as Topological spaces, Oliver and Boyd, Edinburgh, 1963).
[12] Daniel Bernoulli, Specimen theoriae novae de mensura sortis, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, 1738 (trans. in Econometrica, 1954).
•
[13] Luitzen E. J. Brouwer, Uber abbildung von mannigfaltikeiten, Mathematische Annalen,
71, 97-115, 1912.
[16] Rudolf Carnap, Testability and meaning, Philosophy of Science, 3, 419-471, 1936.
1495
1496 BIBLIOGRAPHY
[18] Ernesto Cesaro, Sur la convergence des series, Nouvelles Annales de Mathematiques,
7, 49-59, 1888.
[19] Ernesto Cesaro, Sur la multiplication des series, Bulletin des Sciences Mathematiques,
14, 114-120, 1890.
[20] Ernesto Cesaro, Corso di analisi algebraica con introduzione al calcolo in nitesimale,
Bocca, Torino, 1894.
[21] Oscar Chisini, Sul concetto di media, Periodico di Matematiche, 4, 106-116, 1929.
[22] John H. Cochrane, Asset pricing, Princeton University Press, Princeton, 2005.
[24] Gerard Debreu, De nite and semide nite quadratic forms, Econometrica, 295-300,
1952.
[25] Gerard Debreu, Theory of value, Yale University Press, New Haven, 1959.
[27] Bruno de Finetti, Sul concetto di media, Giornale dell'Istituto Italiano degli Attuari,
2, 369-396, 1931.
[28] Bruno de Finetti, Sulle strati cazioni convesse, Annali di Matematica Pura e Applicata,
30, 173-183, 1949.
[29] Bruno de Finetti, Sulla preferibilita, Giornale degli economisti, NS 11, 685-709, 1952
(trans. in Giornale degli Economisti, 2012).
[30] Rene Descartes, Geometrie, Leiden, 1637 (trans. in D. Smith and M. L. Latham, The
Geometry of Rene Descartes, Dover, New York, 1954).
[31] Peter Lejeune Dirichlet, Sur la convergence des series trigonometriques qui servent a
representer une fonction aribitraire entre des limites donnees, Journal f•
ur die Reine
und Angewandte Mathematik, 4, 157-169, 1829.
[32] Ivar Ekeland and Roger Temam, Convex analysis and variational problems, Siam,
Philadelphia, 1999.
[33] Federigo Enriques, Sul procedimento di riduzione all'assurdo, Bolletino Mathesis, 11,
6-14, 1919.
[34] Paul Erdos, The di erence of consecutive primes, Duke Mathematical Journal, 6, 438-
441, 1940.
[35] Leonhard Euler, Variae observationes circa series in nitas, Commentarii Academiae
Scientiarum Imperialis Petropolitanae, 1744.
[36] William Feller, An introduction to probability theory and its applications, 2nd ed., Wiley,
New York, 1971.
BIBLIOGRAPHY 1497
[37] Werner Fenchel, Convex cones, sets, and functions, Princeton University Press, Prince-
ton, 1953.
[38] Irving Fisher, The application of mathematics to the social sciences, Bulletin of the
American Mathematical Society, 36, 225-243, 1930.
[39] Miroslav Fiedler, Special matrices and their applications in numerical mathematics,
Nijho Publishers, Dordrecht, 1986.
[41] Carl F. Gauss, Disquisitiones Arithmeticae, Fleischer, Lipsia, 1801 (trans. Yale Uni-
versity Press, New Haven, 1966).
[42] Izrail M. Gelfand and Mark Saul, Trigonometry, Birkhauser, Boston, 2001.
[43] Angelo Genocchi and Giuseppe Peano, Calcolo di erenziale e principii di calcolo inte-
grale, Fratelli Bocca, Roma, 1884.
[44] Enrico Giusti, Matematica e commercio nel Liber abaci, in Un ponte sul Mediterraneo
(E. Giusti, ed.), Edizioni Polistampa, Firenze, 2002.
[45] Stephane Gonnord and Nicolas Tosel, Calcul di erentiel, Ellipses, Paris, 1998.
[46] Peter Gordon, Numerical cognition without words: Evidence from Amazonia, Science,
306, 496-499, 2004.
[47] Harvey P. Greenberg and William P. Pierskalla, Quasiconjugate functions and sur-
rogate duality, Cahiers du Centre d'Etude de Recherche Operationelle, 15, 437-448,
1973.
[48] Paul Halmos, Naive set theory, Van Nostrand, Princeton, 1960.
[49] Godfrey H. Hardy, Orders of in nity, Cambridge University Press, Cambridge, 1910.
[50] Godfrey H. Hardy, Divergent series, Oxford University Press, Oxford, 1949.
[51] Godfrey H. Hardy, John E. Littlewood and George Polya, Inequalities, Cambridge
University Press, Cambridge, 1934.
[52] David Hawkins and Herbert A. Simon, Some conditions of macroeconomic stability,
Econometrica, 17, 245-248, 1949.
[53] Roger A. Horn and Charles R. Johnson, Matrix analysis, 2nd ed., Cambridge University
Press, Cambridge, 2013.
[54] Aleksandar Ivic, The Riemann zeta-function, Wiley, New York, 1985.
[55] Johan Jensen, Sur les fonctions convexes et les inegalites entre les valeurs moyennes,
Acta Mathematica, 30, 175-193, 1906.
1498 BIBLIOGRAPHY
[56] B rge Jessen, Bem rkninger om konvekse funktioner og uligheder imellem mid-
delv rdier, Matematisk Tidsskrift, B, 17-28 and 84-95, 1931.
[59] Shizuo Kakutani, A generalization of Brouwer's xed point theorem, Duke Mathemat-
ical Journal, 8, 457-459, 1941.
[62] Konrad Knopp, In nite sequences and series, Dover, New York, 1956.
[63] Tjalling C. Koopmans, Stationary ordinal utility and impatience, Econometrica, 28,
287-309, 1960.
[65] Harold W. Kuhn and Albert W. Tucker, Nonlinear programming, Proceedings of the
Second Berkeley Symposium, 481-492, University of California Press, Berkeley, 1951.
[66] Steven G. Krantz and Harold R. Parks, A primer of real analytic functions, Birkhauser,
Boston, 2002.
[68] Joseph L. Lagrange, Theorie des fonctions analytiques, Ve. Courcier, Paris, 1813.
[69] Gabriel Lame, Note sur la limite du nombre des divisions dans la recherche du plus
grand commun diviseur entre deux nombres entiers, Comptes Rendus de l'Academie
des Sciences de Paris, 19, 867-870, 1844.
[71] Lucio Lombardo Radice, L'in nito, Editori Riuniti, Roma, 1981.
[72] Andrej Marko , On mean values and exterior densities, Recueil Mathematique, 4, 165-
191, 1938.
[73] Jean Mawhin, Variations on the Brouwer xed point theorem: A survey, Mathematics,
8, 501, 1-14, 2020.
BIBLIOGRAPHY 1499
[74] Andreu Mas-Colell, The theory of general economic equilibrium: A di erentiable ap-
proach, Cambridge University Press, Cambridge, 1985.
[75] Andreu Mas-Colell, Michael D. Whinston and Jerry R. Green, Microeconomic theory,
Oxford University Press, Oxford, 1995
[76] James Maynard, Gaps between primes, Proceedings of the International Congress of
Mathematicians 2018 (B. Sirakov, P. Ney de Souza and M. Viana, eds.), v. 1, 343-360,
World Scienti c, Singapore, 2018.
[77] Lionel McKenzie, Matrices with dominant diagonals and economic theory, in Mathe-
matical Methods in the Social Sciences (K. J. Arrow, S. Karlin and P. Suppes, eds.),
Stanford University Press, Palo Alto, 1959.
[78] George J. Minty, Monotone (nonlinear) operators in Hilbert space, Duke Mathematical
Journal, 29, 341-346, 1962.
[79] Leon Mirsky, An introduction to linear algebra, Oxford University Press, Oxford, 1955.
[81] Katta G. Murty and Santosh N. Kabadi, Some NP-complete problems in quadratic
and nonlinear programming, Mathematical Programming, 39, 117-129, 1987.
[82] John F. Muth, Rational expectations and the theory of price movements, Economet-
rica, 29, 315-335, 1961.
[83] John Napier, Miri ci logarithmorum canonis descriptio, Hart, Edinburgh, 1614.
[84] John Nash, Equilibrium points in n-person games, Proceedings of the National Academy
of Sciences, 36, 48-49, 1950.
[85] Yurii Nesterov, Introductory lectures on convex optimization, Kluwer, Boston, 2004.
[86] Otto Neugebauer, The exact sciences in antiquity, Brown University Press, Providence,
1957.
[87] Efe Ok, Real analysis with economic applications, Princeton University Press, Prince-
ton, 2007.
[88] Richard S. Palais, Natural operations on di erential forms, Transactions of the Amer-
ican Mathematical Society, 92, 125-141, 1959.
[89] Vilfredo Pareto, Sunto di alcuni capitoli di un nuovo trattato di economia pura, 20,
216-235, Giornale degli Economisti, 1900 (trans. in Giornale degli Economisti, 2008).
[91] Pierre Pica, Cathy Lemer, Veronique Izard and Stanislas Dehaene, Exact and approx-
imate arithmetic in an Amazonian indigene group, Science, 306, 499-503, 2004.
1500 BIBLIOGRAPHY
[92] Henri Poicare, Sur les integrales irregulieres des equations lineaires, Acta Mathematica,
8, 295-334, 1886.
[93] John W. Pratt, Risk aversion in the small and in the large, Econometrica, 32, 122-136,
1964.
[94] Giovanni Ricci, Ricerche aritmetiche sui polinomi, II: Intorno a una proposizione non
vera di Legendre, Rendiconti del Circolo Matematico di Palermo, 58, 190-208, 1934.
[96] James Ritter, Egyptian mathematics, in Mathematics across cultures (H. Selin, ed.),
Kluwer, Dordrecht, 2000.
[97] R. Tyrrell Rockafellar, Lagrange multipliers and optimality, SIAM Review, 35, 183-238,
1993.
[98] Stephen A. Ross, Neoclassical nance, Princeton University Press, Princeton, 2005.
[99] Ioannis M. Roussos, Improper Riemann integrals, CRC Press, Boca Raton, 2014.
[100] Walter Rudin, Principles of mathematical analysis, McGraw-Hill, New York, 1964.
[101] Lucio Russo, La rivoluzione dimenticata, Feltrinelli, 1996 (trans. as The Forgotten
Revolution, Springer, Berlin, 2004).
[103] Waclaw Sierpinski, Elementary theory of numbers, 2nd ed., North-Holland, Amsterdam,
1988.
[104] Jean-Luc Solere, L'ordre axiomatique comme modele d'ecriture philosophique dans
^
l'Antiquite et au Moyen Age, Revue d'histoire des sciences, 56, 323-345, 2003.
[105] Thomas J. Stieltjes, Recherches sur les fractions continues, Annales de la Faculte des
Sciences de Toulouse, J1-J122 and A5-A47, 1894 and 1895.
[106] George J. Stigler, The development of utility theory I, II, Journal of Political Economy,
58, 307-327 and 373-396, 1950.
[107] Josef Stoer and Christoph Witzgall, Convexity and optimization in nite dimensions,
Springer-Verlag, Heidelberg, 1970.
[108] Dirk J. Struik, A source book in mathematics, 1200-1800, Princeton University Press,
Princeton, 1986.
[109] Patrick Suppes, Axiomatic set theory, Van Nostrand, Princeton, 1960.
[110] Arpad Szabo, The beginnings of Greek mathematics, Reidel Publishing Company, Dor-
drecht, 1978.
BIBLIOGRAPHY 1501
[111] Alfred Tarski, Introduction to logic and to the methodology of the deductive sciences,
4th ed., Oxford University Press, Oxford, 1994.
[112] Leonida Tonelli, L'analisi funzionale nel calcolo delle variazioni, Annali della Scuola
Normale Superiore di Pisa, 9, 289-302, 1940.
[115] Gregory Vlastos, Studies in Greek philosophy, v. 1, Princeton University Press, Prince-
ton, 1996.
[116] Vito Volterra, Sui principii del calcolo integrale, Giornale di Matematiche, 19, 333-372,
1881.
[117] John von Neumann, Zur theorie der gesellshaftsphiele, Mathematische Annalen, 100,
295-320, 1928 (trans. in R. D. Luce and A. W. Tucker, eds., Contributions to the theory
of games IV, 13-42, Princeton University Press, Princeton, 1959).
[118] John von Neumann and Oskar Morgenstern, Theory of games and economic behavior,
Princeton University Press, Princeton, 1944.
[120] Richard Wheeden and Antoni Zygmund, Measure and integral: An introduction to real
analysis, 2nd ed., CRC Press, Boca Raton, 2015.
[121] David Wootton, The invention of science: A new history of the scienti c revolution,
Penguin, London, 2015.
[122] Eduardo H. Zarantonello, Projections on convex sets in Hilbert space and spectral
theory, in Contributions to nonlinear functional analysis (E. H. Zarantonello, ed.),
Academic Press, New York, 1971.
[123] Antoni Zygmund, Trigonometric series, 3rd ed., Cambridge University Press, Cam-
bridge, 2002.
Index
1502
INDEX 1503
collinear, 66
column, 467
disjoint, 757
linearly dependent, 65
linearly independent, 65
orthogonal, 83
product, 47
row, 468
scalar multiplication, 47
sum, 47
Venn diagrams, 4
Versors, 64, 83
Volatility, 1369
Weights, 547