Principles of Mathematical Economics
Principles of Mathematical Economics
Mathematical Economics
Kam Yu
Kam Yu
Lakehead University
Thunder Bay, Ontario
Canada
Phone +1-807-343-8229
Email: [email protected]
Every year a first-year student, out of curiosity or because of some program require-
ments, takes principles of economics. She is surprised to learn that a subject in social
science can be studied in such a logical and scientific manner. After getting a good
grade, she decides to major in economics. The advisor in the department told the
student that quantitative analysis is very important in the study of economics. A
degree in economics, however, is a liberal arts degree, which provides a well-rounded
general education and a good preparation for professional schools. Many economics
majors, the advisor said, achieve high scores in LSAT, an entrance examination for
the faculty of law.
The student diligently select courses in calculus, linear algebra, probability and
statistics, and econometrics in her curriculum and does well in those courses. After
graduation she decides to pursue graduate study in economics. To the student’s
surprise, the verbal and graphical analysis she was trained to solve economic prob-
lems somehow all but disappear. Instead, most of the graduate textbooks contain
materials that more or less resemble applied mathematics. It seems that there is a
wide gap between the general liberal arts degree and graduate studies in economics.
The objective of this book is to fill such a gap. It is written for a one-semester
course as an introduction to mathematical economics for first year graduate and
senior undergraduate students. The materials assume the students have backgrounds
in college calculus and linear algebra. A course in mathematical economics at the
undergraduate level is helpful but not necessary. Chapters 1 through 5 aim to build
up the students’ skills in formal proof, axiomatic treatment of linear algebra, and
elementary vector differentiation. Chapters 6 and 7 present the basic tools needed
for microeconomic analysis. I have covered much of the materials up to chapter 7 to
such a class for many years. Due to the time constraint of a short semester, many
proofs of the theorems are omitted. For example, although the inverse function
theorem and the implicit function theorem are important tools in economic analysis,
iii
Preface
experience has shown that the marginal benefits of going through the proofs in class
are not worth the distraction. Therefore throughout the text I have included notes
that point to references if the students want to explore a bit more about particular
topics.
For the above reasons, the book is not meant to be a manual or a reference
in mathematical economics. My focus is to provide a quick training to graduate
students in one semester to acquire a somewhat elusive quality called mathematical
maturity. Greg Mankiw advises his students to take as many courses in mathematics
“until it hurts.” Different students, of course, have different levels of tolerance. There
were occasional moments of delight, when a student exclaimed that the proof of a
theorem was simply beautiful. Nevertheless, my apology if this book really hurts.
There are two bonus chapters. Most graduate level econometrics courses assume
the students to have some training in mathematical probability and statistics. Chap-
ter 8 provides a quick introduction or review to probability theory. It is, however, by
no mean a substitute for the formal training. Chapter 9 is an introduction to dynamic
modelling, which I used as a supplementary reading for a course in macroeconomic
theory. Both chapters sometimes refer to concepts covered in the core chapters.
Part of the book was written when I was visiting Vancouver, British Columbia.
On one summer day I joined a tour with my friend to Whistler. While the tour
guide was introducing the Coast Mountains and other points of interest, my mind at
one moment drifted back to the time when I was new to the province. I saw afresh
how beautiful the country was. In writing this book, there were moments that I
thought about the materials from a student’s perspective, and rediscover the beauty
of mathematics. I hope you enjoy the journey.
Last but not least, I would like to thank my friends Eric, Erwin, and Renee, who
continue to give me supports, in good time and bad.
iv
Contents
Preface iii
Table of Contents v
v
CONTENTS
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Basic Topology 49
3.1 Introduction to Metric Space . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Topological Space . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.2 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.3 Definitions in Metric Space . . . . . . . . . . . . . . . . . . . . 50
3.1.4 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.5 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.6 Series in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.1 Properties of Continuous Functions . . . . . . . . . . . . . . . 58
3.2.2 Semicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.4 Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Sequences and Series of Functions . . . . . . . . . . . . . . . . . . . . 63
3.3.1 Pointwise Convergence . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Linear Algebra 75
4.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Basic Properties of Vector Spaces . . . . . . . . . . . . . . . . . . . . 80
4.3 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.2 Matrix Representations of Transformations . . . . . . . . . . . 87
4.3.3 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.4 Multilinear Transformations . . . . . . . . . . . . . . . . . . . 91
4.4 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5.1 Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . 93
4.5.2 Triangular Inequality . . . . . . . . . . . . . . . . . . . . . . . 94
4.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.7 Inverse Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.7.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 96
4.7.2 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . 97
4.7.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
vi
CONTENTS
7 Optimization 153
7.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . 153
7.2 Equality-Constrained Optimization . . . . . . . . . . . . . . . . . . . 155
vii
CONTENTS
8 Probability 173
8.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.4 Mathematical Expectations . . . . . . . . . . . . . . . . . . . . . . . 182
8.5 Some Common Probability Distributions . . . . . . . . . . . . . . . . 186
8.5.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . 186
8.5.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . 187
8.5.3 Gamma and Chi-Square Distributions . . . . . . . . . . . . . . 187
8.5.4 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 188
8.6 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . 190
8.7 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.7.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.7.2 Moving Average Models . . . . . . . . . . . . . . . . . . . . . 193
8.7.3 Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . 194
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
viii
CONTENTS
Bibliography 233
ix
CONTENTS
x
List of Figures
xi
LIST OF FIGURES
xii
List of Tables
xiii
LIST OF TABLES
xiv
Chapter 1
Example 1.1.
(a) Aurorae are formed when solar particles interact with the Earth’s magnetic field.
(b) The cardinality of the set of real numbers is bigger than that of the rational
numbers.
(c) All prime numbers are odd numbers.
(d) Every child should be entitled to a good education.
(e) x2 + 1 = 0.
The first three sentences are statements. Statement (a) is true based on inductive
reasoning. That is, it can be verified by repeated empirical testings and observa-
tions, which is the foundation of scientific inquiries. Statement (b) is true because
of deductive reasoning. Mathematical knowledge is derived from agreements among
mathematicians on assumptions called axioms. Theorems and propositions are de-
rived from the axioms by logical rules. Therefore pure mathematics in principle is
not science since it does not rely on empirical observations. Statement (c) is false
1
CHAPTER 1. LOGIC AND PROOF
since 2 is prime but even. The sentence in (d) is a moral declaration. It belongs
to the discussion in ethics and cannot be in principle assigned the value of true or
false. The last sentence is not a statement because there are some numbers which
satisfy the equation, while others do not. We shall later impose more structure on
the sentence to make it a statement. Notice that we do not have to know the truth
value before deciding if a sentence is a statement. All we need is that a truth value
can in principle be determined.
2
1.2. LOGICAL CONNECTIVES
p q p⇒q
T T T
T F F
F T T
F F T
That is, an implication is false when the hypothesis is true but the conclusion is
false. When the hypothesis is false, the implication is true whether the conclusion is
true or false.
Example 1.2. Consider the implication “If I am the queen of England, then I would
make you a knight.” Since the hypothesis is false (I am not the queen), whether I
keep my promise or not is irrelevant. Therefore I do not make a false statement.
Implication is one of the most common forms of logical statements. There are
many different ways to express the same idea. The following statements all mean “If
p then q.”
• p implies q.
• q whenever p.
• q if p.
• p only if q.
3
CHAPTER 1. LOGIC AND PROOF
Converse: q ⇒ p.
Inverse: (∼ p) ⇒ (∼ q).
Contrapositive: (∼ q) ⇒ (∼ p).
In other words, p is equivalent to q when p implies q and the converse is also true.
Often we write equivalent statements as “p if and only if q”. All definitions are
equivalent statements. However, the following example illustrates that we use “if”
in definitions when we actually means “if and only if”.
• p: n is an odd integer.
• q: n = 2m + 1 where m is an integer.
Intuitively, equivalent statements have the same meaning and therefore can be
used as alternative definitions. Often we call finding equivalent statements charac-
terization of the original statement.
∃ x ∈ R 3 x2 + 1 = 0.
4
1.3. QUANTIFIED STATEMENTS
The symbol ∃ above means “there exists” and is formally called an existential
quantifier. The symbol ∈ means “in” or “belongs to”, and R denotes the set of
real numbers (we shall define them in details in the next chapter). The symbol 3
reads “such that”, which is sometimes abbreviated as “s.t.” The statement means
that there is at least one real number x that satisfies the equation that follows,
which in this case is false. The reason is that no real number satisfies that equation
√
x2 + 1 = 0, since it implies that x = ± −1. We can, however, make the statement
true by expanding the choice of x to include the set of complex numbers, C:
∃ x ∈ C 3 x2 + 1 = 0.
∀ x ∈ R, x2 ≥ 0.
The symbol ∀ denotes “for all” or “for every”. The statement requires that all
members of the set R satisfy the condition that follows. In other words, if we can
find even one real number x such that x2 < 0, then the statement is false. For
example, the statement
∀ x ∈ R, x2 > 0 (1.1)
∀ x, p(x).
The above statement is true if every x in the context of our discussion makes p(x)
true. On the other hand, if there is one x such that p(x) is false, the universal
statement is false. It follows that the negation of a universal statement, ∼ (∀ x, p(x))
is equivalent to
∃ x 3 ∼ p(x).
5
CHAPTER 1. LOGIC AND PROOF
∃ x ∈ R 3 x2 ≤ 0,
which reads “there exists a real number such that its square is less than or equal to
zero.”
The general form of an existential statement is
∃ x 3 p(x). (1.2)
For the statement to be true, we need to find at least one x such that p(x) is true.
On the other hand, the statement is false if every x in question makes p(x) a false
statement. Therefore the negation of the existential statement in (1.2) is equivalent
to
∀ x, ∼ p(x).
When we write quantified statements involving more than one quantifiers, their
order is important because it can affect the meaning of the statement. Consider the
following statements under the context of real numbers:
∃ x 3 ∀ y, x > y, (1.3)
and
∀ y, ∃ x 3 x > y. (1.4)
Statement (1.3) means that we can find a real number that is greater than any real
number, which is false. On the other hand, statement (1.4) asserts that for any given
real number, we can find a greater one.
6
1.4. MATHEMATICAL PROOF
(p ⇒ q) ⇔ (p ⇒ p1 ⇒ p2 ⇒ · · · ⇒ q2 ⇒ q1 ⇒ q).
1
For a more detailed discussion of axioms and proof see Devlin (2002).
7
CHAPTER 1. LOGIC AND PROOF
That is, we can take a number of intermediate steps between p and q, and in each
step we arrive at a new true statement (p1 , p2 , . . . , q2 , q1 , etc.). We know that the
intermediate conditional statements are true by invoking axioms, definitions, and
proven results.
For example, suppose we want to prove that if n is a positive integer, then
2
n + 3n + 8 is an even number. The key ingredients of this conditional statement are
For a direct proof, the first step is to assume that p is true, that is, n is a positive
integer. It is not obvious at this stage how we can arrive at the conclusion q. One
strategy is to do some backward thinking about what statement would implies q,
that is, what is q1 ? In this case the definition of an even number is helpful. An even
number can be expressed as 2k, where k is an integer. Therefore somehow we must
be able to express n2 + 3n + 8 as 2k. Another useful property is that (n + 1)(n + 2)
is even since one of n + 1 or n + 2 is even and the product of an even number with
any integer is even. Now we can write down the formal proof as follows.
n2 + 3n + 8 = n2 + 3n + 2 + 6
= (n + 1)(n + 2) + 6
= 2m + 2(3)
= 2(m + 3)
= 2k,
You should be able to write down the intermediate statements in the above proof.
8
1.4. MATHEMATICAL PROOF
The method depends on the following equivalent statements (see exercise 10):
(p ⇒ q) ⇔ [(∼ q) ⇒ (∼ p)].
Therefore we proceed by assuming the negation of the conclusion is true and show
that negation of the hypothesis is true.
Example 1.4. Suppose that a < b and f (x) is a real-valued function. Prove that if
Z b
f (x) dx 6= 0,
a
and we need to prove p ⇒ q. You can try the direct proof but the contrapositive of
the statement provides an easy way out. The negations of p and q are
Rb
• ∼ p : a f (x) dx = 0,
Proof. Suppose that a < b and f (x) is a real-valued function. If f (x) = 0 for all x
Rb
between a and b, then by the definition of integration a f (x) dx = 0.
(p ⇒ q) ⇔ [p ∧ (∼ q) ⇒ c],
9
CHAPTER 1. LOGIC AND PROOF
true but the conclusion is false. Then we show that this implies a statement that is
known to be false or contradicts our assumption.
• q: x + y is irrational.
Proof. Suppose that x is rational and y is irrational but x+y is rational. By definition
x = a/b and x + y = c/d, where a, b, c, d are integers and b, d 6= 0. It follows that
c a bc − ad
y= − = .
d b bd
Since bc − ad and bd are integers and bd = 6 0, y is rational. This contradicts our
assumption that y is irrational. Therefore x + y is irrational.
p ⇔ (∼ p ⇒ c).
That is, instead of showing directly the statement p is true, we can show that its
negation implies a contradiction.
√
Example 1.6. The number 2 is irrational.
√ √
Proof. Assume on the contrary that 2 is rational. Hence 2 = m/n for some
√
integers m and n with no common factors. It follows that m = 2n so that m2 = 2n2 .
This means that m2 and therefore m are even integers. Writing m = 2k for some
integer k, we have 2n2 = m2 = 4k 2 , or n2 = 2k 2 . This implies that n is also even,
which contradicts our assumption that m and n have no common factors.
10
1.4. MATHEMATICAL PROOF
Of course the role of the statements q and r are symmetric. That is, we can show
(p ∧ (∼ r)) ⇒ q instead.
That is, we show that both cases ensure the conclusion r. The following example
employs this technique at the end of the proof.
Example 1.8. Show that there exists irrational numbers x and y such that xy is
rational.
√
Proof. We have established in example 1.6 above that 2 is irrational. But we
observe that √ √2
√ 2 √ (√2√2) √ 2
2 = 2 = 2 = 2,
√ √2
which is rational. Now we have two cases. First, if 2 is rational, then we let
√ √ √2 √ √2 √
x = y = 2. Second, if 2 is irrational, then we let x = 2 and y = 2 as
required.
√ √2
Notice that in the above example we do not know whether 2 is rational or
irrational. But in both cases we can find two irrational numbers to give the results.
11
CHAPTER 1. LOGIC AND PROOF
Example 1.9. Suppose that n is an integer. Prove that n is odd if and only if n2 is
odd.
Proof. In the first part we show that n is odd implies that n2 is odd. By definition
of an odd number, n = 2k + 1, for some integer k. Therefore
Example 1.10. Prove that for all real number x, there exists a number y such that
y > x.
12
1.4. MATHEMATICAL PROOF
Proof. Suppose that x is any arbitrary real number. Let y = x + 1. Then y > x as
required.
Example 1.11. Prove that the square of all odd integers can be expressed as 8k + 1
for some integer k.
Proof. Let n be any odd integer. Then n = 2m + 1 for some integer m. Now
Example 1.12. Prove that there exists a real number x such that for all y and z,
if y > z, then y > x + z.
Proof. Choose x = 0. Then for all real numbers y and z such that y > z, y >
0 + z.
Example 1.13. Suppose that f (n) = 1 − 1/n for n = 1, 2, 3, . . .. For all > 0, show
that there exists a positive integer N such that for all n > N ,
We need to take any arbitrary value of a positive and then find an N that all
n > N satisfy the inequality in (1.5). Now
Therefore we need to have 1/n < , or n > 1/. Hence we can let N = d1/e, where
dxe is the ceiling function, defined as the smallest integer greater than the number
x. For example, if = 0.07, then N = d1/0.07e = d14.29e = 15. Here is the formal
proof.
Proof. Given any > 0, let N = d1/e. Then for all n > N ,
as required.
13
CHAPTER 1. LOGIC AND PROOF
14
1.6. EXERCISES
the statement. A lot of simple proofs just need a few steps between the definitions.
In making those steps, recall any known results that you or others have developed.
Sometimes an example or a diagram may help you to clarify some ideas. Keep in
mind, however, that examples and diagrams are not formal proofs. One exception
is that you are finding a counter-example to disprove a universal statement. For
example, to disprove a universal statement such as ∀x ∈ R, p(x), all you need is
to find one number x such that p(x) is false. On the other hand, to prove such a
statement, always begin your proof with “Let x be a real number. . . . .”
My final advice is that studying mathematics is not unlike learning a foreign
language or participating in a sport. Practising is as important as reading and
memorizing. Therefore working on the exercises at the end of each chapter is an
important part of developing intuition, stimulating creativity, and gaining experi-
ence. Over time you will be able to acquire that ellusive quality called mathematical
maturity. As one Olympic swimmer once said, “To be a good swimmer, you have to
feel the water.”
1.6 Exercises
1. Determine whether each of the following sentences is a statement. Provide a brief
justification.
15
CHAPTER 1. LOGIC AND PROOF
(a) All presidents of the United States, past or present, are men.
(b) Given the function f (x) = x2 . If f (x1 ) = f (x2 ), then x1 = x2 .
(c) If the equilibrium price of a commodity goes up, then the equilibrium quan-
tity must decrease.
(d) For all > 0, there exist δ < 0 such that δ ≥ .
(e) This statement is not true.
(a) Express the statement in four simple statements p, q, r, and s with logical
symbols ⇒, ∀, ∃, 3, ∨, ∧, etc.
(b) Find the negation of the statement in logical symbols.
(a) ∼ (p ∧ q) ⇔ (∼ p) ∨ (∼ q).
(b) ∼ (p ∨ q) ⇔ (∼ p) ∧ (∼ q).
16
1.6. EXERCISES
Is the second statement the logical inverse of the first statement? Discuss.
11. Use a truth table to show that each of the following statement is a tautology:
(a) [∼ q ∧ (p ⇒ q)] ⇒ ∼ p.
(b) ∼ (p ⇒ q) ⇔ (p ∧ ∼ q).
13. A prime number is a natural number that is divisible by 1 and itself only. Let P
be the set of all prime numbers. Consider the statement “If n is a prime number,
then 2n + 1 is also a prime number.”
17
CHAPTER 1. LOGIC AND PROOF
15. In each part below, the hypotheses are assumed to be true. Use the tautologies
from table 1.1 to establish the conclusion. Indicate which tautology you are using
to justify each step.
(a) Hypotheses: r, ∼ t, (r ∧ s) ⇒ t
Conclusion: ∼ s
(b) Hypotheses: s ⇒ p, s ∨ r, q ⇒ ∼ r
Conclusion: p ∨ ∼ q
(c) Hypotheses: ∼ s ⇒ t, t ⇒ r
Conclusion: ∼ r ⇒ s
(d) Hypotheses: r ⇒ (∼ p ∨ q)
Conclusion: p ⇒ (q ∨ ∼ r)
(a) Hypotheses: r ⇒ ∼ p, ∼ r ⇒ q
Conclusion: p ⇒ q
(b) Hypotheses: q ⇒ ∼ p, r ⇒ s, p ∨ r
Conclusion: ∼ q ∨ s
(c) Hypotheses: p ⇒ s, s ⇒ q, ∼ q
Conclusion: ∼ p
(d) Hypotheses: ∼ s ⇒ ∼ p, (∼ q ∧ s) ⇒ r
Conclusion: p ⇒ (q ∨ r)
(a) 2n is divisible by 4.
(b) If n2 is an odd number, then n is an odd number.
18. Let n be a positive integer. Prove or disprove the following two statements:
18
1.6. EXERCISES
22. Prove or disprove: For every positive integer n, the function given by
f (n) = n2 + n + 13
24. Let m and n be integers. Prove that mn is even if and only if m is even or n is
even.
19
CHAPTER 1. LOGIC AND PROOF
20
Chapter 2
We describe the set A by listing all of the elements between the braces { }. Elements
are separated by commas. Although the elements seem to be unrelated objects but
nevertheless their meanings are well-defined. The set B contains the integers from
1 to 9. The elements represented by the ellipsis “. . . ” are to be understood in the
context. The set C contains any element x which satisfies the property described
21
CHAPTER 2. SETS AND RELATIONS
In general, if p(x) is a statement whose truth value depends on the variable x, then
the set {x : p(x)} contains all elements x such that p(x) is true.
Being a general idea, a set can contain elements which are themselves sets. For
example, S = {A, B, C}, where A, B, and C are sets defined above. A set of sets is
often called a class or a collection.
Given two sets A and B, we say A is a subset of B, or A ⊆ B, if x ∈ A implies
that x ∈ B. That is, all the elements belongs to A are also in B. Two sets are equal
if they contain the same elements. In other words,
(A = B) ⇔ [(A ⊆ B) ∧ (B ⊆ A)].
Real Numbers R
√
Complex Numbers C = {a + bi : a, b ∈ R, i = −1}
The set of natural numbers N and the number zero can be formally constructed with
set theory using an idea called ordinal. Alternatively, we can accept the fact that
many animals, including humans, have the innate ability to count and to add.1 Then
given a natural number n, we ask the question whether there exists a number x such
that n + x = 0. This gives rise to the idea of negative integers. Next, given two
integers n and m, we ask the question if there exists a number x such that n = mx.
The results are the rational numbers. We shall not give a precise definition of the
set of real numbers here. Interested readers can find the process of constructing
1
See Holt (2008) or Economist (2008).
22
2.1. BASIC NOTATIONS
real numbers from the rational numbers in many books on analysis.2 The set R
√ √
contains all rational numbers in Q and all irrational numbers such as 2, 5, π, etc.
Geometrically, it contains all points on the real line. You should convince yourself
that
N ⊂ Z ⊂ Q ⊂ R ⊂ C.
P(S) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.
These are called finite sets because they contain a finite number of elements. Sets
whose elements can be listed one by one are called countable or denumerable.
For example, N, Z, and Q are countable sets. Also, any finite set is countable. On
the other hand, R and C are uncountable.
Theorem 2.1. Let S be a finite set with cardinality |S| = n. Then |P(S)| = 2n .
Proof. We shall give an informal proof here. In each subset of S, an element is either
in the set or not in the set. Therefore there are two choices for each element. The
total number of combinations is therefore
2 × 2 × · · · × 2 = 2n
23
CHAPTER 2. SETS AND RELATIONS
Since the natural numbers are unbounded, there is an infinite number of them.
This is also true for Z, Q, R, and C. It turns out, however, that there are different
sizes of infinities. The cardinality of the set of natural numbers is denoted by |N| = ℵ0
(pronounced as aleph naught), while the cardinality of the set of real number is
denoted by |R| = c. Some facts on cardinality are listed as follows.3
(a, ∞) = {x ∈ R : x > a}
represents the set of all real numbers greater than a. We do not generally treat ∞
as a number. More importantly, infinity is implied by the logical structure of the
number systems. As a concept in science, however, it is unreachable, unobservable,
and untestable. This distinction between mathematics and reality is emphasized by
Ellis (2012) in what he calls Hilbert’s Golden Rule:
A ∩ B = {x : (x ∈ A) ∧ (x ∈ B)}.
3
For a formal treatment, see for example Gerstein (2012). A brief discussion can be found in
Matso (2007).
24
2.2. SET OPERATIONS
U U
A B A B
It is a trivial exercise to show that (A ∩ B) ⊆ (A ∪ B). Two sets A and B are said
to be disjoint if they have no element in common. That is, A ∩ B = ∅.
The difference of two sets, denoted by A \ B, is the set which contains the
elements in A but not in B:
A \ B = {x : (x ∈ A) ∧ (x ∈
/ B)}.
The set Ac is called the complement of A, which contains all the element not in A.
In other words, Ac = U \ A.4
4
Some writers denote the complement of A by A0 , while others prefer Ā.
25
CHAPTER 2. SETS AND RELATIONS
These are examples of indexed collection of sets. The set N is called an index
set. The index set can be finite, infinitely countable, or uncountable.
Let S be a set and I an index set. We say that an indexed collection of subsets
of S, {Aα ⊆ S : α ∈ I}, is a partition of S if
3. ∪α∈I Aα = S.
26
2.2. SET OPERATIONS
A Delicious Partition
A1 = {x ∈ S : x is a Monday},
A2 = {x ∈ S : x is a Tuesday},
..
.
A7 = {x ∈ S : x is a Sunday}.
In defining a set, the order of the elements in the list is not important. For
example, {a, b} = {b, a}. In some situations, however, we want to define an ordered
set. For example, an ordered pair of two elements a and b is defined as
whereas
(b, a) = {{b}, {a, b}}.
It is obvious that (a, b) 6= (b, a). Therefore, given two ordered pairs, (a, b) = (c, d) if
and only if a = c and b = d. We are now ready to define the Cartesian product
of two sets A and B:
A × B = {(a, b) : a ∈ A, b ∈ B}.
27
CHAPTER 2. SETS AND RELATIONS
A1 × A2 × · · · × An = {(a1 , a2 , . . . , an ) : ai ∈ Ai , i = 1, 2, . . . , n}.
A × B = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3)}.
Generally, if A and B are finite sets with |A| = n and |B| = m, then |A × B| = nm.
Example 2.7. The set of complex numbers C is the product set R2 with two oper-
ations, namely addition and multiplication. For any (a, b) and (c, d) in C,
The first number a in (a, b) is call the real part and the second number b is the
imaginary part.
Example 2.8. Define R+ = {x ∈ R : x ≥ 0}, that is, the set of nonnegative real
numbers. In consumer analysis, x = (x1 , x2 , . . . , xn ) ∈ Rn+ is called a consumption
bundle with n goods and services. Each component xi in the n-tuple represents the
quantity of product i in the bundle. Similarly, define R++ = {x ∈ R : x > 0}, the set
of positive real numbers. Then the corresponding prices of the goods and services
in a bundle is the n-tuple p = (p1 , p2 , . . . , pn ) ∈ Rn++ . Total expenditure is therefore
the dot product of the two n-tuples, defined by
p · x = p 1 x1 + p 2 x2 + · · · + p n xn .
28
2.3. BINARY RELATIONS
π = p · y = p1 y1 + p2 y2 + · · · + pn yn .
Example 2.10. Let N be a set of nuts and B be a set of bolts. We can define
(n, b) ∈ R if the size of nut n fits the size of bolt b.
Example 2.11. Let A be the set of people. A relation R on A can be defined as “is
the mother of”. The statement “Mary is the mother of James.” can be expressed as
(Mary, James) ∈ R.
Example 2.12. A relation R on N can be defined as “is a prime factor of”. For
examples, (5, 20) and (7, 56) are in R but (3, 13) ∈
/ R.
Example 2.14. Let Rn+ be the consumption set of n goods and services of a con-
sumer. We can define a preference relation % on Rn+ . For consumption bundles
a, b ∈ Rn+ , a % b means that bundle a is at least as good as bundle b.
Given a relation R between A and B, we call the set A the domain of R and
B the codomain of R. The range of R is defined as the set of elements in the
codomain that is in the relation. That is,
By definition the range is a subset of the codomain of a relation. We can also define
an inverse relation R−1 from B to A such that
b R−1 a ⇔ a R b,
29
CHAPTER 2. SETS AND RELATIONS
or formally,
R−1 = {(b, a) ∈ B × A : (a, b) ∈ R}.
Complete: (a R b) ∨ (b R a).
Reflexive: a R a.
Symmetric: (a R b) ⇒ (b R a).
Asymmetric: (a R b) ⇒ ∼ (b R a).
Example 2.15. Consider the relation “divides” on Z from example 2.13. As the
example shows, not any pair of numbers divides each other so the relation is not
complete. It is true that for all n ∈ Z, n|n. Therefore the relation is reflexive. If
m|n, it is not necessary that n|m unless m = n. It follows that the relation is not
symmetric but antisymmetric. If m|n and n|p, then by definition there exists k, l ∈ Z
such that n = km and p = ln. Hence p = (lk)m so that m|p but in general p - m.
We conclude that | is transitive but not circular. Finally, reflexivity implies that the
relation is not asymmetric.
Example 2.16. Let S be a set. The relation ⊆ on the power set of S, P(S), is a
partial order.
A complete partial order is called a linear order. For example, the relation ≥
on R is a linear order. Notice that completeness implies reflexivity (but the converse
is not true) so the latter assumption is redundant.
Another important class of relations in mathematics is the equivalence rela-
tion, which is defined as a relation R on a set S that is reflexive, transitive, and
30
2.3. BINARY RELATIONS
symmetric. The relation “=” on the set of real numbers R is an example of equiv-
alence relation. An equivalence relation is often denoted by an ordered pair (S, ∼),
where we use the symbol ∼ in place of R. For any a ∈ S, we define the equivalence
class of a as
∼ (a) = {x ∈ S : x ∼ a}.
The following theorem shows that the collection of the distinct equivalence classes
in fact forms a partition of the set.
Proof. By reflexivity, for every a, b ∈ S, ∼ (a) and ∼ (b) are nonempty. Suppose
that ∼ (a) ∩ ∼ (b) 6= ∅. Then there exists an x ∈ ∼ (a) such that x is also in
∼ (b). This means that a ∼ x and x ∼ b, and by transitivity, a ∼ b. It follows that
∼ (a) = ∼ (b). Finally, a ∈ ∼ (a) for all a ∈ S. Therefore ∪a∈S ∼ (a) = S. We
conclude that {∼ (a) : a ∈ S} is a partition of S.
We can also show that the converse of theorem 2.2 is also true.
Theorem 2.3. Suppose that I is an index set and {Aα ⊆ S : α ∈ I}, is a partition
of S. Then there exists an equivalence relation (S, ∼) such that for each a ∈ S, the
equivalence class ∼ (a) = Aα for some α ∈ I.
31
CHAPTER 2. SETS AND RELATIONS
Example 2.18. In example 2.4 we partition the days in a year into seven subsets,
A1 , . . . , A7 , according to the day of the week. We can define an equivalence relation
as a ∼ b if a and b belong to the same day of the week. If a is a Monday, then
∼ (a) = A1 . If b is a Tuesday, then ∼ (b) = A2 , and so on.
With the above notations on an ordering (S, %), we can define different types of
intervals:
[a, b] = {x ∈ S : (x % a) ∧ (b % x)},
(a, b) = {x ∈ S : (x a) ∧ (b x)},
(a, b] = {x ∈ S : (x a) ∧ (b % x)},
[a, b) = {x ∈ S : (x % a) ∧ (b x)}.
The first definition is called a closed interval, the second an open interval. The
last two definitions are sometimes called half-open intervals. Unbounded intervals
are expressed as contour sets. In particular, for any a ∈ S, the upper contour set
32
2.3. BINARY RELATIONS
of a is defined as
% (a) = {x ∈ S : x % a},
- (a) = {x ∈ S : a % x}.
Example 2.20. The countries which constitute the United Nations are supposed to
be equal. Therefore the head of states of the countries are unrelated to each other in
their lines of authority so that all of them are maximal elements at the U.N. Within
its own government, however, each head of state is the greatest element in political
ranking.
Example 2.21. In consumer theory we usually assume that the consumption set
Rn+ has no greatest element under the preference relation % since the set Rn+ is
unbounded. Nevertheless, the choice set of the consumer is always a bounded subset
of Rn+ because of financial and other constraints. A maximal element in the choice
set is called an optimal bundle.
B = {s ∈ S : s % x, x ∈ A}.
Then the least element or a minimal element in B is called the least upper bound
or supremum of A, which is denoted by sup A. Notice that sup A may or may not
33
CHAPTER 2. SETS AND RELATIONS
Theorem 2.4. Let S be a linear order and suppose that A is a nonempty bounded
subset of S. Then sup A is unique.
Proof. Suppose that x and y are both sup A. Then by definition x % y and y % x.
Since a linear order is antisymmetric, x = y.
1. complete,
2. transitive,
3. antisymmetric,
That is, S is a linearly ordered set with a least element for every subset. The sets
Z, Q, and R are not well-ordered with respect to the linear order ≥. For example,
the set of negative integers, {. . . , −3, −2, −1}, does not have a least element. The
set {1/n : n ∈ N}, either as a subset of Q or of R, do not have a least element as
well. It turns out that the set of natural numbers N satisfies the requirements. We
generally regard this property as an axiom called the Well-Ordering Principle:
34
2.4. WELL-ORDERING AND MATHEMATICAL INDUCTION
With the well-order principle, we can prove a useful tool in proof called the Prin-
ciple of Mathematical Induction. It involves proving statements that depend
on the set of natural numbers, which economists frequently encounter in dynamic
models.
Theorem 2.5. Let p(n) be a statement whose truth value depends on n ∈ N. Suppose
that
1. p(1) is true,
Proof. Suppose that p(1) is true and p(k) implies p(k + 1) for any integer k >
1. Suppose on the contrary that there exists m such that p(m) is false. Let the
nonempty set
A = {m ∈ N : p(m) is false}.
By axiom 2.1, there exists a number l ∈ A such that m ≥ l for all m ∈ A. Since
p(1) is true, l must be greater than 1. Then l − 1 ∈ / A since l is the least element
in A. This means that p(l − 1) is true. By hypothesis p(l) is true, contradicting the
fact that l ∈ A. We conclude that p(n) is true for all n ∈ N.
Proof. For n = 1, ni=1 i = n(n + 1)/2 = 1 so that p(1) is true. Next, assume that
P
k
X k(k + 1)
i= .
i=1
2
35
CHAPTER 2. SETS AND RELATIONS
Then
k+1
X k
X
i = i + (k + 1)
i=1 i=1
k(k + 1)
= + (k + 1)
2
(k + 1)(k + 2)
= .
2
This shows that p(k + 1) is true and so by induction p(n) is true for all n ∈ N.
2k+1 = 2(2k )
≥ 2(k 2 )
≥ k 2 + 5k (since k > 4)
= k 2 + 2k + 3k
≥ k 2 + 2k + 1
= (k + 1)2 .
2.5 Functions
Let A and B be nonempty sets. A function f from A to B, denoted by f : A → B,
is a relation such that each element in A is related to at most one element in B.
Instead of (a, b) or a R b, we write f (a) = b. Therefore, if f (a) = b and f (a) = c,
then b = c in B. A function is often called a mapping or a transformation. As
defined in a relation before, the set A is the domain of f , and B is the codomain.
36
2.5. FUNCTIONS
The image of the domain of f is called the range of f , that is, range f = f (A). It
is clear that the range of a function is a subset of the codomain. For any D ⊆ B,
the pre-image of D, denoted by f −1 (D), is a subset in A such that
Injective: Each element in the codomain has at most one pre-image. That is, for
all a, c ∈ A, if f (a) = f (c), then a = c. Injective functions are often called
one-to-one functions.
Surjective: For each b ∈ B, there exists a ∈ A such that f (a) = b. This implies
that the range of f is equal to the codomain. Surjective functions are often
called onto functions
37
CHAPTER 2. SETS AND RELATIONS
Example 2.26. The function defined in example 2.25 is not everywhere defined
since the element 3 has no image. It is not injective because 2 and 4 have the same
image. Also, it is not surjective since b has no pre-image.
where θ ∈ [0, 2π). It rotates a point in the anticlockwise direction by the angle θ. It
can be shown that the function is a bijection.
f −1 (f (a)) = a, f (f −1 (b)) = b.
for any a ∈ A.
When the domain and the codomain of a function are the same set, we sometimes
call the function an operator on a set. Suppose that f is an operator on A. An
element p ∈ A is called a fixed point of f if f (p) = p. Fixed points are important
in economic analysis because they represents the equilibria or the steady states of
dynamical systems.
38
2.5. FUNCTIONS
Example 2.29. Let A = R \ {0} and let f : A → A be defined as f (x) = 1/x. Then
f is a bijection. The inverse function is f −1 (y) = 1/y. The points p = −1, 1 are
fixed points of f and f −1 .
Now let g : A → R be defined as g(x) = x2 . Then g ◦ f : A → R becomes
2a + 3 2b + 3
= .
a−1 b−1
Then (2a + 3)(b − 1) = (2b + 3)(a − 1), which can be reduced to a = b. Therefore f
is one-to-one. To show that f is onto, let y ∈ B and consider x = (y + 3)/(y − 2).
Then
2(y + 3)/(y − 2) + 3 2(y + 3) + 3y − 6
f (x) = = = y.
(y + 3)/(y − 2) − 1 y+3−y+2
Notice that f −1 (y) = (y + 3)/(y − 2).
Example 2.31. The following are some special functions that we frequently en-
counter. Here we let f : A → B be a function.
39
CHAPTER 2. SETS AND RELATIONS
f (x + y) = f (x) + f (y).
5. Suppose that the scalar multiplication αx is defined for all α > 0 and x ∈ A.
Then f is called linearly homogenous if
f (αx) = αf (x).
Contour sets can be defined using numbers instead of an element in S. For example,
the upper contour set
%f (x) = {a ∈ S : f (a) ≥ x}
consists of all the elements in S that have values greater than or equal to the number
x. The subscript f means that the ordering % is induced from f . Similarly the lower
contour set of x is
-f (x) = {a ∈ S : f (a) ≤ x}.
40
2.5. FUNCTIONS
f (x)
epi f (x)
x
x1 x2
Figure 2.2 depicts the epigraph of the function f on the interval (x1 , x2 ). On the
other hand, the hypograph is the subset below the graph:
(a %A b) ⇒ (f (a) %B f (b)).
(a %A b) ⇒ (f (b) %B f (a)).
41
CHAPTER 2. SETS AND RELATIONS
2.6 Exercises
1. Define the following sets with mathematical symbols:
3. Let M denote Mark Messier, E denote the Edmonton Oiler hockey team in 1988,
and H denote the National Hockey League. What are the relationships between
E, H, and M ?
A = {x ∈ R : x ≥ 0},
B = {x = y 2 : y ∈ R}.
Show that A = B.
(a) A = ∅
(b) B = {∅}
(c) C = {∅, {∅}}
(d) D = {∅, {∅}, {∅, {∅}}}
42
2.6. EXERCISES
7. Let
A = {a, b, c, d, e},
B = {1, 2, 3, 4, 5},
C = {x : x is an alphabet in the word ‘concerto’},
P = {x ∈ N : x is a prime number}.
(a) A \ (B ∪ C) ⊆ (A \ B) ∩ (A \ C),
(b) A∆B = (A ∪ B) \ (A ∩ B),
(c) A ∪ (U \ A) = U .
(a) (A ∩ B) ⊆ (A ∪ B),
(b) A \ B = A ∩ B c ,
(c) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
43
CHAPTER 2. SETS AND RELATIONS
(a) S = {a, b, c}, R = {(a, a), (b, b), (c, c), (c, a), (b, c), (b, a)}.
(b) S = {rock, scissors, paper}, a R b means “a beats b”.
(c) S is a set of people, a R b means “a is connected to b on Facebook”.
(d) S is the set of all commercial and investment banks in an economy, a R b
means “Bank a has an interbank loan from Bank b”.
19. Let S be the set of all straight lines in the plane. Define the relation R on S as
“is perpendicular to.” Determine if the relation is
(a) complete,
(b) reflexive,
(c) transitive,
(d) circular,
(e) symmetric,
(f) equivalence.
20. Let S be the set of all statements. Define a relation “⇒” on S which means
“implies”. Explain whether the relation is
(a) complete,
(b) reflexive,
(c) symmetric,
(d) transitive,
(e) anti-symmetric,
(f) asymmetric.
44
2.6. EXERCISES
21. Let S be a set. Show that the relation ⊆ on the power set of S, P(S), is a partial
order.
22. Determine if the relation R on N defined as “is a prime factor of” a partial order
or not.
23. Suppose that a relation R on a set A is reflexive and circular. Show that it is an
equivalence relation.
R = {(x, y) ∈ R × R : x − y ∈ Z}.
27. Suppose that (S, ∼) is an equivalence relation. Prove that for all a, b ∈ S, a ∼ b
if and only if ∼ (a) = ∼ (b).
28. Find the equivalence relation corresponding to the partition defined in exam-
ple 2.5.
29. Give the formal definitions of a minimal element and the least element of an
ordering (S, %).
30. Consider the ordered set (R, ≥). Let the set A = {−1} ∪ (0, 1).
45
CHAPTER 2. SETS AND RELATIONS
31. Let (S, %) be an ordering. For all x and y in S, define the following induced
relations as
(a) x ∼ y ⇔ x % y and y % x,
(b) x y ⇔ x % y and x y.
5 % 3 1 % 4 ∼ 2 % 6.
34. Let (S, %) be an ordering. Prove that for every x, y ∈ S, if x % y, then - (y) ⊆
- (x).
35. Let (S, ≥) be a linear order and suppose that A is a nonempty bounded subset
of S. Show that inf A is unique.
36. Suppose that % is a preference relation on the consumption set Rn+ . Let a ∈ Rn+ .
Is the indifference set ∼ (a) a bounded set? If yes find sup ∼ (a) and inf ∼ (a).
38. Show that for all n ≥ 4, n2 ≤ n!. (Recall that n factorial is defined as n! =
n(n − 1) · · · (2)(1).)
46
2.6. EXERCISES
where A ⊆ X.
(a) Is f one-to-one?
(b) Is f onto?
(c) What is the graph of f ?
(d) What is the hypograph of f ?
42. Determine whether the following functions on R are everywhere defined, one-to-
one, and onto.
(a) f (x) = ex ,
(b) g(x) = log x,
(c) h(x) = (g ◦ f )(x).
5x + 1
f (x) =
x−2
is a bijection.
47
CHAPTER 2. SETS AND RELATIONS
f (f −1 (B)) ⊆ B.
46. Suppose that A, B, and C are sets. Let %A , %B , and %C be ordered relations on
A, B, and C respectively.
is increasing.
(a) −f is decreasing,
(b) f + g is increasing,
(c) g ◦ f is increasing.
48. Let f and g be functionals on an ordered set (S, %). Suppose f is increasing and
g is decreasing. Prove that f − g is increasing.
48
Chapter 3
Basic Topology
1. ∅ and S are in T ,
For any set S, the collection T = {∅, S} is a trivial topology. On the other hand,
T = P(S), the power set of S, is called the discrete topology. In economics we often
focus on a specific type of topological space called metric space.
1. ρ(x, y) ≥ 0, ρ(x, x) = 0,
49
CHAPTER 3. BASIC TOPOLOGY
The first axiom requires that the metric or distance between two points must be
positive, unless x = y. The second axiom is about symmetry. It does not matter if
we measure the distance from x to y or from y to x. The last one is called triangular
inequality. It is the generalization of the fact that the sum of two sides of a triangle
is greater than or equal to the third side. We often denote a metric space by (X, ρ).
Any subset of X with the metric ρ also satisfies the above definition and is therefore
a metric space itself.
Example 3.1. The most important example of metric spaces is the Euclidean
space Rn with the metric
1/2
ρ(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2
, (3.1)
ρ(x, y) = kx − yk.
Open Ball: An open ball of a point p is a set Br (p) = {x ∈ X : ρ(x, p) < r}. The
positive number r is called the radius of the ball. Sometimes an open ball is
called a neighbourhood of p.
Limit Point: A point p is called a limit point of A if every open ball Br (p) contains
a point q ∈ A but q 6= p.
50
3.1. INTRODUCTION TO METRIC SPACE
Perfect Set: A closed set A is called perfect if every point in A is a limit point of
A.
Compact Set: A compact set in the Euclidean space can be defined as a closed
and bounded set.
3. The limit points of a set A do not necessary belong to A. For example, let
A = {1/n : n = 1, 2, . . .}. The point 0 is not in A but it is a limit point of A.
Note that A is neither open nor closed. What is Ā?
4. The closure of a set A can also be defined as the union of A and all its limit
points.
51
CHAPTER 3. BASIC TOPOLOGY
6. Separated sets are disjoint, but disjoint sets are not necessary separated. For
example, the intervals [a, b] and (b, c) are disjoint but not separated. What about
(a, b) and (b, c)?
7. The set of rational numbers Q is neither a close set nor an open set in R. It is,
however, dense in R. Is it connected?
11. The sets A = [3, π] and B = {−4, e, π} are both closed in R. But A is perfect
while B is not.
The proofs of some of the above statements can be found in Rudin (1976).
3.1.5 Sequences
A sequence is a list of points, x1 , x2 , . . . , in a metric space (X, ρ). If the list is
finite, it is similar to an n-tuple. Often we use the notation {xn } to represent an
infinite sequence. Formally, an infinite sequence is a function from N into X, with
f (n) = xn . Notice that not all points in a sequence need to be distinct. The set of
all the points in a sequence is called the range of {xn }. The sequence is said to be
bounded if the range has a finite diameter.
A sequence {xn } in a metric space (X, ρ) is said to converge to a limit x ∈ X if
for all > 0, there exists an integer N such that for all n > N , ρ(xn , x) < . Often
a converging sequence is denoted by
xn → x or lim xn = x.
n→∞
52
3.1. INTRODUCTION TO METRIC SPACE
Proof. Let > 0 be any positive real number. Let N = d1/e. Then for all n > N ,
Therefore xn → 0. Notice that the range of the sequence is infinite, and the sequence
is bounded.
The following result shows that the concept of the limit of a sequence and a limit
point of a set are related.
Theorem 3.1. Suppose that A is a subset of a metric space (X, ρ) and p is a limit
point of A. Then there exists a sequence in A that converges to p.
Therefore xn → p.
A Cauchy sequence {xn } in a metric space (X, ρ) is that for all > 0, there
exists an integer N such that ρ(xn , xm ) < if n ≥ N and m ≥ N . It can be
shown that any convergent sequence is a Cauchy sequence. If the converse is also
true, (X, ρ) is called a complete metric space. Euclidean spaces are complete.
Complete metric spaces have a number of interesting properties and are very useful
in economic analysis. Interested readers can consult Chapter 10 in Royden and
Fitzpatrick (2010) for more details.
A sequence {xn } of real numbers is said to be
Increasing and deceasing sequences are called monotone sequences. An useful fact
about a monotone sequence is that it converges if and only if it is bounded.
The concept of convergence is important in statistics.
Example 3.3 (Law of Large Number). Let X̄n = (X1 + X2 + · · · + Xn )/n denote
the mean of a random sample of size n from a distribution that has mean µ. Then
53
CHAPTER 3. BASIC TOPOLOGY
{X̄n } is a sequence in R. The Law of Large Number states that X̄n converges to µ
in probability, that is, for any > 0,
3.1.6 Series in R
Suppose that {an } is a sequence in R. The partial sum of {an } is a sequence {sn }
defined by
n
X
s n = a1 + a2 + · · · + an = ai .
i=1
We also call the sequence {sn } the infinite series or simply series, which is denoted
by ∞
P
n=1 an . We say that the series converges to a number s ∈ R if
∞
X
an = s.
n=1
Otherwise we say that the series diverges. Since R is a complete metric space, we can
apply the Cauchy criterion for convergence on the series. That is, ∞
P
n=1 an converges
if an only if for all > 0, there exists an integer N such that for m ≥ n > N ,
Xm
ai < . (3.2)
i=n
sn = a + ax + ax2 + · · · + axn−1 .
54
3.1. INTRODUCTION TO METRIC SPACE
sn − xsn = a − axn ,
or
a(1 − xn )
sn = .
1−x
The series converges if −1 < x < 1, that is,
∞
X a(1 − xn ) a
axn = lim sn = lim = . (3.3)
n=0
n→∞ n→∞ 1−x 1−x
55
CHAPTER 3. BASIC TOPOLOGY
x
−4 −3 −2 −1 0 1
where we have applied the geometric series in (3.3) to the last equality with a = 1
and x = 1/2. Therefore the series converges to a number that we call e. It can be
shown that e is irrational. An alternative definition is
n
1
e = lim 1 + .
n→∞ n
56
3.1. INTRODUCTION TO METRIC SPACE
Example 3.7. For every real number x, we define the exponential function as2
∞
x
X xn
e = .
n=0
n!
It can be shown that the series converges for every real number so that the function
is everywhere defined in R. The function is increasing so it is one-to-one. Some of
the important properties of the exponential function is listed below. For all x, y ∈ R,
1. ex > 0, e0 = 1;
3. ex+y = ex ey ;
5. the function increases faster than any power function of x, that is,
xn
lim = 0,
x→∞ ex
for all n ∈ N;
6. the function is a bijection from R to R++ , the inverse function is the natural
log function, f (y) = log y.
diverges to infinity. By exploiting the paradoxes of infinity, some scientists claim that
the series converges to −1/12. Overbye (2014) provides an entertaining description
of their claim.
2
The exponential function is also defined for complex numbers. But we shall not pursue the
analysis here.
57
CHAPTER 3. BASIC TOPOLOGY
ρX (x, p) < δ
implies that
ρY (f (x), f (p)) <
Example 3.9. We show that the function f : R → R given by f (x) = ex with the
Euclidean metric is continuous. That is, we have to show that for any p ∈ R and
> 0, we can find a δ > 0 such that whenever
ρX (x, p) = |x − p| < δ,
we have
ρY (f (x), f (p)) = |ex − ep | < .
Theorem 3.2. Let f be a function which maps a metric space X into a metric space
Y . The following statements are equivalent:
58
3.2. CONTINUOUS FUNCTIONS
4. Let {xn } be any sequence in X which converges to a point x. Then the sequence
{f (xn )} converges to f (x) in Y .
Proof. Proof of part 1 can be found in Rudin (1976, p. 89). Here we prove part
2 by contradiction. Assume that E is connected but f (E) is separated, that is,
f (E) = A ∪ B, where A, B ⊂ Y are non-empty and separated. Let G = E ∩ f −1 (A)
and H = E ∩ f −1 (B). Notice that E = G ∪ H and both G and H are non-empty
sets.
Since G ⊆ f −1 (A), G ⊆ f −1 (Ā). By the above corollary, Ā is closed implies that
f −1 (Ā) is also closed. Therefore Ḡ ⊆ f −1 (Ā), or f (Ḡ) ⊆ Ā. Since f (H) = B and
Ā ∩ B = ∅, we have f (Ḡ) ∩ f (H) = ∅. This implies that Ḡ ∩ H = ∅, otherwise any
x ∈ Ḡ ∩ H will have f (x) ∈ f (Ḡ) ∩ f (H).
By a similar argument we can show that G ∩ H̄ = ∅. Together these mean
that G and H are separated, which is a contradiction since E = G ∪ H and E is
connected.
59
CHAPTER 3. BASIC TOPOLOGY
Proof. Let M = sup f (E) and m = inf f (E). Since E is compact, then by the-
orem 3.3 f (E) is compact. It follows that M and m are in f (E) and the result
follows.
You should be able to proof the following results from the definition of continuity.
3.2.2 Semicontinuity
Theorem 3.4 implies that solutions for optimization problems on a compact set al-
ways exist if the objective function is continuous. In a typical economic application,
however, we only need to find either the maximum or the minimum but not both.
In these situations we can relax the continuity requirement a little bit. For exam-
ple, for a maximum solution to exist, we require the objective function to be upper
semicontinuous.
Formally, a functional f on a metric space X is upper semicontinuous if for
every α ∈ R, the upper contour set
{x ∈ X : f (x) ≥ α}
60
3.2. CONTINUOUS FUNCTIONS
y y
• ◦
◦ •
x x
x0 x0
ρX (p, q) < δ,
then
ρY (f (p), f (q)) < .
61
CHAPTER 3. BASIC TOPOLOGY
Theorem 3.2 guarantees that the image of a convergent sequence is also conver-
gent under a continuous mapping. The following example shows that this is not
necessary so for a Cauchy sequence.
Example 3.10. Define the continuous function f (x) = log x on R++ = (0, ∞).
Consider the Cauchy sequence {xn } = {e−n }. The image of the sequence with f is
{yn } = {−n}, which is not a Cauchy sequence.
Theorem 3.8. Let f be a uniformly continuous function which maps a metric space
X into a metric space Y and let {xn } be a Cauchy sequence in X. Then the image
{f (xn )} is also a Cauchy sequence in Y .
In example 3.10, f is not uniformly continuous on R++ . On the other hand, R++
is not complete and so the Cauchy sequence {xn } = {e−n } is not convergent.
62
3.3. SEQUENCES AND SERIES OF FUNCTIONS
Suppose that f : [a, b] → [a, b] is continuous. Then there exists at least one
fixed point. To show this, let g(x) = f (x) − x on [a, b]. Then g(a) ≥ 0 ≥ g(b).
By theorem 3.5 there exist a point x ∈ [a, b] such that g(x) = 0, which implies
that f (x) = x. An extension of this result to a more general setting is as follows.
Interested readers can consult McLennan (2014) for extensive discussions on fixed
point theorems.
fn (x) = xn .
63
CHAPTER 3. BASIC TOPOLOGY
Example 3.11 shows that although each of the functions fn in the sequence is
continuous on the interval [0, 1], the limit function f (x) is not. In other words, when
x = 1,
lim lim fn (t) 6= lim lim fn (t).
t→x n→∞ n→∞ t→x
It turns out that properties in differentiation and integration are not preserved under
pointwise convergence as well. We need a stricter criterion for convergence that can
carry over these properties to the limit function.
for all x ∈ S. The key difference between pointwise convergence and uniform con-
vergence is that, in the former N depends on and x, whereas uniform convergence
means that N depends only on . In other words, inequality (3.4) is satisfied uni-
formly for all x ∈ S once N is chosen. A formal definition for uniform convergence
P
of the series fn (x) is left as an exercise. Not surprisingly, the Cauchy criterion
applies to uniform convergence.
for all x ∈ S.
64
3.3. SEQUENCES AND SERIES OF FUNCTIONS
For any > 0, let N > 1/. Then for all n > N ,
x
|fn (x)| =
1 + nx2
1 |x|
=
n 1/n + x2
1 |x|
≤
n 1 + x2
1 1
< < < .
n N
This shows that {fn (x)} converges uniformly to f (x) = 0 on R.
Example 3.13 (Central Limit Theorem). Let X̄n = (X1 + X2 + · · · + Xn )/n denote
the mean of a random sample of size n from a distribution that has mean µ and
variance σ 2 . Let √
n(X̄n − µ)
Zn = .
σ
Then {Zn } is a sequence of random variables with probability density functions {fn }.
The Central Limit Theorem states that {fn } converges uniformly to the probability
density function of a normal distribution with mean 0 and variance 1. That is, the
limit function is
1 2
f (x) = √ e−x /2 .
2π
This is a remarkable result because any distribution with a finite mean and variance
converges to the standard normal distribution, and knowledge of the functional form
of fn is not necessary.
Theorem 3.12. Suppose that {fn } is a sequence of continuous functions that con-
verges uniformly to a limit function f on a set S. Then f is continuous on S.
Interested readers can consult Rudin (1976, chapter 7) for a proof of this result
and the relation between uniform convergence and differentiation and integration.
For series of functions, there is a test for uniform convergence.
65
CHAPTER 3. BASIC TOPOLOGY
Proof. The proof uses the Cauchy criterion for series of functions. That is, suppose
P
Mn converges. Then for any > 0, there exists an integer N such that for all
m, n > N ,
X n X n
fi (x) ≤ Mi < ,
i=m i=m
P
for all x ∈ S. Therefore fn (x) converges uniformly on S.
where F is the aggregate production function, kt is the capital stock, and δ is the
depreciation rate of capital. The function F is usually assumed to be increasing and
concave, reflecting decreasing returns to scale. Then the economy will converge to
an equilibrium kt = k ∗ . From (3.5) the maximum consumption is cM = F (k ∗ ), which
means that the economy consume the whole output and makes zero investment, that
is,
kt+1 − (1 − δ)kt = 0.
66
3.4. EXERCISES
3.4 Exercises
1. Let S = {a, b, c}. Show that
is a topology on S.
Prove that (X, ρ) is a metric space. Give an example of proper subset of X that
is open. Are there any closed sets and compact sets?
3. Show that the set Rn with the distance function ρ(x, y) = maxni=1 |xi − yi | is a
metric space.
4. (Symbolic dynamics) Suppose that the set Ω2 contains all bi-infinite sequences of
two symbols, say 1 and 2. A point a ∈ Ω2 is the sequence
a = {. . . , a−2 , a−1 , a0 , a1 , a2 , . . . },
can represent two distinct sequences, one with a0 = 2 and the other with a0 = 1.
The metric between two points a, b ∈ Ω2 is defined by
∞
X δ(ai , bi )
ρ(a, b) = |i|
,
i=−∞
2
where
0 if a = b ,
i i
δ(ai , bi ) =
6 bi .
1 if ai =
67
CHAPTER 3. BASIC TOPOLOGY
5. Determine whether the following sets are (i) open, (ii) closed, (iii) both open and
closed, or (iv) neither open nor closed:
(a) Z ⊆ R
(b) {(x, y) : 1 < x < 2, y = x} ⊆ R2 ,
(c) {x ∈ Q : 1 ≤ x ≤ 2} ⊆ R
6. Prove or disprove: Let A and B be connected in a metric space. Then there exists
a boundary point in one of the sets which belongs to the other set.
7. Consider the Euclidean metric on R. Let A = (−1, 1) ∪ (1, 2), that is, A is the
union of two intervals. Answer the following questions with explanations.
68
3.4. EXERCISES
10. Consider the Euclidean space R2 . Let A = {(x, 0) : a < x < b}, that is, A is an
open interval on the horizontal axis. Determine and explain whether A is
(a) open,
(b) closed,
(c) perfect,
(d) compact.
11. Prove that a set is closed if and only if its complement is open.
13. Let (X, ρ) be a metric space. Suppose that A and B are separated subsets of X.
Prove or disprove:
Ā ⊆ B c .
14. Define the sequence {xn } as {1, 1/2, 1/4, 1/6, . . .}.
69
CHAPTER 3. BASIC TOPOLOGY
(c) xn = (−1)n .
(d) xn = 1 + (−1)n /n.
In each case determine if the sequence is bounded and if the range is finite.
16. Let {xn } be sequence in a metric space (X, ρ) which converges to a point a. Let
A denotes the range of {xn }. Is a always a limit point of the set A?
18. Let {xn } and {yn } be two sequences in a metric space (X, ρ).
(a) Show that limn→∞ xn = x if and only if for all r > 0, the open ball Br (x)
contains all but finitely many terms of {xn }.
(b) Suppose limn→∞ xn = x and ρ(xn , yn ) → 0. Show that limn→∞ yn = x.
(c) Consider the case that X = R so that the metric is ρ(x, y) = |x−y|. Suppose
limn→∞ xn = x and limn→∞ yn = y. Show that limn→∞ (xn + yn ) = x + y.
20. Consider the natural order ≥ on the set of rational numbers Q. Define a sequence
{xn } in Q as follows:
x1 = 3,
x2 = 3.1,
x3 = 3.14,
x4 = 3.141,
x5 = 3.1415,
x6 = 3.14159,
..
.
That is, xn+1 includes one more digit from the number π than xn . Note that
π∈/ Q. Let A be the range of {xn }.
70
3.4. EXERCISES
21. Let {xn } be an increasing sequence. Suppose {xn } is bounded. Prove that the
sequence converges by following the hints below:
22. Find ∞ n
X r
,
n=0
1 + r
where r is the market interest rate.
23. Show that the function f : R → R given by f (x) = x2 with the Euclidean metric
is continuous. Hint: you may find this useful:
24. Show that the rotation function defined in example 2.28 of chapter 2 is continuous.
is nowhere continuous.
26. Let B̄r (p) = {x ∈ X : ρ(x, p) ≤ r} be a closed ball centred at p with radius r in
a metric space X. Let f be a continuous functional on X. Show that f (B̄r (p)) is
bounded.
28. Suppose that S is a compact set in a metric space X. Show that every continuous
functional f : S → R is bounded, that is, f (S) is bounded.
71
CHAPTER 3. BASIC TOPOLOGY
35. Let S = [0, 1] ∈ R. Show that a continuous function f : S → S have at least one
fixed point.
P
36. Give a formal definition of uniform convergence for the series of functions fn (x)
on a set S.
1
fn (x) =
1 + xn
72
3.4. EXERCISES
73
CHAPTER 3. BASIC TOPOLOGY
74
Chapter 4
Linear Algebra
Linear algebra is the starting point of multivariate analysis, due to its analytical and
computational simplicity. In many economic applications, a linear model is often
adequate. Even in a more realistic nonlinear model, variables are linearized at the
point of interest, usually at a steady-state equilibrium, to study their behaviours
in the neighbourhood. In this chapter we first study the basic properties of vector
spaces. Then we turn our attention to transformations between vector spaces.
75
CHAPTER 4. LINEAR ALGEBRA
1. Commutative in addition: x + y = y + x.
5. 1x = x.
6. (αβ)x = α(βx).
7. α(x + y) = αx + αy.
8. (α + β)x = αx + βx.
Axioms 1 and 2 imply that the order of vector addition is not important. For
example,
x + y + z = z + x + y.
In axiom 8, the addition symbol on the left hand side is for real numbers, while the
one on the right is for vectors.
x = (x1 , x2 , . . . , xn ),
y = (y1 , y2 , . . . , yn ),
x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn ).
It is straight forward to verify that the definitions satisfy the eight axioms. Fig-
ure 4.1 illustrates what we learned in high school about vector addition. Vectors
are represented as arrows, which are defined by their length and direction. To find
a+b, we move the tail of vector b to the head of vector a, while keeping the direction
unchanged. The vector from the tail of a to the head of b is a + b. If we instead put
76
4.1. VECTOR SPACES
a b
a+b=b+a
b a
the tail of a on the head of b, we get b + a. And as the diagram shows, the results
confirm axiom 1, that vector additions are commutative. The scalar multiplication
αa, on the other hand, means that the length of a is multiplied by α. If α is negative,
then αa and a have opposite directions.
Example 4.2. Let V = P (n) be the set of all polynomials of degree n. For any
f, g ∈ P (n), we have
f (x) = a0 + a1 x + a2 x2 + · · · + an xn ,
g(x) = b0 + b1 x + b2 x2 + · · · + bn xn ,
In fact, the operations are the same as those in example 4.1 if we let (a0 , a1 , . . . , an )
and (b0 , b1 , . . . , bn ) be (n + 1)-tuples.
Example 4.3. Let V = S be the set of all sequences in R. For all {xn }, {yn } ∈ S
and α ∈ R, we can define
and
α{xn } = {αxn }.
77
CHAPTER 4. LINEAR ALGEBRA
Example 4.4. Define V = Mm×n as the set of all matrices on real numbers with m
rows and n columns. That is, any A ∈ Mm×n is in the form
a11 a12 · · · a1n
a21 a22 · · · a2n
A= .. .. .. .
..
. . . .
am1 am2 · · · amn
Example 4.5. Let V = F (S, R) be the set of all functionals on a set S. For all
f, g ∈ F (S, R), vector addition is defined by
78
4.1. VECTOR SPACES
From the above eight axioms, we can establish a few basic results in vector spaces.
Let us label the axioms as A1 to A8. Now, if α is a scalar and 0 is the zero vector,
then by A3 and A7
α0 = α(0 + 0) = α0 + α0.
α0 = 0. (4.1)
That is, any scalar multiplication of the zero vector gives you the zero vector. Simi-
larly, we can show that
0x = 0. (4.2)
That is, any vector multiplied by the scalar 0 becomes the zero vector. Observe that
0x = (0 + 0)x = 0x + 0x
1
(αx) = 0.
α
But by A5 and A6
1 1
(αx) = α x = 1x = x.
α α
Therefore we conclude that x = 0.
79
CHAPTER 4. LINEAR ALGEBRA
The reason is
0 = 0x (by (4.2))
= (α − α)x
= αx + (−α)x (using A8)
Proof. By A4 there exist a vector called −αx. So adding this vector to both sides
of αx = βx gives
αx + (−αx) = βx + (−αx). (4.4)
It follows that
80
4.2. BASIC PROPERTIES OF VECTOR SPACES
The set of all linear combination of A is called the linear span or linear hull of A.
That is, we define
( n )
X
span(A) = αi xi ∈ V : αi ∈ R, i = 1, 2, . . . , n .
i=1
Example 4.6. Suppose that V = R3 . Let A = {(1, 0, 0), (0, 3, 0)}. Then span(A) is
the x-y plane in the three-dimensional space. That is,
Example 4.7. Consider the vector space P (n) of all n-degree polynomials in exam-
ple 4.2. Let B = {f0 , f1 , . . . , fn } be the subset of polynomials defined by
fi (x) = xi , i = 0, 1, . . . , n.
Then span(B) = P (n). In this case we say that the set B spans P (n).
Theorem 4.2. A set S is a subspace of a vector space V if and only if for any
vectors x, y ∈ S and and any scalar α, the linear combination αx + y is in S.
81
CHAPTER 4. LINEAR ALGEBRA
S = {αx : α ∈ R}
2. The set P (n) of all n-degree polynomials defined in example 4.2 is a subspace
of the vector space F (S, R) defined in example 4.5 with S = R.
3. Let S be the vector space of all sequences in R defined in example 4.3. Then
the set of all convergent sequences in R is a subspace of S.
Let A and B be any nonempty subsets of a vector space V . Then the Minkowski
sum, or simply the sum of sets A and B is defined as
A + B = {a + b : a ∈ A, b ∈ B}.
Example 4.9. In example 2.9, chapter 2 we define the production set of a firm.
Suppose that an economy has n commodities and l firms. The firms’ production sets
are represented by Yj ∈ Rn , j = 1, . . . , l. In the analysis of the whole economy, we
can add up the production of all firms into an aggregate firm, with the aggregate
production set
Xl
Y = Y1 + Y2 + · · · + Yl = Yj .
j=1
α1 x1 + α2 x2 + · · · + αn xn = 0. (4.5)
That is, the zero vector is a linear combination of A with some nonzero scalars.
Otherwise, the set A is called linearly independent.
Alternatively, we can define a set of vectors A = {x1 , x2 , . . . , xn } to be linearly
independent if the condition in equation (4.5) implies that αi = 0 for i = 1, 2, . . . , n.
You can verify the following consequences readily from the definitions:
82
4.2. BASIC PROPERTIES OF VECTOR SPACES
x = α1 x1 + α2 x2 + · · · + αn xn ,
then
α + 2β = 0,
2α + β = 0,
which gives α = β = 0. Therefore the two vectors are linearly independent. For any
vector (x1 , x2 ) ∈ R2 , we can write
α + 2β = x1 ,
2α + β = x2 .
83
CHAPTER 4. LINEAR ALGEBRA
The two vectors span R2 and therefore form a basis for the vector space.
e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1).
It is clear from examples 4.10 and 4.11 that the coordinates of a vector change
with the choice of the basis for a vector space. We summarize some important facts
on basis and dimension of a vector space as follows.
1. If B is a finite set of linearly independent vectors that spans the vector space
V , then any other basis for V must have the same cardinality as B. That is,
dim V is a unique number.
4. The set {0} is called the zero subspace of V . But by definition the zero subspace
is not linearly independent. Therefore we define dim{0} = 0.
Example 4.12. You can verify the dimensions of the vector spaces in examples 4.1
to 4.5 by identifying their standard basis:
• dim Rn = n,
• dim P (n) = n + 1,
• S is infinite-dimensional,
84
4.2. BASIC PROPERTIES OF VECTOR SPACES
• dim Mm×n = m × n,
• F (S, R) is infinite-dimensional.
fi (x) = xi , i = 0, 1, . . . , n,
is a basis for the vector space P (n) of all n-degree polynomials. In this case the order
of the set B is somewhat important because polynomials are expressed in increasing
power of x. For this reason we define an ordered basis as an n-tuple of independent
vectors in the desirable order, B = (f0 , f1 , . . . , fn ). Similarly, the standard ordered
basis for Rn is B = (e1 , e2 , . . . , en ).
Example 4.13. In matrix analysis, vectors in Rn are written as Mn×1 column ma-
trices to conform with matrix multiplication. Let A be an n × n invertible matrix.
The column vectors of A, denoted by a1 , a2 , . . . , an ∈ Rn , form a basis for Rn . This
is because for any vector x in Rn , we can define a vector α = (α1 , α2 , . . . , αn ) such
that, written in matrix form, x = Aα, or
x = α1 a1 + α2 a2 + · · · + αn an .
α = A−1 x. (4.6)
In example 4.10, !
1 2
A= .
2 1
The readers can verify the values of α and β using equation (4.6).
85
CHAPTER 4. LINEAR ALGEBRA
4.3.1 Introduction
That is, f is additive and linearly homogeneous. If f maps a vector space V into
itself, it is call a linear operator.
The pre-image of the zero vector in W , f −1 (0) is called the kernel or null space
of f . It can be shown that the kernel of f is a subspace of V . The dimension of the
kernel of f is called the nullity of f . Also, the range of f , f (V ) is a subspace of
W . If V is finite-dimensional, the dimension of f (V ) is called the rank of f . The
following theorem relates the rank and nullity of f .
A proof of this important result can be found in Hoffman and Kunze (1971,
71–72). A transformation f is said to have full rank if
The set of all linear transformations from V into W can be denoted by L(V, W ).
For every f, g ∈ L(V, W ) and α ∈ R, define
86
4.3. LINEAR TRANSFORMATIONS
BW = (w1 , . . . , wm )
where (a1j , . . . , amj ) are the coordinates of f (vj ) relative to the basis BW . For
j = 1, . . . , n, the m × n scalars aij can be arranged in the form of a matrix
a11 a12 ··· a1n
a21 a22 ··· a2n
A=
.. .. .
..
. . .
am1 am2 · · · amn
That is, we write the coordinates of f (vj ) as the j-th column in A. A is called the
matrix representation of f relative to the ordered bases BV and BW . Since BV is
a basis of V , any vector x ∈ V can be expressed as a linear combination
x = x 1 v 1 + · · · + xn v n ,
87
CHAPTER 4. LINEAR ALGEBRA
f (x) = f (x1 v1 + · · · + xn vn )
= x1 f (v1 ) + · · · + xn f (vn )
X m X m
= x1 ai1 wi + · · · + xn ain wi
i=1 i=1
n
X m
X
= xj aij wi
j=1 i=1
m
XX n
= aij xj wi
i=1 j=1
Xm
= yi wi
i=1
Pn
where in the last line we define yi = j=1 aij xj for i = 1, . . . , m. In matrix form we
have
y = Ax.
88
4.3. LINEAR TRANSFORMATIONS
x2 x2
f (e2 ) e2
f (e1 )
θ
θ x1 x1
e1
where pi is the market price of good i, i = 1, . . . , n. The following are some other
important examples.
89
CHAPTER 4. LINEAR ALGEBRA
Example 4.17. Let [a, b] be an interval in R and let C ([a, b]) be the set of all
continuous functions on [a, b]. Then for every g ∈ C ([a, b]), the integral of g,
Z b
L(g) = g(t) dt
a
A = [a1 a2 · · · an ] (4.7)
and
f (x) = a1 x1 + a2 x2 + · · · + an xn ,
where (x1 , . . . , xn ) are the coordinates of x relative to BV . The collection of all linear
functionals on V , L(V, R) is a vector space. This vector space V ∗ is often called the
dual space of V . It is clear from (4.7) that dim V ∗ = dim V = n.
By definition the rank of a linear functional is 1. Therefore by the Dimension
Theorem
nullity f = dim V − rank f = n − 1.
For example,
Hf (π) = {y ∈ Rn : p1 y1 + · · · + pn yn = π} (4.9)
Hf (M ) = {x ∈ Rn : p1 x1 + · · · + pn xn = M }
defines the budget constraint in consumer analysis with prices p = (p1 , . . . , pn ) and
total expenditure M . Notice that Hf (c) is not a subspace of V unless c = 0.
1
Recall that in functional analysis a hyperplane is the contour of f at c.
90
4.4. INNER PRODUCT SPACES
1. (x + y)T z = xT z + yT z,
3. xT y = yT x,
4. xT x ≥ 0, xT x = 0 implies that x = 0.
Example 4.18. When V = Rn , the Euclidean space of dimension n, the dot product
of two vector is defined as
x T y = x1 y 1 + · · · + xn y n ,
xT y = x1 y1 − x2 y1 − x1 y2 + 4x2 y2 .
91
CHAPTER 4. LINEAR ALGEBRA
Example 4.20. Let C ([0, 1]) be the set of all continuous functions on [0, 1]. For
any f, g ∈ C ([0, 1]), an inner product can be defined as
Z 1
T
f g= f (t)g(t)dt.
0
The study of inner products is concerned with concepts of the “length” of a vector
and the “angle” between two vectors. The first idea involves the definition of the
norm of a vector as follows: For all x ∈ V ,
The above definition of course satisfies the three axioms of a normed vector space,
that is, for all x, y ∈ V and α ∈ R,
Axioms (4.10) and (4.11) are straight forward to verify. Axiom (4.12) requires the
application of the Cauchy-Schwarz Inequality and will be presented in section 4.5
below.
The following definitions are concerned with vectors being “perpendicular” to
each other. Let x, y ∈ V . The angle θ between x and y is implicitly defined as
xT y
cos θ = .
kxkkyk
It follows that θ is π/2 or 90◦ when the inner product of x and y is zero. Formally,
x and y are orthogonal to each other, sometimes written as x ⊥ y, if xT y = 0. A
set of vectors S ⊆ V is called an orthogonal set if all pairs of distinct vectors in S
are orthogonal. Moreover, if every vector in S has norm equals to 1, S is called an
orthonormal set. The orthogonal complement of any set S ⊆ V is defined as
all the vectors in V which are orthogonal to every vector in S, that is,
S ⊥ = {x ∈ V : xT y = 0, y ∈ S}.
92
4.5. INEQUALITIES
4.5 Inequalities
93
CHAPTER 4. LINEAR ALGEBRA
have
0 ≤ kx − αyk2
= (x − αy)T (x − αy)
= xT (x − αy) − αyT (x − αy)
= xT x − αxT y − αyT x + α2 yT y
= xT x − 2αxT y + α2 yT y.
kx + yk ≤ kxk + kyk.
kx + yk2 = (x + y)T (x + y)
= xT x + xT y + y T x + y T y
= kxk2 + 2xT y + kyk2
≤ kxk2 + 2|xT y| + kyk2
≤ kxk2 + 2kxk kyk + kyk2
= (kxk + kyk)2
and the result follows. Note that in the second inequality above we have applied the
94
4.6. DETERMINANTS
Cauchy-Schwarz Inequality.
4.6 Determinants
Let L(V ) be the collection of all linear operators on a finite-dimensional vector space
V . Our goal is to find a necessary and sufficient condition for any f in L(V ) to have
an inverse operator. That is, we want to know under what condition f is a bijection.
It turns out that a functional on L(V ) can do the trick. The functional, however,
is more conveniently defined using the matrix representations of L(V ) relative to a
basis BV . Therefore we define a functional D which assigns each n × n matrix A a
real number and satisfies the following properties:
1. D is multilinear (recall section 4.3.4) in the row vectors of A. That is, if we let
xi be the i-th row of A, i = 1, . . . , n so that we can write
D(A) = D(x1 , . . . , xn ),
then
Any functional on L(V ) which satisfies the above three conditions is called a de-
terminant function. It turns out that the determinant function is unique.2 The
determinant of A is often written as det A or |A|.
The formula for the determinant of A can be expressed inductively as follows.
The determinant of a 1 × 1 matrix A = (a) is simply a. For n > 1, define the minor,
Aij , of the element aij of A as the determinant of the (n − 1) × (n − 1) matrix by
deleting the i-th row and j-th column of A. The determinant of A is given by
n
X
|A| = (−1)i+j aij Aij . (4.13)
j=1
2
See Hoffman and Kunze (1971, Chapter 5).
95
CHAPTER 4. LINEAR ALGEBRA
Notice that in (4.13) the row i is not specified. That means we can find the deter-
minant by choosing any row. In fact we can interchange the indices i and j and
evaluate |A| along any column instead.
Some important properties of the determinant function are given below:
1. |AB| = |A||B|.
2. |AT | = |A|.
7. Suppose !
A11 A12
A=
0 A22
where A11 and A22 are square matrices and 0 represents a zero matrix with
conforming dimension. Then
Ax = b (4.14)
96
4.7. INVERSE TRANSFORMATIONS
where
a11 a12 · · · a1n
a21 a22 · · · a2n
A= .. .. ..
. . .
an1 an2 · · · ann
is the matrix representation of f relative to the standard basis of V , and the coor-
dinates of x and b are written as column vectors:
x1 b1
x2 b2
x = . , b = .
.
.. ..
xn bn
Now we show that if we replace the row ai1 , ai2 , . . . , ain in (4.16) with another
row k in A, then
Xn
akj Cij = 0. (4.17)
j=1
3
See exercise 34 at the end of the chapter.
97
CHAPTER 4. LINEAR ALGEBRA
To see this, let B be the matrix obtained from A by replacing the i-row with the
k-row. Then B has two identical rows and therefore |B| = 0. The minors of the
i-th row of B, however, are still the same as those of A. That is, Bij = Aij for
j = 1, . . . , n. Now we have
n
X n
X
akj Cij = (−1)i+j akj Aij
j=1 j=1
Xn
= (−1)i+j bij Bij
j=1
= |B| = 0.
1 1 T
A−1 = (adj A) = C .
|A| |A|
where C is the n-dimensional square matrix with elements Cij , the cofactor of aij . If
|A| = 0, then the linear operator f is not a bijection and the inverse operator does
not exist. In this case f and A are called singular.
98
4.7. INVERSE TRANSFORMATIONS
2. Replace the i-th column of A by the column vector b. Call this modified matrix
Ai . In other words,
a11 · · · b1 · · · a1n
a21 · · · b2 · · · a2n
Ai =
.. .. ..
. . .
an1 · · · bn · · · ann
3. Find |Ai |.
4. The solution is
|Ai |
xi = .
|A|
2. (A−1 )−1 = A.
4. |A−1 | = 1/|A|.
99
CHAPTER 4. LINEAR ALGEBRA
6. Suppose !
A11 0
A=
0 A22
where A11 and A22 are square matrices and 0 represents a zero matrix with
conforming dimension. Then
!
−1
A 0
A−1 = 11
0 A−1 22
4.8.1 Definitions
Let f be a linear operator on an n-dimensional vector space V . Then a nonzero
vector x ∈ V is an eigenvector of f if
αf (x) = α(λx)
100
4.8. EIGENVECTORS AND EIGENVALUES OF LINEAR
OPERATORS
or
f (αx) = λ(αx)
so that
(A − λI)x = 0. (4.21)
101
CHAPTER 4. LINEAR ALGEBRA
√ √
To normalize x1 we let α = 1/ 22 + 12 = 1/ 5 so that the normalized eigenvector
is √ !
2/ 5
x1 = √ .
1/ 5
Similarly, for λ2 = 0, the normalized eigenvector is
√ !
1/ 5
x2 = √ .
−2/ 5
We can observe some properties of f from the results. First, the determinant of A is
zero since it has a zero eigenvalue. Second, the rank of f is 1. Third, B = {x1 , x2 }
is an orthonormal basis for R2 and
!
1 2 1
P = (x1 x2 ) = √
5 1 −2
f (x)T y = xT f (y).
102
4.9. QUADRATIC FORMS
A = P ΛP T
8. The eigenvalues of A2 (f ◦ f ) are λ21 , λ22 , . . . , λ2n , but the eigenvectors are the
same as A.
103
CHAPTER 4. LINEAR ALGEBRA
3. All principal minors of A are positive, that is, |Ak | > 0 for k = 1, 2, . . . , n.
Statements 2 and 3 of theorem 4.4 imply that a positive definite matrix is
nonsingular. Testing for positive semi-definiteness is more involving. First let
K = {1, 2, . . . , k} and Π be the set of all permutation on K. For example, for
k = 3, Π consists 3! = 3 × 2 × 1 = 6 permutations: π1 = {1, 2, 3}, π2 = {1, 3, 2}, π3 =
{2, 1, 3}, π4 = {2, 3, 1}, π5 = {3, 1, 2}, and π6 = {3, 2, 1}. Now denote Aπk as the k × k
principal matrix of A with the rows and columns positions rearranged according to
the permutation π ∈ Π. For example, the first two permutions of A3 are:
a11 a12 a13
Aπ3 1 = a21 a22 a23
a31 a32 a33
4
Any asymmetric metric B can be converted into a symmetric matrix by taking A = 12 (B + B T ).
The values of the quadratic form will be the same.
104
4.9. QUADRATIC FORMS
and
a11 a13 a12
Aπ3 2 = a31 a33 a32 .
a21 a23 a22
Theorem 4.5. The following three statements are equivalent:
1. A is positive semi-definite.
1. A is negative definite.
1. A is negative semi-definite.
Notice, however, that the quadratic form Q(x) is negative (semi-)definite if and
only if −Q(x) is positive (semi-)definite. Therefore if A is negative (semi-)definite,
we can apply theorems 4.4 and 4.5 to −A.
The following example illustrates why we need to examine the permutation ma-
trices when checking for semi-definiteness. Suppose that
0 0 0
A = 0 1 0 .
0 0 −1
105
CHAPTER 4. LINEAR ALGEBRA
In this case |A1 | = |A2 | = |A| = 0. Therefore A does not satisfy the requirement of a
positive or negative definite matrix. It does, however, satisfy the weak inequality of
condition 3 in theorem 4.5 and condition 3 in theorem 4.7 without any permutation.
We might want to conclude that A is both positive semi-definite and negative semi-
definite. This turns out not to be the case because eT T
2 Ae2 = 1 > 0 and e3 Ae3 =
−1 < 0 so that A is in fact indefinite.5 The indefiniteness of A is best seen by its
eigenvalues, which are 0, 1, and −1, which obviously do not satisfy the condition of
any definite matrix. On the other hand, the permutation of A can also reveal its
indefiniteness. Take π4 = {2, 3, 1}. Then |Aπ1 4 | = 1,
1 0
|Aπ2 4 | = = −1,
0 −1
4.10 Exercises
1. Verify that the sets and operations in examples 4.1 to 4.5 satisfy the eight axioms
of a vector space.
4. Let Mn×n be the vector space of all n × n square matrices. A symmetric matrix
A is a square matrix equal to its transpose, that is, A = AT . In other words, its
elements aij = aji for all i, j = 1, . . . , n. Prove that the set of all n × n symmetric
matrices is a subspace of Mn×n .
(a) Is S ∪ T a subspace of R2 ?
(b) Is S + T a subspace of R2 ?
6. Suppose that A is a nonempty subset of a vector space V . Show that the linear
span of A is the intersection of all subspaces of V which contain A.
5
The vector ei , i = 1, . . . , n is the i-th standard basis of Rn . Therefore e2 = (0, 1, 0) in this
example.
106
4.10. EXERCISES
7. Determine whether each of the following sets is a subspace of the vector space
R3 . Justify your answers.
9. Prove or disprove: Suppose that S and T are subspaces of a vector space V . Then
S ∪ T is a subspace of V .
10. Prove or disprove: Suppose that S and T are subspaces of a vector space V . Then
S + T is a subspace of V .
12. Suppose that V = M2×2 is the vector space of all 2 × 2 square matrices. Let
!
1 2
A= .
3 4
107
CHAPTER 4. LINEAR ALGEBRA
14. Suppose that the linear operator f on the vector space R3 is defined by
20. Show that each of the examples in section 4.3.3 is a linear functional.
21. Verify that each of the examples in section 4.4 is an inner product space.
108
4.10. EXERCISES
24. Let V be a normed vector space. Show that (V, ρ) is a metric space, where for
every x, y ∈ V ,
ρ(x, y) = kx − yk.
25. Let V be a normed vector space. Prove the parallelogram equality: For every
x, y ∈ V ,
kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ).
(a) reflexive,
(b) transitive,
(c) circular,
(d) symmetric,
(e) asymmetric,
(f) antisymmetric.
27. Suppose that (2x + y) ⊥ (x − 3y) and kxk = 2kyk. Find the angle between x
and y.
30. Give a geometric interpretation of the dot product in R3 . What if one of the
vector is a unit vector?
109
CHAPTER 4. LINEAR ALGEBRA
31. Suppose that L(V ) is the set of all linear operators on an n-dimensional vector
space V . Let D : L(V ) → R be the determinant function. Prove or disprove: D
is a linear functional.
35. Let the matrix representation relative to the standard basis of the linear operator
f : R2 → R2 be !
2 3
A= .
3 4
36. Let
2 0 −1 3 −1 1
A = 5 1 0 , B = −15 6 −5 .
0 1 3 5 −2 2
Show that A is the inverse of B.
110
4.10. EXERCISES
−3 6 −11
(b) 3 −4 6
4 −8 13
38. Suppose that f and g are two invertible linear operators on a vector space V .
Prove that
(g ◦ f )−1 (x) = (f −1 ◦ g −1 )(x)
for all x ∈ V .
x1 − 2x2 = −8,
5x1 + 3x2 = −1.
2x1 − x2 + 2x3 = 2
x1 + 10x2 − 3x3 = 5
−x1 + x2 + x3 = −3.
111
CHAPTER 4. LINEAR ALGEBRA
44. Find the eigenvalues and normalized eigenvectors of the following matrices:
! ! ! !
3 4 1 1 −5 2 1 2
(i) A = , (ii) A = , (iii) A = , (iv) A = .
4 −3 4 1 2 −2 2 1
Pn Pn
Show that i=1 λi = i=1 aii , P T P = I, A = P ΛP T , and |Λ| = |A|.
47. Prove statement 1 in section 4.8.2. Hint: Suppose that A has a complex eigenvalue
√
λ + iβ where i = −1 with the corresponding eigenvector x + iy. Then
Show that β = 0.
48. Prove that for distinct eigenvalues λ1 and λ2 of a symmetric operator A, the
corresponding eigenvectors x1 and x2 are orthogonal.
112
4.10. EXERCISES
49. Prove the Spectral Theorem in section 4.8.2. Hint: Recall that columns in P are
pairwise orthonormal, that is,
(
1, if i = j,
xTi xj =
0, otherwise.
50. Prove statement 6 in section 4.8.2. Hint: first show that |P | = ±1 and use the
Spectral Theorem.
and
M2 = In − X(X T X)−1 X T
113
CHAPTER 4. LINEAR ALGEBRA
54. Let
0 0 0 1 0 0 −3 0 0
A = 0 −3 0 , B = 0 3 0 , C = 0 −1 0 .
0 0 2 0 0 0 0 0 −2
56. Let A be a positive definite matrix. Show that A = QQT where Q is an invertible
matrix.
57. Let A be a positive definite matrix. Prove that A−1 is also positive definite.
1 1
xT f (y) = Q(x + y) − Q(x − y).
4 4
114
Chapter 5
Vector Calculus
5.1.1 Introduction
f (x + h) − f (x) − f 0 (x)h
lim = 0. (5.1)
h→0 h
If the limit exists at every point in a set E ⊆ R, we say that f is differentiable on E.
Differentiability implies that the function is continuous on E, but the converse is not
true. There are many examples of continuous functions which are not differentiable.
115
CHAPTER 5. VECTOR CALCULUS
which gives f 0 (x) = 2x. For any particular point, say x = −1, the derivative
f 0 (−1) = −2. The last term in the numerator of the limit in (5.1) is f 0 (x)h = −2h,
which is a linear function in h. Geometrically, f 0 (x) is the slope of the function f (x)
at the point x.
ρ(x, y) = kx − yk,
116
5.1. VECTOR DIFFERENTIATION
It is clear that the limit is zero if Df (x) = A for all x ∈ Rn . If m = n, then Df (x)
is a square matrix and the Jacobian is Jf (x) = |A|, the determinant of A. For
m = n = 1, the function reduces to the single variable case that f (x) = ax so that
f 0 (x) = a.
6. Product rule: Suppose that f and g are functionals on S and define f g(x) =
f (x)g(x). Then
Next we define the partial derivatives of f . Suppose that f maps an open set
S ⊆ Rn into Rm . Let (e1 , e2 , . . . , en ) and (u1 , u2 , . . . , um ) be the standard bases
for Rn and Rm respectively. Then f can be expressed as its component functionals
relative to (u1 , u2 , . . . , um ):
f1 (x)
f2 (x)
f (x) =
.. ,
.
fm (x)
1
See Rudin (1976, p. 213) for a proof.
117
CHAPTER 5. VECTOR CALCULUS
See Rudin (1976, p. 215–6) for a proof of this theorem. Consequently, the matrix
representation of Df (x) relative to the above standard bases is
∂f1 (x)/∂x1 ∂f1 (x)/∂x2 ··· ∂f1 (x)/∂xn
∂f2 (x)/∂x1 ∂f2 (x)/∂x2 ··· ∂f2 (x)/∂xn
.. .. .. .
. . .
∂fm (x)/∂x1 ∂fm (x)/∂x2 · · · ∂fm (x)/∂xn
Notice that ∇f (x) is a linear functional for any particular value of x. For example,
118
5.1. VECTOR DIFFERENTIATION
∂ 2 f (x) ∂ 2 f (x)
= .
∂xi ∂xj ∂xj ∂xi
Table 5.1 lists the gradients and Hessians of some commonly used functionals. The
square matrix yyT in example 4 is called the outer product of vector y. In ex-
ample 5, if A is symmetric, the gradient and Hessian of xT Ax are 2Ax and 2A
respectively. Also, if A is the identity matrix, the results reduce to those in exam-
ple 3.
Geometrically, the partial derivative of a functional f with respect to the standard
119
CHAPTER 5. VECTOR CALCULUS
basis, ∂f (x)/∂xj , is the rate of increase of f (x) at x in the direction of the j-th
coordinate. The definition can be modified to find the rate of increase in an arbitrary
direction. Suppose u ∈ Rn is a unit vector, with kuk = 1. Then the directional
derivative of f at x in the direction of u is defined as
f (x + tu) − f (x)
Du f (x) = lim .
t→0 t
Du f (x) = ∇f (x)T u.
It is clear that Du f (x) achieves its maximum value when u has the same direction
as ∇f (x). In other words, the gradient points to the direction in which the value
of f is increasing at the fastest rate, with the rate of increase given by k∇f (x)k.
Figure 5.1 illustrates the relationship between the gradient and the graph of f in the
case of n = 2. Notice that ∇f (x0 , y0 ) is orthogonal to the tangent line to the level
curve at (x0 , y0 ).
Theorem 5.2 (Mean Value Theorem). Suppose that f is a differentiable functional
2
See exercise 6 at the end of the chapter.
120
5.2. APPROXIMATION OF FUNCTIONS
121
CHAPTER 5. VECTOR CALCULUS
y
5
P3 (x) 4 P1 (x)
3
2
sin x 1
x
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
−1
P5 (x) −2
−3
−4
P7 (x)
−5
1
f (x) = f (x0 ) + ∇f (x0 )T (x − x0 ) + (x − x0 )T ∇2 f (x0 )(x − x0 ) + r(x0 , x),
2
where limx→x0 r(x0 , x) = 0.
1
f (x) = α0 + αT x + xT Ax, (5.3)
2
122
5.2. APPROXIMATION OF FUNCTIONS
1 x
log(1 + x)
x
−1 0 1
−1
parameters to be estimated from the regression model. The total number of unknown
parameters is 1+n+n(n+1)/2 and so if n is big the regression needs a large number
of observations. Another popular flexible functional form is the translog function,
defined as
1
log f (x) = β0 + β T log x + log xT B log x,
2
where log x = (log x1 log x2 · · · log xn )T , and β0 ∈ R, β ∈ Rn and B = B T , an n × n
symmetric matrix, are again the unknown parameters.
The function f (x) = log(1+x) is tangent to the 45 degree line at x = 0 (See figure 5.3
and exercise 12 below). Therefore for small values of x, a useful approximation is
log(1 + x) ' x.
For example, let Pt be the price level in period t and let pt = log Pt . Then
123
CHAPTER 5. VECTOR CALCULUS
That is, the log difference of price levels between the two periods is approximately
equal to the inflation rate πt . Similarly, the log difference of output levels between
two periods can be used as the growth rate in practice.
On the other hand, for x around 1, the approximation can be modified to become
Figure 5.4 depicts ex and its log-linear approximation at x∗ = 1. See Clausen (2006)
for more discussions on the topic. See Wickens (2011, p. 506) for an application of
the log-linear approximation in solving a real business cycle model.
124
5.3. INVERSE FUNCTIONS
y
7 ex
6
5
4 e(1 + log x)
3
2
1
x
0 1 2
−1
125
CHAPTER 5. VECTOR CALCULUS
with Jacobian Jf (x) = x21 − x22 . Therefore f is not invertible when x1 = ±x2 .
Put x = (1, 0) and let y = f (x). Then by (5.6)
!−1 !
1 0 1 0
Df −1 (y) = [Df (x)]−1 = =
0 1 0 1
x = g(θ), (5.7)
f (g(φ), φ) = 0 for all φ ∈ W , (5.8)
−1
Dg(θ) = −[Dx f (x, θ)] Dθ f (x, θ). (5.9)
Equation (5.7) means that the solution of the endogenous variable x exists as a
function of the exogenous variable θ in the neighbourhood W . Equation (5.8) ensures
that the solutions g(φ) for all φ ∈ W satisfy the system of equations f (x, θ) = 0.
And finally, although an explicit solution of the function g may be difficult to obtain,
we can get the derivative of g by equation (5.9). The elements of the matrix Dg(θ)
are the rates of change of the endogenous variables in x with respect to the exogenous
variables in θ.
Example 5.5. We look at the simple case that f is a linear transformation from
Rn+m to Rn . Then f can be represented by a matrix relative to the standard basis.
126
5.3. INVERSE FUNCTIONS
We are interested in all the vectors (x, θ) that belong to the kernel of f . That is,
" #
h i x
f (x, θ) = A B = 0 ∈ Rn ,
θ
Ax + Bθ = 0.
If A is invertible, then
x = g(θ) = −A−1 Bθ,
which verifies equation (5.8). Finally, the rate of change of x with respect to θ is
Dg(θ) = −A−1 B.
This is equation (5.9) with the facts that Dx f (x, θ) = A and Dθ f (x, θ) = B.
Example 5.6. Consider the following Keynesian model for the aggregate economy:
The first equation is often called the IS equation and the second LM equation. In this
model, Y (output) and r (interest rate) are endogenous and P (price), G (government
expenditure), t (tax rate), and M (money supply) are exogenous. C, I, and L are
functions for consumption, investment, and money demand respectively. In the
context of theorem 5.6, we have n = 2, m = 4, with x = [Y r]T and θ = [t G M P ]T .
The function f is
!
Y − C[(1 − t)Y ] − I(r, Y ) − G
f (x, θ) = .
M/P − L(Y, r)
127
CHAPTER 5. VECTOR CALCULUS
In order for the model to work and to have an implicit solution, Dx f (x, θ) must be
invertible, that is, the Jacobian cannot be zero. In this IS-LM model, we normally
assume that 1 − C 0 [·](1 − t) − ∂I/∂Y is a positive fraction, ∂I/∂r < 0, ∂L/∂Y > 0,
and ∂L/∂r < 0. Therefore the Jacobian is
Using (5.9) in Theorem 5.6 we can obtain Dg(θ) using (5.10) and (5.11), which tells
the effects of changes in t, G, M , and P on output Y and interest rate r. If the purpose
of the analysis is to study the impact of one exogenous variable on one endogenous
variable only, then we can use Cramer’s rule instead of inverting Dx f (x, θ). For
example, to find the impact of a tax increase on output, we replace the first column
of Dx f (x, θ) with the first column of Dθ f (x, θ), divide the determinant with J, and
add a negative sign:
0
∂Y 1 C [·]Y −∂I/∂r
= −
∂t J 0 −∂L/∂r
C 0 [·]Y (∂L/∂r)
= < 0.
J
5.4 Exercises
1. Let f be a linear operator on Rn . Find the derivative of f by using the partial
derivatives in theorem 5.1.
128
5.4. EXERCISES
4. Find the Hessian ∇2 f (x) of the function f : R2++ → R at x = (2, 1) such that
√
f (x) = x1 + x2 .
f (x, y) = x + 2 log y
1
f (x, y) = .
1 + x2 + y 2
(a) Calculate the directional derivative of f at the point (1, 0) in the direction
of the vector v = (4, 3).
(b) In which direction the value of f increases at the fastest rate? Explain.
129
CHAPTER 5. VECTOR CALCULUS
11. Find the second order Taylor formula for the following functions about the given
point x0 in R2 .
12. Find the first-order and second-order Taylor approximations of the following func-
tions at the indicated points:
U 0 (ct+1 ) 0
β [F (kt+1 ) + 1 − δ] = 1,
U 0 (ct )
where U is a utility function, F is a production function, and β and δ are given pa-
rameters. Let x = [ct+1 ct kt+1 ]T . Show that the first-order Taylor approximation
of the Euler equation about the steady-state values x∗ = [c∗ c∗ k ∗ ]T is
U 00 (c∗ ) 0 ∗
0 ∗ 00 ∗ ∗
β F (k ) + 1 − δ + 0 ∗ [F (k ) + 1 − δ](ct+1 − ct ) + F (k )(kt+1 − k ) ' 1.
U (c )
c∗ g∗
ŷt = ĉ t + ĝt .
y∗ y∗
130
5.4. EXERCISES
where D is an n × n diagonal matrix with diagonal elements x∗1 , x∗2 , . . . , x∗n and
Find the set of points that the inverse of f does not exist.
131
CHAPTER 5. VECTOR CALCULUS
(a) Show that Df (x) is invertible at any point of R2 . Thus every point of R2
has a neighbourhood in which f is a bijection.
(b) Find an explicit formula for f −1 . Derive Df −1 (x), and verify the inverse
function theorem.
The Hénon map exhibits a number of interesting properties in the study of com-
plex dynamics. See Devaney (2003) for details.
xy 5 + yu5 + zv 5 = 1
x5 y + y 5 u + z 5 v = 1
where Y (output) and r (interest rate) are endogenous and P (price), G (govern-
ment expenditure), t (tax rate) , M (money supply), and B (government bonds)
are exogenous. C, I, and L are functions for consumption, investment, and money
demand respectively. Using Cramer’s rule, find the following partial derivatives
in studying the comparative statics of the model.
(a) ∂Y /∂G,
(b) ∂r/∂M ,
(c) ∂Y /∂B.
What assumptions do you have to make in order for the model to work?
22. Suppose a consumer wants to maximize the following function with respect to
consumption ct and ct+1 :
132
5.4. EXERCISES
with x = ct+1 and θ = ct , apply the implicit function theorem to find the
marginal rate of time preference, dct+1 /dct .
(b) Find the slope dct+1 /dct of the intertemporal budget constraint (5.13) as
well.
(c) Combine your results to get the Euler equation:
βU 0 (ct+1 )
(1 + r) = 1. (5.14)
U 0 (ct )
The Euler equation (5.14) is the cornerstone of the dynamic general equilibrium
model used in macroeconomic analysis.
23. (Wickens, 2011, chapter 8) In the transaction cost approach to money demand,
households incur a transaction cost in consumption. The following two conditions
are satisfied in the steady state:
c + πm + T (c, m) − x − θb = 0, (5.15)
Tm (c, m) + R = 0, (5.16)
where Tc means ∂T /∂c, and Tmc means ∂ 2 T /∂m∂c, etc. Equations (5.15) and
(5.16) can be written as f (x, y) = 0, where f : R2 × R3 → R2 , x = (c, m)T is the
133
CHAPTER 5. VECTOR CALCULUS
134
Chapter 6
Convex Analysis
Models in economic analysis often assume sets or functions are convex. These con-
vexity properties make solutions of optimization problem analytically convenient. In
this chapter we discuss various concepts of convexity in Euclidean spaces and their
implications. The readers can consult Rockafellar (1970) for more details.
M = {αx + (1 − α)y : α ∈ R}
is called an affine set and is no longer two dimensional or a subspace. To see this,
we can express a vector z ∈ M as
for some α ∈ R. Geometrically, x − y is the vector going from the head of y towards
the head of x. Therefore z is a vector on the straight line passing through the heads
of x and y. Since this line in general does not include the zero vector, M is not a
subspace of V . A vector in M is called an affine combination of x and y.
135
CHAPTER 6. CONVEX ANALYSIS
Now we further restrict the value of α to be between 0 and 1. Then the set
is called the convex combinations of x and y. It is clear that this is the set of
vectors lying between the heads of x and y. The idea can be extended to more than
two vectors. Let A = {x1 , x2 , . . . , xn } be a set of vectors in V , then the convex
combinations of A is defined as the set
( n n
)
X X
C= αi xi : αi ≥ 0, i = 1, . . . , n, αi = 1 . (6.1)
i=1 i=1
The convex hull of A is the smallest convex set in V that contains A. In other words,
conv(A) is the intersection of all convex sets that contain A. When A is a finite set
of say n vectors, conv(A) is the set defined in equation (6.1).
Example 6.1. Let B = (e1 , e2 , . . . , en ) be the standard basis of Rn . Then the convex
hull of B is called a standard simplex in Rn .
Theorem 6.1. Let A and B be subsets of a vector space V . Then the convex hull
of their Minkowski sum is the Minkowski sum of their convex hulls. That is,
136
6.2. CONVEX SETS
x• •y
y
•
•
x
the theorem applies to any finite number of set, that is, for any sets A1 , A2 , . . . , An ,
n
! n
X X
conv Ai = conv(Ai ). (6.2)
i=1 i=1
n converges to the sum in equation (6.2). See Starr (2008) for a formal statement of
the theorem.
The Shapley-Folkman theorem implies that even though the production sets Yi of
individual firms are not convex, the aggregate production set Y can be assumed to
be convex, making the analysis more tractable.
137
CHAPTER 6. CONVEX ANALYSIS
f (x) = |ax|,
The empty set ∅ and V are trivial examples of affine sets, and so is the set {x}
which contains any single vector x ∈ V . In general, affine sets are “linear” sets such
as a straight line or a plane in an Euclidian space.
Theorem 6.2. An affine set which contains the zero vector is a subspace.
Proof. It is clear that if S ∈ V is a subspace then it is an affine set with the zero
vector. Conversely, let M be an affine set which contains 0. For M to be a subspace,
we need to show that it is closed under addition and scalar multiplication. Let
x, y ∈ M . Then for any α ∈ R,
αx = αx + (1 − α)0 ∈ M.
Also,
1 1 1
(x + y) = x + 1 − y∈M
2 2 2
so that
1
x+y =2 (x + y) ∈ M
2
138
6.3. AFFINE SETS
as required.
M + a = {x + a : x ∈ M }.
It is easy to verify that M + a is also an affine set. In fact we say an affine set L is
parallel to an affine set M if L = M + a for some a ∈ V . Theorem 6.2 implies that
every non-empty affine set M is parallel to a unique subspace S, given by
S = {x − a : x, a ∈ M }.
S ⊥ = {x ∈ V : bT x = 0}. (6.3)
S ⊥ + a = {x + a : bT x = 0}
= {y : bT (y − a) = 0} (6.4)
= {y : bT y = c} (6.5)
and
H − = {x ∈ V : bT x ≤ c}.
139
CHAPTER 6. CONVEX ANALYSIS
All half-spaces are convex sets. If c = 0, then the half-space becomes a convex cone.
Obviously, H + ∩ H − is a hyperplane. Half-spaces are useful in defining feasible sets
in economics. For example, the budget set B of a consumer can be represented by
the intersection of n + 1 half-spaces:
B = {x : pT x ≤ M } ∩ {x : eT T
1 x ≥ 0} ∩ · · · ∩ {x : en x ≥ 0}
where p is the market price vector of the n goods and M is the income of the
consumer. Notice that the intersections of any number of half-space is a convex set.
It is obvious that the intersection of epi f and hypo f is the graph of f . We call f a
convex function if epi f is a convex subset of V × R. The definition implies that
the domain S of f is a convex set in V . If hypo f is a convex subset of V × R, then f
is called a concave function. A function is called an affine function if it is both
convex and concave. It is straight forward to show that f is convex if and only if
−f is concave. An important characterization of convex functions is as follows.
Proof. Suppose that f is a convex function on S. By definition (x, f (x)) and (z, f (z))
are in epi f . Since epi f is convex, for 0 < α < 1,
((1 − α)x + αz, (1 − α)f (x) + αf (z)) = (1 − α)(x, f (x)) + α(z, f (z)) ∈ epi f,
140
6.4. CONVEX FUNCTIONS
which implies (6.6). The converse is straight forward and is left as an exercise.
f (x) = aT x + b
where a ∈ V and b ∈ R.
Example 6.6. The followings are some examples of convex function of single vari-
able, which can be characterized by f 00 (x) ≥ 0.
1. f (x) = eax , a ∈ R;
Setting αi = 1/m for i = 1, . . . , m, the inequality in (6.7) shows that the arithmetic
mean of m positive numbers is bounded below by their geometric mean.
New convex functions can be constructed using simple convex functions. The
following results are useful for this purpose and for recognizing convex functions.
1. Suppose f is a convex function on a convex set S in a vector space V and
g : R → R is an increasing and convex function. Then g ◦ f is also a convex
function. For example, if f is convex, then h(x) = ef (x) is convex.
141
CHAPTER 6. CONVEX ANALYSIS
142
6.4. CONVEX FUNCTIONS
1. Function f is concave.
2. Function −f is convex.
5. For every x, y ∈ S,
6. For every x, x0 ∈ S,
Quasi-convex and quasi-concave functions are less restrictive than convex and con-
cave functions. Some properties of these functions are listed below. More can be
found in Carter (2001, p. 336–342).
143
CHAPTER 6. CONVEX ANALYSIS
2. Function −f is quasi-convex.
Figure 6.3 illustrates the classification of curvatures for the simple case of sin-
gle variable functions. Parts (a) and (b) are strictly concave functions but (b) is
monotone (increasing in this case). Part (c) is an affine function and therefore both
concave and convex. Parts (d) and (e) are strictly convex with (d) being mono-
tone. All except (e), however, are quasi-concave functions. Similarly, all except (a)
are quasi-convex functions. Part (f) is neither convex nor concave but since it is
monotone it is both quasi-convex and quasi-concave.
144
6.5. HOMOGENEOUS FUNCTIONS
y y y
6 6 6
- x - x - x
y y y
6 6 6
- x - x - x
1. x > y if x ≥ y and x 6= y,
2. x y if xi > yi for i = 1, . . . , n.
For example, in consumer analysis, the non-satiation axiom implies that the utility
function U of a consumer is increasing. That is, if x and y are two consumption
bundles and x ≥ y, then U (x) ≥ U (y).
f (αx) = αk f (x).
f (x, y) = axβ y γ
145
CHAPTER 6. CONVEX ANALYSIS
is homogeneous of degree β + γ.
A functional f on a convex cone S is called homothetic if, for every α > 0 and
x, z ∈ S,
f (x) = f (z)
implies that
f (αx) = f (αz).
All linear functions are linearly homogeneous but the converse is not true. For
example, the Cobb-Douglas function, f (x) = i xβi i , βi > 0,
Q P
i βi = 1, is linearly
homogeneous but not linear. Nevertheless, linearly homogeneous functions have
properties that closely resemble linear functions. The following result illustrates this
point.
1. f (x) = ∇f (x)T x,
2. ∇2 f (x)x = 0.
Setting α = 1 gives the first result. Now differentiate both sides of (6.8) with respect
to x gives
α∇f (αx) = α∇f (x)
1
In the context of consumer theory y is replaced by a given utility level u and C is often called
an expenditure function.
146
6.6. SEPARATING HYPERPLANES
or
∇f (αx) = ∇f (x).
∇2 f (αx)x = 0.
Apply the first part of Euler’s theorem to the cost function, we have
or
Dp h(p, y)p = 0.
for i = 1, . . . , n. Since each Hicksian demand function is decreasing in its own price,
ie, ∂hi (p, y)/∂pi < 0, equation (6.10) implies that at least one of ∂hi (p, y)/∂pj must
be positive. In other words, at least one of the other n − 1 inputs is a substitute for
input i. In the case of n = 2, the two inputs normally are substitutes of each other.2
2
An exception is the Leontief production function, f (x) = min{x1 /a1 , . . . , xn /an }, where
a1 , . . . , an > 0. In this case ∇2p C(p, y) is a zero matrix. Note that f is not differentiable in
this case.
147
CHAPTER 6. CONVEX ANALYSIS
f (x) ≥ c or f (x) ≤ c
and there exists s ∈ Hf (c) ∩ S. For example, in production theory, the cost mini-
mization problem involves finding the supporting hyperplane of the upper contour
set of the production function that is orthogonal to the input price vector p. The
following result is a form of separating hyperplane theorem.
Theorem 6.8. Suppose S is a closed convex subset of a normed vector space V and
z∈
/ S. There exists a vector p ∈ V and a vector s ∈ S such that for every x ∈ S,
pT x ≥ pT s > pT z. (6.11)
Hf (c) = {y : pT (y − s) = 0}
148
6.7. EXERCISES
3. Two closed convex sets which intersect at a single point only have a common
separating hyperplane which is also a supporting hyperplane of both sets.
6.7 Exercises
1. Show that a set A in a vector space is convex if and only if A = conv(A).
5. Show that the orthogonal projection of a convex set on a subspace is also convex.
6. Let F (S, R) be the set of all functionals on a metric space S. Let C(S, R) be the
subset of continuous functions in F (S, R). Prove or disprove: C(S, R) is a convex
set in F (S, R).
7. Let L(V ) be the set of all linear operators on a finite-dimensional vector space
V . Prove or disprove: The set of all invertible operators in L(V ) is a convex set.
12. Suppose S and T are convex cones in a vector space V . Show that S + T is also
a convex cone.
13. Let F (X) be the set of all functionals on a set X. Show that the set of all
increasing functionals on X is a cone in F (X).
149
CHAPTER 6. CONVEX ANALYSIS
15. The conic hull of a set A in a vector space V is the smallest cone in V that
contains A, and is denoted by cone(A).
17. Let a and b be two distinct vectors in a vector space V . Show that
A = {x : aT x ≤ 0} ∩ {x : bT x ≤ 0}
is a convex cone.
18. List the counter parts of theorem 6.5 for a convex function.
20. Let f : Rn+ → R+ be an increasing and concave function. Show that if there exists
a z 0 such that f (z) = 0, then f (x) = 0 for all x ∈ Rn+ . (Hint: First show that
for all α ≥ 0, f (αz) = 0. Then for any x ∈ Rn+ , we can choose a large enough α
such that αz x.)
f (x + z) ≤ f (x) + f (z).
kf (x) = ∇f (x)T x.
150
6.7. EXERCISES
S = {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 ≤ 0, x2 ≤ 0}
and the origin. Find the supporting hyperplane of S that is orthogonal to (1, 1).
f (x, y) = y − x2 .
(a) Let S = %f (0) = {(x, y) : f (x, y) ≥ 0} be the upper contour set of the value
0 induced by f (see section 2.5). Is S a convex set? Explain.
(b) Find two supporting hyperplanes of S which pass through the point z =
(−3, 8).
27. Let L(V ) be the set of linear operators on an n-dimensional vector space V . Show
that the determinant function D : L(V ) → R is homogenous of degree n.
28. Suppose that f is a linear transformation from a vector space V to a vector space
W . Show that the graph of f is a subspace in the product set V × W .
151
CHAPTER 6. CONVEX ANALYSIS
152
Chapter 7
Optimization
max f (x, θ)
x
Let us ignore the parameter θ for now. Suppose that x∗ is a local maximum of f .
That is, there exists a neighbourhood B containing x∗ such that for every x ∈ G∩B,
153
CHAPTER 7. OPTIMIZATION
or
∇f (x∗ )T (x − x∗ ) ≤ 0.
Notice that the expression on the left hand side of the above inequality is the direc-
tional derivative of f at x∗ in the direction of (x − x∗ ) multiplied by its length. If
x∗ is an interior point of G, then the inequality becomes an equality. Otherwise if
there exists an x such that
∇f (x∗ )T (x − x∗ ) < 0,
Any point that satisfies (7.3) is called a stationary point of f . In the special
case that the control variable is constrained to Rn+ , the necessary condition can be
summarized by the following complementary slackness condition:
∇f (x∗ ) ≤ 0, x∗ ≥ 0, ∇f (x∗ )T x∗ = 0.
The above example shows that we need to find a sufficient condition for a local
maximum. It turns out that such a condition exists for an interior strict local
maximum. That is, f (x∗ ) > f (x) for all x in the neighbourhood of x∗ . The quadratic
(Taylor) approximation of f (x) about x∗ is
1
f (x) = f (x∗ ) + ∇f (x∗ )T (x − x∗ ) + (x − x∗ )T ∇2 f (x∗ )(x − x∗ )
2
1
= f (x ) + (x − x ) ∇ f (x )(x − x∗ )
∗ ∗ T 2 ∗
2
where in the first equality above ∇f (x∗ )T (x − x∗ ) = 0 from the necessary condition
in (7.3). Therefore f (x∗ ) > f (x) if and only if
154
7.2. EQUALITY-CONSTRAINED OPTIMIZATION
max f (x)
x
subject to g(x) = 0.
155
CHAPTER 7. OPTIMIZATION
maximum.
∇f (x∗ ) = Dg(x∗ )T λ.
Then we set the gradients of L with respect to x and λ to zero as if we are maximizing
L, that is,
∇x L(x∗ , λ) = ∇f (x∗ ) − Dg(x∗ )T λ = 0 (7.4)
and
∇λ L(x∗ , λ) = −g(x) = 0,
which give us the first order conditions as in the Lagrange Multiplier Theorem.
The second order conditions are a bit more complicated than those for the un-
constrained optimization. First recall that the kernel of the linear function Dg(x∗ )
is defined as
K = {h ∈ Rn : Dg(x∗ )h = 0}.
156
7.2. EQUALITY-CONSTRAINED OPTIMIZATION
Theorem 7.2. Suppose there exist points x∗ ∈ G and λ ∈ Rm such that Dg(x∗ ) has
rank m and ∇f (x∗ ) = Dg(x∗ )T λ.
1. If f has a local maximum on G at x∗ , then hT ∇2 Lh ≤ 0 for all h ∈ K.
157
CHAPTER 7. OPTIMIZATION
for j = 1, . . . , n. Now we can states the second-order conditions for equality con-
strained maximization:
Example 7.2. Find the maximum of the function f (x, y) = x2 −y 2 in the constraint
set {(x, y) ∈ R2 : x2 + y 2 = 1}.
Here n = 2, m = 1, with g(x, y) = x2 + y 2 − 1. The Lagrangian is L(x, y, λ) =
x2 − y 2 − λ(x2 + y 2 − 1). The first-order condition is
2x − 2λx
∇L(x, y, λ) = −2y − 2λy = 0.
1 − x2 − y 2
The solutions to the equation are (x∗ , y ∗ , λ) = (1, 0, 1), (−1, 0, 1), (0, 1, −1), and
(0, −1, −1). Evaluating f at these four points gives f (1, 0) = f (−1, 0) = 1 and
f (0, 1) = f (0, −1) = −1. Therefore (1, 0) and (−1, 0) are the maximum points. We
can verify this by the second-order condition. Now
!
2 − 2λ 0
∇2x L(x∗ , λ) =
0 −2 − 2λ
and Dg(x∗ , y ∗ ) = (2x∗ , 2y ∗ ). The bordered Hessian matrices at the above four points
are, respectively,
0 2 0 0 −2 0 0 0 2 0 0 −2
2 0 0 , −2 0 0 , 0 4 0 , 0 4 0 .
0 0 −4 0 0 −4 2 0 0 −2 0 0
Table 7.1 lists the values of (−1)j |Bj |, j = 1, 2 for the four points, which confirms
our results.
158
7.2. EQUALITY-CONSTRAINED OPTIMIZATION
The function in the above example is called a saddle (see Figure 7.1). The
example in section 7.1 shows that the function has no maximum nor minimum.
Maxima and minima do exist, however, on the constrained set of the unit circle as
demonstrated.
∇f (x∗ ) = λp,
that is, ∇f (x∗ ) and p are pointing in the same direction. Another way to say this is
at x∗ , the budget hyperplane is tangent to the indifference surface, or, in the case of
two goods, the marginal rate of substitution (MRS) is equal to the price ratio p1 /p2 .
2
Next we put the the parameters θ back into the objective function and the con-
straints. The maximum points is then a function of of the parameters, x∗ = φ(θ), so
that f (x∗ ) = f (φ(θ)) which we call the value function. We want to study the effect
on the value function when we change θ. In economics this is called comparative
statics. The following result is very useful.
159
CHAPTER 7. OPTIMIZATION
max f (x, θ)
x
subject to g(x, θ) = 0
or in matrix form,
∇θ f (x∗ , θ) = ∇θ L(x∗ , θ)
where L(x∗ , θ) = f (x∗ , θ) − g(x∗ , θ)T λ is the Lagrangian, and λ ∈ Rm is the vector
of Lagrange multipliers.
Differentiating both sides the above equation with respect to θ and using the chain
rule, we have
160
7.3. INEQUALITY-CONSTRAINED OPTIMIZATION
In many economics applications the equality constraints are too restrictive. For
example, in consumer analysis, instead of restricting a consumer to select bundles
on the budget hyperplane, it may be desirable to expand the feasible set to a set in
Rn defined by the intersection of the half-spaces pT x ≤ M, x1 ≥ 0, . . . , xn ≥ 0. The
intersection of these n+1 half-spaces is a compact and convex set called a polytope.
Formally, an inequality-constrained optimization problem can be stated as
max f (x)
x
subject to g(x) ≤ 0,
The second group, called the set of slack constraints, is defined by those compo-
nents of g at x∗ with strict inequalities,
For all j ∈ S(x∗ ), the constraint gj (x∗ ) < 0 is non-binding. It means that there
exists a neighbourhood of x∗ contained by the intersection of these constraints, which
has no effect on the maximum status of x∗ even if the constraints are relaxed. On
the other hand, for all i ∈ B(x∗ ), x∗ is the solution to the problem
max f (x)
x
161
CHAPTER 7. OPTIMIZATION
max f (x)
x
Each of the binding equation in (7.5) is relaxed by replacing the zero by the parameter
θi in (7.7). Using the envelope theorem, for each i ∈ B(x∗ ),
∂f (φ(θ), θ)
= λi .
∂θi
Since relaxing the constraints will only increase the value of f (φ(θ), θ), λi ≥ 0 for all
i ∈ B(x∗ ). Since gi (x∗ ) = 0, we have
λi gi (x∗ ) = 0, i ∈ B(x∗ ).
λj gj (x∗ ) = 0, j ∈ S(x∗ )
where λ ∈ Rm . The signs of the λi ’s and gi (x∗ )’s can be summarized by the comple-
mentary slackness condition: For i = 1, . . . , m,
λi ≥ 0, gi (x∗ ) ≤ 0, λi gi (x∗ ) = 0.
162
7.3. INEQUALITY-CONSTRAINED OPTIMIZATION
∇f (x∗ ) = Dg(x∗ )T λ;
λ ≥ 0; and g(x∗ )T λ = 0.
as before but we set the gradient of L with respect to x only to zero and not λ.
Instead we impose the complementary slackness conditions. Therefore the necessary
conditions for inequality constrained maximization are
∇x L(x∗ , λ) = 0; (7.8)
λ ≥ 0; g(x∗ ) ≤ 0; g(x∗ )T λ = 0. (7.9)
subject to x + y ≤ 4,
x ≥ 0,
y ≥ 0.
We have n = 2, m = 3, and
L(x, y, λ1 , λ2 , λ3 )
= log x + log(y + 5) − λ1 (x + y − 4) + λ2 x + λ3 y.
163
CHAPTER 7. OPTIMIZATION
1. Assume that we have an interior solution, that is, x > 0 and y > 0. Then by
(7.13) and (7.14) λ2 = λ3 = 0.
3. The above result gives us a hint that we should assume x > 0 and y = 0. Then
λ2 = 0 and λ3 ≥ 0. We also keep the assumption that x + y = 4 and λ1 ≥ 0.
With (7.10) and (7.11) we have
1/x − λ1 = 0,
1/5 − λ1 + λ3 = 0,
x = 4.
satisfies the implications of the Kuhn-Tucker Theorem, with f (x∗ , y ∗ ) = log 4 + log 5.
As an exercise you should try the assumptions that x = 0 and y > 0.
164
7.4. EXERCISE
7.4 Exercise
1. Let f : S → R where S ⊆ Rn is a convex set. Suppose f is a C 1 concave function
and ∇f (x∗ ) = 0. Show that x∗ ∈ S is a global maximum for f over S.
f (x, y) = xy + 3y − x2 − y 2 .
f (x, y) = x sin y.
165
CHAPTER 7. OPTIMIZATION
7. The owner of a toy store can purchase yo-yos from the manufacturer for $4.20
each. The owner estimates that 80 will be sold at a price of $8.00 and that 10
more will be sold for every $0.50 decrease in price. What price should be charged
to maximize profit?
0 ≤ zT z
= (x + αy)T (x + αy)
= α2 yT y + 2αxT y + xT x
= f (α). (7.16)
The inequality in (7.16) is true for every real number α. Now minimize f (α) with
respect to α. Denote the minimum point as α∗ . Calculate f (α∗ ) and it will turn
out that the inequality f (α∗ ) ≥ 0 is equivalent to (7.15).
y = xT β,
166
7.4. EXERCISE
The set in part (a) is called a cusp manifold (Figure 7.2). The set in part (c)
is called the bifurcation set. The function f is useful in the study of stability of
dynamical systems. See, for example, Poston and Stewart (1978, p. 78–83) for
details.
12. Verify that the other two points in Table 7.1 are minimum points.
14. Given the production function f (x) = (x1 x2 )1/2 where x1 , x2 are input quantities,
a firm is trying to minimize the input cost given a particular output level y.
Suppose the market prices for x1 and x2 are $2 and $5 per unit respectively.
167
CHAPTER 7. OPTIMIZATION
168
7.4. EXERCISE
15. Find the bordered Hessian of the consumer maximization problem in example 7.3.
16. Let f : Rn+ → R+ be the C 2 production function of a firm. For y ∈ f (Rn+ ) and
p ∈ Rn++ , define the producer’s cost function by
min w1 z1 + w2 z2
z
subject to y = z1 z2 .
where z ∈ R2+ and w ∈ R2++ . Show that the solution is f (z1∗ , z2∗ ) = 2(w1 w2 y)1/2
by representing it as an unconstrained minimization problem in one variable.
1
max a0 − aT z + zT Az
z 2
169
CHAPTER 7. OPTIMIZATION
20. Suppose a consumer maximizes her utility by buying two products with amounts
x1 and x2 according to the utility function U (x1 , x2 ) = xα1 x21−α . Her budget
constraint is p1 x1 + p2 x2 = y where p1 , p2 are the prices and y is her weekly
expenditure on the two products.
(a) Find the weekly ordinary demand function for the two products.
(b) Define the indirect utility function V (p1 , p2 , y) = U (x∗1 , x∗2 ) to be the value
function of the optimal choice. Show that
∂V (p1 , p2 , y)
= λ,
∂y
21. Find the maximum of the function f (x, y) = x2 −y 2 in the constraint set {(x, y) ∈
R2 : x2 + y 2 ≤ 1}.
max 2 log x + y
x,y
subject to x + 2y ≤ 1, x ≥ 0, y ≥ 0.
23. The utility function of a consumer is given by U (x1 , x2 ) = (x1 x2 )1/2 , where prices
for goods 1 and 2 are p1 > 0 and p2 > 0. Given that the consumer has income
y > 0, solve the utility maximization problem subject to the constraints, p1 x1 +
p2 x2 ≤ y, x1 ≥ 0, x2 ≥ 0 as follows:
24. Suppose that a consumer’s preferences are represented by an increasing and quasi-
concave utility function C 2 function u = f (x), where x ∈ Rn+ is the consumption
bundle. The budget constraint is given by pT x = y, where p ∈ Rn++ is the price
vector and y > 0 income.
170
7.4. EXERCISE
(b) Let x∗ be the optimal consumption bundle and V (p, y) be the value function
(called the indirect utility function). Use the envelope theorem to prove
Roy’s identity:
∂V (p, y)/∂pi
x∗i = − , i = 1, . . . , n
∂V (p, y)/∂y
25. A competitive firm produces a single output Q with capital K and labour L with
production technology according to an increasing and strictly concave production
function
Q = F (L, K).
The market price of the output is p and the market prices of capital and labour
are r and w respectively.
26. Use the Kuhn-Tucker Theorem to solve the following optimization problem:
subject to x + y ≤ 4,
x + 3y ≤ 9.
27. A consumer who buys two goods has a utility function U (x1 , x2 ) = min{x1 , x2 }.
Given income y > 0 and prices p1 > 0 and p2 > 0.
171
CHAPTER 7. OPTIMIZATION
Maximize xy
subject to x + 2y ≤ 5, x ≥ 0, y ≥ 0.
Maximize x + log(1 + y)
subject to x + 2y = 1, x ≥ 0, y ≥ 0.
max x2 − y 2
x,y
subject to x2 + y 2 ≤ 4,
y ≥ 0,
y ≤ x.
(a) Derive the Kuhn-Tucker conditions by assuming that only the first constraint
is binding. Determine whether a solution exists with this assumption.
(b) Find the solution of the problem.
172
Chapter 8
Probability
Most graduate level econometrics textbooks require the readers to have some back-
ground in intermediate level mathematical probability and statistics. A lot of stu-
dents, however, took business and economic statistics and then econometric courses
in the undergraduate program. This chapter gives a quick introduction to the formal
theory. It is by no mean a replacement for the proper training. Knowledge in prob-
ability theory is also essential in the studies of economics of risk and uncertainty,
financial economics, game theory, and macroeconomics.
Throughout history people involved in decision making and gambling activities have
developed an intuitive idea of probability, which is the likeliness of some events
would happen. The formal theory was first constructed by Blaise Pascal and Pierre
de Fermat in the seventeenth century (see Ross, 2004).
We start with the idea of a random experiment or random trial. Before the
experiment, it is assumed that we know the set of all possible outcomes, which is
called the sample space, denoted by Ω. At the end of the experiment, one outcome
is realized.
Example 8.1. A coin is tossed once, there are two possible outcomes:
173
CHAPTER 8. PROBABILITY
Let us call the outcome on the left “head”, or simply H, and the one on right “tail”,
or T . Then the random experiment is the tossing of the coin, the sample space is
Ω = {H, T }. At the end of the experiment either H or T is the outcome, but not
both.
Example 8.2. A die is thrown once. The sample space Ω consists of six outcomes:
r r r r r r r r
u r r rr
r r r r r r r r
X(H) = 0, X(T ) = 1,
so that the range of the random variable X is {0, 1}. In example 8.2, the natural
way to assign values to the variable is by the number of dots on the face of the die.
In this case the range of X is {1, 2, 3, 4, 5, 6}. It is clear that a random variable is
everywhere defined but not necessary one-to-one or onto.
Nevertheless, the assignment of a random variable to the sample space may not
be meaningful, as the following example shows.
Example 8.3. A card is chosen randomly from a deck of cards. The sample space
is
Ω = {♠A, ♠2, . . . , ♠10, ♠J, ♠Q, ♠K, ♥A, . . . , ♥K, ♦A, . . . , ♦K, ♣A, . . . , ♣K},
with 52 possible outcomes. The sample space can also be seen as a product set
174
8.1. BASIC DEFINITIONS
Ω = S × T , where
S = {♠, ♥, ♦, ♣},
T = {A, 2, . . . , 10, J, Q, K}.
We shall see later in example 8.11 that independent events can be constructed from
the product sets.
A subset of the sample space is called an event. Since events are sets, the set
operations defined in chapter 2 can be used to defined new events. For example, the
event A ∪ B means that the outcome belongs to A or to B. Two events A ad B are
said to be mutually exclusive if A and B are disjoint. The complement of A, Ac
is the event that A does not happen.
Example 8.4. Let A be the event that the randomly chosen card be a seven. Then
A = {♠7, ♥7, ♦7, ♣7}. Let B be the event that the card is a heart, that is, B =
{♥A, . . . , ♥K}. Then the event A \ B is the event of a seven card but not ♥7. That
is,
A \ B = {♠7, ♦7, ♣7}.
Example 8.5. In example 8.1 a coin is tossed once, with Ω = {H, T }. Suppose now
that the coin is tossed twice. The sample space is
Let A be the event that we have two heads, B means two tails, and C means one
head and one tail. Then A, B, C is a partition of the sample space Ω. In other words,
the events are pairwise mutually exclusive and A ∪ B ∪ C = Ω.
2. P (Ω) = 1.
175
CHAPTER 8. PROBABILITY
Example 8.6. Suppose a coin is tossed once. If we assume that the coin is fair,
there is no reason to believe that the outcome H is more likely than T , or vice versa.
Therefore it is natural to assign the probabilities
Example 8.7. Consider the experiment in example 8.5. Again if we assume a fair
coin, the probabilities of the four outcomes are the same. Then P (A) = P (B) = 1/4.
By the axioms above P (C) = 1/2.
Example 8.8. It is often more convenient to define the probabilities on the ran-
dom variable instead of events. Consider the experiment in throwing a fair die in
example 8.2. It is natural to assign equal probability of 1/6 to each outcome in Ω.
Therefore
P (X = 1) = P (X = 2) = · · · = P (X = 6) = 1/6.
The probability of the event that X is less than or equal to 2 can be written as
P (X ≤ 2) = P (X = 1) + P (X = 2) = 1/3.
Similarly,
P (X is odd) = P (X = 1) + P (X = 3) + P (X = 5) = 1/2.
The following results can be proven directly from the three axioms of probability
176
8.2. CONDITIONAL PROBABILITY
4. P (A \ B) = P (A) − P (A ∩ B).
5. P (∅) = 0.
When the sample space Ω is a finite or countable set, the set of all events is the
power set of Ω. The probability function defined above is well-defined. Things are
a bit more complicated if Ω is uncountable. In order for the probability function
to be well-defined, we often narrow the set of events to a collection of sets called
a σ-algebra or σ-field, which is closed under countable unions, intersections, and
complements. Formally, a σ-field F is a collection of subsets of the sample space Ω
which satisfies the following three axioms:
1. ∅ ∈ F,
177
CHAPTER 8. PROBABILITY
In the general case, it depends on P (A ∩ B). Since we know that B has occurred,
the sample space is no longer Ω but B. Therefore we have to “rescale” P (A ∩ B)
using B as the sample space so that all the mutually exclusive events still sum to 1.
It follows that
P (Ω) P (A ∩ B)
P (A|B) = P (A ∩ B) × = . (8.2)
P (B) P (B)
Notice that the two special cases above satisfy equation (8.2) as well. In the first
case, P (A ∩ B) = 0 and in the second P (A ∩ B) = P (B). Also, since event B is
realized, P (B) 6= 0 so that equation (8.2) is well-defined.
Example 8.9. Consider the probability of getting the queen of heart from a random
draw from a desk of cards, which is P (A) = 1/52. Before the card is revealed, we
are told that the card is a heart. So it is clear that from the new information the
probability of getting the queen of heart is 1/13. Formally, A is the event of getting
the queen of heart. B is the event of getting a heart, with P (B) = 1/4. In this case
A ⊆ B so that P (A ∩ B) = P (A) = 1/52. It follows that
P (A ∩ B) 1/52 1
P (A|B) = = = .
P (B) 1/4 13
Psychologists Amos Tversky and Daniel Kanheman (1974) point out that intu-
ition is very unreliable when people make subjective probabilities. The following
example, taken from Economist (1999), illustrates the importance of the concept of
conditional probability. Most people, unfortunately probably including your family
doctor, get the answer wrong the first time.
Example 8.10. You are given the following information. (i) In random testing, you
test positive for a disease. (ii) In 5% of cases, this test shows positive even when
the subject does not have the disease. (iii) In the population at large, one person in
1,000 has the disease. What is the probability that you have the disease?
Let A be the event that you have the disease, and B be the event that you test
positive for the disease. So P (A) = 0.001 and P (B) = 0.05. We can assume that if
you have the disease you will be tested positive as well. Therefore, A ⊆ B so that
178
8.2. CONDITIONAL PROBABILITY
P (A ∩ B) 0.001
P (A|B) = = = 0.02.
P (B) 0.05
That is, even though you test positive, the probability of having the disease is only
2%. As the article points out, most people’s answers is 95%. They fail to include
the information in (iii) in their subjective probabilities.
Suppose that A and B are two events from a sample space Ω. We say that A
is independent of B if P (A|B) = P (A). That is, the probability of event A is
unaffected by whether event B happens or not. Using equation (8.2), we have
P (A ∩ B)
P (A|B) = = P (A),
P (B)
which gives
P (A ∩ B) = P (A)P (B). (8.3)
Example 8.11. Let the event A be drawing a 7 from a deck of cards and B be
drawing a diamond. Then P (A) = 1/13, P (B) = 1/4, and P (A ∩ B) = 1/52. It
follows that
P (A ∩ B) 1/52 1
P (A|B) = = = = P (A),
P (B) 1/4 13
P (A ∩ B) 1/52 1
P (B|A) = = = = P (B),
P (A) 1/13 4
Independent events arise from situations when the sample spaces are product
sets, as examples 8.3 and 8.11 show. A common situation is when we repeat a
random experiment n times, each with a sample space Ω. The sample space of all
the n trials is Ωn . Then any event in one particular trial is independent of any event
in another trial. For example, if we roll a die ten times, even the outcomes of the
first nine are all even, the probability of the last roll being odd is still one half.
179
CHAPTER 8. PROBABILITY
P (A ∩ B)
P (B|A) = ,
P (A)
which gives
P (A ∩ B) = P (B|A)P (A). (8.4)
P (B|A)P (A)
P (A|B) = .
P (B)
There are many interpretations of Bayes’ Theorem. Here we provide one in line with
the question we ask at the beginning of the section. Suppose we have a prior believe
that the probability of event A happens is P (A). Then we learn that another event
B has occurred. Given the new information, how would we update our believe on
event A? Bayes’ Theorem says that we should update P (A) by multiplying it with
the factor P (B|A)/P (B), the ratio of the conditional probability of B given A and
the probability of B.
Example 8.12. Consider the experiment of rolling a fair die once. The support of
the random variable X is S = {1, 2, 3, 4, 5, 6}. The probability mass function is
1/6 x = 1, 2, . . . , 6,
f (x) =
0 elsewhere.
By the axioms of probability, the probability mass function (pmf) satisfies the
180
8.3. PROBABILITY DISTRIBUTIONS
181
CHAPTER 8. PROBABILITY
The pdf of X is
1 0 ≤ x ≤ 1,
f (x) =
0 elsewhere.
Other similar distributions, with the pdf equal to a constant value on an interval, is
generally called uniform distributions.
182
8.4. MATHEMATICAL EXPECTATIONS
then we can define the mathematical expectation of u(X) in the discrete case as
X
E[u(X)] = u(x)f (x),
x∈S
Given a random variable X, let V (X) be the set of functions of X that satisfy
inequality (8.6) in the discrete case and inequality (8.7) in the continuous case.
Then it can be shown that V (X) is a vector space. By the properties of integration,
mathematical expectation is a linear functional on V (X). That is, for any u, v ∈
V (X) and α ∈ R,
Some of the important expectations for the continuous case are discussed as
follows. The counterparts for the discrete case are similar by using summation
instead of integration.
If u(X) = X, then Z ∞
E(X) = xf (x) dx
−∞
is called the expected value of X or simply the mean of X, and sometimes denoted
by µ. Intuitively, it gives the notion of the average of the outcomes if we repeat the
random experiment many times. The concept is also analogous to the centre of
gravity of a rigid body in physics. For this reason the mean is sometimes called the
first moment of the probability distribution about the point zero.
If u(X) = (X − µ)2 , then
Z ∞
2
E[(X − µ) ] = (x − µ)2 f (x) dx
−∞
183
CHAPTER 8. PROBABILITY
of the square of the distance of each of the outcomes weighted by the probabilities.
Therefore σ 2 is a measure of how spread out the outcomes are away from the mean.
For this reason it is sometimes called the second moment of the distribution about
the mean. The positive square root of the variance, σ, is called the standard
derivation of X. Some statisticians prefer the use of a dimensionless quantity to
measure the dispersion, which is called the coefficient of variation and is defined
as
σ
cv = ,
µ
provided that µ 6= 0.
By the linear property described in equation (8.8), we have
Example 8.15. Consider again the experiment of rolling a fair die once. The pmf
is given in example 8.12. The mean is
6
X 1
µ = E(X) = x = 3.5.
x=1
6
The variance is
6
X 1
σ 2 = E[(X − µ)2 ] = (x − µ)2 = 2.917.
x=1
6
√
The standard derivation is σ = 2.917 = 1.708. The coefficient of variation is
cv = 1.708/3.5 = 0.488.
We have Z a
1
µ = E(X) = x dx = 0,
−a 2a
184
8.4. MATHEMATICAL EXPECTATIONS
and a
a2
Z
2 2 1 2
σ = E[(X − µ) ] = x dx = .
−a 2a 3
√
This gives σ = a/ 3. The coefficient of variation cv is undefined since µ = 0.
The following results involve simple functions of a random variable. The proofs
are left as exercises.
Theorem 8.2. Suppose that X is a random variable with mean µ and variance σ 2 .
2. Let
X −µ
Z= .
σ
Then Z is called the standardized random variable of X and has mean equals
to 0 and variance 1.
E(X 3 ) − 3µσ 2 − µ3
cs = . (8.9)
σ3
185
CHAPTER 8. PROBABILITY
The fourth moment of the standardized random variable is defined as the coef-
ficient of kurtosis, which is
E[(X − µ)4 ]
ck = .
σ4
There are different interpretations of ck . The most common one is that it is a measure
of the weight of the tails of the distribution. A large coefficient means that X has
fat tails. The concept is similar to the moment of inertia of beams and columns in
structural mechanics.1
where !
n n!
=
x x!(n − x)!
is the number of combinations of taking x outcomes from a sample space of size
n without repetitions. The binomial distribution has mean µ = np and variance
σ 2 = np(1 − p).
Example 8.18. A fair die is rolled 6 times. Let A be the event that the outcome is
either 1 or 2 each time. Let X be the number of times that event A occurs. Then
we have n = 6 and p = 1/3. The values of the pmf of X are listed below:
x 0 1 2 3 4 5 6
f (x) 0.088 0.263 0.329 0.219 0.082 0.016 0.001
The mean of the distribution is µ = 2 and the variance σ 2 = 4/3. The readers can
verify that 6x=0 f (x) = 1.
P
1
See, for example, Timoshenko (1940, p. 343).
186
8.5. SOME COMMON PROBABILITY DISTRIBUTIONS
Example 8.19. A fair die is rolled until the event A = {1, 2} occurs. Let X be the
number of trials that A first happens. Then p = 1/3. The values of the pmf of the
first six trial are as follows.
x 1 2 3 4 5 6
f (x) 0.333 0.222 0.148 0.099 0.066 0.044
F (x) 0.333 0.556 0.704 0.802 0.868 0.912
F (x) = 1 − (1 − p)x , x = 1, 2, 3, . . . .
Γ(x + 1) = xΓ(x).
Γ(n + 1) = n!.
2
See Artin (1964) for the properties of the gamma function.
187
CHAPTER 8. PROBABILITY
Also, Z ∞
Γ(1) = e−t dt = 1,
0
1
f (x) = xα−1 e−x/β , 0 < x < ∞,
Γ(α)β α
where α > 0 and β > 0 are two parameters. The mean of the distribution is µ = αβ
and the variance is σ 2 = αβ 2 . The gamma distribution is a very good model of
income distribution because of the flexibility provided by the two parameters.
A special case of the gamma distribution is when α = k/2 and β = 2. The pdf
becomes
1
f (x) = xk/2−1 e−x/2 , 0 < x < ∞.
Γ(k/2)2k/2
It has a special name called a chi-square distribution with k degrees of freedom,
and is denoted by χ2 (k). The distribution has mean µ = k and variance σ 2 = 2k.
The χ2 distribution has a close relationship with the normal distribution, which we
shall define next.
The normal distribution is often denoted by N (µ, σ 2 ). The mean of the distribution
is µ and variance σ 2 . In practice, it is often convenient to transform the distribution
it standardized form, with
X −µ
Z= .
σ
By theorem 8.2, Z has mean 0 and variance 1, with pdf
2
1 z
f (z) = √ exp − , −∞ < z < ∞.
2π 2
188
8.5. SOME COMMON PROBABILITY DISTRIBUTIONS
x
−3 −2 −1 0 1 2 3
The normal distribution is symmetric about its mean and therefore the coefficient
of skewness cs is zero. Because of the shape of the graph, it is often call the bell
curve (see figure 8.1). The following two results are very useful in statistical analysis.
Theorem 8.3. Let X be a normally distributed random variable with mean µ and
variance σ 2 . Let Z = (X − µ)/σ be the standardized random variable of X. Then
Y = Z 2 is χ2 (1).
189
CHAPTER 8. PROBABILITY
When we have two random variables, say X1 and X2 , we consider the joint distribu-
tion with pdf f : R2 → R+ where
Z ∞Z ∞
f (x1 , x2 ) dx1 dx2 = 1.
−∞ −∞
The marginal distribution of X2 can be defined similarly. The means and variances
of X1 and X2 are then defined by their marginal pdf. Mathematical expectation is
a linear operator. That is, for any scalar α,
In addition to their means, µ1 and µ2 , and variances, σ12 and σ22 , we need to ask
whether they are correlated or not. The correlation is expressed by their covariance,
f (x1 , x2 )
f2|1 (x2 |x1 ) = . (8.11)
f1 (x1 )
190
8.6. MULTIVARIATE DISTRIBUTIONS
We say that the two random variables are independent if their joint pdf is sepa-
rable, that is,
f (x1 , x2 ) = f1 (x1 )f2 (x2 ).
It is also clear from (8.11) that if X1 and X2 are statistically independent then
E(cT X) = cT E(X) = cT µ
191
CHAPTER 8. PROBABILITY
and
Var(cT X) = cT Σc.
The most frequently joint distribution we encounter is the multivariate normal dis-
tribution, with joint pdf
1 1 T −1
f (x) = exp − (x − µ) Σ (x − µ) .
(2π)n/2 |Σ|1/2 2
E[u(X)] 6= u(E[X]).
Similarly,
1 1
E 6 = .
X E[X]
In these cases the functions are linearized for further analysis. For example, let the
discounted value of consumption in period t + 1 be βct+1 where 0 < β < 1 is the
discount factor. Using the first-order Taylor approximation at c∗ , the steady-state
consumption level,
1 1 1 1 ∗
Et ' Et − (ct+1 − c )
βct+1 β c∗ (c∗ )2
1 Et [ct+1 ]
= ∗
2− .
βc c∗
Also, in general
E(X1 X2 ) 6= E(X1 )E(X2 )
192
8.7. STOCHASTIC PROCESSES
8.7.1 Martingales
A stochastic process is called a martingale if for t = 0, 1, . . . ,
1. E[|Xt |] < ∞,
2. E[Xt+1 |X0 , X1 , . . . , Xt ] = Xt .
E[Xt+1 ] = E[Xt ].
Therefore a martingale has constant mean. If all the Xt have independent and
identical distributions, then the process is called a random walk.
1. E[t ] = 0,
2. E[2t ] = σ 2 ,
3. Cov(t , s ) = 0.
Xt = µ + t + θt−1 ,
193
CHAPTER 8. PROBABILITY
The variance is
Xt = c + φXt−1 + t .
Finding the mean and the variance of this AR(1) process is left as an exercise. In
this case the process has a constant mean and variance and is said to be stationary.
On the other hand, if φ < −1 or φ > 1, the process is non-stationary. Comparing
equations (8.12) and (8.13) reveals that an AR(1) process can be expressed as a
MA(∞) process with µ = c/(1 − φ) and θi = φi . In general a stationary AR process
of any order can be inverted into a MA process and vice versa.
It is often advantageous to use a mixed autoregressive and moving average model.
For example, an ARMA(1, 1) process is defined as
Xt = c + φXt−1 + t + θt−1 .
194
8.8. EXERCISES
More on stochastic processes can be found in Taylor and Karlin (1998), Box, Jenkins,
and Reinsel (1994), and Hamilton (1994). See Grinstead and Snell (1997), Hogg
and Craig (1995), or any book in mathematical probability and statistics for more
detailed discussions on probability theory.
8.8 Exercises
1. In a random experiment, a coin is flipped until the first tail appear. Describe the
sample space of the experiment.
2. In the game of craps, two dice are thrown and the sum of the number of dots on
the dice are recorded. Describe the sample space of each throw.
4. Show that the probability function defined in equation (8.1) satisfies the three
axioms of probability measure.
(a) F = {∅, Ω}
(b) F = {∅, {a}, {d}, {a, d}, {b, c}, Ω}
(c) F = {∅, {a, b}, {c, d}, Ω}
195
CHAPTER 8. PROBABILITY
9. (Wiggins, 2006) A patient goes to see a doctor. The doctor performs a test with
99 percent reliability—that is, 99 percent of people who are sick test positive and
99 percent of the healthy people test negative. The doctor knows that only 1
percent of the people in the country are sick. If the patient tests positive, what
are the chances the patient is sick?
10. (Paulos, 2011) Assume that you’re presented with three coins, two of them fair
and the other a counterfeit that always lands heads. If you randomly pick one
of the three coins, the probability that it’s the counterfeit is 1 in 3. This is the
prior probability of the hypothesis that the coin is counterfeit. Now after picking
the coin, you flip it three times and observe that it lands heads each time. Seeing
this new evidence that your chosen coin has landed heads three times in a row,
what is the revised posterior probability that it is the counterfeit?
12. Consider example 8.5 of tossing a fair coin twice. Let X be the random variable
of the number of times that tail shows up.
14. Find the mean, variance, and coefficient of variation of the random variable de-
fined in example 8.14.
17. The price of a new iTab computer is $500. It is known that a fraction d of the
iTabs are defective. All consumers value the non-defective computers at $550
each. The same model of second-hand iTab can be found on eBay for $50, which
includes shipping costs.
196
8.8. EXERCISES
(a) Do you expect to find non-defective iTabs for sale on eBay? Why or why
not?
(b) If the consumers are risk neutral, what is d?
18. A fair die is rolled 6 times. Let A be the event that the outcome is 5 each time.
Let X be the number of times that event A occurs. Find the pmf, the mean, and
the variance of X.
21. Mr. Smith has two children, at least one of whom is a boy. What is the probability
that the other is a boy? (Confused? See the discussion in Wikipedia.)
22. Consider the stock price model in section 9.1.2. Suppose the dividends of the
stock follow a random walk, that is,
dt = dt−1 + et ,
23. Find the mean and variance of the AR(1) process if −1 < φ < 1.
24. Consider the following passage from Grinstead and Snell (1997, p. 405):
197
CHAPTER 8. PROBABILITY
198
Chapter 9
Dynamic Modelling
In this chapter we extend economic modelling to include the time dimension. Dy-
namic modelling is the essence of macroeconomic theory. Our discussions provide
the basic ingredients of the so-called dynamic general equilibrium model.
199
CHAPTER 9. DYNAMIC MODELLING
it to equation (9.1). Now assume that equation (9.2) holds for xt+n , then
1 − an
n
axt+n + b = a a xt + b +b
1−a
1 − an
= an+1 xt + ab +b
1 − a n
n+1 1−a
= a xt + b a +1
1−a
a(1 − an ) + (1 − a)
n+1
= a xt + b
1−a
n+1
n+1 a−a +1−a
= a xt + b
1−a
n+1
n+1 1−a
= a xt + b
1−a
= xt+n+1 ,
which shows that equation (9.1) holds for xt+n+1 . The proof for the case a = 1 is
left as an exercise.
If −1 < a < 1, then limn→∞ an = 0. By equation (9.2) we have
b
lim xt+n = .
n→∞ 1−a
That is, if −1 < a < 1, xt converges to a steady-state value of b/(1 − a). Figure 9.1
plots the values of xt with a = 0.3, b = 10, with initial value x0 = 1. The series
quickly converges to the steady-state value of 10/(1 − 0.3) = 14.286 within ten
periods.
On the other hand, if a ≥ 1 or a ≤ −1, then xt diverges and does not achieve a
steady state.
Alternatively, equation (9.1) can be seen as a dynamical system, with
200
9.1. FIRST-ORDER DIFFERENCE EQUATION
xt
14
13
12
11
10
9
8
7
6
5
4
3
2
1
t
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
y y
6 45◦ 6 45◦
r r
r
-
r-
r
-
-
- x - x
x0 x1 x∗ x1 x0 x∗
(a) 0 < a < 1 (b) a > 1
Figure 9.2(a) illustrates the case when 0 < a < 1. The initial value of x is equal
to x0 . The value of x1 = ax0 + b is reflected back to the x-axis by the 45◦ line
and the process goes on. Eventually x converges to the fixed point at x∗ . Starting
points with values greater than x∗ go through a similar process converging to x∗ . In
figure 9.2(b) the coefficient a is greater than one. Values of x not equal to the fixed
point x∗ will move away from it. You should draw the cases of −1 < a < 0 and
a < −1 to reveal the dynamics of the so-called cobweb models.
201
CHAPTER 9. DYNAMIC MODELLING
1
xt = (−bt + xt+1 )
a
1 1
= −bt + (−bt+1 + xt+2 )
a a
1 1 1
= −bt + −bt+1 + (−bt+2 + xt+3 )
a a a
..
.
1 bt+1 bt+2 xt+s
= − bt + + 2 + · · · + lim s
a a a s→∞ a
∞
1 X bt+s xt+s
= − s
+ lim s
a s=0 a s→∞ a
∞
X bt+s xt+s
= − s+1
+ lim .
a s→∞ as
s=0
For the case of −1 < a < 1, we can solve the difference equation (9.3) “backward”
to get
X∞
xt = as−1 bt−s . (9.5)
s=1
In the special case that bt+s = b for all s, equation (9.3) becomes a geometric series
202
9.1. FIRST-ORDER DIFFERENCE EQUATION
and so
b
xt = .
1−a
See Goldberg (1986) for more on difference equations. Chapter 2 in Hamilton (1994)
also contains discussions on difference equations using lag operators.
Example 9.1. (Azariadis, 1993, Chapter 3) Suppose that investors have two choices
of asset. The first is a riskless government bond that pays an interest at the rate of
r per period. The second is a stock that pays dividend dt in period t. The stock is
bought or sold at market price pt before the dividend is paid out. Then a risk-neutral
investor faces
(1 + r)pt = dt + Et [pt+1 ], (9.6)
that is, an amount of pt invested in government bond should be equal to the expected
return invested in the stock. The investor has adaptive expectation in every period:
with 0 < λ < 1. This means that forecasted price Et [pt+1 ] is partially adjusted by a
factor of λ towards the forecasting error pt − Et−1 [pt ] in the last period. Substitute
the expectation in equation (9.6) into (9.7), we get
pt+1 = apt + bt ,
where
(1 + r)(1 − λ)
a= ,
1+r−λ
and
dt+1 − (1 − λ)dt
bt = .
1+r−λ
Given that r > 0, it is clear that 0 < a < 1. We can therefore apply equation (9.5)
to obtain the price of the stock in period t,
∞
X
pt = as−1 bt−s .
s=1
203
CHAPTER 9. DYNAMIC MODELLING
In the special case that dividends are constant, that is, dt = d for all t,
λd
bt = b =
1+r−λ
Figure 9.3 shows the price/earning ratio of the U.S. stock market, which is pt /dt in
our model. The average ratio in the past 50 years is 19.5.
204
9.2. DYNAMICAL SYSTEMS
where
• t ∈ Z = {. . . , −2, −1, 0, 1, 2, . . .} denotes discrete time periods,
205
CHAPTER 9. DYNAMIC MODELLING
where F is the aggregate production function and δ is the depreciation rate of capital.
The output is the utility of the aggregate household so that the payoff function in each
period is U (ct ). The objective of the central planner is to maximize the discounted
values of present and future utility, that is,
∞
X
max β t U (ct ),
ct
t=0
where β = 1/(1 + θ) is the social discount factor. The following Euler equation is
obtained from the necessary conditions of optimization (see section 9.3):
U 0 (ct+1 ) 0
β [F (kt+1 ) + 1 − δ] = 1. (9.12)
U 0 (ct )
Assuming the utility function U is strictly concave so that U 00 (c) < 0 for all values
of c, the Euler equation can be solved for ct+1 as a function of ct and kt (the term
kt+1 can be substituted by ct and kt using equation (9.11)). The result is a two-
dimensional autonomous dynamical system with xt = [kt ct ]T .
Consider an autonomous and stationary dynamical system with given initial state
x. After two periods we have f (f (x)) = f 2 (x). In general, f t+s (x) = f s (f t (x)). The
sequence {xt } for t ∈ Z is called the orbit of x.1 When the sequence only includes
t = {0, 1, 2, . . .}, it is called a forward orbit. Similarly, a backward orbit consists
of t = {0, −1, −2, . . .}. A fixed point p is defined by f (p) = p. The set of all fixed
points of f is written as Fix(f ). In economics a fixed point is generally called an
equilibrium.
In dynamic general equilibrium models, the system is sometimes linearized by
first-order Taylor approximation about the steady-state values. An autonomous
linear dynamical system can be written as
1
Orbits are often called trajectories or flows.
2
Boyd (2008) provides a detailed analysis of linear dynamical systems.
206
9.3. DYNAMIC PROGRAMMING
where β ∈ (0, 1] is a discount factor. The value of the state variable in period 0, x0 ,
is given. In other words, we want to maximize the sum of the present values of the
output variables from period t = 0 onward. Notice that the payoff function g and
the transition function f are stationary. This will allow us to express the system in a
recursive form below. Often we assume that g is differentiable and concave and the
feasible set S = {(xt+1 , xt ) : xt+1 ≤ f (xt , ut ), t = 0, 1, . . .} is convex and compact.
If the solution of the above problem exists, ut = h(xt ) is called the policy
function. With the initiate state x0 and the transition equation (9.14), the sequence
{(xt , ut )} is obtained. By putting the sequence {(xt , ut )} in the objective function
in (9.13), we get the value function
(∞ )
X
v(x0 ) = max β t g(xt , ut ) : xt+1 = f (xt , ut ) .
ut
t=0
After some time periods, say s, we obtain the state variable xs . The structure of the
problem remain the same since g and f are stationary. Therefore the value function,
now depends on xs , is
(∞ )
X
v(xs ) = max β t−s g(xt , ut ) : xt+1 = f (xt , ut ) .
ut
t=s
In general, in any period t, the value function is the sum of current payoff function
207
CHAPTER 9. DYNAMIC MODELLING
In fact, since the value function has the same structure in every period, we can
express the above equation recursively as
We can substitute the maximizer u with the policy function h(x) in the above equa-
tion to obtain
v(x) = g(x, h(x)) + βv(f (x, h(x))). (9.17)
Equations (9.15), (9.16), and (9.17) are different versions of the Bellman equation.
Given g and f , equation (9.17) is a functional equation with unknown functions v
and h to be found.
Existence of the solution is confirmed by the following theorem. The proof can
be found in Stokey and Lucas (1989, p. 79), and Ljungqvist and Sargent (2004,
p. 1011–1012).
Theorem 9.1. Suppose that g is continuous, concave, and bounded, β ∈ (0, 1), and
the feasible set S is nonempty, compact, and convex. Then there exists a unique
function v that solves the Bellman equation (9.17).
Methods for solving the Bellman equation for the policy function h and value
function v include iterations of h, iterations of v, the method of undetermined co-
efficients, and numerical analysis using computers. You can consult Stokey and
Lucas (1989) or Ljungqvist and Sargent (2004) for details. In what follows we as-
sume that the value function v is differentiable and derive the first-order conditions
for maximizing the Bellman equation. Then we apply the envelope theorem to the
value function v. Together we obtain the same set of necessary conditions as in the
Lagrange method.
Let λt = ∇v(xt ) and λt+1 = ∇v(xt+1 ). The first-order condition of the maxi-
mization problem in (9.15) is
208
9.3. DYNAMIC PROGRAMMING
Next, we differentiate the functional equation (9.17) with respect to the state variable
x:
By the first-order condition (9.18), the term inside the square bracket in the last
equality above is the zero vector in Rm . Therefore we have
Table 9.1 keeps track of the dimensions of the derivatives in the above algebra.
You should verify that all matrix multiplications in applying the chain rule are
compatible.
Equation (9.19) is similar to the envelope theorem in static optimization. It
effectively means that the rate of change (gradient) of the value function v with
respect to the state variable xt is equal to the partial derivative of the Bellman
equation with respect to xt .
9.3.2 Recipe
Here is a set of cookbook style procedures to follow as an alternative method to the
Lagrange method in analyzing dynamic optimization problems:
1. Make sure that the objective function is additively separable for each period
and the payoff function g(xt , ut ) is stationary.
209
CHAPTER 9. DYNAMIC MODELLING
2. Make sure that the transition function is stationary and expressed in the form
5. Differentiate the right-hand side of the Bellman equation with respect to the
control variable ut and set the gradient equal to the zero vector 0 ∈ Rm as in
equation (9.18).
6. Differentiate the right-hand side of the Bellman equation with respect to the
state variable xt and set the gradient equal to λt ∈ Rn as in equation (9.19).
7. The results in steps 5 and 6, together with the transition equation (9.20) form
the set of necessary conditions for optimization.
Example 9.3. Consider the Ramsey model in example 9.2. In this case m = n = 1.
The Bellman equation is
U 0 (ct ) − βλt+1 = 0,
β[F 0 (kt+1 ) + 1 − δ]λt+1 = λt .
From the first equation λt+1 = U 0 (ct )/β so that λt = U 0 (ct−1 )/β. Substitute these
into the second equation above and shift one period forward gives the Euler equation
in (9.12).
210
9.3. DYNAMIC PROGRAMMING
subject to
dk(t)
= F (k(t)) − δk(t) − c(t).
dt
The problem can be solved using the technique of optimal control theory and is popu-
lar among theorists of economic growth and natural resource economics. The results
given by the continuous time version are similar to the discrete time version. But
since economic data such as output, investment, consumption, and inflation come in
discrete form, the dynamic programming approach is more convenient in empirical
analysis. An excellent exposition of optimal control theory and its applications in
economics can be found in Weitzman (2003).
Example 9.4. In this example we analyse the dynamics of durable goods set up in
Wickens (2011, p. 74–76) using dynamic programming. Utility U (ct , Dt ) in each pe-
riod depends on consumption of nondurable goods ct and the stock of durable goods
Dt . The two state variables are Dt and asset holding at . The two control variables
are ct and the purchase of durable goods in each period, dt , which has a relative
price equal to pt . The transition equations are the durable goods accumulation and
the budget constraint:
Dt+1 = dt + (1 − δ)Dt ,
at+1 = xt + (1 + rt )at − ct − pt dt ,
where δ is the depreciation rate, xt is exogenous income, and rt is the interest rate.
The Bellman equation is
211
CHAPTER 9. DYNAMIC MODELLING
The second line in the above equation gives the Euler equation. The first line, using
the Euler equation, can be expressed as
Uc,t−1 1−δ
UD,t = pt−1 − pt .
β 1 + rt
See Adda and Cooper (2003, Chapter 7) for a survey on the economics of durable
consumption.
A popular class of dynamic programming models is the linear quadratic model. They
are useful in solving rational expectation models and as second-order approximations
to nonlinear models. The structure of the model is that the payoff function g is the
sum of two quadratic forms of the state variable xt and the control variables ut ,
while the transition function f is a linear function of xt and ut . The problem is
∞
X
−β t xT T
max t Rxt + ut Qut , β ∈ (0, 1),
ut
t=0
subject to
xt+1 = Axt + But , x0 given.
v(xt ) = max − xT T
t Rxt + ut Qut + βv(Axt + But ) .
ut
212
9.3. DYNAMIC PROGRAMMING
and
−1
λt = 2 βBB T
BQut−1 .
This is a form of Euler equation which gives the intertemporal relation of the control
variable u. If m > n, we need more constraints among the components of u such as
government budget constraint, money supply rule, etc.
subject to
xt+1 = f (xt , ut ), t = 0, . . . , T − 1.
As before the initial value of the state variable, x0 , is given. The additional term
in the objective function, β T v(xT ), is the discounted scrap value, which is usually
specified in the model. In fact, the problem can be solved by starting from the last
213
CHAPTER 9. DYNAMIC MODELLING
period and working backward. The following example is taken from Carter (2001,
Chapter 7).3 It illustrates an application of dynamic programming in operations
management. The structure is similar to the “cake eating problem” described in
Adda and Cooper (2003, p. 14–15).
Example 9.5. A mining company has acquired a licence to extract a rare earth
element for three years. The geophysicist of the company has estimated that there
are 128 tonnes of the ore in the mine. The market price of the ore is $1 million per
tonne. The total cost of extraction is u2t /xt , where ut is the rate of extraction and
xt is the stock of the ore remaining in the ground. The discount rate β is assumed
to be 1. The problem can be set up as
3
u2t
X
max ut − + v(x3 ) ,
ut
t=0
xt
subject to
xt+1 = xt − ut , t = 0, 1, 2.
Since any quantity of ore remains in the ground after year 3 has no value to the
company, we can assume that v(x3 ) = 0. The Bellman equation in period t = 2 is
u2
v(x2 ) = max u − ,
u x2
x2 x2 x2
v(x2 ) = − 2 = .
2 4x2 4
u2
v(x1 ) = max u − + v(x2 )
u x1
u2 1
= max u − + (x1 − u) .
u x1 4
214
9.4. AUTONOMOUS DYNAMICAL SYSTEMS
9x21
3x1 1 3x1 25
v(x1 ) = − + x1 − = x1 .
8 64x1 4 8 64
u2
v(x0 ) = max u − + v(x1 )
u x0
u2 25
= max u − + (x0 − u) .
u x0 64
t xt ut
0 128 39
3
1 89 8
× 89 = 33.4
5 5
2 8
× 89 = 55.6 16
× 89 = 27.8
Notice that the cost of extraction is inversely related to the stock, so it is not possible
to extract all the mineral in the ground.
215
CHAPTER 9. DYNAMIC MODELLING
A = P ΛP T
xt+1 = P ΛP T xt .
xt+2 = Axt+1
= A2 xt
= (P ΛP T )(P ΛP T )xt
= P Λ2 P T xt ,
xt+s = P Λs P T xt .
It is obvious that for the system to converge the origin, each of the eigenvalues must
be less than one in absolute value. In other words, the spectral radius of A, defined
as the largest of the absolute values of the eigenvalues, must be less than one. We
have shown the following result.4
Theorem 9.2. Suppose that the transition function f is linear and all eigenvalues
of f have absolute values less than one. Then
lim f t (x) = 0
t→∞
4
Theorem 9.2 applies to asymmetric linear functions as well. See Devaney (2003) or Hasselblatt
and Katok (2003, Chapter 3) for details.
216
9.4. AUTONOMOUS DYNAMICAL SYSTEMS
for all x ∈ Rn .
Notice that in the above example S1 and S2 are invariant subspaces, that is,
once an orbit enter one of the subspaces, it cannot escape since Ax = λx. Formally,
a set S is invariant under a transition function f if xt ∈ S implies that xt+s ∈ S
for all s ≥ 1. Moreover, S1 is called an unstable subspace since any forward orbit
in it diverges. On the other hand, S2 is a stable subspace since any forward orbit
entering it converges to the origin.
217
CHAPTER 9. DYNAMIC MODELLING
x2 xt+4
S2 xt S1
xt+3
xt+1 xt+2
v2 v1
x1
1. a sink or attracting point if all the eigenvalues of Df (p) are less than one
in absolute value,
2. a source or repelling point if all the eigenvalues of Df (p) are greater than
one in absolute value,
3. a saddle point if some eigenvalues of Df (p) are less than one and some are
greater than one in absolute value.
Theorem 9.3. Suppose f has sink at p. Then there exists an open set U with p ∈ U
in which the forward orbit of any point converges to p.
The largest of such open set is called the basin of attraction. The proof of the-
orem 9.3 for the case of R2 can be found in Devaney (2003, p. 216). A corollary of
the theorem is that if p is a source and f is invertible, then there exist an open set
in which the backward orbit of any point converges to p.
The case for a saddle point is more interesting. Recall the example of a linear
transition function in section 9.4.1 with eigenvalues λ1 = 2 and λ2 = 1/2. Therefore
p = 0 is a saddle point. There we find a stable and an unstable subspaces, spanned
by the eigenvectors or λ2 and λ1 respectively. The equivalent invariant subsets
218
9.4. AUTONOMOUS DYNAMICAL SYSTEMS
in the nonlinear case are not linear subspaces anymore but manifolds. Manifolds
are high-dimensional analogues of smooth curves (one-dimensional manifolds) or
surfaces (two-dimensional manifolds). A stable manifold is often called a saddle
path. Another example will illustrate the point.
so that y = x3 is the stable manifold. Any orbit entering the saddle path y = x3 will
converge to the origin. Otherwise it will diverge.
219
CHAPTER 9. DYNAMIC MODELLING
y y y
6 B 6 - B 6
?
∆x = 0 - ∆y = 0B 6 @ B 6
-
PP B PP@R B
PP - x B - x P@
R
@PBP - x
PP B B PP
PP P
P B B@
I
@ P
B B @ I
@
? B ? B 6
B B
(a) (b) (c)
∆x = Ax − x
= (A − I)x
220
9.4. AUTONOMOUS DYNAMICAL SYSTEMS
y ≥ −3x. The phase line is shown in figure 9.5(b), with points above the line going
upward in their forward orbits and points below going downward. Putting the two
phase lines together as in figure 9.5(c), we can infer the directions of the forward
orbits in different regions. The location of the saddle path can be approximated
using the information, which lies in the stable subspace in the direction of v2 in
figure 9.4.
In a linear system with a saddle point, a phase line is a hyperplane passing
through the fixed point. The hyperplane divides the whole space into two half-
spaces, with orbits going in opposite directions.
Example 9.9. Consider the Ramsey model we analyzed in examples 9.2 and 9.3.
We have x = [k c]T . The transition equations are the capital accumulation equa-
tion (9.11) and the Euler equation (9.12) of intertemporal consumption. In the
steady state, k = k ∗ and c = c∗ . The Euler equation reduces to F 0 (k ∗ ) = θ + δ so
that
k ∗ = (F 0 )−1 (θ + δ).
c∗ = F (k ∗ ) − δk ∗ .
c ≤ F (k) − δk.
With the usual assumptions on the production function F (the Inada condition),
the phase line ∆k = 0 has the shape shown in figure 9.6(a). Capital stock above this
phase line is decreasing and below increasing. From the Euler equation,
∆c ≥ 0 ⇔ ct+1 ≥ ct
⇔ U 0 (ct+1 ) ≤ U 0 (ct )
⇔ β(F 0 (kt+1 ) + 1 − δ) ≥ 1
⇔ F 0 (kt+1 ) ≥ θ + δ
⇔ kt+1 ≤ (F 0 )−1 (θ + δ)
⇔ kt+1 ≤ k ∗ .
221
CHAPTER 9. DYNAMIC MODELLING
c c c
6 6 6 6
r
6
∆k = 0
) ?
∆c = 0 *
- -
? - -6 ? -
- k k k
k∗
(a) (b) (c)
Since kt+1 = F (kt ) + (1 − δ)kt − ct , the above result means that ∆c ≥ 0 if and only if
F (k) + (1 − δ)k − c ≤ k ∗ ,
or
c ≥ F (k) + (1 − δ)k − k ∗ .
Figure 9.6(b) shows the phase line ∆c = 0. Consumption above this phase line is
increasing and below decreasing. Combining all the information in figure 9.6(c), we
see that the stable manifold or saddle path converges to the steady-state equilibrium
(k ∗ , c∗ ) in the regions between the two phase lines.
In practice, with nonlinear functional forms for U and F , an explicit formulation
for the stable manifold may not be possible. Many analysts choose to linearize the
system about the fixed point (k ∗ , c∗ ). Others may solve the nonlinear system by
using numerical methods.
222
9.4. AUTONOMOUS DYNAMICAL SYSTEMS
2. The covariance matrix between two time periods, E[(xt+s − µ)(xt − µ)T ], de-
pends on the separation of time s = 0, 1, 2, . . ., but not on t.
Vector Autoregression
Some economists suggest that the stock market is a leading indicator for the ag-
gregate economy, but not the other way round.5 Moreover, equity returns are by
themselves random walks. Let xt = [rt γt ]T be the rate of return of the stock market
and growth rate of the economy respectively in period t and c = [c1 c2 ]T be their
average rates in the long run. The relation can be represented by the model
" # " # " #" # " #
rt+1 c1 φ11 φ12 rt t1
= + + ,
γt+1 c2 φ21 φ22 γt t2
where t1 and t2 are white noises. The above ideas can be tested by the hypotheses
φ11 = φ12 = 0 and φ21 > 0.
In general, let xt be a state variable in period t and c be a constant vector, both
in Rn . Suppose t be an n-dimensional vector of white noise, that is,
1. E[t ] = 0 ∈ Rn ,
2. E[t T n n
s ] = 0 ∈ R × R for t 6= s,
3. E[t T
t ] = Ω,
xt = c + Φxt−1 + t , (9.25)
223
CHAPTER 9. DYNAMIC MODELLING
values less than one. That is, xt has constant mean and variance in every period.
The expected values of xt in equation (9.25) is
which gives
µ = E[xt ] = E[xt−1 ] = (In − Φ)−1 c.
Finding the covariance matrix Σ of xt is more involving but it can be shown that Σ
satisfies the following discrete Lyapunov matrix equation:
Σ = ΦΣΦT + Ω. (9.26)
See Stock and Watson (2001) for an introduction in applying VAR to macroeco-
nomics. Hamilton (1994) and Lütkepohl (2005) provide extensive coverage on anal-
ysis and estimation of VAR models.
Markov Chains
Now imagine we draw a random person from the work force in period t and let
πt = [πt1 πt2 ]T be the probabilities that she is employed or unemployed respectively.
Here πt2 = 1 − πt1 is the unemployment rate of the economy in period t. The
probability distribution of this person’s employment status in the next period is
224
9.4. AUTONOMOUS DYNAMICAL SYSTEMS
Suppose that the average income of an employed worker is ȳ1 and that of an un-
employed person is ȳ2 . Write ȳ = [ȳ1 ȳ2 ]T . Then yt = ȳT xt is a random variable
of income in period t. To summarize, {yt } is a stochastic process of labour income
with probability distribution πt in period t.
In the general set up, the state of the world consists of n possible outcomes,
denoted by the state vectors xt ∈ {e1 , . . . , en }. The conditional probability that
state j will happen in period t + 1 given that state i occurs in period t is
The probability distribution of the n states in each period t is given by an n×1 vector
πt = [πt1 · · · πtn ]T , with ni=1 πti = 1. The transition equation for the probability
P
distribution is therefore6
πt+1 = P T πt .
6
Some authors prefer to writing πt as a row vector so that the transition equation is πt+1 = πt P .
Others define the transition matrix P as the transpose of what we specified in equation (9.28),
consequently the transition equation is πt+1 = P πt .
225
CHAPTER 9. DYNAMIC MODELLING
It can be shown (see exercise 23 below) that P T has at least one eigenvalue equals
to one. That is, there exists an eigenvector π ∈ S such that
π = P T π, (9.30)
which implies that π is a fixed point, or in this case called a stationary probability
distribution of the Markov chain. A Markov chain is called regular if some power
of the transition matrix has only positive elements. That is, there is an integer k
such that P k is a positive matrix. It turns out that for a regular Markov chain, all
initial probability distributions converge to one unique fixed point. The process is
said to be asymptotically stationary, with
lim P t = Π,
t→∞
1T π = 1. (9.31)
(In − P T )π = 0. (9.32)
Aπ = e1 , (9.33)
226
9.5. EXERCISE
See Grinstead and Snell (1997) and Hamilton (1994) for introductions to Markov
chains, Levin, Peres, and Wilmer (2009) provides a more advanced treatment.
9.5 Exercise
1. Draw the graphs showing the dynamics of equation (9.1) for the cases of −1 <
a < 0 and a < −1.
at+1 = (1 + rt )(at + xt − ct ),
227
CHAPTER 9. DYNAMIC MODELLING
(a) Set up the Bellman equation and show that the Euler equation is
U 0 (ct+1 )
βEt (1 + rt+1 ) = 1.
U 0 (ct )
(b) Suppose that utility is represented by the quadratic function U (ct ) = −(ct −
γ)2 and rt = θ for all periods. Show that consumption is a martingale, that
is, Et ct+1 = ct .
7. Consider the lump-sum taxation model where the central planer maximizes house-
hold utility
X∞
β s cαt+s gt+s
1−α
, 0 < α < 1,
s=0
subject to wt+1 = wt − ct with the initial size of the cake w0 given. In each period
the consumer eat part of the cake (ct ) but must save the rest.
10. (Ljungqvist and Sargent, 2004, p. 110) Consider the linear quadratic model in
section 9.3.3 with β = 1. Suppose we know that the value function is also a
228
9.5. EXERCISE
11. In a linear dynamical system in R2 where |λ1 | > 1 and |λ2 | < 1, what is the
necessary condition for the system to converge to the origin?
13. Describe the dynamics of the following linear maps with matrix representations
! !
1
2 1 −2 0
(a) , (b) ,
1 1 0 2
!
cos α − sin α
(c) β ,
sin α cos α
with 0 ≤ α < 2π and β > 0.
14. Suppose that all the eigenvalues a square matrix A have absolute values less than
one.
(a) Show that the matrix I − A is invertible. (Hint: Let (I − A)x = 0, where x
is a column vector. Show that x = 0.)
(b) Prove that (I − A)−1 = I + A + A2 + · · · .
xt+1 = Axt + b,
229
CHAPTER 9. DYNAMIC MODELLING
y
6 45◦
- x
a b
16. Derive the equations for the phase lines for the dynamical system in the example
of section 9.4.2:
x 15 3
f (x, y) = , 2y − x .
2 8
Draw the phase diagram and locate the stable and unstable manifolds.
where kt is the capital stock in period t, s, δ, and n are the exogenous rates
of saving, depreciation, and population growth, respectively. The production
function F satisfies the Inada conditions.
18. Growth in the Solow-Swan model is forced by the strong assumptions on the
production function. Here is another model that is more realistic at low income
230
9.5. EXERCISE
levels. Let xt be the income per person of an economy at period t. Suppose that
xt+1 = f (xt ) where
(b − a)1−α (x − a)α + a, if a ≤ x ≤ b,
f (x) =
(x − b)β + b, if x > b.
The region a ≤ x ≤ b is called a poverty trap. See figure 9.7 and Banerjee and
Duflo (2011, p. 12). The book provides justifications for the shape of the growth
curve.
19. Consider a stationary VAR(1) process as in equation (9.25). Let x̄t = xt − µ, the
derivation of xt from the mean.
x̄t = Φx̄t−1 + t .
(b) As in the theory of ordinary least square, the error term t is assumed to be
uncorrelated with x̄t−1 , that is, E[x̄t−1 T n n
t ] = 0 ∈ R × R . Show that the
covariance matrix Σ of xt satisfies equation (9.26).
E[xt+1 |xt ] = P T xt .
7
“Six Years into a Lost Decade,” The Economist, August 6, 2011.
231
CHAPTER 9. DYNAMIC MODELLING
22. Show that the eigenvalues of the two-state Markov chain defined in equation (9.27)
are λ1 = 1 and λ2 = p11 + p22 − 1.
232
Bibliography
Adda, Jérôme and Russell Cooper (2003) Dynamic Economics, Cambridge: The
MIT Press.
Artin, Emil (1964) The Gamma Function, New York: Holt, Rinehart and Winston.
Banerjee, Abhijit and Esther Duflo (2011) Poor Economics: A Radical Rethinking
of the Way to Fight Global Poverty, New York: PublicAffairs.
Beck, Matthias and Ross Geoghegan (2010) The Art of Proof, New York: Springer
Science+Business Media.
Box, George E.P., Gwilym M. Jenkins, and Gregory C. Reinsel (1994) Time Series
Analysis: Forecasting and Control, Third edition, Englewood Cliffs: Prentice-
Hall, Inc.
Brosowski, Bruno and Frank Deutsch (1981) “An Elementary Proof of the Stone-
Weierstrass Theorem,” Proceedings of the American Mathematical Society,
81(1), 89–92.
233
Bibliography
Day, Richard H. (1999) Complex Economic Dynamics, Volume II, Cambridge: The
MIT Press.
Devlin, Keith (1993) The Joy of Sets, Second Edition, New York: Springer Sci-
ence+Business Media.
Ellis, George (2012) “On the Philosophy of Cosmology,” Talk at Granada Meeting,
2011. Available at <http://www.mth.uct.ac.za/∼ellis/philcosm 18 04 2012.pdf>
Hasselblatt, Boris and Anatole Katok (2003) A First Course in Dynamcis: with a
Panorama of Recent Developments, Cambridge: Cambridge University Press.
Hoffman, Kenneth and Ray Kunze (1971) Linear Algebra, Second Edition, Engle-
wood Cliffs: Prentice-Hall.
234
Bibliography
Holt, Jim (2008) “Numbers Guy,” The New Yorker, March 3 issue.
Jehle, Geoffrey A. and Philip J. Reny (2011) Advanced Microeconomic Theory, Third
edition, Harlow: Pearson Education Limited.
Lancaster, Kelvin (1968) Mathematical Economics, New York: The Macmillan Com-
pany.
Lay, Steven R. (2000) Analysis with an Introduction to Proof, Third edition, Upper
Saddle River: Prentice Hall.
Levin, David A., Yuval Peres, and Elizabeth L. Wilmer (2009) Markov Chains and
Mixing Times, Providence: American Mathematical Society. (Chapters 1 and
2 downloadable from the AMS web site.)
Lütkepohl, Helmut (2005) New Introduction to Multiple Time Series Analysis, Berlin:
Springer-Verlag.
Marsden, Jerrold E. and Anthony J. Tromba (1988) Vector Calculus, Third edition,
W.H. Freeman and Company.
Matso, John (2007) “Strange but True: Infinity Comes in Different Sizes,” Scientific
American, July 19 issue.
McLennan, Andrew (2014) Advanced Fixed Point Theory for Economics, Available
at <http://cupid.economics.uq.edu.au/mclennan/Advanced/advanced fp.pdf>.
Overbye, Dennis (2014) “In the End, It All Adds Up to −1/12,” The New York
Times, February 3 issue.
Paulos, John Allen (2011) “The Mathematics of Changing Your Mind,” The New
York Times, August 5 issue.
Poston, Tim and Ian Stewart (1978) Catastrophe Theory and its Applications, Lon-
don: Pitman.
Rao, C.R. (1973) Linear Statistical Inference and Its Applications, Second Edition,
New York: John Wiley & Sons.
235
Bibliography
Rosenlicht, Maxwell (1968) Introduction to Analysis, Scott, Foresman and Co. (1986
Dover edition).
Ross, John F. (2004) “Pascal’s legacy,” EMBO Reports, Vol. 5, Special issue, S7–
S10.
Royden, H.L. and P.M. Fitzpatrick (2010) Real Analysis, Boston: Prentice Hall.
Rudin, Walter (1976) Principles of Mathematical Analysis, Third edition, New York:
McGraw-Hill.
Stokey, Nancy L. and Robert E. Lucas (1989) Recursive Methods in Economic Dy-
namics, Cambridge: Harvard University Press.
Strogatz, Steven (2010) “Division and Its Discontents,” The New York Times, Febru-
ary 21 issue.
Tversky, Amos and Daniel Kahneman (1974) “Judgment under Uncertainty: Heuris-
tics and Biases,” Science, 185(4157), 1124–1131.
Weitzman, Martin L. (2003) Income, Wealth, and the Maximum Principle, Cam-
bridge: Harvard University Press.
236