0% found this document useful (0 votes)
14 views267 pages

Math 99 Rhandout 2020

The document discusses various methods for representing the preferences of decision-makers in mathematical economics, including choice correspondences, preference relations, and utility functions. It outlines key concepts such as the axioms of completeness and transitivity, which are essential for rationality in preferences, and introduces the Weak Axiom of Revealed Preference (WARP). Additionally, it explores the conditions under which preferences can be represented by utility functions and the implications of these representations in both finite and infinite contexts.

Uploaded by

pshpark17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views267 pages

Math 99 Rhandout 2020

The document discusses various methods for representing the preferences of decision-makers in mathematical economics, including choice correspondences, preference relations, and utility functions. It outlines key concepts such as the axioms of completeness and transitivity, which are essential for rationality in preferences, and introduces the Weak Axiom of Revealed Preference (WARP). Additionally, it explores the conditions under which preferences can be represented by utility functions and the implications of these representations in both finite and infinite contexts.

Uploaded by

pshpark17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 267

Math 99r: Mathematical Economics

Peter S. Park

April 29, 2020


Preferences: Choice correspondence

Consider the problem of representing the preference of a


decision-maker (DM) who consistently chooses among a menu of
alternatives in a choice set X .
The first way is by cataloguing the DM’s choice for each nonempty
subset (menu) A ⊆ X of alternatives.
This can be done with a choice correspondence (on the set M(X ) of
subsets of X ).

Definition
A choice correspondence c : M(X ) ⊸ X is a correspondence defined on
the menus A ⊆ X that satisfies ∅ =
̸ c(A) ⊆ A, representing the DM’s
possible choices when choosing between the alternatives in A.
Preferences: Preference relation

When only cataloguing the DM’s choice for pairs A = {x, y } of


alternatives, we instead have the second way of representing the DM’s
preference: a preference relation.

Definition
A preference relation ≾ is a binary relation (i.e., a subset of X × X ) where
x ≾ y means that the DM weakly prefers y over x.

A naive translation to the language of choice correspondences would


be that x ≾ y should be equivalent to y ∈ c({x, y }).
Preferences: Utility function

The third standard way to represent the DM’s preference is through a


utility function u : X → R.
The utility function has the following meaning: for x, y ∈ X , we have
x ≾ y if and only if u(x) ≤ u(y ).
In order of parsimony, we have

choice correspondence ≤ preference relation ≤ utility function,

with the utility function being the most parsimonious, perhaps the
reason for its ubiquity in economics.
Not all choice functions can be represented as preference relations and
not all preference relations can be represented as utility functions.
Let’s try to see precisely when we can do so.
Preferences: Revealed preference

Given a choice correspondence c, define its revealed preference


relation ≾ by
x ≾ y ⇐⇒ y ∈ c({x, y }).
It is clear that the revealed preference relation is unique.
The problem is trying to invert this procedure.
Doing so, if possible, would be to—when given ≾—define c≾ by

c≾ (A) = {x ∈ A : y ≾ x for all y ∈ A}.

But in general, such a maximal element x of A may not exist.


Preferences: Axiom of completeness

Let’s list possible obstructions to the existence of a maximal element


of a menu A.
First, there cannot be a maximal element if one cannot pairwise
compare all elements.
This motivates assuming ≾ satisfies the axiom of completeness.

Definition
A preference relation ≾ is complete if and only if any two x, y ∈ X , we
have x ≾ y or y ≾ x.
Preferences: Axiom of transitivity
Even if ≾ is complete, there might not be a maximal element.
For example, suppose X is the list of candidates running for a political
position and ≾ denotes pairwise comparisons for candidates x, y ∈ X .
Define majority voting by the voting system that crowns the candidate
who wins more votes in pairwise comparisons with every other
candidate.
It is well-known that majority voting is not deterministic, i.e.,

{x ∈ X : y ≾ x for all y ∈ X }

may not exist.


How so? The obstruction is that there could be more than two
candidates x1 , . . . , xn such that

x1 ≺ x2 ≺ · · · xn ≺ x1 ,

so that none of x1 , . . . , xn is weakly preferred to all the others.


Preferences: Axiom of transitivity, pt. 2

This second obstruction of the existence of a maximal element


motivates assuming ≾ satisfies the axiom of transitivity.

Definition
A preference relation ≾ is transitive if and only if for every x, y , z ∈ X such
that x ≾ y and y ≾ z, we have x ≾ z.

Colloquially, the axiom represents a “consistency of ordering”:


in that the (at least weak) preference of x over y the DM revealed in
her choice among menu A consistently applies to her choice among
menu B as well.
The axioms of completeness and transitivity together comprise
rationality.
Preferences: Rationality

We can analogously axiomatize rationality in the context of choice


correspondences rather than that of preference relations.
Sen did so in the following way.

Definition (Sen’s Property α, also called Arrow’s Property IIA)


Whenever for menus A and B and element x ∈ B ⊆ A we have x ∈ c(A),
it follows that x ∈ c(B).

Definition (Sen’s Property β)


Whenever for menus A and B and elements x, y ∈ c(A) we have A ⊆ B
and y ∈ c(B), it follows that x ∈ c(B).
Preferences: Weak axiom of revealed preference

Together, they comprise the well-known Weak Axiom of Revealed


Preference (WARP), the rationality axiom that is omnipresent in
classical economic theory.

Definition (WARP)
Whenever for menus A and B and elements x, y ∈ A ∩ B we have x ∈ c(A)
and y ∈ c(B), it follows that x ∈ c(B).

Exercise
Prove that a choice correspondence c satisfies WARP if and only if it
satisfies Sen’s Properties α and β.
Preferences: Representations of rational preference

One can ponder the relationship between transitivity, the rationality


axiom of preference relations, and WARP, the rationality axiom of
choice correspondences.

Theorem (Arrow)
For a choice correspondence c, there exists a complete and transitive
preference ≾ such that c = c≾ if and only if c satisfies WARP. The only
such preference relation is the revealed preference relation.

Proof. ( ⇐= ): Define ≾ by the revealed preference

x ≾ y ⇐⇒ y ∈ c({x, y }).

Since c({x, y }) is nonempty, x or y is in c({x, y }), which proves


completeness.
Preferences: Representations of rational preference, pt. 2

Suppose x ≾ y and y ≾ z. We need to show x ≾ z.


The nonempty set c({x, y , z}) contains some element.
If it is z, we are done.
If it is y , then by WARP z is also in c({x, y , z}).
If it is x, then by WARP y is also in c({x, y , z}), and then by WARP
z is also in c({x, y , z}).
We have proven transitivity.
Preferences: Representations of rational preference, pt. 3

( =⇒ ): Under our hypothesis, suppose we have menus A and B and


elements x, y ∈ A ∩ B such that x ∈ c(A) and y ∈ c(B).
Since x ∈ c(A), we have z ≾ x for all z ∈ A, and in particular, y ≾ x.
However, since y ∈ c(B), we have z ≾ y —and by transitivity,
z ≾ x—for all z ∈ B.
It follows that x ∈ c(B).
Preferences: Representations of rational preference, pt. 4

Proving c(A) ⊆ c≾ (A): Assume x ∈ c(A).


By WARP (specifically, Sen’s Property α), we have x ∈ c({x, y }), i.e.,
y ≾ x, for any y ∈ A.
Thus, x ∈ c≾ (A) = {z ∈ A : y ≾ z for all y ∈ A}.
Proving c(A) ⊇ c≾ (A): Next, assume x ∈ c≾ (A), i.e., for any y ∈ A,
we have x ∈ c({x, y }).
By nonemptyness, some y ∈ c(A).
But by WARP (specifically, Sen’s Property β), we have x ∈ c(A).
Uniqueness: Clear. Q.E.D.
Preferences: Reverse-engineering the preference

The above problem—guaranteeing the nonemptiness of the


maximizing set of every menu—hints at the sheer informational
quantity in providing a choice for every menu A.
If you have n sets, then it is easy to construct a choice function that
selects an element in each of the n sets (e.g., just choose your favorite
element n times).
But what if n is infinite?
In general, you cannot choose an element for infinitely many sets, at
least not without assuming you can (axiom of choice).
Preferences: Reverse-engineering the preference, pt. 2

Even when comparing finite observations, the amount of data needed


to reverse-engineer the DM’s preference may be unrealistically large.
Realistically, we only have choice data on a subset M ⊆ M(X ).

Theorem (Sen)
Suppose that a choice correspondence c is defined on a subset M ⊆ M(X )
that contains all menus of up to three elements. There exists a complete
and transitive preference ≾ such that c = c≾ if and only if c satisfies
WARP. The only such preference relation is the revealed preference relation.

Exercise
Prove the above theorem. Does it still hold if we only assume that M
contains all menus of up to two elements?
Preferences: Utility representations

We next discuss utility representations of (rational) preferences.


For any order-preserving (i.e., strictly increasing) transformation
T : R → R, we have that any utility function u : X → R and T ◦ u
both represent the same preference.
In other words, the map sending each utility function to the preference
it represents is not injective.
However, the map sending each equivalence class of utility functions u
to the preference it represents—where two utility functions u and v
are equivalently if and only if v = T ◦ u for some strictly increasing
map T : R → R—is clearly injective.
A question remains: what is the image of this map?
Preferences: What is the image of this map?

Essentially, the answer is the subset of rational preferences.


The qualifier “essentially” is not needed when X is countable.

Proposition
Suppose X is finite or countably infinite. The preferences that correspond
to utility-function equivalence classes are precisely those that correspond to
complete and transitive preference relations.

Proof. ( =⇒ ): Clear.
Preferences: Rational = utility-representable
( ⇐= ): For every x ∈ X , define

W (x) = {z ∈ X : z ≾ x}.

Let µ be an arbitrary probability distribution on X whose support is all


of X .
Let’s show that the utility function
X
u(x) = µ(z)
z∈W (x)

works, i.e., x ≾ y if and only if u(x) ≤ u(y ).


For the forward direction, transitivity means that x ≾ y implies
W (x) ⊆ W (y ), so u(x) ≤ u(y ).
For the latter direction, assume that x ≾ y is not true.
Then, by completeness, y ≺ x, which implies W (y ) ⊊ W (x) (the
inclusion does not hold with equality since x ∈ W (x) \ W (y ))
So, u(y ) < u(x), as needed. Q.E.D.
Preferences: Uncountably infinite case

The above does not hold if X is uncountably infinite.


For example, consider X = R2 with the lexicographic preference
relation ≾, in which (x1 , y1 ) ≾ (x2 , y2 ) if and only if x1 < x2 or
[x1 = x2 and y1 ≤ y2 ].
Notice that
(x1 , 0) ≺ (x1 , 1) ≺ (x2 , 0) ≺ (x2 , 1)
for all real numbers x1 < x2 .
So, the positive-measure intervals [u((x, 0)), u((x, 1))] for x ∈ R are
all disjoint.

Exercise
Why can’t uncountably infinitely many positive-measure intervals disjointly
fit into R?
Preferences: Uncountably infinite case, pt. 2

One way to guarantee utility representations for preference relations on


uncountably infinite X is to restrict to continuous preference
relations/utility functions.

Definition
A preference relation ≾ is continuous if and only if the upper- and
lower-contour sets, {y ∈ X : x ≾ y } and {y ∈ X : y ≾ x}, are closed.

Theorem (Main theorem)


Let ≾ be a continuous preference relation on an arbitrary topological space
X . Then, ≾ has a utility representation if and only if it has a continuous
utility representation.
Preferences: Continuous utility representations
To prove this, we will use a theorem of Wold, the first person to prove
an existence result for continuous utility representations.

Theorem (Wold)
Let ∼ be an equivalence relation on [0, 1] such that every equivalence class
is a closed interval. Then, there exists a continuous and nondecreasing map
T : [0, 1] → [0, 1] such that T (x) = T (y ) if and only if x ∼ y .

Exercise
Prove this. (Hint: If we have a sequence of continuous and nondecreasing
→ [0, 1] such that |Tn+1 (x) − Tn (x)| ≤ Mn for constants
maps Tn : [0, 1] P
Mn that satisfy ∞ n=0 Mn < ∞, we have a well-defined continuous and
nondecreasing map T (x) = limn→∞ Tn (x) = ∞
P
n=0 n+1 (x) − Tn (x),
T
since the series converges uniformly by the Weierstrass M-test. Construct
the Tn by iteration so that their uniform limit T will have the desired
property that T (x) = T (y ) if and only if x ∼ y .)
Preferences: Continuous utility representations, pt. 2
We define an equivalence relation ∼ on [0, 1]:
Fix a subset S ⊆ [0, 1].
A lacuna of S is an interval with at least two points that is disjoint
from S and has upper and lower bounds in S.
A gap is a maximal lacuna.
If there are two gaps of S of the form [a, b) and (b, c], then label
[a, c] as a equivalence class.
Any half-open, half-closed gap of S that does not satisfy the above
condition has its closure as an equivalence class.
All other points are single-element equivalence classes.

Lemma (Gap lemma)


Any function T that satisfies Wold’s theorem for the above equivalence
relation ∼ is strictly increasing on S. Furthermore, no gap of T (S) is of
the form (a, b] or [a, b).
Preferences: Motivation for the gap lemma

Intuition: for a strictly increasing function T , continuity can be


“encapsulated” by the lack of half-open, half-closed gaps in its image.
e.g., Draw strictly increasing functions on [0, 1] that are continuous
except at 1/2,
draw strictly increasing functions on [0, 1] ∪ [2, 3] that are continuous,
and look at the gaps of the images.
(A continuous, strictly increasing function T on [0, 1] ∪ (2, 3] can have
a half-open, half-closed gap, but such a gap is avoidable since you can
set T (1) and limx→2− T (x) to be equal.)
Goal: Given a utility function u : X → R that is not necessarily
continuous, apply a “continuizing” T that eliminates all half-open,
half-closed gaps (i.e., such that no gaps of T (u(X )) are of this form),
and show that the equivalent utility function T ◦ u is continuous.
Preferences: Proof of the gap lemma

Proof. The intersection of S with any equivalence class is at most one


point.
So, since the nondecreasing function T is only constant on
equivalence classes, T is strictly increasing on S.
Suppose for the sake of a contradiction that (a, b] is a gap of T (S).
Then, there exists maximal c ∈ S such that T (c) = a, and a
decreasing sequence of numbers dn ∈ S such that T (dn ) is decreasing
and converging to b.
We have that ∞
T
n=0 (c, dn ) is disjoint from S, so dn converges to d
such that T (d) = b.
But then, (c, d] is a gap of S.
By the definition of ∼, we obtain the contradiction T (c) = T (d).
The proof for [a, b) is analogous. Q.E.D.
Preferences: Proof of the main theorem

Proof. ( ⇐= ): Immediate.
Observe that f = T ◦ u is another utility representation of ≾.
We now show that f is continuous. First, we show that it is
upper-semicontinuous, i.e., f −1 ([r , ∞)) is closed for all r .
Case 1: r ∈ f (X ), say f (a) = r .
Case 2: r bounds f (X ) from above.
Case 3: r bounds f (X ) from below.
Case 4: The gap of f (X ) containing r is open, say (c, d).
Case 5: The gap of f (X ) containing r is closed, say [c, d].
These exhaust the casework, since no gap of f (X ) is half-open and
half-closed.
The proof of lower-semicontinuity is analogous.
Preferences: Proof of the main theorem, pt. 2

Case 1: Suppose r ∈ f (X ), say f (a) = r .


Then,
f −1 ([r , ∞)) = {x ∈ X : a ≾ x},
which is closed by the definition of continuity for preference relations.
Case 2: Suppose r bounds f (X ) from above.
Then,
f −1 ([r , ∞)) = ∅,
which is trivially closed.
Case 3: Suppose r bounds f (X ) from below.
Then,
f −1 ([r , ∞)) = X ,
which is trivially closed.
Preferences: Proof of the main theorem, pt. 3

Case 4: Suppose the gap of f (X ) containing r is open, say (c, d).


We have f (b) = d for some b ∈ X .
Then, f −1 ([r , ∞)) = f −1 ([d, ∞)) = {x ∈ X : b ≾ x}, which is closed.
Case 5: Suppose the gap of f (X ) containing r is closed, say [c, d].
We have an increasing sequence of numbers an ∈ X so that f (an ) is
increasing and converges to c. Then,

\
−1 −1
f ([r , ∞)) = f ([a, ∞)) = f −1 ([f (an ), ∞))
n=0
\∞
= {x ∈ X : an ≾ x},
n=0

which is an intersection of closed sets and therefore closed. Q.E.D.


Preferences: Rational = utility-representable (continuous)

We have just shown that if a continuous preference relation on an


arbitrary choice space has a utility representation, then we can modify
the utility function to be continuous.
We now show that a continuous preference relation on a “nice” choice
space has a utility representation, and consequently, has a continuous
utility representation.
There are at least two notions of “nice” that work, and both are useful
in different situations.
The first is due to the topologist Eilenberg, and the second is due to
the mathematical economist Debreu.
Preferences: Eilenberg’s result

Theorem (Eilenberg)
Suppose X is connected and separable. The preferences that correspond to
continuous-utility-function equivalence classes are precisely those that
correspond to complete, transitive, and continuous preference relations.

Proof. ( =⇒ ): Clear.
( ⇐= ): By the main theorem, it suffices to show that ≾ has a utility
representation (we don’t need to check continuity).
By assumption, X has a countable dense subset Z = {zn : n ∈ Z≥0 }.
For every x ∈ X , define
N(x) = {n ∈ Z≥0 : zn ≺ x}
Let µ be an arbitrary probability distribution on Z>0 whose support is
all of Z>0 .
P
Let’s show that the utility function u(x) = n∈N(x) µ(n) works, i.e.,
x ≾ y if and only if u(x) ≤ u(y ).
Preferences: Eilenberg’s result, pt. 2

Need to show: x ≾ y if and only if u(x) ≤ u(y ).


For the forward direction, transitivity means that x ≾ y implies
N(x) ⊆ N(y ), so u(x) ≤ u(y ).
For the backward direction, assume that x ≾ y is not true.
Then, by completeness, y ≺ x.
By the continuity of ≾, the nonempty, disjoint sets {a ∈ X : a ≾ x}
and {a ∈ X : y ≾ a} are closed.
By the connectedness of X , their union is not all of X .
So, the open set {a ∈ X : x ≺ a ≺ y } is nonempty, and thus contains
some zn .
Then, n ∈ N(x) \ N(y ), so u(y ) < u(x), as needed. Q.E.D.
Preferences: Debreu’s result

Theorem (Debreu)
Suppose X is second-countable. The preferences that correspond to
continuous-utility-function equivalence classes are precisely those that
correspond to complete, transitive, and continuous preference relations.

Proof. ( =⇒ ): Clear.
( ⇐= ): By our main theorem, it suffices to show that ≾ has a utility
representation (we don’t need to check continuity).
By assumption, X has a countable basis {Un }n∈Z>0 .
For every x ∈ X , define
N(x) = {n ∈ Z≥0 : z ≾ x for all z ∈ Un }.
Let µ be an arbitrary probability distribution on Z≥0 whose support is
all of Z≥0 .
P
Let’s show that the utility function u(x) = n∈N(x) µ(n) works, i.e.,
x ≾ y if and only if u(x) ≤ u(y ).
Preferences: Debreu’s result, pt. 2

Need to show: x ≾ y if and only if u(x) ≤ u(y ).


For the forward direction, transitivity means that x ≾ y implies
N(x) ⊆ N(y ), so u(x) ≤ u(y ).
For the backward direction, assume that x ≾ y is not true.
Then, by completeness, y ≺ x.
By the continuity of ≾, the set V = {a ∈ X : a ≺ y } is an open
neighborhood of x.
Thus, for some n, we have x ∈ Un ⊆ V .
It follows that n ∈ N(x) \ N(y ).
So, u(y ) < u(x), as needed. Q.E.D.
Preferences: Expected utility theory

“Risk is present when future events occur with measurable probability...


Uncertainty is present when the likelihood of future events is indefinite
or incalculable.” (Knight 1921)
Let’s table our discussion of uncertainty, and focus on risk.
On a finite outcome space X , define the simplex ∆(X ) of lotteries
(i.e., probability distributions) on X .
∆(X ) is convex, i.e., a compound lottery (convex combination of
lotteries) is still a lottery.
Expected utility hypothesis: rationality = preference on the lottery
simplex is utility-represented by a (possibly subjective) expected value.
Preferences: Defining rationality under risk

Definition
A utility function u : ∆(X ) → R is a von Neumann–Morgenstern (vNM)
utility function if and only if there exists U = (U1 , . . . , Un )—corresponding
to the outcomes x1 , . . . , xn comprising X —such that

u(v ) = U · v ∀v ∈ ∆(X ).

i.e., vNM utility functions are precisely the ones that are linear.
In general, linearity is not preserved by a strictly increasing
transformation T : R → R.

Exercise
Suppose that u : ∆(X ) → R is a vNM utility function representing a
preference relation ≾ on ∆(X ). Then, f : ∆(X ) → R is also one if and
only if ∃a and b > 0 such that f (v ) = a + bu(v ) ∀v ∈ ∆(X ).
Preferences: Axiomatizing rationality under risk

Consider a preference ≾ on the lottery simplex ∆(X ).


Completeness: defined before.
Transitivity: defined before.
Archimedean axiom: For any v , v ′ , v ′′ ∈ ∆(X ) such that v ≾ v ′ ≾ v ′′ ,
there exists α ∈ [0, 1] such that

αv + (1 − α)v ′′ ∼ v ′ .

Independence: For any v , v ′ , v ′′ ∈ ∆(X ) and α ∈ [0, 1],

v ≾ v ′ ⇐⇒ αv + (1 − α)v ′′ ≾ αv ′ + (1 − α)v ′′ .
Preferences: Allais’ paradox

Gamble 1A: 100% chance of 1 million dollars


Gamble 1B: 89% chance of 1 million dollars, 1% chance of 0 dollars,
10% chance of 5 million dollars.
Gamble 2A: 89% chance of 0 dollars, 11% chance of 1 million dollars.
Gamble 2B: 90% chance of 0 dollars, 10% chance of 5 million dollars.
The independence axiom,

v ≾ v ′ ⇐⇒ αv + (1 − α)v ′′ ≾ αv ′ + (1 − α)v ′′ .

may not always hold.


Preferences: Expected utility representation

Theorem (von Neumann–Morgenstern)


A complete and transitive preference relation ≾ on ∆(X ) satisfies the
Archimidean and independence axioms if and only if there exists a vNM
utility function u : ∆(X ) → R representing it.

Exercise
Prove the ⇐= direction.

Proof: ( =⇒ ): By construction of u.
Let v , v ∈ ∆(X ) be the most and least preferred lottery by the
preference ≾.
If v ∼ v , then take u to be any constant function.
The remaining case is v ≺ v .
Preferences: Expected utility representation, pt. 2

Let’s show that for any v ∈ ∆(X ), there exists a unique λv such that

λv v + (1 − λv )v ∼ v .

Existence: Follows from the Archimedean axiom, since v ≾ v ≾ v .


Uniqueness: We show uniqueness by monotonicity.
Let 0 < α < β < 1. We claim that

v ≺ αv + (1 − α)v ≺ βv + (1 − β)v ≺ v

The first and last ≺ follow from the independence axiom, since
v = αv + (1 − α)v and v = βv + (1 − β)v .
The second ≺ also follows from the independence axiom, since
αv + (1 − α)v = (β − α)v + αv + (1 − β)v and
βv + (1 − β)v = (β − α)v + αv + (1 − β)v .
Preferences: Expected utility representation, pt. 3
We show that u(v ) = λv is a vNM utility representation of ≾.
Represents ≾:

w ≾ z ⇐⇒ λw v + (1 − λw )v ≾ λz v + (1 − λz )v ⇐⇒ λw ≤ λz .

vNM: Need to show linearity, i.e., for all α ∈ [0, 1] and w , z ∈ ∆(X ),

αu(w ) + (1 − α)u(z) = u(αw + (1 − α)z).

We have w ∼ u(w )v + (1 − u(w ))v and z ∼ u(z)v + (1 − u(z))v .


Check that

αw + (1 − α)z
∼ α (u(w )v + (1 − u(w ))v ) + (1 − α) (u(z)v + (1 − u(z))v )
∼ (αu(w ) + (1 − α)u(z)) v + (1 − αu(w ) − (1 − α)u(z)) v .

This is precisely what we wanted. Q.E.D.


Preferences: An application to social choice

“By utility is meant that property in any object, whereby it tends to


produce benefit, advantage, pleasure, good, or happiness (all this in
the present case come to the same thing) or (what comes again to the
same thing) to prevent the happening of mischief, pain, evil or
unhappiness to the party who whose is considered: if that party be the
community in general, then the happiness of the community; if a
particular individual; then the happiness of that individual.” (Bentham
1798)
P
Utilitarianism: max i Ui , where Ui is the utility of individual i.
Criticized for assuming that utility is cardinal and can be
interpersonally compared (Robbins 1969[1935]).
Solution: max f (U1 , . . . , UI ) for social choice function f that is
increasing in each input (Bergson 1938).
Preferences: An application to social choice, pt. 2

Generalized to settings of risk: Social choice function f can be defined


on the space of lotteries of outcomes (Harsanyi 1955).
Harsanyi’s Utilitarian Theorem: Suppose
(1) society maximizes the expected value of some social choice
function f ,
(2) each individual maximizes the expected value Ui = E(ui ) of some
utility function ui (vNM preference), and
(3) whenever all individuals are indifferent between two lotteries,
society is also indifferent between them.
P
Then, f is a weighted sum i αi Ui of the individual utilities.
Preferences: An application to social choice, pt. 3

P
Updated approach to utilitarianism: max i αi Ui
vNM setting imbues utility with cardinal meaning.
After utility representations Ui of preferences ≾i are fixed, scaling the
moral weights αi may help justify interpersonal comparisons of utility.
Decision-making under risk as if behind the veil of ignorance.
Preferences: Monetary risks

Specialize to the case when the space X is of monetary outcomes.


Let X be a real-valued interval (think of x ∈ X as number of dollars).
Each probability distribution in ∆(X ) can be represented by its
cumulative distribution function F (·), where

F (x) = p ⇐⇒ the probability of receiving ≤ x dollars is p.

vNM utility function on ∆(X ) is precisely of the following form:


Z
U(F ) = EA⇝F (u(A)) = u(x)dF (x)
x∈X

for some Bernoulli utility function u : X → R (required to be strictly


increasing and continuous).
Preferences: Risk aversion

Definition
A vNM utility function U on ∆(X ) is risk-averse if and only if for any
non-point-mass F ∈ ∆(X ), the utility function U prefers, over F , the
point-mass lottery δEA⇝F (A) yielding the mean payoff
Z
EA⇝F (A) = xdF (x)
x∈X

with probability 1.

In other words, risk aversion is equivalent to


Z Z 
u(x)dF (x) ≤ u xdF (x)
x∈X x∈X

for all F ∈ ∆(X ).


This is Jensen’s inequality, a defining property of concave functions.
Preferences: Risk aversion, pt. 2

Theorem (Jensen)
A vNM utility function U on ∆(X ) is risk-averse if and only if u is concave.

Proof: ( =⇒ ). Jensen’s inequality for each two-point-mass lottery


and the point-mass lottery yielding its mean payoff.
( ⇐= ). It suffices to prove Jensen’s inequality for F ∈ ∆(X )
supported on finitely many points.

Exercise
Why does the above suffice to prove Jensen’s inequality for a general
probability distribution F ∈ ∆(X )?
Preferences: Risk aversion, pt. 3
Proof by induction. Base case is trivial.
Suppose claim holds for all lotteries supported on < n points.
Consider arbitrary F supported on x1 , . . . , xn ∈ X . Then,
n
X
EA⇝F (u(A)) = p(x1 )u(x1 ) + p(xi )u(xi )
i=2
n
X p(xi )
= p(x1 )u(x1 ) + (1 − p(x1 )) u(xi )
1 − p(x1 )
i=2
n
!
Xp(xi )
≤ p(x1 )u(x1 ) + (1 − p(x1 )) u xi
1 − p(x1 )
i=2
n
!
X p(xi )
≤ u p(x1 )x1 + (1 − p(x1 )) xi
1 − p(x1 )
i=2
= u (EA⇝F (A))
by the inductive hypothesis and the definition of concavity. Q.E.D.
Preferences: Measuring risk aversion

Definition
The certainty equivalent c(F , u) is the number of dollars c such that
Z
u(c) = u(x)dF (x).
x∈X

It is evident that c(F , u) ≤ u (EA⇝F (A)) for concave u.


One way to measure risk-aversion: c(F , u) ≤ c(F , v ) should mean
that u is more risk-averse than v .
Preferences: Measuring risk aversion, pt. 2

Another way to measure risk-aversion: concavity.


For C 2 functions u, its concavity can be measured by its curvature.

Definition
For C 2 Bernoulli utility functions u, define the Arrow–Pratt coefficient

u ′′ (x)
A(x, u) = − .
u ′ (x)
Preferences: Measuring risk aversion, pt. 3

Goal: compare the risk-aversion degrees of an arbitrary pair of C 2


Bernoulli utility functions u and v .

Theorem (Pratt)
The following are equivalent.
1 Whenever u prefers a lottery F to a point-mass lottery δx , then so
does v .
2 For every F ∈ ∆(X ), we have c(F , u) ≤ c(F , v ).
3 There exists an concave order-preserving transformation T such that
u = T ◦ v.
4 For every x ∈ X , we have A(x, u) ≥ A(x, v ).

Exercise
Prove this.
Preferences: Descriptive decision-making under risk

Risk-seeking behavior.
Gambling addicts repeatedly take negative expected value gambles
that increase variance.
Can a risk-seeking Bernoulli utility function (i.e., with positive
curvature) be consistent with rationality?
Preferences: Descriptive decision-making under risk, pt. 2

Modelling risk-averse behavior.


Curvature is not plausible as a uniform measure of risk aversion.
Suppose a vNM preference U with a uniformly curved Bernoulli utility
function u would reject the 50/50 lottery of losing 100 dollars or
gaining 110 dollars.
Then, it would also reject the 50/50 lottery of losing 1000 dollars or
gaining 1010 dollars (Rabin 2001).
Example suggests that loss aversion may be really what’s going on.
Motivates prospect theory, a behavioral alternative to modelling
decision-making under risk (Kahneman–Tversky 1979).
Preferences: Descriptive decision-making under risk, pt. 3
Framing effects.
“Imagine that the U.S. is preparing for the outbreak of an unusual
Asian disease, which is expected to kill 600 people. Two alternative
programs to combat the disease have been proposed. Assume that the
exact scientific estimates of the consequences of the programs are as
follows: If program A is adopted, 200 people will be saved. If program
B is adopted, there is a one-third probability that 600 people will be
saved and a two-thirds probability that no people will be saved. Which
of the two programs would you favor?...
Imagine that the U.S. is preparing for the outbreak of an unusual
Asian disease, which is expected to kill 600 people. Two alternative
programs to combat the disease have been proposed. Assume that the
exact scientific estimates of the consequences of the programs are as
follows: If program C is adopted, 400 people will die. If program D is
adopted, there is a one-third probability that nobody will die and a
two-thirds probability that 600 people will die. Which of the two
programs would you favor?” (Kahneman–Tversky 1981).
Preferences: Prospect theory

Kahneman and Tversky (1979, 1992) developed prospect theory: an


alternative to subjective expected utility theory that was pioneering in
its use of psychology experiments.
95% chance to gain £10,000 vs. 100% chance to gain £9499.
Fear of losing the sure gain =⇒ risk averse.
95% chance to lose £10,000 vs. 100% chance to lose £9499.
Hope of avoiding the sure loss. =⇒ risk seeking.
5% chance to gain £10,000 vs. 100% chance to gain £501.
Hope of large gain. =⇒ risk seeking.
5% chance to lose £10,000 vs. 100% chance to lose £501.
Fear of large loss. =⇒ risk averse.
Preferences: Concluding remarks

We have a lot of useful tools that can deal with utility functions.
The assumption that human decision-making follows a utility
function—the assumption of rationality—makes models parsimonious
and tractable.
Helps explain the omnipresence of utilitarianism in economic theory.
(Economists generally do not make normative moral claims about
what society’s objective should be, e.g., the social choice function.
They only make recommendations on how to achieve such an
objective when one is chosen by others, e.g., voters and politicians.)
We now know of several empirical inconsistencies regarding this
assumption of rationality.
We will explore more of these issues of descriptive decision theory later
in the course.
General equilibrium (GE) theory: Invisible-Hand Hypothesis

Adam Smith, The Wealth of Nations (1776)


“...by directing that industry in such a manner as its produce may be of the
greatest value, [an individual] intends only his own gain, and he is in this,
as in many other cases, led by an invisible hand to promote an end which
was no part of his intention.”
GE theory: Arrow–Debreu model

Smith’s well-known hypothesis states that in a free market comprised


of rational agents, their combined pursuit of self-interest leads the
market to an efficient allocation.
In an Arrow–Debreu model of agent types 1, . . . , I , each type i has
infinitely many atomistic agents.
However, we aggregate them into one representative agent who has a
preference over the space X = RL≥0 of quantity vectors of goods
1, . . . , L.
We assume that the preference of each representative agent i can be
represented by a utility function ui : X → R.
GE theory: Arrow–Debreu model, pt. 2

Representative agent i has a starting endowment of ωi ∈ X .


We call the array ω of starting endowments the economy ω.
Agents may want to make trades with other willing agents to increase
their respective utilities.
Define an allocation by an array x = (x1 , . . . , xI ) ∈ X = RLI
≥0 .
We call x an allocation of the economy ω if
I
X I
X
xi ≤ ωi .
i=1 i=1
GE theory: A notion of market equilibrium

We assume that the market is competitive, i.e., all trades happen


under a shared price vector p ∈ RL≥0 .
Each agent wants to sell a part of her endowment and use the
resulting income to purchase goods: in a way that reaches her optimal
consumption vector given the scarcity constraints.
A competitive equilibrium (x, p) of an economy ω ∈ RLI>0 is a pair of
L
an allocation x of ω and a price vector p ∈ R≥0 such that:
each xi is the argument maximum for agent i’s utility maximization
program
max ui (yi )
yi ∈X

such that p t yi ≤ p t ωi .
If the domain is compact and ui is continuous, the maximum of the
above utility maximization program is realized.
GE theory: A notion of efficiency

An allocation x Pareto dominates an allocation y if

ui (xi ) ≥ ui (yi ) for all i, > for at least one i.

An allocation x of the economy ω is Pareto efficient if no other


allocation of ω Pareto dominates it:
i.e., if x maximizes the Pareto program

max ui (xi )
x∈X

I
X I
X
s.t. xi ≤ ωi and uj (xj ) ≥ ūj ∀j ̸= i
i=1 i=1

for some vector ūj ∈ RI −1 .


The optimization program is defined on a compact domain, so if the
ui are continuous, the maximum is realized.
GE theory: Model assumptions

Monotonicity: For each i, we have ui (x) < ui (x ′ ) if x ≪ x ′ .


Continuity: Each ui is continuous.
Concavity: Each ui is strictly concave. (Economic meaning: Law of
Diminishing Marginal Utility)
Positive endowments: For each i, we have ωi ≫ 0.
There are also underlying assumptions already built into the model.
Perfect information: All agents face the same price vector p.
Perfect competition: All agents take these prices as given, i.e., they do
not strategically consider their actions’ effect on them.
Complete set of markets: A market that all agents can participate in
exists for every good.
No externalities: Each agent’s utility only depends on the vector of
goods she owns, not on that of any other agent.
GE theory: Walras’ law

Theorem (Walras)
Let u be a monotone utility function and p ≫ 0. For any v , w ∈ X that
maximize the utility maximization program

max u(z)
z∈X

such that p t z ≤ c
for some c = cv , cw > 0, we have the following.
1 The constraint is satisfied with equality, i.e., p t v = cv and p t w = cw .
2 We have u(v ) ≤ u(w ) ⇐⇒ cv ≤ cw .

Exercise
Prove Walras’ law.
GE theory: First welfare theorem

Is the invisible-hand hypothesis—that a decentralized price system


leads to an efficient societal allocation—true?

Theorem (First Welfare Theorem)


Suppose the utility functions ui are monotone. For any economy ω ∈ RLI >0 ,
any allocation x of ω comprising a competitive equilibrium (x, p) of ω is
Pareto efficient.

Proof. Suppose for the sake of a contradiction that there exists an


allocation y of ω that Pareto dominates x,
i.e., ui (yi ) ≥ ui (xi ) with strict inequality for at least one i = j.
GE theory: First welfare theorem, pt. 2

By Walras’ law, ui (yi ) ≥ ui (xi ) implies p t yi ≥ p t xi ,


and uj (yi ) > uj (xi ) implies p t yj > p t xj .
Since prices are nonnegative, we must have
I
X I
X I
X
(yi )ℓ > (xi )ℓ = (ωi )ℓ
i=1 i=1 i=1

for some ℓ (used Walras’ law at the end).


Contradicts that y is an allocation of ω. Q.E.D.
GE theory: Second welfare theorem

Theorem (Second Welfare Theorem)


Suppose the utility functions ui are monotone, continuous, and strictly
concave; and the endowments ωi ≫ 0. For any economy ω and any Pareto
efficient
PI allocation x ≫ 0 of ω, there exist transfers t ∈ RLI , i.e.,
i=1 ti = 0, such that (x, p) is a competitive equilibrium of ω + t for
some price vector p ∈ RL>0 .
GE theory: A lemma to separate convex sets

Lemma (Separating Hyperplane Theorem)


Let B be an open convex set and x ∈/ B, a point. Then, there exists a
nonzero vector p such that
pt x ≤ pt b
for all b in the closure B of B.

Exercise
Prove this.
GE theory: Proof of the second welfare theorem

Proof. We will show that the choice of ti = xi − ωi works.


Define the bettering set

Bi = {b ∈ RL : xi + b > 0, ui (xi + b) > ui (xi )},

and
B = B1 + · · · + BI .
Since x is Pareto optimal, it follows that 0 ∈
/ B.
Each Bi is open convex. Indeed, since ui is strictly concave,

ui (xi + tb + (1 − t)b ′ ) ≥ tui (xi + b) + (1 − t)ui (xi + b ′ )


> tui (xi ) + (1 − t)ui (xi ) = ui (xi )

and xi + tb + (1 − t)b ′ ≥ min(xi + b, xi + b ′ ) > 0.


Thus, B is open convex.
GE theory: Proof of the second welfare theorem, pt. 2

By the Supporting Hyperplane Theorem, ∃p ∈ RI \ {0} such that


p t b ≥ 0 for any b ∈ B.
Let’s show that this separating price vector p ̸= 0 is actually ≫ 0.
If p had a negative entry pℓ < 0, then for 0 ≪ bi ∈ B, take
(bi )ℓ → ∞ while taking all other (bi )j sufficiently small.
Aggregate the bi to b ∈ B that contradicts p t b ≥ 0.
If p had a zero entry pℓ = 0, then for b such that (bi )j = −ε for j ̸= ℓ,
take ε sufficiently small and (bi )ℓ sufficiently large so that bi ∈ Bi .
Aggregate the bi to b ∈ B that contradicts p t b ≥ 0.
GE theory: Proof of the second welfare theorem, pt. 3

Remains to show that (x, p) is a competitive equilibrium of x.


Suppose that ui ′ (xi ′ + bi ′ ) > ui ′ (xi ′ ).
Take bi = 0 for all i ̸= i ′ to get b = bi ′ ∈ B.
We thus have p t (xi ′ + bi ′ ) ≥ p t xi ′ .
To show that the inequality is strict, perturb xi ′ + bi ′ to be slightly
smaller, but have utility greater than xi ′ ,
i.e., p t (xi ′ + bi ′ ) > p t xi ′ . Q.E.D.
GE theory: Was Adam Smith correct? pt. 1

The efficiency of the free market in the normative-welfare sense


requires strong assumptions.
It requires not only the weak rationality of the agents, the assumption
that their preferences are utility-representable;
but also their strong rationality, the assumption that each such agent’s
utility-representable revealed preference = her normative preference.
Even assuming this, the (Pareto) efficiency of the free market requires
more strong assumptions.
GE theory: Was Adam Smith correct? pt. 2

Essentially necessary condition for Pareto efficiency of the free market


= that the following do not occur.
Imperfect competition (e.g., monopolies, cartels): antitrust policies,
regulations, or public ownership/operation.
Imperfect information (e.g., adverse selection, moral hazard):
educational interventions, regulations, taxes/subsidies.
Incomplete set of markets (e.g., people cannot buy insurance against
certain types of risks): social safety net.
Externalities (e.g., greenhouse gas emissions): Pigouvian taxes (e.g.,
price on carbon) or regulations.
The optimal governmental intervention—one that achieves Pareto
efficiency—is often known for an Arrow–Debreu economy with a single
distortion (e.g., Greenwald–Stiglitz 1986).
In general, it is not known for one with multiple distortions.
GE theory: The Pareto frontier
Define the utility possibility set:
I I
( )
X X
I
U= u∈R :u≤ (ui (xi ))Ii=1 for some x ∈ X s.t. xi ≤ ωi .
i=1 i=0

Theorem (Negishi)
Suppose U is convex. A Pareto efficient allocation is precisely an argument
maximizing, for some α ∈ RI≥0 \ {0}, the Negishi program

I
X I
X I
X
max αi ui (xi ) s.t. xi ≤ ωi .
x∈X
i=1 i=1 i=1

Exercise
Prove Negishi’s Theorem.
GE theory: Negishi’s theorem and social choice

Once the representative agents’ utility functions are chosen,


choosing an allocation on the Pareto utility frontier
= choosing a Negishi weighting vector α
(relative moral weights of each agent type).
For example, consider the allocation where all goods are owned by
agents of type 1 (e.g., nobles).
This is Pareto efficient with weighting vector α = (1, 0, . . . , 0).
In other words, the allocation values the nobles infinitely more than all
other people.
This example goes to show that from a welfare perspective, Pareto
efficiency is the bare minimum and by itself is nothing to celebrate.
GE theory: Constrained optimization

The general form of a constrained optimization program is

min f (x)
x∈X

s.t. ∀x ∈ X , g (x) ≤ 0 and h(x) = 0,


where X ⊂ Rd is an open subset,
g (x) = (g1 (x), . . . , gm (x)) : Rn → Rm , and
h(x) = (h1 (x), . . . , hℓ (x)) : Rn → Rℓ .
(Suppose that all these functions are continuously differentiable.)
The questions we would like to answer are:

Is there an easily checkable criterion that the solution to the optimization


program must necessarily satisfy? Is this criterion also sufficient?
GE theory: KKT necessary condition

Let I = {i : gi (x̄) = 0}.

Theorem (Karush–Kuhn–Tucker)
Consider x̄ ∈ X that satisfies all the constraints of the optimization
program. Suppose further that for I, the gradients ∇gi for i ∈ I and ∇hj
for all j are linearly independent. A necessary condition for x̄ to be a local
minimum is that there exists u ≥ 0 and w such that

∇f (x̄) + Jg (x̄)t u + Jh(x̄)t w = 0, and

ui gi (x̄) = 0 for all i.


GE theory: Deriving the KKT condition

Consider the Pareto and Negishi programs for which all relevant
functions are assumed to be continuously differentiable.
The negated form of the Pareto program has objective function

−ui ◦ Coordi ,

and inequality constraint functions

gj = −uj ◦ Coordj − ūj for all j ̸= i

and
I
X I
X
Gℓ = (xi )ℓ − (ωi )ℓ for all 1 ≤ ℓ ≤ L.
i=1 i=1
GE theory: Deriving the KKT condition, pt. 2
KKT necessary condition restricted to the vector entries corresponding
to agent i:
 
0
 .. 
X L XL .
KKT
 
∇ui (x̄i ) = λGℓ ∇i Gℓ (x̄) =  1
λGℓ  
ℓ=1 ℓ=1  .. 
.
0
where the last vector only has 1 in the ℓth entry.
Thus, λGℓ is the partial derivative of ui with respect to the ℓth entry,
evaluated at x̄.
This means that the KKT necessary condition restricted to the vector
entries corresponding to agent j ̸= i is:
L
X
KKT
λgj ∇uj (x̄j ) = λGℓ ∇j Gℓ (x̄) = ∇ui (x̄i ).
ℓ=1
GE theory: Deriving the KKT condition, pt. 3

Exercise
Derive the KKT necessary condition for the Negishi program. Suppose that
U is convex, so the Pareto and Negishi programs are equivalent. If x̄ is a
solution to both problems, then show that λgj defined above = αj /αi .
(Intuition: At a local optimum, all agents have the same gradient
direction—the direction of maximal utility ascent—since otherwise, some of
them can trade so that they all gain. Furthermore, the magnitude ratio of
the agents’ gradient vectors is α1 : · · · : αI , since otherwise the agent
shortchanged by the gradient magnitude ratio can take goods from agents
who were overrewarded by the gradient magnitude ratio to increase
I
X
αi ui (xi ),
i=1

the Negishi objective function.)


GE theory: When is the KKT condition also sufficient?

Theorem (Karush–Kuhn–Tucker)
Consider x̄ ∈ X that satisfies all the constraints of the optimization
program. Suppose u ≥ 0 and w satisfy

∇f (x̄) + Jg (x̄)t u + Jh(x̄)t w = 0, and

ui gi (x̄) = 0 for all i.


If X is convex as a set, f and gi are convex as functions, and hj are linear,
then x̄ is the global minimum of the optimization program among x ∈ X
satisfying the constraints.

(Aside: For slightly more generality, the convexity assumptions can be


relaxed to pseudo-/quasi-convexity.)
GE theory: Convex optimization in equilibrium analysis

To find market equilibria by using the KKT condition (also called


first-order conditions), economists often assume that utility functions
are differentiable and strictly concave.
The concavity assumption has an economic meaning: the Law of
Decreasing Marginal Utility.
We have seen that its mathematical meaning yields:
the uniqueness of the solution to each agent’s utility maximization
program,
the Second Welfare Theorem,
the equivalence of the Pareto and Negishi programs, and
the sufficiency of the KKT condition.
Game theory: Motivation

In the GE model of the economy, we assumed that every agent is an


atomistic price-taker.
In reality, the agent has more options,
including actions such as undercutting her competitors to punish them,
or colluding with them as a cartel.
Need a theory of how rational agents act in a situation where the
outcome is determined by the agents’ choices of actions and how they
interact.
Game theory: A tale of two Johns from Princeton

Game theory was born in Princeton due to the pioneering work of two
mathematicians named John:
John von Neumann,
who (with economist Oskar Morgenstern) began the study of game
theory
and used fixed point theorems in a way that has since been
fundamental to game theory and economics in general;
and John Nash,
who formulated the notion of a Nash equilibrium.
The Nash equilibrium has had “a fundamental and pervasive impact in
economics and the social sciences which is comparable to that of the
discovery of the DNA double helix in the biological sciences.”
(Myerson 1999)
Game theory: Nash equilibrium

Astrategic game is a collection of:


1 A finite set I of players,
2 For each player i ∈ I , a nonempty set Ai of actions available to i,
3 For each player i ∈ I , her preference relation ≾i on A = ×j∈I Aj :
alternatively, her utility (payoff) representation ui on A.
A Nash equilibrium of a strategic game is an I -tuple of actions a∗ ∈ A
such that for every player i ∈ I ,
∗ ∗
(a−i , ai ) ≾i (a−i , ai∗ ) for all ai ∈ Ai .

(The subscript ·−i denotes the entries corresponding to the players


other than i.)
Game theory: Prisoners’ dilemma

Suspect 2
Stay quiet Confess
Stay quiet −1, −1 −3, 0
Suspect 1
Confess 0, −3 −2, −2

Whatever Suspect 1 does, Suspect 2 is better off confessing than not


confessing, and vice versa.
Thus, the unique Nash equilibrium is (Confess, Confess).
The prisoners’ dilemma models the general problem of free-riding.
e.g., climate change: all countries would benefit from a non-warming
climate, but no country wants to cut its own GHG emissions.
Game theory: Strategic games with multiple Nash equilibria

Consider a sealed-bid auction,


in which there are I players, agent i’s valuation of the auctioned
object is vi (where v1 > v2 > · · · > vI > 0),
and the right (and obligation) to buy the object is won by the player
who has made the highest sealed bid,
where bids are in A1 = · · · = AI = R≥0 .
(Tiebreaker is the lowest index.)
In a first-price sealed-bid auction, price = the highest bid.
In a second-price sealed-bid auction, price = the highest bid among
non-winners.
Game theory: Sealed-bid auctions

Notice that agents have a bare-minimum option of bidding zero,


i.e., they are always guaranteed a net zero payoff.
Thus, any agent i will never bid more than vi to win,
since she would be worse off than the bare-minimum option.
Even if she loses, she would indifferently prefer the bare-minimum
option.
When betting exactly vi , player i would always be at least equivalently
well-off to the bare-minimum option.
Game theory: First-price sealed-bid auction

When i faces the others’ actions a−i , her best response is:
If max a−i ≥ vi ,
then the bare-minimum payoff is the best she can do, so her best
response is to bid any number ≤ max a−i .
If max a−i < vi ,
then she gains positive payoff by bidding some number in the interval
(max a−i , vi ).
But if the highest bidder (with the tiebreaker of lowest-index) has
index less than i, then i has no best response.
(Indeed, decreasing choices of ai that converge to max a−i will
continually improve her payoff vi − max a−i ,
but at max a−i the payoff decreases to zero.)
On the other hand, if the highest bidder has index greater than i,
then i has a best response of bidding max a−i , since she would win.
Game theory: First-price sealed-bid auction, pt. 2

Consider a Nash equilibrium a in which i, faced with others’ actions


a−i , bids to positively gain.
There cannot be a player j whose valuation vj is greater than ai ,
since she can then outbid player i to positively gain.
Thus, ai ≥ vj for all j ̸= i.
This means that necessarily i = 1,
and max a−1 < v1 so that player 1’s strictly gaining best response is to
bid max a−1 .
But for (max a−1 , a−1 ) to be a Nash equilibrium, it is necessary that
max a−1 ≥ v2 ,
since otherwise another player can strictly gain by outbidding player 1.
Thus, the Nash equilibrium a must be of the form (c, a−1 ) such that
max a−1 = c ∈ [v2 , v1 ).
Game theory: First-price sealed-bid auction, pt. 3

Finally, consider the remaining case: a Nash equilibrium a in which no


player positively gains.
The winner must be player 1 bidding v1 ,
since if it were player i ̸= 1 bidding vi , player 1 could outbid her to
positively gain.
Thus, the Nash equilibrium a must be of the form (v1 , a−i ) such that
max a−i ≤ v1 .
Complete characterization of Nash equilibria:
(c, a−1 ) such that max a−1 = c ∈ [v2 , v1 ].
The auction winner is Player 1 in all Nash equilibria.
Game theory: Second-price sealed-bid auction

Also called a Vickrey auction, the second-price sealed-bid auction is


well-known for its incentive compatibility,
i.e., every rational player can achieve her best outcome by acting
according to her true preference.

Exercise
Prove that in our second-price sealed-bid auction, each player i’s strategy
of bidding vi is a weakly dominant strategy, i.e., for any a−i , player i’s
payoff is at least as large as if she had submitted any other bid. Thus,
(v1 , . . . , vI ) is a Nash equilibrium.
Find all the other Nash equilibria: in particular, there are Nash equilibria in
which the auction winner is not player 1.
Game theory: Strategic games with no Nash equilibria

Player 2 (George)
Even (Two) Odd (One)
Even (Two) 0, 1 1, 0
Player 1 (Jerry)
Odd (One) 1, 0 0, 1

Consider the game odds and evens, in which two players bid either one
(odd) or two (even);
player 1 wins if the sum is odd, and player 2 wins if the sum is even.
There is no Nash equilibrium, since for any action profile a, the losing
player can deviate to win.
However, what if players can use randomized strategies?
Game theory: Mixed strategies

The probability distributions in ∆(Ai ) also correspond to randomized


strategies over the action space Ai .
A mixed strategy is an element of ∆(Ai ), representing the action of
playing each action with its corresponding probability.
(The probability distributions of all players’ mixed strategies are
assumed to be independent.)
When mixed strategies are permitted, the payoff occurs as a
probability distribution over A = ×j∈I Aj .
From here on out, we assume each player’s preference on ∆(A) is
vNM-utility-representable.
(All payoffs are expected values.)
A mixed-strategy Nash equilibrium is a mixed-strategy profile in which
no player can on expectation strictly gain by deviating.
Game theory: Mixed strategies in odds and evens

Player 2 (George)
Even (Two) Odd (One)
Even (Two) 0, 1 1, 0
Player 1 (Jerry)
Odd (One) 1, 0 0, 1

The unique mixed-strategy Nash equilibrium is


((1/2, 1/2), (1/2, 1/2)).
(If one of the players played a mixed strategy ̸= (1/2, 1/2), then the
other player’s best response would be (1, 0) or (0, 1),
but the argument from before shows that no mixed-strategy Nash
equilibrium can have one of the players play (1, 0) or (0, 1).)
The expected payoff vector is (1/2, 1/2), same as a coin flip.
Game theory: Two-player, zero-sum games
A two-player strategic game is zero-sum if for any a ∈ A1 and b ∈ A2 ,
a ≾1 b if and only if b ≾2 a. (e.g., odds and evens)
Exercise
Prove that in a two-player, zero-sum game with Player 1’s preference
represented by u (which means −u represents that of Player 2),
If (x ∗ , y ∗ ) is a Nash equilibrium, then

max min u(x, y ) = min max u(x, y ) = u(x ∗ , y ∗ ).


x∈A1 y ∈A2 y ∈A2 x∈A1

In particular, all Nash equilibria result in the same payoffs.


Conversely, if

max min u(x, y ) = min max u(x, y )


x∈A1 y ∈A2 y ∈A2 x∈A1

such that x ∗ argument maximizes the first and y ∗ argument


maximizes the second, then (x ∗ , y ∗ ) is a Nash equilibrium.
Game theory: Nash’s theorem

A preference relation ≾i on A is quasiconcave if for every a∗ ∈ A, the


level set
{ai ∈ Ai : a∗ ≾i (ai , a−i

)}
is convex.

Theorem (Nash)
The strategic game ⟨I , (Ai ), (≾i )⟩ has a Nash equilibrium if: each Ai is a
nonempty compact convex Euclidean subset, and each ≾i is continuous
and quasiconcave on Ai .
Game theory: Nash’s theorem, pt. 2

The proof of Nash’s theorem requires a fixed point theorem, in the


von Neumann style.

Lemma (Kakutani’s Fixed Point Theorem)


Let X be a nonempty compact convex Euclidean subset, and f : X ⊸ X
be a correspondence such that
the set f (x) is nonempty and convex for all x ∈ X , and
the graph of f is closed, i.e., for any pair of sequences (xn ) and (yn ) of
points in X such that yn ∈ f (xn ) for all n, we have

xn → x ∈ X , yn → y ∈ X as n → ∞ =⇒ y ∈ f (x).

Then, there exists a fixed point, i.e., x ∗ ∈ X such that x ∗ ∈ f (x ∗ ).


Game theory: Nash’s theorem, pt. 3

Proof. Define the best-response correspondence

Bi (a−i ) = {a ∈ Ai : (a′ , a−i ) ≾i (a, a−i ) for all a′ ∈ Ai }.

Since ≾i is continuous and Ai is compact, Bi (a−i ) is nonempty.


By the definition of quasiconcavity, Bi (a−i ) is convex.
Thus, the correspondence B : A ⊸ A

B(a) = ×i∈I Bi (ai ).

has the property that B(a) is nonempty and convex.


The graph of B is closed, since ≾i is continuous.
By Kakutani’s Fixed Point Theorem, B has a fixed point, which is
precisely a Nash equilibrium. Q.E.D.
Game theory: Nash’s theorem, pt. 4

A strategic game ⟨I , (Ai ), (≾i )⟩ is finite if every Ai is finite.

Corollary
Every finite strategic game ⟨I , (Ai ), (≾i )⟩ which allows mixed strategies has
a mixed-strategy Nash equilibrium.

Proof. This strategic game with mixed strategies is equivalent to a


strategic game whose pure strategies correspond to the
aforementioned mixed strategies.
The latter satisfies the conditions of Nash’s theorem. Q.E.D.

Exercise
Prove the assertion that B has the closed graph property because ≾i is
continuous.
Aside: Coronavirus (COVID-19)

Figure: COVID-19 mortality rate by age (Gal 2020, sourced from Chinese Center
for Disease Control and Prevention; found in Bendix 2020, click for hyperlink)
Aside: How to protect yourself and others (COVID-19)

Social distancing (e.g., remotely attend lectures and office hours


through Zoom or Skype; reconsider large gatherings and travel plans)
“Avoid close contact with people who are sick.
Avoid touching your eyes, nose, and mouth.
Stay home when you are sick.
Cover your cough or sneeze with a tissue, then throw the tissue in the
trash.
Clean and disinfect frequently touched objects and surfaces using a
regular household cleaning spray or wipe.
Wash your hands often with soap and water.
Aside: How to avoid getting sick (COVID-19)

...Don’t smoke.
Eat a diet high in fruits, vegetables, and whole grains.
Take a multivitamin if you suspect that you may not be getting all the
nutrients you need through your diet.
Exercise regularly.
Maintain a healthy weight.
Control your stress level.
Control your blood pressure.
If you drink alcohol, drink only in moderation (no more than one to
two drinks a day for men, no more than one a day for women).
Get enough sleep.
Take steps to avoid infection, such as washing your hands frequently
and trying not to touch your hands to your face, since harmful germs
can enter through your eyes, nose, and mouth.
Aside: How to prepare for day-to-day life (COVID-19)
...For peace-of-mind, try to plan ahead for a possible outbreak.
For example, if there is an outbreak in your community, you may not
be able to get to a store, or stores may be out of supplies, so it will be
important for you to have extra supplies on hand.
Talk with family members and loved ones about how they would be
cared for if they got sick, or what would be needed to care for them in
your home.
Consider what you might do if your child’s school or daycare shuts
down, or if you need to or are asked to work from home.
Stay up-to-date with reliable news resources, such as the website of
your local health department. If your town or neighborhood has a
website or social media page, consider joining it to maintain access to
neighbors, information, and resources.”
(Harvard Health Publishing 2020; click for hyperlink)
Harvard University COVID-19 website (click for hyperlink)
Massachusetts COVID-19 website (click for hyperlink)
Game theory: Cooperation in the prisoners’ dilemma?

The prisoners’ dilemma is often cited as a realpolitikal prediction:


that in a pandemic, arms race, or a warming climate,
any cooperation will break down due to a self-interested deviation.
But once we take long-term strategic interactions into account,
cooperation between self-interested agents can occur.
Game theory: Repeated games

Suppose players repeat a strategic game G = ⟨I , (Ai ), (ui )⟩


T times, possibly infinitely many times.
We assume each Ai is finite (allowing for mixed strategies) and each
ui is continuous.
We need to assume what the players’ preferences on sequences (over
time) of payoffs are.
Game theory: Repeated games, pt. 2
Descriptive temporal preference on payoff sequences:
‘I value immediate gratification, even if it means forgoing a higher
delayed gratification.’
(e.g., Stanford marshmallow experiments, Mischel–Ebbesen 1972)
Temporal discounting is almost always exponential discounting in
dynamic models (e.g., economic models on climate-change policy):
multiply by δ t for δ ∈ (0, 1).
It is more accurately modelled by hyperbolic discounting, e.g., multiply
by 1/(1 + kt).
“...when presented a choice between doing seven hours of an
unpleasant activity on April 1 versus eight hours on April 15. if asked
on February 1 virtually everyone would prefer the seven hours on April
1. But come April 1, given the same choice, most of us are apt to put
off the work until April 15.” (Donoghue–Rabin 1999)
Evolutionary pressures due to uncertain risks may help rationalize this
empirical phenomenon (Dasgupta–Maskin 2005).
Game theory: Repeated games, pt. 3

Normative temporal preference on payoff sequences:


People have finite life spans, and
the mortality hazard rate is not independent of age,
which otherwise would normatively rationalize exponential discounting.
“Stern Review” of climate change: utilitarian impartiality =⇒ there
should normatively be little to no temporal discounting (Stern 2007).
Objections to Stern’s introduction of moral philosophy to economics,
e.g., Nordhaus 2007a,b; Weitzman 2007; Dasgupta 2008.
Priority of future (generations’) well-being over that of the present:
exponential ≪ hyperbolic ≪ zero discounting = utilitarian impartiality.
For a survey of temporal discounting from the perspective of moral
philosophy, see Mintz-Woo 2018.
Aside: Zoom recording rules and recommendations

Zoom lectures will be recorded for the benefit of students who may
not be able to attend at the normal time, due to technological
difficulties or time-zone differences.
Please let me know in advance by email if you cannot attend one or
more of the remaining lectures.
“RECOMMENDATION: You should consider advising students that
Zoom provides an on-screen notification to meeting participants,
whenever a session is being recorded...
RULE: Instructors must not allow or enable students to record class
sessions, including by using Zoom...
RULE: Instructors may post links to class session recordings only in
the Zoom link of their Canvas course webpages...
RULE: Instructors must instruct students not to disclose any Zoom
recording URL — or any copies of the recording the student might
create or obtain — to anyone outside the class...
Aside: Zoom recording rules and recommendations, pt. 2

RULE: Unless instructors have a compelling pedagogical reason not to


do so, instructors must advise students of the following additional
measures they may take to protect their privacy:
Students may select audio-only participation in Zoom class sessions.
Students may access Zoom class sessions under a pseudonymous
username.
In order to facilitate class participation, students would be expected to
communicate their pseudonyms offline to instructors.
A BEST PRACTICE for instructors would be to call on/ address the
student using the student’s pseudonym.
RULE: Instructors must not share or disclose a Zoom recording of a
class session to anyone outside of the course, without first obtaining
the approval of the Office of the Vice Provost for Advances in
Learning.
Aside: Zoom recording rules and recommendations, pt. 3

RULE: The following information must be provided to students,


regarding the Zoom recording function:
Instructors can use Zoom to record class sessions.
If an instructor uses Zoom to record a class session, Zoom provides
audio and visual indicators to inform you when the recording starts,
stops, is in progress, and is paused/unpaused.
You may not yourself record a class session.
Links to class session recordings, if available, will be posted in the
Zoom meetings section of the Canvas course webpage.
Links to Zoom class session recordings will be removed at the end of
the academic term.
You may not disclose the link to/URL of a class session recording or
copies of recordings to anyone, for any reason. It is available to your
class only.
Aside: Zoom recording rules and recommendations, pt. 4

RULE: Unless you have a compelling pedagogical reason not to do so,


you should advise students of the following additional measures they
may take to protect their privacy:
You have the option to appear in an audio-only mode, such that your
webcam is disabled during the class.
You have the option to access Zoom class sessions under a
pseudonymous username.
In order to facilitate class participation, you are expected to
communicate your pseudonym offline to your instructor.”
(Harvard University Information Technology 2020)
Game theory: Terminology

For a strategic game ⟨I , (Ai ), (ui )⟩, a payoff profile is w ∈ RI such


that there exists an action profile (ai ) for which w = ui (ai ).
A feasible payoff profile is a convex combination of payoff profiles.
Player i’s minimax payoff is defined by

vi = min max ui (ai , a−i ).


a−i ∈A−i ai ∈Ai

A payoff profile w ∈ RI is (strictly) enforceable if and only if wi ≥ vi


(strictly, i.e., without equality) for all i ∈ I .
An action profile a ∈ A is (strictly) enforceable if and only if u(a) is
(strictly) enforceable.
Game theory: Terminology, pt. 2

Suppose we have fixed a notion of the overall payoff, i.e., the utility
representation on sequences of payoffs.
e.g., Suppose preferences on payoff sequences are given by the
limit-of-means
i.e., agent i has overall payoff
T
X wt
(wit ) 7→ lim i
.
T →∞ T
t=1

(Note that this assumes the limit exists, which is not always true.)
Then, v is the overall payoff profile of (at ) precisely when each vi is
the limit-of-means of (ui (at )).
Game theory: When specifying the overall payoff is enough
Suppose that agents can only use memory-n strategies,
i.e., their actions in the T th stage game GT can only depend on the
n-history, the action profiles of GT −n , . . . , GT −1 .
Then, after fixing the agents’ memory-n strategies,
the set of I -tuples of agents’ n-histories can be thought of as a finite
state space, on which each passing stage game probabilistically acts as
a Markov chain.
Under certain conditions, this Markov chain has a unique invariant
distribution which is also its limiting distribution.
Thus, the limit of means
T
X wt i
lim .
T →∞ T
t=1

always exists and is equal to the expected payoff profile of the


invariant/limiting distribution on the n-histories.
Game theory: Well-defining the limit-of-means preference

We will instead consider the setting in which agents have infinite


memory,
i.e., their actions can depend on the entire history, comprised of the
action profiles of all past stage games.
Then, the limit of means is not always well-defined.
Two ways to make the Nash equilibrium of undiscounted ∞ly repeated
games well-defined:
First, we can define the overall payoff by lim inf T →∞ T vi
P
t=1 T , the
pessimistic view of the payoff sequence;
or lim supT →∞ T vi
P
t=1 T , the optimistic view of the payoff sequence.
Second, we can create a new definition of Nash equilibrium specialized
to undiscounted ∞ly repeated games:
the limit-of-means Nash equilibrium is a sequence of actions profiles for
which any deviation by a player will result in a payoff sequence whose
limsup is not strictly larger than the previous limsup=liminf=lim.
Game theory: Undiscounted ∞ly repeated games

Want to characterize the overall payoff profiles of the Nash equilibria


in repeated games.
Easy direction: the overall payoff profile of any Nash equilibrium
(of an infinitely repeated game, whose overall payoff is given by the
limit of means)
is enforceable.

Proposition
Consider an infinitely repeated (T = ∞) strategic game ⟨I , (Ai ), (ui )⟩ in
which the players prioritize the limit of means. Every overall payoff profile
of a Nash equilibrium of the repeated game is an enforceable payoff profile
of G .

Proof: Otherwise, some player i can strictly gain by defecting. Q.E.D.


Game theory: Undiscounted ∞ly repeated games, pt. 2

“Converse” results—characterizing Nash equilibrium payoff profiles of


repeated games—are called folk theorems.
A Nash folk theorem for infinitely repeated games with the overall
payoff given by the limit-of-means:

Theorem (Aumann–Shapley)
Every feasible strictly enforceable payoff profile w is the overall payoff
profile of a Nash equilibrium of the repeated game.
Game theory: Undiscounted ∞ly repeated games, pt. 3

Proof. By definition, w is a convex combination of payoff profiles.


Thus, there exists a = (at ) such that
T
X ui (at )
lim = wi
T →∞ T
t=1

for all i ∈ I .

Exercise
Prove that there exists a = (at ) such that the above is true. (Hint: Begin
by showing that any point in the convex hull of a finite set of points S can
be approximated by rational-coefficient linear sums of points in S.)
Game theory: Undiscounted ∞ly repeated games, pt. 4

t ) of
For each i ∈ I , there exists a minimaxing action profile (p−i
players −i.
Define player i’s strategy si = (sit ) of the repeated game by the one
in which she plays sit = ait until the first defection of a player j ̸= i
from a,
after which in all subsequent actions, player i plays to punish the first
t ) .
defector j, i.e., sit = (p−j i
The strategy profile s is a Nash equilibrium.
Indeed, if player i defects, then she receives at most her minimax
payoff vi ,
which is < wi by the definition of strict enforceability. Q.E.D.
Game theory: Finitely repeated games

The lack of an infinite horizon changes things.


The outcome in the last stage game must be a Nash equilibrium,
and this casts a noncooperative shadow on the latter part of the
repeated game.
This noncooperative shadow envelops the entire repeated game
when every player, in every Nash equilibrium of the constituent game,
can only obtain her minimax payoff.
Let GT denote the T -times repeated strategic game
G = ⟨I , (Ai ), (ui )⟩ in which the players prioritize the mean.

Proposition
Suppose the payoff profile of each Nash equilibrium of G is precisely the
vector of minimax payoffs. Then, for any Nash equilibrium (at ) of GT ,
each at must be a Nash equilibrium of the constituent game G .
Game theory: Finitely repeated games, pt. 2

Proof: By reverse induction.


The last action profile, aT , must be a Nash equilibrium,
(whose payoff profile is then determined, as the vector of minimax
payoffs).
Assume that for t = τ + 1, . . . , T , proposition is proved.
Abstracting away the payoff profiles of stage games τ + 1, . . . , T as
minimax payoff vector acquisitions,
the “last” action profile, aτ , must be a Nash equilibrium,
(whose payoff profile is then determined, as the vector of minimax
payoffs).
Game theory: Finitely repeated games, pt. 3

The noncooperative shadow is in general not all-encompassing.


A Nash folk theorem for finitely repeated games GT in which players
prioritize the mean payoff:

Theorem (Benoit–Krishna)
Suppose G has a Nash equilibrium â for which each player’s payoff is
strictly greater than her minimax payoff vi . Then, for any feasible strictly
enforceable action profile a∗ and ε > 0, there exists T ∗ such that if
T > T ∗ , a vector that only differs from ui (a∗ ) in each entry by at most ε
is the overall payoff profile of a Nash equilibrium of the repeated game GT .
Game theory: Finitely repeated games, pt. 4
Proof. Let L be arbitrary; we will set it later so that deviations are
unprofitable.
For each player i, choose a minimaxing action profile p−i of players −i.
Construct player i’s abstract machine Mi that determines her strategy:
The set of states of Mi is

{Standardt : 1 ≤ t ≤ T − L} ∪ {Punish(j) : j ̸= i} ∪ {Nash}.

In state Standardt , player i does the action ai∗ , and the state updates
to Standardt+1 if t < T − L and to Nash if t = T − L.
In the permanent state Nash, player i does the action âi .
At any point when a player j ̸= i plays an action ̸= a∗ during her
standard state Standardt , all other players’ states update to the
permanent state Punish(j),
and when acting in this state, they play their respective actions in the
minimaxing action profile p−j .
Game theory: Finitely repeated games, pt. 5
We choose L sufficiently large so that

max ui (ai , a−i ) − ui (a∗ ) ≤ L(ui (â) − vi )
ai ∈Ai

for all i ∈ I ,
which guarantees that no player can profitably deviate in a standard
state.
Finally, choose T ∗ large enough so that
(T ∗ − L)ui (a∗ ) + Lui (â)
− ui (a∗ ) ≤ ε
T∗
i.e., so that the mean payoff of player i differs from ui (a∗ ) by at most
ε, for all players i. Q.E.D.
Exercise
Prove this result for the weaker hypothesis that G has Nash equilibria
aˆ1 , . . . , aˆI for which player i has payoff strictly greater than her minimax
payoff vi .
Game theory: Discounted ∞ly repeated games
With (limit of) mean preferences, a dichotomy:
Infinitely repeated games have full potential for cooperation,
since any profit from one-time deviation cannot matter in the grand
scheme of things.
Finitely repeated games have limited potential for cooperation,
because the above can matter.
An analogous dichotomy arises when we take into account
nonconstant temporal preferences,
i.e., agents who are impatient rather than patient.
Suppose now that preferences on payoff sequences are given by
exponential discounting, i.e., for a fixed δ ∈ (0, 1), the overall payoff is

1 X t−1 t
(wit ) 7→ δ wi .
1−δ
t=1
Game theory: Discounted ∞ly repeated games, pt. 2

Nash folk theorem for infinitely repeated games with the overall payoff
given by exponential discounting:

Theorem (Fudenberg–Maskin)
Let w be a feasible strictly enforceable payoff profile of G = ⟨I , (Ai ), (ui )⟩.
For every ε > 0, there exists δ ∈ (0, 1) such that

δ > δ =⇒ the δ-discounted infinitely repeated game of G


has a Nash equilibrium whose overall payoff profile w ′
differs from w in each entry by at most ε.

Exercise
Prove the Fudenberg–Maskin theorem. Hint: A proof similar to that of
Aumann–Shapley works.
Game theory: Extensive-form games
An extensive game is comprised of the following:
a finite set I of players;
a history set H of sequences (finite or infinite) that satisfies three
properties,
the empty sequence ∅ is in H,
every subhistory (ak )k=1,...,L of (ak )k=1,...,K ∈ H is also in H, and
if every subhistory of an infinite sequence (ak )k=1,...,∞ is in H, then it
is also in H;
a player function P : H\ → I ∪ {c}, where c denotes “chance”;
for every h ∈ P −1 (c), a probability distribution fc (·|h) on A(h), the
action space of history h;
for each player i, a partition Ii of P −1 (i) into information sets, each
of which has shared action space A(h);
for each player i, a complete and transitive preference relation ≾i on
the set Z of terminal histories.
Game theory: Extensive-form games, pt. 2
A strategy of player i is a function that, to every nonterminal history
h ∈ p −1 (H \ Z ), assigns an action in A(h).
A Nash equilibrium is a strategy profile (si∗ )i∈I such that every player i
cannot deviate to another strategy to achieve a better outcome, i.e.,
∗ ∗
O(s−i , si ) ≾i O(s−i , si∗ ) for all strategies si of player i.

The previous type of games we looked at, we call a strategic game.

Exercise
Show that every strategic game (thought of as simultaneous actions by the
players) can be converted into an extensive game. Show that every
extensive game (thought of as: each player submits a strategy in the
beginning) can be converted into a strategic game.

Thus, our definitions and results for strategic games carry over to
extensive games (e.g., on Nash equilibria).
Game theory: The chain store’s dilemma

Figure: Extensive game representation of chain store’s dilemma (Ho–Su 2011)


Game theory: Nash equilibria of the chain store’s dilemma

The chain store’s dilemma has multiple Nash equilibria.


Every terminal history in which the outcome of every interaction is
(5,1) or (2,2) is achieved by a Nash equilibrium:
the former by the (sub)strategies (Enter 7→ Fight, Stay Out),
the latter by the (sub)strategies (Enter 7→ Share, Enter).
But assuming the chain store is rational, its threat to fight in its
deterrent substrategy, Enter 7→ Fight, is not credible.
When considering temporal structure, we need a new definition of
equilibrium that incorporates the axiom that rational agents would
change course for self-interest.
Game theory: Subgame perfect equilibrium
Suppose all information sets are singletons (perfect information).
A subgame perfect equilibrium of an extensive game Γ is a strategy
profile that is a Nash equilibrium in every subgame of Γ.
The chain store’s dilemma only has one SPE,
((Enter 7→ Share)10 , Enter, . . . , Enter).
To show this, the following criterion for SPE is useful:

Exercise (One deviation property)


Prove this. Let Γ = ⟨I , H, P, (≾i )⟩ be a finite-horizon extensive game of
perfect information. The strategy profile s ∗ is an SPE if and only if for
every i ∈ I , we have for all h ∈ P −1 (i)
∗ ∗
Oh (s−i |h , si ) ≾ Oh (s−i |h , si∗ |h )

for every strategy si in the subgame Γ(h) that differs from si∗ |h only in the
entry of the action after the initial history of Γ(h). Hint: reverse induction.
Game theory: Punishing for a limited period of time

We can analogously define SPE for repeated strategic games:


a strategy profile such that for any subgame, no player has a deviation
to strictly profit.
Perfect folk theorem for infinitely repeated games with overall payoff
given by the limit of means:

Theorem (Aumann-Shapley/Rubinstein)
Every feasible strictly enforceable payoff profile w is the overall payoff
profile of an SPE of the repeated game.
Game theory: Punishing for a limited period of time, pt. 2

Proof: Like last time, there exist action profiles at whose


limit-of-means payoff vector is w .
A deviation (say, by player i) from this coordinated cooperation are
punished in the minimax way,
but only for a finite period of time such that the gain from deviating is
outweighed by the summed difference between player i’s minimax
payoff and her coordination’s payoff in the period of time.
No deviation is strictly profitable.
Furthermore, no deviation is strictly profitable in any subgame,
since what happens in a finite period of time does not change the
limit-of-means payoff of any punisher. Q.E.D.
Game theory: Rewarding the punishers
Perfect folk theorem for infinitely repeated games with overall payoff
given by exponential discounting.
To make punishment credible, the punishment of a deviator (say,
player i) only happens for a finite period of time,
and the punishers are rewarded afterwards to offset the sacrifice they
made to minimax punish i.
Requires the ability to reward punishers without rewarding player i.

Theorem (Fudenberg–Maskin)
For feasible strictly enforceable action profile a∗ , suppose there exists a
collection a(1), . . . , a(I ) of strictly enforceable action profiles such that for
every player i, we have a(i) ≺i a∗ and a(i) ≺i a(j) for all j ̸= i. Then,
there exists δ ∈ (0, 1) such that

δ > δ =⇒ the δ-discounted infinitely repeated game of G


has an SPE whose overall payoff profile is w .
Game theory: Rewarding the punishers, pt. 2

Proof: For convenience, set a(0) = a∗ .


Set of states: C (n) for {0} ∪ I ,
P(j, t) for j ∈ I and 1 ≤ t ≤ L.
Initial state: C (0).
Output: C (j) 7→ a(j),
P(j, t) 7→ (p−j , b(p−j )), where p−j is the minimax punishment
(I − 1)-tuple of actions, and b(p−j ) is the best response to it.
Transitions: From C (j), stay in C (j) until i deviates. Then, switch to
P(i, L).
From P(i, t), if j ̸= i deviates, then switch to P(j, L);
otherwise switch to P(i, t − 1), where P(i, 0) means to switch to C (i).
Game theory: Rewarding the punishers, pt. 3

We set L and δ so that no profitable deviations exist in any subgame.


For M = maxi,a ui (a), take L large enough so that for all i ∈ I and
j ∈ {0} ∪ I ,
M − ui (a(j)) < L(ui (a(j)) − vi ).
To deter a deviation from state C (j), choose δ 1 close enough to 1 so
that for all δ ∈ (δ 1 , 1), i ∈ I , and j ∈ {0} ∪ I ,
L
X
M − ui (a(j)) < δ k (ui (a(j)) − vi )
k=1

and in particular (since ui (a(i)) < ui (a(j)))


L
X ∞
X ∞
X
k ℓ
M+ δ vi + δ ui (a(i)) < δ k ui (a(j)).
k=1 ℓ=L+1 k=0
Game theory: Rewarding the punishers, pt. 4
To deter a deviation from state P(i, t) by player j, we need that
XL ∞
X t−1
X X∞
M+ δ k vj + δ ℓ uj (a(j)) < δ k uj (p−i , b(p−i ))+ δ ℓ uj (a(i)).
k=1 ℓ=L+1 k=0 ℓ=t

This can be guaranteed by necessitating that δ ∈ (δ 2 , 1) for δ 2


sufficiently close to 1, since uj (a(j)) < uj (a(i))).
Take δ = max(δ 1 , δ 2 ). Q.E.D.?

Exercise
The above proof of the folk theorem implicitly requires that any deviation
from the minimax punishment (I − 1)-tuple would be detected, which
requires some restriction. Examples include the condition that the
comprising actions be pure strategies (whereas in general, they can be
mixed strategies; e.g., odds and evens) and the condition that each
punishing player’s random generator can be observed after her action.
Complete the proof of the folk theorem for the general setting.
Game theory: More folk theorems

Figure: “Folk theorem (game theory)”, Wikipedia; references in article


Mechanism design: Reversing game theory

We have studied how rational agents act when in a strategic game.


We now study the reverse problem:
Can we design a system of game rules so that,
no matter which players (with whichever preferences over the set of
outcomes) participate,
the set of predicted outcomes = the set of socially desired outcomes?
Mechanism design: Reversing game theory, pt. 2

Let C be a set of outcomes.


A strategic form with players in I and consequences in C is defined by
⟨(Ai ), g ⟩,
i.e., an action space Ai for each player i ∈ I and an outcome function
g : A → C.
An environment ⟨I , C , P, G ⟩ consists of:
a set of players I ,
a set of outcomes C ,
a set of I -tuples of preference profiles P, and
a set G of strategic forms with players in I and consequences in C .
For an environment ⟨I , C , P, G ⟩, a strategic game is given by a pair of
choices ((≾i ), x) ∈ P × G .
Specifically, it is the strategic game in which player i has the
preference over A induced (through g ) by her preference ≾i over C .
Mechanism design: King Solomon’s dilemma

Players 1 and 2 both claim to be the real mother of a baby.


Each of them know the truth about who is the real mother, but she
cannot prove her motherhood.
King Solomon tries to extract the truth by threatening to cut the baby
in half and give one to each player,
knowing that the false mother would prefer this over the outcome in
which the true mother would get the baby,
but that the true mother would rather the false mother get the baby
than this.
Mechanism design: King Solomon’s dilemma, pt. 2

The players are I = {1, 2}.


The outcome space C = {a, b, d},
comprised of a (player 1 gets the baby), b (player 2 gets the baby),
and d (the baby is cut in half).
The preference profiles P = {(≾1 , ≾2 ), (≾′1 , ≾′2 )},
where (≾i ) represents the state of the world in which player 1 is the
true mother,
d ≺1 b ≺1 a and a ≺2 d ≺2 b;
and (≾′i ) represents the state of the world in which player 2 is the true
mother,
b ≺′1 d ≺′1 a and d ≺′2 a ≺′2 b.
G is the set of all strategic forms with players in I and consequences
in C .
Mechanism design: Predicted outcomes

Recall that we alluded to the notion of a predicted outcome.


Prediction: players’ action profile will be a Nash equilibrium.
Stronger prediction: each player will play a weakly dominant strategy,
if it exists,
e.g., bid one’s true valuation in a Vickrey auction.
(Unlike a Nash equilibrium, a dominant strategy equilibrium, i.e., an
action profile comprised completely of weakly dominant strategies,
may not necessarily exist.)
A solution concept for the environment ⟨I , C , P, G ⟩ is a
correspondence S : P × G ⊸ C .
S = Nash (the output is the set of Nash equilibria of the game given
by a choice in P × G ) and S = DSE (the output is the set of
dominant strategy equilibria) are solution concepts:
Mechanism design: Socially desired outcomes
From this point on, we fix an arbitrary environment ⟨I , C , P, G ⟩.
We wish to construct a strategic form for which the set of predicted
outcomes is equal to the set of socially desired outcomes.
A social choice rule is a correspondence f : P ⊸ C .
A strategic form ⟨(Ai ), g ⟩ ∈ G is said to S-implement the social
choice rule f
if and only if for every preference profile (≾i ) ∈ P, we have

S((≾i ), ⟨(Ai ), g ⟩) = f ((≾i )),

i.e., the set of predicted outcomes equals the set of socially desired
outcomes.
One example of a social choice rule is f = Pareto,
which maps an I -tuple of preferences (≾i ) over C to the subset of C
comprised of outcomes that are not Pareto dominated by another
outcome, assuming the players have preferences (≾i ).
Mechanism design: S-implementation

Which social choice rules can we DSE-implement?


Which social choice rules can we Nash-implement?
Policymakers and market designers want to smartly design
non-exploitable systems of incentives that lead to (some notion of)
socially desired outcomes,
regardless of the participants’ private information (preferences).
Moral philosophy: Which social choice rules should (in the normative
sense) we implement?
Mechanism design: Solomon’s S-implementation problem

Recall King Solomon’s environment ⟨{1, 2}, {a, b, d}, {(≾i ), (≾′i )}, G ⟩
f = TrueMother is defined by (≾i ) 7→ 1 and (≾′i ) 7→ 2.
His original strategic form: Announce d will happen.
Hope exactly one player will say “don’t cut the baby, give it to the
other player.”
Then, this player is the true mother, so do a or b depending on
whether she is player 1 or 2.
If both players say this, then...?
Despite his wisdom, strategic form is not completely defined; this trick
probably won’t work again when the dilemma reoccurs in the future.
Can King Solomon devise a strategic form which can DSE -implement
the social choice function f = TrueMother?
Can a strategic form Nash-implement f = TrueMother?
Mechanism design: DSE-implementation
It turns out that DSE-implementation is difficult.
A social choice rule f is dictatorial if and only if there is a player j ∈ I
such that for any (≾i ) ∈ P,

f ((≾i )) ⊆ {a ∈ C : b ≾j a for all b ∈ C }.

Theorem (Gibbard—Satterthwaite)
Suppose that in the environment ⟨I , C , P, G ⟩, the cardinality |C | is at least
three, P is the set of all strict preference profiles over C , and G is the set
of all strategic forms of the environment. If a social choice rule f is
DSE-implementable and satisfies

for every a ∈ C , there exists (≾i ) ∈ P such that f ((≾i )) = {a},

then f is dictatorial.
Mechanism design: DSE-implementation, pt. 2

Proof. By assumption, we have a strategic form ⟨(Ai ), g ⟩


DSE-implementing f .
We will use this to construct another strategic form ⟨(A∗i ), g ∗ ⟩.
In it, each action space A∗i is the set of preference relations (note: not
profiles) such that
for any (≾i ) ∈ P, the action profile (≾i ) is a DSE equilibrium of the
strategic game given by plugging in (≾i ) into ⟨(A∗i ), g ∗ ⟩.
Furthermore, it will satisfy g ∗ ((≾i )) ∈ f (≾i ).
Mechanism design: DSE-implementation, pt. 3

The set of weakly dominant actions of any player j in ⟨(Ai ), g ⟩


depends only on ≾j .
So, choose a weakly dominant action aj (≾j ) ∈ Aj for each player j
and each preference relation ≾j .
Since ⟨(Ai ), g ⟩ DSE-implements f , it follows that

g (aj (≾j )) ∈ f ((≾j , ≾−j ))

for any preference relations ≾−j of players −j.


Define g ∗ ((≾i )) = g (ai (≾i ))).
We have shown above that g ∗ ((≾i )) ∈ f ((≾i )).
By construction, (≾i ) is a DSE equilibrium of the strategic game given
by plugging in (≾i ) into ⟨(A∗i ), g ∗ ⟩.
Mechanism design: DSE-implementation, pt. 4
Observe that g ∗ is a single-valued social choice rule; it is an
honest-to-god function P → C (a ranked choice voting system).
Since f satisfies the theorem condition, its restriction g ∗ also satisfies
it, i.e., is surjective.
Furthermore, it is stable, defined by the property that

g (≾−i , ≾′i ) ≾i g (≾−i , ≾i ) for all strict orderings ≾′i .

We can thus apply the following to conclude that g ∗ is dictatorial.

Theorem (Gibbard—Satterthwaite Impossibility Theorem)


Retain the hypotheses on the environment ⟨I , C , P, G ⟩ from the
Gibbard–Satterthwaite Theorem. If the social choice function f : P → C is
surjective and stable, then it is dictatorial.

Colloquially, the only incentive-compatible ranked choice voting


systems are dictatorial (under the aforementioned hypotheses).
Mechanism design: Order theory

Our proof of the Gibbard—Satterthwaite Impossibility Theorem uses


order theory: specifically, the notion of filters and ultrafilters.
A filter F on a nonempty set I is a family of subsets satisfying
A, B ∈ F =⇒ A ∩ B ∈ F,
A ∈ F and A ⊆ B ⊆ I =⇒ B ∈ F,
and ∅ ∈
/ F.
An ultrafilter F is a filter that is maximal among filters with respect
to inclusion: one that is not contained in any other filter.
An ultrafilter F is principal if and only if it is precisely the family of all
subsets containing some element i ∈ I .

Exercise
Show that if I is finite, every ultrafilter on I is principal.
Mechanism design: Proof of Impossibility Theorem

Our proof of the Gibbard—Satterthwaite Impossibility Theorem will


show that a certain family of subsets in I is an ultrafilter.
For every distinct pair x, y ∈ C , define the preventing family Fxy by
the family of all subsets S ⊆ I thathas the property that
if all members of S strictly prefer x to y in the preference profile (≾i ),
then f ((≾i )) ̸= y .
We will show that all the preventing families Fxy are equal,
and that this family is an ultrafilter.
Thus, it is principal for some player j ∈ I , which implies that f is
dictatorial favoring player j.
Mechanism design: Proof of Impossibility Theorem, pt. 2
To show that the preventing family is a (principal) ultrafilter, we will
use the following sufficient condition.
Lemma
If a family F of subsets of I satisfies
1 X ∈ F and X ⊆ Y ⊆ I =⇒ Y ∈ F,
2 X ∈ F ⇐⇒ (I \ X ) ∈
/ F, and
3 X , Y , Z ∈ F =⇒ X ∩ Y ∩ Z ̸= ∅,
then it is an ultrafilter.
Proof. If ∅ ∈ F, then we have X ∈ / F by condition (2), a
contradiction of condition (1).
Assume A, B ∈ F, and suppose that A ∩ B ∈ / F.
Then, I \ (A ∩ B) ∈ F by condition (2),
but A ∩ B ∩ (I \ (A ∩ B)) = ∅ ∈ F by condition (3), a contradiction
of condition (1).
Thus, F is a filter, and by the second hypothesis, an ultrafilter. Q.E.D.
Mechanism design: Proof of Impossibility Theorem, pt. 3

We begin the proof of the Gibbard–Satterthwaite Impossibility


Theorem.
Let Vxy ((≾i )) = {i ∈ I : y ≺i x}, the set of voters preferring x to y .

Lemma
A social choice function f : P → C is stable if and only if

[x ̸= y , f ((≾i )) = x, Vxy ((≾i ) ⊆ Vxy ((≾′i ))] =⇒ f ((≾′i )) ̸= y .

Proof. We say that (x, y ) is optimal for a strict ordering ≾ if the top
choice is x and the second top choice is y .
We say that {x, y } is optimal for a strict ordering ≾ if the top two
choices are, in any order, x and y .
Mechanism design: Proof of Impossibility Theorem, pt. 4

Define the self-correspondences φxy


j : P ⊸ P by

φxy ′ ′ ′
j ((≾i )) = {(≾i ) :≾−j =≾−j , {x, y } is optimal for ≾j ,
and (x, y ) is optimal for ≾′j if j ∈ Vxy ((≾i ))},

and φxy by the composition of all φxy


j .
This construction has the property that for any stable social choice
function f : P → C , we have

[f ((≾i )) = x and (≾′i ) ∈ φxy ′


j ((≾i ))] =⇒ f ((≾i )) = x;

and furthermore, by applying each φxy


j , we have

[f ((≾i )) = x and (≾′i ) ∈ φxy ((≾i ))] =⇒ f ((≾′i )) = x.


Mechanism design: Proof of Impossibility Theorem, pt. 5

We have shown that applying certain types of manipulations to the


preference profile does not change its value under f .
Such manipulations can get us from (≾i ) to (≾∗i ), defined by the
following:
if j ∈ Vxy ((≾i )), force (x, y ) to be optimal for ≾j (bring them to the
top);
if j ∈ Vyx ((≾′i )), force (y , x) to be optimal for ≾j ;
otherwise, force {x, y } to be optimal in either order;
do the above for each j.
By construction, (≾∗i ) is in both φxy ((≾i )) and φyx ((≾′i )).
But this means f ((≾∗i )) is both x and y , a contradiction.

Exercise
Prove the converse of this lemma.
Mechanism design: Proof of Impossibility Theorem, pt. 6
Due to the above lemma and the surjectivity of f , we obtain the weak
Pareto property: if every agent i prefers x to y , then y is not the
social choice.
Lemma
The preventing family Fxy is invariant under the choice of x ̸= y .

Proof. First, we show that Fxy ⊆ Fxz , thus showing that Fxy is
invariant under the choice of y .
Let S ∈ Fxy and T ∈ Fyz be arbitrary.
There exists a strict ordering (≾i ) such that x, y , and z are the top
three options of every agent, and
y ≾i z ≾i x for all i ∈ S \ T ,
x ≾i z ≾i y for all i ∈ T \ S,
z ≾i y ≾i x for all i ∈ S ∩ T , and,
x ≾i y ≾i z for all i ∈ I \ (S ∪ T ).
Mechanism design: Proof of Impossibility Theorem, pt. 7

Since every agent’s top three choices are x, y , z, the weak Pareto
property implies that f ((≾i )) is x, y , or z.
But we have constructed (≾i ) so that f ((≾i )) cannot be y or z.
Thus, it must be x.
Since Vxz ((≾i )) is equal to S,
it follows from the previous lemma that S ∈ Fxz .
(Because if f ((≾i )) = x, then for any other preference profile (≾′i )
that satisfies S ⊆ Vxz ((≾′i )), we have f ((≾′i )) ̸= z.)
A similar proof shows that Fxy ⊆ Fzy , thus showing that Fxy is
invariant under the choice of x. Q.E.D.

Exercise
Do the second case: write an analogous proof showing that Fxy ⊆ Fzy .
Mechanism design: Proof of Impossibility Theorem, pt. 8
We now show that F is an ultrafilter.
Condition (1) follows from the second last lemma and the weak Pareto
property.
Condition (2) follows from the weak Pareto property.
Condition (3): Suppose for the sake of a contradiction that
S, T , U ∈ F, but S ∩ T ∩ U = ∅.
We force a Cordocet cycle: there exists a strict ordering (≾i ) such
that x, y , and z are the top three choices for all agents,

x ≾i y if i ∈ S,

y ≾i z if i ∈ T , and
z ≾i x if i ∈ U.
By the weak Pareto property, f ((≾i )) must be one of x, y , and z.
But it cannot be any of them, by our construction. Contradiction.
Q.E.D. of G–S Impossibility Theorem, and thus, of the G–S Theorem
Mechanism design: Exercises

Exercise
What happens to the G–S Impossibility Theorem when |C | = 2?

Exercise
Use a similar argument to prove Arrow’s Impossibility Theorem: Let L
denote the set of strict orderings of C . If the cardinality |C | is at least
three, then it is impossible for a social aggregating function F : LI → L to
simultaneously satisfy the following axioms.
1 Unanimity: If (≾i ) satisfies that x ≾i y for all agents i, then xF (≾i )y .
2 Independence of irrelevant alternatives: For any two strict orderings
(≾i ) and (≾′i ) such that for all agents i, we have x ≾i y if and only if
x ≾′i y , it follows that xF ((≾i ))y if and only if xF ((≾′i ))y .
3 Non-dictatorship: No agent j enjoys the property that F ((≾i )) =≾j
for all (≾i ).
Mechanism design: When is DSE-implementation possible?

DSE-implementation can only be done non-dictatorially if the


preference profiles are assumed to be in a “nice” restriction.
For example, suppose that the agents’ utility functions are quasi-linear,
i.e., if there is an item and a currency allowed to vary from (−∞, ∞)
(the agents are allowed to go into arbitrary debt to pay for the item),
so that the utility can be expressed as the difference between the
valuation θi of the item and the amount of money paid.
Then, the social choice function defined
P by f ((θi )) = the set of
outcomes in which k̂ ∈ arg maxk i Ii=k θi wins the object, and pays
maxk̸=k̂ θk , is DSE-implementable by a Vickrey auction.

Theorem
The second-price, sealed-bid auction DSE-implements the above social
choice rule f .
Mechanism design: Generalized Vickrey auction
Before our proof, we generalize to the general Vickrey auction.
Let x̄ be a vector of auctioned objects.
Let vi (·) denote player i’s value function (true valuation of each
subvector of objects).
Each player self-reports a value function v̂i .
The auctioneer then chooses an allocation maximizing the summed
self-reported value functions,

x ∗ ∈ arg Pmax v̂i xi .


xi ≤x̄

v̂j (xj∗ ), where


P
Player i pays αi − j̸=i
X
αi = P max v̂j (xj ).
j̸=i xj ≤x̄
j̸=i

Notice that αi does not depend on the bid of player i.


Mechanism design: Example of generalized Vickrey auction

Suppose there are players 1, 2 and goods A, B.


They submit v̂i (A), v̂i (B), v̂i (AB) (WLOG assuming that v̂i (0) = 0).
WLOG assume that v̂1 (AB) > v̂2 (AB) and
v̂1 (A) + v̂2 (B) > v̂1 (B) + v̂2 (A).
Case 1: If v̂1 (AB) > v̂1 (A) + v̂2 (B), then player 1 wins both goods.
Player 1 pays v̂2 (AB) − v̂2 (0) = v̂2 (AB), and player 2 pays
v̂1 (0) − v̂1 (0) = 0.
Case 2: If v̂1 (AB) < v̂1 (A) + v̂2 (B), then player 1 wins good A and
player 2 wins good B.
Player 1 pays v̂2 (AB) − v̂2 (B), and player 2 pays v1 (AB) − v1 (A).
Mechanism design: Vickrey auction is incentive-compatible

Theorem (Vickrey)
Truthtelling is the unique weakly dominant bid for every player i.

Proof. Consider any fixed bid profile v̂−i for −i.


Assuming player i bids truthfully, choose a value-maximizing allocation
x ∗ and the resulting payment vector p ∗ ;
and assuming player i bids an arbitrary v̂i , choose a value-maximizing
allocation x̂ and the resulting payment vector p̂.
Player i’s utility from bidding v̂i is
X
v̂i (x̂i ) − p̂i = vi (x̂i ) − αi + v̂j (x̂j )
j̸=i
X
≤ Pmax (vi (xi ) + v̂j (xj )) − αi
k xk ≤x̄
j̸=i
Mechanism design: Vickrey auction is IC, pt. 2
Player i’s utility from bidding v̂i is
X
v̂i (x̂i ) − p̂i = vi (x̂i ) − αi + v̂j (x̂j )
j̸=i
X
≤ Pmax (vi (xi ) + v̂j (xj )) − αi
k xk ≤x̄
j̸=i
 
X
= vi (xi∗ ) +  v̂j (xj∗ ) − αi
j̸=i

= vi (xi∗ ) − pi∗ ,
.e., at most her utility from bidding her true valuation vi .

Exercise
Complete the proof: show that truthtelling is the only weakly dominant
strategy. This entails constructing a bid v̂−i such that a given non-truthful
bid v̂i gives strictly less utility than the truthful bid vi .
Mechanism design: Nash-implementation

Unlike DSE-implementation, Nash-implementation is possible in a


myriad of settings.
A social choice rule f : P ⊸ C is Maskin-monotonic if for any
preference profiles (≾i ), (≾′i ) ∈ P,

[c ∈ f ((≾i )) and there exist no i ∈ I and b ∈ C such that

b ≾i c and c ≺′i b] =⇒ c ∈ f ((≾′i )).


In Solomon’s environment, f = TrueMother is not Maskin-monotonic.
TrueMother selects the outcome in which the unique player which
prefers the other player get the baby than d (the death of the baby)
gets the baby,
but each player’s relative rankings of a and b are always the same in
either possible state of the world.
Mechanism design: Nash-implementation, pt. 2

Theorem (Maskin)
Let ⟨I , P, C , G ⟩ be an environment in which G is the set of all (relevant)
strategic game forms. If a social choice rule f : P ⊸ C is
Nash-implementable, then it is Maskin-monotonic.

Proof. We are given that f is Nash-implemented by some strategic


game form x = ⟨(Ai ), g ⟩.
/ f ((≾′i )) for (≾i ), (≾′i ) ∈ P.
Suppose that c ∈ f ((≾i )) and c ∈
Then, there is some action profile a ∈ A such that g (a) = c which is a
Nash equilibrium of the strategic game ((≾i ), x),
but not a Nash equilibrium of the strategic game ((≾′i ), x).
i.e., there is a player j for whom g (a) ≺′j g (a−j , aj ) for some deviating
action aj , even when g (a−j , aj ) ≾j g (a). Q.E.D.
Mechanism design: Nash-implementation, pt. 3

Exercise
Construct an environment and a Maskin-monotonic social choice function
that shows that the converse of the above theorem is not true.

Theorem (Maskin)
Let ⟨I , P, C , G ⟩ be an environment in which G is the set of all (relevant)
strategic game forms. Suppose that a social choice rule f : P ⊸ C is
Maskin-monotonic, and furthermore, that it has the no-veto-power
property [c ∈ f ((≾i )) whenever at least I − 1 players maximally prefer c].
Then, f is Nash-implementable.

Proof. By constructing the strategic game form x = ⟨(Ai ), g ⟩.


Mechanism design: Nash-implementation, pt. 4

The action spaces are Ai = P × C × Z≥0 .


First, if all players agree on the true preference profile (≾i ) ∈ P and
the outcome c ∈ f ((≾i )),
then define the outcome to be c.
Second, if all but one player agree on the above [preference profile
(≾i ) and outcome c ∈ f ((≾i ))],
the majority prevails (define the outcome to be c)
unless the exceptional player has bid an outcome b that does not
make her strictly better off compared to c,
in which case define the outcome to be b.
In all other cases, choose a loudest-shouting player (highest bid in
Z≥0 ) to prevail; define the outcome to be her bid c.
Mechanism design: Nash-implementation, pt. 5

Need to show that f ((≾i )) = Nash((≾i ), x).


Step 1: f ((≾i )) ⊆ Nash((≾i ), x). Let c ∈ f ((≾i )).
Everyone bidding ((≾i ), c, 0) is a Nash equilibrium,
since a single deviator either doesn’t change the outcome
or changes it to something she doesn’t strictly prefer.
Step 2: f ((≾i )) ⊇ Nash((≾i ), x). Let a∗ be a Nash equilibrium of the
game ((≾i ), x), say with outcome c ∗ .
Two cases of showing that c ∗ ∈ f ((≾i )):
Case 1: In a∗ , everyone agrees on (≾′i ) and c ∗ ∈ f ((≾′i )).
There exist no i ∈ I and b ∈ C such that b ≾′i c and c ≺i b,
because if there were, player i would deviate to ((≾i ), b, ·) to strictly
profit.
Thus, by Maskin-monotonicity, c ∗ ∈ f ((≾i )).
Mechanism design: Nash-implementation, pt. 6
Subcase 2.1: All ai∗ = ((≾′i ), c ∗ , m) are the same.
Then, c ∗ must be each player’s favorite outcome, since otherwise she
can shout louder to get any strictly preferred outcome.
The no-veto-power property is more-than satisfied, so c ∗ ∈ f ((≾i )).
Subcase 2.2: Some ai∗ ̸= aj∗ .
Need to show that c ∗ is the favorite of at least I − 1 players, so that
c ∗ ∈ f ((≾i )) by the no-veto-power property.
Since I ≥ 3, at least one of i and j (WLOG i) has the property that
the other players are not in agreement.
Player i may not be able to deviate to strictly gain by shouting loudly,
since all the other players may be in agreement in the above way.
However, for all players ℓ ̸= i, it is guaranteed that the other players
are not in agreement, so theoretically can deviate.
The fact that (by the definition of a Nash equilibrium) they don’t
deviate shows that c ∗ is their favorite.
So c ∗ ∈ f ((≾i )) by the no-veto-power property. Q.E.D
Descriptive human decision-making: Overview

This section will follow Park 2020.


When payoffs occur with high uncertainty, a human decision-maker
may not meaningfully retain observations of them,
a behavior which may have evolved to avoid risks from overcommitting
attention in a past environment.
We propose that in this case, she defaults to assuming the payoff
structure of said past environment when estimating future payoffs:
the then-fitness-maximizing, but now-irrational strategy of doing so.
A past environment in which humans learned either by innovation or
imitation, but a priori could not distinguish between the two, can
explain the non-monotonicity puzzle:
the empirical paradox that when a person does not meaningfully retain
observations of her past payoffs, her estimate of her ability to obtain
future payoffs—her confidence—varies non-monotonically with respect
to its true value.
Descriptive HDM: Non-monotonicity puzzle
In their experiment investigating human learning of a novel task,
Sanchez and Dunning (2018) unexpectedly found human confidence to
have three phases: a beginning phase of increase, an intermediate
phase of decrease, and a final phase that returns to increase.
In each experimental variant, subjects learned a new task having a
payoff structure with fixed uncertainty: classifying profiles with lists of
properties (e.g., symptoms) into categories (e.g., made-up diseases).
Specifically, the subjects attempted this type of task 60 times while
simultaneously giving their confidence: their self-estimate of the
probability of their answer being correct.
After each of their 60 answers, they received immediate feedback.
While the average confidence of subjects was non-monotonic with
respect to trial number in the aforementioned way, the proportion of
subjects that actually answered correctly grew monotonically.
Thus, confidence was on average non-monotonic, not only as a
function of trial number, but also as a function of the true value: the
true probability of giving the correct answer.
Descriptive HDM: Non-monotonicity puzzle, pt. 2

In retrospect, the non-monotonicity of confidence is not surprising,


given that it has surfaced in previous studies.
For instance, the experiments of Kruger and Dunning (1999) on
confidence as a function of true ability—as well as replications (Haun
et al. 2000; Burks et al. 2006)—have often yielded a non-monotonic
relationship between the two measured variables.
This literature has injected into the public consciousness a famous
non-monotonic graph attributed to the namesake Dunning–Kruger
effect: the cognitive bias in which incompetent people are unable to
recognize their incompetence.
Also, the work of Hoffman and Burks (2020) investigating truckers’
self-estimates of the number of miles they have driven each week has
found their average to be non-monotonic, and the average of the true
value, monotonic in the level of experience: consistent with the
findings of Sanchez and Dunning.
Descriptive HDM: Non-monotonic confidence graphs

Figure 1, Sanchez and Dunning 2018 Figure 2, Sanchez and Dunning 2018

Figure 3, Sanchez and Dunning 2018


Figure 4, Zawadka et al. (2019)
Descriptive HDM: Bayesian DM hypothesis
Bayes’ rule: For mental model M ∈ {Mθ : model parameter θ} and
observation E ,
P(E | M)P(M)
P(M | E ) = .
P(E )
P(M | E ) is the posterior, the conditional probability of the true
model being M given event E .
P(E | M) is the likelihood,
P(M) is the prior, and
P(E ) normalizes to make the probability distribution sum to 1.

Exercise (Independent of observation orders)


For a sequence of i.i.d. observations E = (e1 , . . . , en ),

P(E | M)
P(M | E ) = P P(M)
m P(E | Mθ )P(Mθ )
Q
where P(E | M) = k P(ek | M).
Descriptive HDM: Bayesian DM hypothesis, pt. 2
Bayesian inference: Let e1 , . . . , en ∼ p(X | Mθ ).
Start with prior distribution π(Mθ ) and update it to
 
p(Mθ | e1 , . . . , en )π(Mθ )
,
m(e1 , . . . , en ) θ
Z
m(e1 , . . . , en ) = p(e1 , . . . , en | Mθ )π(Mθ )dθ.

Bayesian CLT: In nice model spaces,


1
(π(Mθ | e1 , . . . , en ))θ ≈ N(θ̂, ))
nI (θ̂
for n large, where m̂ is the maximum likelihood estimator and I (θ0 ) is
the Fisher information at the true population parameter.
Relevant part: After a Bayesian DM makes many observations, her
prior converges to a normal distribution with the “true” mean and
vanishingly small variance.
Prior converges to the Dirac delta distribution at the true mean.
Descriptive HDM: Bayesian DM hypothesis, pt. 3

The “relevant part” of the Bayesian CLT can fail in many ways,
but the failure we have in mind is the following:
“...we should be skeptical of a person’s ‘Bayesianness’ if we either
consistently observe that she repeatedly changes her mind a relatively
large amount without growing more confident, or, conversely,
consistently ends up very confident but with relatively little
fluctuation in beliefs.” (Augenblick–Rabin 2019; emphasis mine)
Suppose for the sake of a contradiction that one’s self-reported
confidence is a prior that is updated Bayesianly.
Persistent overconfidence, i.e., a large, positive difference between the
prior and the observations’ mean, is a contradiction.
Descriptive HDM: Non-monotonic confidence graphs

Figure 1, Sanchez and Dunning 2018 Figure 2, Sanchez and Dunning 2018

Figure 3, Sanchez and Dunning 2018


Figure 4, Zawadka et al. (2019)
Descriptive HDM: BDM method
Subjects of Sanchez–Dunning experiments robustly overbid compared
to their true performance (mean of the past observations), even in a
Vickrey auction equivalent: the Becker–DeGroot–Marschak method.
“Specifically, participants were told that at the end of the zombie
diagnosis task one of their diagnoses would be selected randomly to
see whether they would win the additional $5. The confidence level
they expressed for that diagnosis, however, would determine whether
they would win the $5 based on the accuracy of their diagnosis or
instead in a random lottery that they could switch to. The key to the
lottery was that we would not announce the chance of winning until it
was time to play. The question the participant had to decide for
themselves was, would they rather bet that their diagnosis was right or
instead on the lottery for each possible chance of winning we might
name (e.g., 40%, 50%, 60%). In other words, for each diagnosis, they
were asked to indicate the probability level at which they would rather
switch from betting on their diagnosis to taking their chances on the
lottery...
Descriptive HDM: BDM method, pt. 2
...For example, participants were told that if they were 70% confident
in their diagnosis, that meant that they wanted to bet on their
diagnosis instead of any lottery with a chance of winning at 70% or
less, but that they would want to switch to the lottery if it offered a
chance of winning that was 71% or above. Similarly, a 40% confidence
meant they wanted to make the switch from betting on their diagnosis
to the lottery if the chance of winning at the lottery were 41% or
higher...
...We then randomly selected the same diagnostic case for everyone in
any experimental session and played the additional bet, paying off
those participants who won. The chance of winning the lottery was
announced to be 72%. For those who expressed confidence in their
diagnosis equal or greater than that, they were paid $5 if their
diagnosis was correct. For the rest, the experimenter consulted a
computerized random number generator, paying the participant if the
computer then generated a two-digit number (from 00 to 99) less than
72.” (Sanchez–Dunning 2018)
Descriptive HDM: Three puzzles
First, overconfidence dominates instead of decaying to zero, an
immediate contradiction of Bayesianness.
(Unlike in the current setting, overconfidence may be beneficial in
evolutionary-game-theoretic settings of group competition when
opponents’ levels of competitive ability are observed with error; see
Johnson and Fowler 2016.)
Second, confidence somehow always behaves in the same qualitative
way regardless of differences in the rate of increase of the true value,
the initial comparison between confidence and the true value, and
other properties of the observations.
Finally, confidence is non-monotonic with respect to the level of
experience, even when the true value is monotonic.
A human consumer may not always bid her rational valuation—the
Bayesian estimate of value added—in a Vickrey auction.
In fact, since her bid may vary non-monotonically with respect to her
rational valuation, the former may not even be a sufficient statistic for
the latter.
Descriptive HDM: Possibility of evolution

All three patterns are considered cognitive biases today.


However, we propose that they may be products of a rational
payoff-estimation strategy for a past environment: one with
unfavorable fitness tradeoffs from overcommitting attention to retain
observations of high-uncertainty payoffs
It is feasible that in the evolutionary past, the expected benefit of
retaining payoff observations sometimes fell short of the expected cost.
The former would have been negligible if payoffs had occurred with
high statistical noise.
The latter may have been large in comparison due to external risks
from overcommitting attention, such as increased vulnerability to
ambushes by other humans.
Bowles (2009) estimates that ancestral hunter-gatherers experienced a
14% rate of mortality due to warfare.
Descriptive HDM: Possibility of evolution, pt. 2
Indeed, one’s estimate of future payoffs may not meaningfully update
with high-variance observations of previous payoffs,
a phenomenon seen in gamblers who—even after observing numerous
realizations of payoff lotteries—fail to conclude that the lotteries have
negative expected value.
We propose that she may instead assume the payoff structure of said
past environment.
This explains underinference: why subjects who were incentivized by
a BDM mechanism to estimate their future payoff systematically
underinferred from previous payoff observations.
Indeed, not having retained these observations, the subjects may have
defaulted to using an innate estimate of the expected payoff optimized
for the evolutionary past.
This innate estimate can be greater than the Bayesian estimate—if
the underlying task is difficult—or less than it—if the task is
easy—which produces the hard-easy effect.
Descriptive HDM: Possibility of evolution, pt. 3

The non-monotonicity of this innate estimate of expected payoff, with


respect to the level of experience, can be explained by the uncertainty
early humans faced regarding whether a given task was learnable.
i.e., we introduce a distribution of tasks comprised of two types: those
requiring innovation learning and those requiring imitation learning.
Innovation learning is not guaranteed to complete in finite time,
because the task may be impossible.
On the other hand, this does not occur for imitation learning, since
the teacher has already completely learned the task.
The expected payoff is monotonically increasing when the current task
is guaranteed to complete in finite time, but eventually decays to zero
when the task may be impossibly difficult with positive probability.
Descriptive HDM: Possibility of evolution, pt. 4

Thus, we hypothesize that the non-monotonicity of the past


environment’s expected payoff arises from its piecewise definition.
The increasing, then decreasing portion of the expected payoff
function is conditional on the fact that the current task may be of
either innovation or imitation learning.
The final increasing portion is conditional on having ruled out tasks of
innovation learning, because these tasks should optimally be quit at an
intermediate level of experience.
Descriptive HDM: Innovation vs. imitation

Evidence shows that even in hunter-gatherer societies like those of the


evolutionary past, social learning occurs before individual learning;
in fact, their adolescents learn almost entirely from adults rather than
from solitary innovation (Lew-Levy et al., 2017).
However, just because an adult purports to be a teacher who has
already learned the task at hand—as non-science-based healers did, for
instance—that does not mean that she actually is one.
The learner would thus stand to benefit from knowing whether her
teacher is a genuine expert giving beneficial advice—imitation
learning—or a charlatan whose advice does not meaningfully
help—innovation learning—especially since a charlatan cannot
guarantee that the task is learnable.
Descriptive HDM: Innovation vs. imitation, pt. 2

However, we hypothesize that the learner cannot initially differentiate


between the learning speeds of the two types of tasks, because they
are too similar.
Fortunately, the speed of learning increases over time
(Sanchez–Dunning, 2020).
We hypothesize that it increases faster for imitation learning than it
does for innovation learning, which enables the learner to eventually
differentiate between the two types of tasks by a time-measurement
experiment.
Thus, the learner performs a time-measurement experiment at the
aforementioned intermediate level of experience, quitting if the type of
learning is innovation and not doing so if it is imitation.
Descriptive HDM: Learning from charlatans
A charlatan whose advice consistently yields low payoffs is vulnerable
to coordinated discrediting from the students who observe them and
thereby decide to quit.
However, if payoffs occur with high uncertainty, then their
observations by default may not be retained,
so—under our theory of learning—quitting due to a lack of meaningful
teaching may instead occur at a specific level of experience.
In that case, the points of time at which her students quit may differ,
since they have different levels of experience in general.
Thus, charlatans may be much less vulnerable to coordinated
discrediting from their students when payoffs occur with high
uncertainty or are unobserved for other reasons.
This may help explain charlatans’ lasting influence on human society
(de Francesco, 1939),
and by extension, the lasting evolutionary pressure posed by the risk of
being forever led astray by their unhelpful advice.
Descriptive HDM: Application to education economics
In addition, our theory of learning and quitting in the context of
unobserved payoffs may help explain the disagreement in the empirical
literature as to the efficacy of educational interventions,
such as additional schooling or decreased class size.
Under our hypothesis that quitting due to a lack of meaningful
teaching occurs at a singular level of experience,
such educational interventions—which marginally lengthen the time
students spend meaningfully learning from a teacher—can either be:
unusually effective (Angrist and Krueger, 1991; Harmon and Walker,
1995; Acemoglu and Angrist, 1999; Duflo, 2001; Oreopoulos, 2006a,b;
Aakvik et al., 2010; Unterman, 2014; Barrow et al., 2015).
or essentially ineffective (Meghir and Palme, 2005; Evan et al., 2006;
Pischke and Wachter, 2008; Grenet, 2013; Stephens and Yang, 2014).
The outcome depends on whether the students’ level of experience is
located just below the distinguished quitting point,
since passing it while learning from a teacher ensures that they will
not give up in the future.
Descriptive HDM: Uniform convergence

Suppose fn : [a, b] → R is a pointwise convergent sequence of


functions, say f (x) = limn→∞ fn (x) for each x ∈ [a, b].
Pointwise convergence does not allow for interchanging limn→∞
and limx→c .
e.g., fn (x) = x n for x ∈ [0, 1].
Pointwise
Rb convergence does not allow for interchanging limn→∞
and a (· · · )dx.
sin(n2 x)
e.g., fn (x) = n .
Pointwise convergence does not allow for interchanging limn→∞
and d/dx.
2n2 x
e.g., fn (x) = for x ∈ [0, 1].
(1+n2 x 2 )2
Descriptive HDM: Uniform convergence, pt. 2
A pointwise convergent sequence fn : [a, b] → R is uniformly
convergent if for every ε > 0, there exists Nε such that
n ≥ Nε =⇒ |f (x) − fn (x)| < ε for all x ∈ [a, b]
Assuming fn uniformly converges to f , prove the following.
Exercise
If each fn is continuous, then the uniform limit f is continuous.

Exercise
If each fn is (Riemann) integrable, then the uniform limit f is integrable and
Z b Z b
lim fn (x)dx = f (x)dx
n→∞ a a

Exercise
If each fn is continuously differentiable such that fn′ uniformly converges to
g , then the uniform limit f is continuously differentiable such that f ′ = g .
Descriptive HDM: Guaranteeing uniform convergence

Theorem (Arzelà–Ascoli)
Let (fn ) be a sequence of continuous functions [a, b] → R that is uniformly
bounded

[There exists M such that |fn (x)| ≤ M for all x ∈ [a, b] and n]

and uniformly equicontinuous

[For every ε > 0, there exists δ > 0 such that for all x, y ∈ [a, b] and n,
|x − y | < δ =⇒ |fn (x) − fn (y )| < ε].

Then, there exists a subsequence (fnk ) that is uniformly convergent.

Exercise
Prove the Arzelà–Ascoli theorem.
Descriptive HDM: Guaranteeing uniform convergence, pt. 2

Theorem (Dini)
Let (fn ) be a sequence of continuous functions [a, b] → R that is monotone

[fn (x) ≤ fn+1 (x) for all x ∈ [a, b] and n]

and pointwise converges to f . Then, the convergence is uniform.

Exercise
Prove Dini’s theorem. 6$PS7
Descriptive HDM: The learning model
The human decision-maker (DM) faces a task that requires an amount
a ∈ R>0 ∪ {∞} of knowledge to master.
(Information in the complement of payoff observations is not
straightforward to quantify—hence our use of more general model
parameters than previous papers on learning.)
She knows b ≤ a of this amount of knowledge.
The values of b and a, once fixed, deterministically fix the expected
marginal payoff fa (b) the DM can achieve.
By scaling, we suppose that fa : [0, a] → [0, 1], which we also assume
is continuous.
When fixing b, the payoff fa (b) is decreasing in a;
when fixing a, it is increasing in b.
We suppose that fa (a) = 1, the maximum payoff.
All payoffs are subject to time-discounting by an exponential factor
δ ∈ (0, 1).
Descriptive HDM: The learning model, pt. 2
The amount of knowledge that the DM knows, b, increases over time
in discrete jumps, each following the acquisition of a payoff in the
form of a high-uncertainty lottery, until it reaches a.
b increases as a discrete learning function

LS : R≥0 → R≥0 ,

a right-continuous step function for which LS (0) = 0 and


limt→∞ LS (t) = ∞.
Specifically, b = min(LS (t), a), where t denotes the point in time.
The step intervals [ti−1 , ti ] of LS represent the learning periods, each
of which comprises one of the DM’s attempts at the task.
At the end of a learning period [ti−1 , ti ], it yields a payoff of expected
value Z ti
fa (min(LS (ti−1 ), a)) δ t dt,
ti−1

after which b jumps discontinuously to the next value of LS .


Descriptive HDM: The learning model, pt. 3

A Bayesian DM would be able to learn from payoff observations.


An obvious example of this occurs when the payoffs fa (b) are realized
with no uncertainty. Then, the observed values of fa (b) may comprise
a sufficient statistic for the true value of a.
However, recall that humans by default may not retain observations of
payoffs when they occur in the form of high-uncertainty lotteries,
a behavior that may have evolved in response to unfavorable fitness
tradeoffs from overcommitting attention.
Thus, only the information in the complement of payoff
observations—which we aggregate into an abstract statistic, the level
b of knowledge—is available to the human DM in our setting.
Descriptive HDM: The learning model, pt. 4

Let t1 < t2 < · · · denote the jump discontinuities of LS , which bound


the learning periods.
Let bi = LS (t) for ti−1 ≤ t < ti denote the level of knowledge that
the DM reaches after the ith learning period,
and ∆i = ti − ti−1 denote the length of time of the ith learning period,
where we have used the notation t0 = 0 and b0 = 0.
Note that pairs of sequences {∆i }i>0 and {bi }i>0 (where all ∆i > 0
and 0 < b1 < b2 < · · · ) bijectively correspond to discrete learning
functions.
The DM’s total payoff from time 0 to t ∗ ∈ [0, ∞) is
X Z ti
ΠLS (t ∗ ) = fa (min(LS (ti−1 ), a)) δ t dt.
i>0:ti ≤t ∗ ti−1
Descriptive HDM: The continuous learning model

Since the human DM cannot retain payoff observations by assumption,


we can employ a continuous approximation of nature, assuming its
discrete learning function is sufficiently fine.
Specifically, we consider a sequence of discrete learning functions LSn
that monotonically converge to
a continuous learning function L : R≥0 → R≥0 , a bijective, strictly
increasing function.
The DM’s payoff from time 0 to t ∗ in the corresponding continuous
learning model is
Z t∗

ΠL (t ) = δ t fa (min(L(t), a))dt,
0

which ΠLSn (t ∗ ) converges to as n → ∞, by the dominated


convergence theorem.
R
Descriptive HDM: When do lim and commute?

Figure: Convergence theorems (Math3ma, 2015)


Descriptive HDM: Convergence theorems

Theorem (Dominated convergence theorem)


Suppose fn : R → R is a sequence of measurable functions such that
fn → f pointwise almost everywhere, and there exists a Lebesgue-integrable
function (definition: measurable + Lebesgue integral is finite) g ≥ |fn | for
all n. Then, their Lebesgue integrals satisfy
Z Z
lim fn = f.
n→∞ X X
Descriptive HDM: Convergence theorems, pt. 2

Theorem (Monotone convergence theorem)


Suppose fn : X → [0, ∞) is a monotone sequence of measurable functions
on a measure space (X , Σ, µ) such that fn → f pointwise almost
everywhere. Then, their Lebesgue integrals satisfy
Z Z
lim fn dµ = fdµ.
n→∞ X X

Exercise
Prove a version of the monotone convergence theorem under the additional
assumption that fn are Riemann integrable. (Hint: Start from Dini’s
theorem).
Descriptive HDM: Convergence theorems, pt. 3

Theorem (Fatou’s lemma)


Suppose fn : X → [0, ∞] is a sequence of nonnegative functions on a
measure space (X , Σ, µ). Then, lim inf n→∞ fn is measurable and their
Lebesgue integrals satisfy
Z Z
lim inf fn dµ ≤ lim inf fn dµ.
X n→∞ n→∞ X
Descriptive HDM: Two types of tasks
Tasks that require innovation learning (task of the first type), which
occur with probability 1 − q
vs. tasks that require imitation learning (task of the second type),
which occur with probability q.
A continuous learning model (L1 , L2 ) is defined by the data of
the continuous learning functions Lj of tasks of the jth type,
the probability distributions µj of difficulty values a of tasks of the jth
type,
the aforementioned probability value q ∈ [0, 1],
and the payoff functions fa (·).
A discrete learning model (LS , LS′ ) is defined by the same data,
except the continuous learning functions Lj are replaced with discrete
learning functions: LS for tasks of the first type and LS′ for tasks of
the second type.
Additionally, we require that the values bi to which the DM’s level of
knowledge jumps at the end of the ith learning period are shared
between LS and LS′ .
Descriptive HDM: Limit of discrete learning models

Consider a sequence of discrete learning models {(LSn , LS′n )}n>0 .


From this point on, we denote the points of time bounding the
learning periods of LSn by {tn,i }i>0 ,
′ }
those of LS′n by {tn,i i>0 ,
the lengths of the learning periods of LSn by {∆n,i }i>0 ,
those of LS′n by {∆′n,i }i>0 ,
and the knowledge values—which are shared by LSn and LS′n , as we
have assumed—by {bn,i }i>0 .
Descriptive HDM: Limit of discrete learning models, pt. 2

We say that the sequence of discrete learning models {(LSn , LS′n )}n>0
converges to a continuous learning model (L1 , L2 ) if the following
conditions hold.
The sequence of functions {LSn }n>0 monotonically converges to L1 in
a way such that L1 (tn,i ) = bn,i for all n and i.
The difficulty values of tasks of the first type are given by the same
distribution µ1 for L1 and all LSn .
The sequence of functions {LS′n }n>0 monotonically converges to L2 in
′ )=b
a way such that L2 (tn,i n,i for all n and i.
The difficulty values of tasks of the second type are given by the same
distribution µ2 for L2 and all LS′n .
The same payoff functions fa (·) are shared by (L1 , L2 ) and all
(LSn , LS′n ).
Descriptive HDM: Differences of the two task types

First, tasks of the first type have positive probability p on the event
that a = ∞,
i,e., that the task—due to its impossibility—will never learn to
completion and always give a marginal payoff of f∞ (b) = 0;
tasks of the second type do not.
Second, imitation learning is faster than innovation learning.
In the continuous learning model (L1 , L2 ), the latter difference
corresponds to the following condition.
Assumption C1 We have L1 (t) ≤ L2 (t) for all t ∈ R≥0 .
When a sequence of discrete learning models {(LSn , LS′n )}n>0
converges to the continuous learning model (L1 , L2 ),
the above condition on (L1 , L2 ) follows from an analogous one applied
to (LSn , LS′n ).
Assumption D1 We have ∆n,i ≥ ∆′n,i for all n and i.
Descriptive HDM: Information set

Recall our assumption that in the discrete learning model (LSn , LS′n ),
the knowledge jumps from bn,i−1 to bn,i that occur during innovation
learning are indistinguishable form those that occur during imitation
learning.
Thus, the only way to tell the two types of tasks apart are by observing
whether the amount of time spent in the ith learning period is ∆n,i
or the possibly smaller value ∆′n,i .
Furthermore, this can only be done when these two values, ∆n,i and
∆′n,i , are distinct.
Descriptive HDM: Information set, pt. 2

In the approximating continuous learning model (L1 , L2 ), the DM’s


marginal information set is comprised of:
her level of knowledge b on her current task,
whether b has reached a and thus stopped increasing,
and whether she has ruled out the possibility that j = 1 or j = 2.
The latter will be subject to the constraint detailed in Assumption C2
(to come).
Descriptive HDM: Action space

It remains to define the DM’s marginal action space in the


continuous learning model (L1 , L2 ).
The DM maximizes her expected payoff within the continuous learning
game’s marginal action space—which only pertains to quitting—
and this payoff determines the main term (as n → ∞) of her optimal
profit in the approximated discrete learning model (LSn , LS′n ).
We will later define the discrete learning model’s finer action
space—containing strategic considerations other than quitting—which
determines the error term
and explains the evolutionary benefit of both
1) having an accurate innate estimate of expected marginal profit
2) and ruling out the possibility that the current task is of the first
type at the latest possible moment: right before quitting.
Descriptive HDM: Action space, pt. 2

As we have stated, the DM’s marginal action space in the continuous


learning model (L1 , L2 ) is comprised of two options:
continuing to learn the current task (the default option, which in
particular is always taken when b has reached a)
and quitting for an opportunity-cost task (which resets b to zero).
Whenever the latter action is taken, as well as in the very beginning, a
task is drawn from the true distribution µ of tasks,
each of which is comprised of a type j ∈ {1, 2} and a difficulty value
a ∈ R>0 .
When a task is drawn from µ, the value of j is first determined: j = 1
with probability 1 − q and j = 2 with probability q.
Then, the value of a is drawn from the distribution µj of tasks of the
jth type.
The DM begins learning every task without knowing the current task’s
state j.
Descriptive HDM: Action space, pt. 3
We introduce the following constraint to identifying j and quitting
that are exogenous to the continuous learning model,
but will be endogenously justified for the discrete learning model that
it approximates.
Assumption C2: To identify j, the DM must commit to quitting the
task if j = 1 and not quitting if j = 2 immediately afterwards. She can
only perform this identification when her level of knowledge b is ≥ β
for a fixed β > 0.
We qualitatively describe the DM’s optimal quitting strategy when all
tasks are of the second type (q = 1), of the first type (q = 0), or are
nontrivially divided between the two types (0 < q < 1).
When q = 1, the DM never quits. When q = 0, the DM always quits
at a finite level of knowledge.
Finally, when 0 < q < 1, the DM never quits tasks of the second type,
and quits tasks of the first type at a finite level of knowledge.
The constraint imposed by Assumption C2 plays a vital role in
determining the optimal quitting strategy in the latter case.
Descriptive HDM: Summary of continuous learning model

Figure: Continuous learning model (L1 , L2 ), a limit of discrete learning models


(LSn , LS′n ), as an optimal stopping game against nature

The DM knows b, as well as whether b < a or


Information space
b = a.
The DM’s sole action is quitting, which can be
Action space
done at any level of knowledge b ≥ β.
Payoff The DM obtains a flow payoff of δ t fa (b)dt.
Assumption C1 We have L1 (t) ≤ L2 (t) for all t ∈ R≥0 .
To identify j, the DM must commit to quitting the
task if j = 1 and not quitting if j = 2 immediately
Assumption C2
afterwards. She can only perform this identification
when her level of knowledge b is ≥ β.
··· ···
Descriptive HDM: No tasks of infinite difficulty

Suppose that all tasks are of the second type (q = 0),


i.e., that there are no tasks of infinite difficulty, a = ∞.
Assumption C3: The payoff functions fa (·) satisfy

fb+m (b) < fb′ +m (b ′ )

for all b < b ′ and m > 0.


This axiom is reasonable because a fixed amount of knowledge m
constitutes a larger fraction of total knowledge of an easy task than of
an difficult task;
consequently, not knowing it causes a harsher penalty in the former
case.
Descriptive HDM: No tasks of infinite difficulty, pt. 2

It is evident that Assumption C3 implies the first-order stochastic


dominance of being at a level of knowledge b that has not yet caught
up to the true task difficulty a,
over being at a level of knowledge b < b that has also not yet caught
up to a.
As a consequence, we deduce two conclusions when the distribution
µ2 of task difficulties is exponential.
First, define the expected payoff function
R
fa (b)dµ(a)
g (b) = a>bR ,
a>b dµ(a)

the expected value of the marginal payoff at level of knowledge b that


has not yet caught up to a.
The expected payoff function is monotonic.
Descriptive HDM: No tasks of infinite difficulty, pt. 3

Second, define the quitting value function V (b) by the expected total
profit from quitting at level of knowledge b ∈ R≥β ∪ {∞} that has
not yet caught up to a.
i.e. the solution to
L−1
!
Z b Z
2 (a)
Z ∞
t t
V = δ fa (L2 (t)) + δ dt dµ(a)
0 0 L−1
2 (a)

L−1
!
Z ∞ Z
2 (b)
t L−1
2 (b)
+ δ fa (L2 (t))dt + δ V dµ(a).
b 0

Then, V (b) is also monotonic.


In particular, V (b) is maximized at b = ∞.
Descriptive HDM: No tasks of infinite difficulty, pt. 4

Theorem
Suppose q = 1 and µ = µ2 is an exponential distribution of decay factor
η ∈ (0, 1). Under Assumption C3, the following are true.
1 The quitting value function V (b) is strictly increasing. In particular,
b ∗ = ∞ is the unique quitting point maximizing V (·).
2 The expected payoff function g (b) is strictly increasing.

Proof. The conditional probability distributions µ2 |a>b are naturally


isomorphic for all b (memoryless property);
all are exponential distributions of the same decay factor η.
Assumption C3 implies that for a DM whose learning has not
completed yet, the conditional distribution of payoffs at knowledge b is
first-order stochastic dominated by that at knowledge b > b.
As a result, the DM—conditional on b not having yet reached
a—would always prefer to be at a greater level b of knowledge, and
she would never reset b to zero by quitting. Q.E.D.
Descriptive HDM: Memoryless property

A random variable X is said to have the memoryless property if


P(X > a + b|X > b) = P(X > a).
Used to model situations where the “waiting time” until a certain
event occurs does not depend on how much time has passed so far.
e.g., Let X be the time of death. Memorylessness of X entails the
property that from any temporal reference point (0 years old, 50 years
old, 100 years old),
the probability of surviving a more years (from that point of time) is
the same.

Exercise
Show that every positive continuous random variable X with the
memoryless property is an exponential distribution.
Descriptive HDM: With tasks of infinite difficulty

Suppose that all tasks are of the first type (q = 1),


i.e., that tasks with infinite difficulty a = ∞ occur with positive
probability.
Specifically, we assume µ = µ1 is the distribution with probability
p ∈ (0, 1) on the event a = ∞ and the remaining probability
distributed exponentially on R>0 with decay factor η.
Consider the following two differential conditions:
Descriptive HDM: Differential conditions
Assumption C4: The learning function L1 is C 1 and satisfies
L′1 (t) ≪ η −L1 (t) as t → ∞.
The continuous differentiability of L1 guarantees the continuous
differentiability of the quitting value function V (b), which in our
current case is given by the solution to
Z b Z L−1 (a) Z ∞ !
1
V = δ t fa (L1 (t)) + δ t dt dµ(a)
0 0 L−1
1 (a)

L−1
!
1 (b)
Z Z
L−1
+ δ t fa (L1 (t))dt + δ 1 (b) V dµ(a).
a>b 0

The asymptotic upper bound on the derivative implies that the


introduction of infinite-difficulty tasks changes
the expected payoff function g (b), which previously was monotonically
increasing, to go to zero as b → ∞;
and b = ∞, which previously was the unique optimal quitting point,
to be suboptimal.
Descriptive HDM: Differential conditions, pt. 2

Another differential condition—which constrains the derivatives of the


payoff functions fa (·) from being uniformly too large—
guarantees that the expected payoff function g (b) is monotonically
decreasing for all sufficiently large b.
Assumption C5: The payoff functions fa (·) are C 1 and satisfy
Z  

fa (b) dµ(a) ≪ η b
a>b ∂b

as b → ∞.
Then, we have the following non-monotonicity:
Descriptive HDM: Non-monotonicity

Theorem
Suppose q = 0 and µ = µ1 is the distribution with probability p > 0 on the
event a = ∞ and the remaining probability distributed exponentially on
R>0 with decay factor η ∈ (0, 1).
1 Under Assumption C4, the derivative of the quitting value function,
V ′ (b), is negative for all sufficiently large b. In particular, the one or
more quitting points b ∗ that maximize V (·) are finite.
2 The expected payoff function g (b) converges to zero as b → ∞. In
particular, g (·) attains its maximum at one or more finite points b.
3 Under Assumption C5, g is C 1 such that g ′ (b) is negative for all
sufficiently large b.
Descriptive HDM: Non-monotonicity, pt. 2
To prove part (1), we show that the derivative

1
V ′ (b) =  2
−1
L1 (b) dµ(a)
R
1− a>b δ
−1
!

δ L1 (b) fa (b)
 Z  Z Z
L−1
1 (b) t
· 1− δ dµ(a) µ(b) δ dt + dµ(a)
a>b L−1
1 (b) a>b L′1 (L−1
1 (b))
−1
!
δ L1 (b) log 1δ
Z
−1
− µ(b)δ L1 (b)
+ ′ −1
dµ(a)
a>b L1 (L1 (b))
L−1
!
Z b Z
1 (a)
Z ∞
· δ t fa (L1 (t)) + δ t dt dµ(a)
0 0 L−1
1 (a)

L−1
! !!
1 (b)
Z Z
+ δ t fa (L1 (t))dt dµ(a)
a>b 0

is negative for all sufficiently large b.


Descriptive HDM: Non-monotonicity, pt. 3

Indeed, the distribution of


Z
dµ(a)
a>b

becomes dominated by the point mass on the event a = ∞ as b → ∞.


Using Assumption C4 to bound the error term, it follows that
−1
′ δ L1 (b)
V (b) < −κ  2
−1
L1 (b) dµ(a) L′1 (L−1
R
1− a>b δ 1 (b))

for all sufficiently large b (where κ > 0 is some constant).


Part 2 follows from the fact that the conditional distribution µ1 |a>b is
dominated by the event a = ∞ for large b.
Descriptive HDM: Non-monotonicity, pt. 4
To prove part 3, we show that the derivative
1
gu′ (b) =  R 2
p+ a>b,a̸=∞ dµ(a)
 Z  Z   

· p+ dµ(a) −µ(b) + fa (b) dµ(a)
a>b,a̸=∞ a>b,a̸=∞ ∂b
Z !
+ µ(b) fa (b)dµ(a)
a>b,a̸=∞

is negative for all sufficiently large b.


Using Assumption C5 to bound the error term, we find that
ηb
g ′ (b) < −κ  R 2
p+ a>b,a̸=∞ dµ(a)

for all sufficiently large b (where κ > 0 is some constant). Q.E.D.


Descriptive HDM: Non-monotonicity, pt. 5
Allowing impossible tasks to occur with positive probability prevents
the expected payoff function from being monotonically increasing,
which was the case without impossible tasks.
Also, the optimal quitting point(s) b ∗ must be finite. (These were the
statements of the previous theorem.)
In fact, they can be made arbitrarily large in our continuous learning
model by suitably shrinking the probability mass on the event a = ∞.
Colloquially, it is feasible that the DM does not quit soon, for
arbitrarily generous notions of “soon.”

Corollary
In the setting of the previous theorem, fix all choices except that of p. For
every γ ≥ β ≥ 0, there exist a choice of p such that any optimal quitting
point b ∗ for which the event a = ∞ has probability p satisfies

b ∗ ≥ γ.
Descriptive HDM: DM does not quit soon, pt. 1

Proof. Let µp be the distribution that has probability mass p on


a = ∞ and probability mass 1 − p distributed exponentially on R>0
with a fixed decay factor η ∈ (0, 1).
Let Vµp (·) denote the quitting value function (of the previous
theorem’s game) for which the event a = ∞ has probability p.
Note that Vµ0 is a continuously differentiable function that is strictly
increasing.
Also, the continuously differentiable functions Vµ 1 for n ∈ {2, 3, . . . , }
n
pointwise converge to Vµ0 .
Since |Vµ′ 1 (b)| is uniformly bounded over all n and b ∈ [β, γ + 1],
n
applying the Arzelà–Ascoli theorem demonstrates that the
convergence of Vµ 1 to Vµ0 on [β, γ + 1] is uniform on a subsequence.
n
Descriptive HDM: DM does not quit soon, pt. 2
Let ε = Vµ0 (γ + 1) − Vµ0 (γ).
By uniform convergence, there exists N such that for all n ≥ N in our
subsequence, we simultaneously have
ε
|Vµ 1 (γ + 1) − Vµ0 (γ + 1)| <
n 2
and
ε
|Vµ 1 (b) − Vµ0 (b)| <
n 2
for all b ∈ [β, γ].
Since
Vµ0 (b) ≤ Vµ0 (γ) < Vµ0 (γ + 1)
for all b ∈ [β, γ] and the last inequality has a difference of ε, it follows
that for all n ≥ N in our subsequence,
Vµ 1 (b) < Vµ 1 (γ + 1)
n n

for all b ∈ [β, γ].


In particular, no b ∈ [β, γ] maximizes Vµ 1 . Q.E.D.
n
Descriptive HDM: Mixed case

We now consider the mixed case (0 < q < 1) with both innovation
and imitation learning,
which we offer as an approximation of a realistic learning model in the
setting of unobserved payoffs.
We assume that µ = (1 − q)µ1 + qµ2 for q ∈ (0, 1), where µ1 and µ2
are of the form defined previously.
(We have abused notation here, because our probability distributions
µj were defined on the domains of difficulty values a: not on the
domains of tasks (j, a), which also include the data of the type j.)
We can repeat our previous analysis to show that the DM always
prefers being at a positive level of knowledge b (that has not yet
caught up to a) for a task of the second type,
to being at zero knowledge for a task of the second type,
which is—by Assumption C1—always preferred to being at zero
knowledge for a task of the first type.
Descriptive HDM: Mixed case, pt. 2

Thus, any optimal quitting strategy must be of the following form:


never quit tasks of the second type,
and quit tasks of the first type only when the level of knowledge b
reaches some value b ∗ ∈ (0, ∞) ∪ {∞} before catching up to a.
Denote this by the quitting strategy b ∗ .
Recall the following content of Assumption C2:
when employing quitting strategy b ∗ , the DM has not ruled out either
j = 1 or j = 2 when b < b ∗ ;
but when b = b ∗ , she finds out the type of the task, quits if j = 1,
and does not quit if j = 2.
Descriptive HDM: Mixed case, pt. 3

Define the conditional quitting value functions V1 (b) and V2 (b) by the
expected total payoff from quitting at level of knowledge b that has
not yet caught up to a,
conditional on the task being of type j = 1 or j = 2, respectively.
Also, define the unconditional quitting value function
Vu (b) = (1 − q)V1 (b) + qV2 (∞) by the expected total payoff from
employing quitting strategy b:
that of quitting tasks of the first type at level of knowledge b that has
not yet caught up to a.
Descriptive HDM: Mixed case, pt. 4
Define the conditional expected payoff functions g1 (b) and g2 (b) by
the expected marginal payoff at payoff of knowledge b that has not yet
caught up to a, conditional on the task being of type j = 1 or j = 2,
respectively.
Also, define the unconditional expected payoff function
gu (b) = (1 − q)g1 (b) + qg2 (b) by the expected marginal payoff at
level of knowledge b that has not yet caught up to a, unconditional on
the task type.
Distinguish the latter from the true expected payoff function gb∗ (b)
associated to quitting strategy b ∗ , defined by the piecewise function
given by gu (b) if b < b ∗ and g2 (b) if b ≥ b ∗ .
Theorem
Suppose 0 < q < 1 and µ = (1 − q)µ1 + qµ2 , where µ1 is the distribution
with probability p > 0 on the event a = ∞ and the remaining probability
distributed exponentially on R>0 with decay factor η ∈ (0, 1), and µ2 is the
exponential distribution of decay factor η...
Descriptive HDM: Desired non-monotonicity

Theorem
...Under Assumptions C1 and C2, the following are true.
1 The conditional quitting value function V2 (b) is strictly increasing
under Assumption C3, while under Assumption C4, Vu (b) has
negative derivative for all sufficiently large b. In particular, the one or
more quitting strategies b ∗ maximizing

max Vu (b),
b∈R≥β ∪{∞}

are finite under Assumption C4.


2 The conditional expected payoff function g2 (b) is strictly increasing
under Assumption C3, while the unconditional expected payoff
function gu (b) converges to zero as b → ∞.
3 Under Assumption C5, gu (b) is decreasing for all sufficiently large b.
Descriptive HDM: Desired non-monotonicity, pt. 2

Exercise
Prove this theorem. (Note: The proof can be found in Park 2020, and you
can use it without citing it. But please write the solution in your own
words.)

Note that if β = 0, then:


the optimal quitting point would always be b ∗ = 0,
i.e., only tasks of the second type are ever learned, and the associated
true expected payoff function is monotonically increasing.
Consequently, a non-monotonic true expected payoff function gb∗ (b)
requires that β > 0.
Descriptive HDM: DM still does not quit soon

Corollary
Retain the setting of the previous theorem and fix all choices except those
of p and q. For every γ ≥ β > 0, there exist choices of p and q such that
any optimal quitting point b ∗ for which tasks of the first type have
difficulty a = ∞ with probability p and tasks of the second type occur with
probability q satisfies
b ∗ ≥ γ.

Exercise
Prove this corollary. (Note: The proof can be found in Park 2020, and you
can use it without citing it. But please write the solution in your own
words.)
Descriptive HDM: Summary of CLM, pt. 2

The payoff functions fa (·) satisfy fb+m (b) <


fb′ +m (b ′ ) for all b < b ′ and m > 0. Colloquially, a
fixed amount of knowledge m constitutes a larger
Assumption C3
fraction of total knowledge of an easy task than
of a difficult task, so ignorance causes a harsher
penalty in the former case.
The learning function L1 is C 1 and satisfies
Assumption C4 L′1 (t) ≪ η −L1 (t) as t → ∞. Colloquially, learn-
ing does not occur at too high a rate.
The payoff functions fa (·) are C 1 and satisfy
Z  

fa (b) dµ(a) ≪ η b
Assumption C5 a>b ∂b

as b → ∞. Colloquially, the first derivatives of the


payoff functions are well-behaved.
Descriptive HDM: Continuous learning model

In summary, for any optimal quitting strategy b ∗ ,


which can be made arbitrarily late by a suitable choice of model
parameters,
the associated true expected payoff function gb∗ (b) is a piecewise
defined function that is given by an increasing function, g2 (b), for
b ≥ b∗ ;
and by a function that is eventually decreasing, gu (b), for b < b ∗ .
Descriptive HDM: Discrete learning model

Regarding the approximation of a fine discrete learning model with a


continuous learning model, two issues remain.
First, we have not yet justified the exogenous Assumption C2.
Second, the continuous learning model does not explain the
evolutionary benefit of innately possessing a then-accurate estimate of
expected marginal payoff.
In this section, we simultaneously address both these issues by
introducing side opportunities that provide positive, but negligible
payoffs;
and a less negligible, endogenous cost to identifying the type j of the
current task by a time-measurement experiment.
Our overall story then holds for a sufficiently fine discrete learning
model in a sequence of discrete learning models that converges to a
continuous learning model.
Descriptive HDM: Main task vs. side opportunities

Evidence shows that even in hunter-gatherer societies like those of the


evolutionary past, humans take on highly specialized roles, which
require copious amounts of experience and knowledge (Hooper et al.,
2016).
These specializations can be thought of as analogous to modern
workers’ full-time jobs.
However, hunter-gatherers also face incentives to be opportunistic: to
accurately appraise—and based on the result of said appraisal, possibly
procure—additional foraging opportunities as they arise (Bird-David,
1992).
These opportunities can be thought of as analogous to the side gigs
that modern workers do in addition to their jobs.
It is thus realistic for our learning model, which in its continuous form
only considered the tasks analogous to full-time jobs,
to also consider unrelated opportunities analogous to side gigs.
Descriptive HDM: Main task vs. side opportunities, pt. 2

To the discrete form of our learning model, we proceed to add side


opportunities that require accurate estimation of payoffs from the
default foraging task to optimally exploit—taking such an opportunity
only when its marginal payoff exceeds the expected marginal payoff of
the default foraging method—
but whose payoff values decay to zero in the continuous limit.
i.e., The continuous approximation (L1 , L2 ) of nature’s discrete
learning model (LSn , LS′n ) creates a lexicographic evolutionary objective.
Payoffs from the default foraging method comprise the Θ(1) main
term, and payoffs from side opportunities comprise the o(1) error term.
The quitting strategy affects the main term, and is thus of primary
importance.
The strategy of which side opportunities should be taken in lieu of the
default option does not affect the main term; it only affects the error
term, and is thus of secondary importance.
Descriptive HDM: Main task vs. side opportunities, pt. 3
Specifically, we suppose that for every discrete learning model in our
sequence of (LSn , LS′n ) converging to (L1 , L2 ),
the ith learning period contains a side opportunity whose procurement
requires an expected fraction Rn,i ∈ (0, 1) of the learning period’s time
(taking into account the time-discounting).
The DM can choose whether or not to devote an expected fraction
Rn,i of the learning period’s time to the side opportunity’s marginal
payoff P, drawn from some random distribution whose support is [0, 1].
In other words, the DM chooses between the default expected payoff
Z ti
fa (min(LS (ti−1 ), a)) δ t dt,
ti−1

and its altered form,


!Z
tn,i
Rn,i P + (1 − Rn,i ) fa (min(bn,i−1 , a)) δ t dt,
tn,i−1

that results from taking the side opportunity.


Descriptive HDM: Main task vs. side opportunities, pt. 4
Then, an accurate mental estimate g (·) of the expected marginal
payoff from the default foraging method allows for the maximization
of additional payoffs from side opportunities.
Indeed, the DM should choose to forgo the side opportunity if
g (bn,i−1 ) < P and choose not to do so if g (bn,i−1 ) > P.
The following assumption endogenizes the above discussion that side
opportunities provide positive, but negligible additional payoffs.
In the continuous limit, the total additional payoff decays to zero and
thus ceases to factor into the optimization problem

max Vu (b)
b∈R≥β ∪{∞}

of the quitting strategy.


Assumption D2: As n → ∞, the expected fraction Rn,i of the ith
learning period’s time that can be used for a side opportunity is
uniformly o(1).
Descriptive HDM: Time-measurement experiment

It remains to endogenize Assumption C2 with an explanation of why


the DM switches priors—from j ∈ {1, 2} to either j = 1 or
j = 2—only just before quitting at the optimal level of knowledge b ∗ .
Consider our sequence of discrete learning models {(LSn , LS′n )}n>0
satisfying Assumption D1 that converges to the discrete learning
model (L1 , L2 ).
The lengths ∆n,i ≥ ∆′n,i of the associated learning periods are either
equal or distinct.
In the former case, ∆n,i = ∆′n,i , the DM has no way of distinguishing
between the possibilities j = 1 and j = 2 within that learning period.
Thus, the following assumption, a revision of Assumption D1,
endogenizes the part of Assumption C2 that constrains quitting to
only occur after a fixed level of knowledge β > 0.
Assumption D1*: For all i such that bn,i−1 < β, we have
∆n,i = ∆′n,i . For all other i, we have ∆n,i > ∆′n,i .
Descriptive HDM: Time-measurement experiment, pt. 2

Assumption D1*: For all i such that bn,i−1 < β, we have


∆n,i = ∆′n,i . For all other i, we have ∆n,i > ∆′n,i .
This assumption is consistent with the finding of Sanchez and
Dunning (2020) that each progressive trial in their experiment
decreased in length of time.
We hypothesize that this decrease occurs faster for innovation learning
than it does for imitation learning,
so that while the lengths of time ∆n,i and ∆′n,i are indistinguishable in
the beginning, they decrease and eventually branch off from one
another as i → ∞.
As a result, the DM can keep track of the amount of time it takes her
to traverse the ith learning period,
and depending on whether that amount is ∆n,i or ∆′n,i , she can
conclude that j = 1 or j = 2, respectively.
Descriptive HDM: Time-measurement experiment, pt. 3

If the DM incurs no cost in performing this time-measurement


experiment, then she would do so at the earliest possible time: at the
beginning of the ith learning period for the smallest i such that
∆n,i > ∆′n,i .
This is consistent with our story if b ∗ = β, i.e., if the DM quits at the
earliest possible quitting point.
However, in the more feasible case that b ∗ > β, the DM would
costlessly identify j at the earliest possible level of knowledge, β,
so that the resulting change in the confidence estimate from gu (b) to
g1 (b) or g2 (b) would increase the payoff from side opportunities.
Then, the remaining part of Assumption C2—which states that after
any identification of j, the DM immediately quits if j turns out to be
1—does not hold true.
Thus, we need an additional assumption like the following to
endogenize this condition.
Descriptive HDM: Time-measurement experiment, pt. 4

Assumption D3: For every n, the undiscounted expected cost of a


time-measurement experiment Cn,i during the ith learning period is
uniformly o(1) as n → ∞. However, it is greater than the expected
additional payoff the DM would obtain from side opportunities by
improving her confidence estimate from gu (b) to g1 (b) or g2 (b), over
the remainder of the learning game (accounting for time discounting).
This o(1) cost can be explained by the increased risk of evolutionary
threats—such as ambush by other humans—resulting from
overcommitting attention to the time-measurement experiment.
Descriptive HDM: Time-measurement experiment, pt. 5

The DM quits if j = 1, and otherwise, she changes priors to j = 2 and


continues with the current task.
The time-measurement experiment provides a Θ(1) benefit for an o(1)
cost, so the DM must perform this experiment before reaching the
∗ at which she quits tasks of the first type.
level of knowledge bn,i
Assumption D3 guarantees that the DM does so at the latest possible
∗ .
moment: during the learning period that precedes the jump to bn,i
Indeed, the o(1) cost of the time-measurement experiment (which can
otherwise be of any functional form) is assumed to always be greater
than the o(1) expected increase in payoffs from identifying j (again,
which can otherwise be of any general form):
information that allows the DM to more optimally take
side-opportunity payoffs in lieu of default payoffs, and thus would be
taken as early as possible if it were free.
Descriptive HDM: Summary of discrete learning model

The DM knows b, as well as whether b = bn,i <


a or b = a. She may also have come to possess
Information space
the information j = 1 or j = 2, without which she
operates under the starting belief, j ∈ {1, 2}.
Full payoffs are guaranteed from the main task if
b = a. Otherwise, the DM makes two choices
before each learning period: first, whether to per-
form a time-measurement experiment; and second,
whether to devote part of the learning period to a
Action space side opportunity—whose marginal payoff P, which is
observed, is drawn from a distribution with support
equal to [0, 1]—in lieu of part of the main task’s pay-
off. At the end of the learning period, she obtains
the payoff, the result of the time-measurement ex-
periment if any, and the opportunity to quit.
Descriptive HDM: Summary of discrete learning model, pt. 2

We only write the payoff of the ith learning period


when j = 1, since that of j = 2 is analogous. It is
!Z
tn,i
IRn,i P+(1 − IRn,i ) fa (min(bn,i−1 , a)) δ t dt,
Payoff tn,i−1

where I is equal to 1 if the DM procures the side


opportunity and to 0 otherwise. The cost −δ tn,i Cn,i
is added if she has performed the time measurement.
The payoff functions fa (·) satisfy fb+m (b) < fb′ +m (b ′ )
for all b < b ′ and m > 0. Colloquially, a fixed amount
Assumption C3 of knowledge m constitutes a larger fraction of total
knowledge of an easy task than of a difficult task, so
ignorance causes a harsher penalty in the former case.
Descriptive HDM: Summary of discrete learning model, pt. 3

For all i such that bn,i−1 < β, we have ∆n,i = ∆′n,i .


For all other i, we have ∆n,i > ∆′n,i . Colloquially,
Assumption D1*
the speeds of innovation and imitation learning are
indistinguishable at first, but branch off.
As n → ∞, the expected fraction Rn,i of the ith learn-
ing period’s time that can be used for a side opportu-
Assumption D2
nity is uniformly o(1). Colloquially, side opportunities
comprise nontrivial, but negligible opportunity costs.
Descriptive HDM: Summary of discrete learning model, pt. 4

For every n, the undiscounted expected cost of a time-


measurement experiment Cn,i during the ith learning
period is uniformly o(1) as n → ∞. However, it is
greater than the expected additional payoff the DM
would obtain from side opportunities by improving her
Assumption D3
confidence estimate from gu (b) to g1 (b) or g2 (b),
over the remainder of the learning game (accounting
for time discounting). Colloquially, the only reason
to perform the time-measurement experiment is to
immediately quit tasks of innovation learning.
Descriptive HDM: Putting it all together

In the absence of infinite-difficulty tasks, the introduction of side


opportunities and time-measurement costs is not necessary to prove
for discrete learning models the monotonicity of the expected payoff
function.
Indeed, the proof based on first-order stochastic dominance also
demonstrates that a discrete learning model with only tasks of the
second type has both a monotonically increasing quitting value
function and a monotonically increasing expected marginal payoff
function.
Specifically, the proof shows that the DM would always prefer being at
a positive level of knowledge b (that has not yet caught up to a) for a
task of the second type,
to being at zero knowledge for a task of the second type,
which is preferred to being at zero knowledge or a task of the first
type.
Descriptive HDM: Putting it all together, pt. 2
Thus, any optimal quitting strategy of a discrete learning model
(LSn , LS′n ) must be to only quit in the following situation:
when the level of knowledge b reaches some bn,i−1 —which is allowed
to be infinite—before catching up to a, perform a time-measurement
experiment;
if the type is j = 2, never quit;
and if the type is j = 1, quit at level of knowledge bn,i , the start of the
ith learning period. Denote this by the discrete quitting strategy bn,i .
Define the discrete quitting value function
Vn : {bn,i : β ≤ bn,i } → R≥0 by the expected total payoff from
employing the discrete quitting strategy bn,i .
Recall that we have denoted its counterpart, associated with the
continuous learning model (L1 , L2 ), by Vu ;
and the expected payoff functions (conditional on b not having caught
up to a), g1 , g2 , and gu .
The latter are also the expected marginal payoff functions of the
discrete learning models (LSn , LS′n ).
Descriptive HDM: Putting it all together, pt. 3
To recap our hypothesis, side opportunities incentivized humans of the
evolutionary past to estimate the expected marginal payoff from their
primary foraging method as a function of their level of knowledge on it.
Furthermore, when the speeds of innovation learning and imitation
learning branch off, a time-measurement experiment can differentiate
between the two types of tasks.
However, the evolutionary cost of performing this experiment made it
suboptimal to do so until the last possible moment: the intermediate
learning period after which tasks of innovation learning should be quit.
Due to the negligibility of these factors, the discrete learning model is
well-approximated by a continuous learning model.
Theorem
Suppose we have a sequence of discrete learning models
{(LSn , LS′n )}n>0 —all having task distribution µ of the form in the previous
theorem; satisfying Assumptions D1*, D2, and D3; and sharing payoff
functions fa (·) satisfying Assumption C3—that converges to the continuous
learning model (L , L )...
Descriptive HDM: Putting it all together, pt. 4

Theorem (continued)
1 When faced with a side opportunity of marginal value P, the DM
takes it if P is greater than

1
 if bn,i has caught up to a,
gu (bn,i ) if bn,i has not caught up to a, and j is unknown,

g2 (bn,i ) if bn,i has not caught up to a, and j is known to be j = 2;

leaves it if the reverse inequality holds; and is indifferent between the


two options if equality holds.
2 For every ε > 0, the discrete quitting value function Vn is ε-close in
sup norm to Vu for all sufficiently large n. In particular, if (L1 , L2 )
satisfies Assumption C4, then for all sufficiently large n, every optimal
∗ maximizing V (b ) is finite.
discrete quitting strategy bn,i n n,i
Descriptive HDM: Putting it all together, pt. 5

We hypothesize that human learners, when informationally lacking a


Bayesian estimate of future payoffs, default to an innate estimate
(
∗ ,
gu (bn,i ) if bn,i < bn,i
bn,i 7→ ∗ ,
g2 (bn,i ) if bn,i ≥ bn,i

associated to an optimal discrete quitting strategy bn,i∗ of the

evolutionary past’s learning environment (LSn , LS′n ).


Recall from the previous section that g2 is monotonically increasing,
while gu eventually decays to zero;
in fact, the latter is eventually monotonically decreasing under
Assumption C5.
Thus, the aforementioned piecewise function can be non-monotonic in
the desired way—general increase except for an intermediate period of
decrease—depending on the choice of model parameters.
Descriptive HDM: Proof

Proof. Only part 2 requires proof.


Choose N large enough that the expected payoff deviation due to the
procurement of side opportunities in the discrete learning model
(LSn , LS′n ) is less than ε/3 for all n ≥ N, which is possible due to
Assumption D2.
By possibly making N larger, the expected payoff deviation due to
time-measurement experiments in the discrete learning model
(LSn , LS′n ) is less than ε/3 for all n ≥ N,
where we have used the hypothesis of uniform o(1) decay in
Assumption D3 and the quitting constraint b ≥ β implied by
Assumption D1*.
Descriptive HDM: Proof, pt. 2

Furthermore, by possibly making N even larger, the difference between


the expected total payoff of the discrete learning model
(LSn , LS′n )—henceforward excluding deviations due to side
opportunities and time-measurement costs—
and that of its approximating continuous learning model (L1 , L2 ) is
less than ε/3 for all n ≥ N.
For this purpose, we may as well assume that in each discrete learning
model (LSn , LS′n ), the payoff of the ith learning period,
Z tn,i
fa (min(bn,i−1 , a)) δ t dt,
tn,i−1

is obtained as a flow payoff of δ t fa (min(bn,i−1 , a)) dt.


Using this equivalence, we can “connect the dots” of Vn in the
following sense.
Descriptive HDM: Proof, pt. 3
Define the unconditional quitting value function Vu,n (b) of this
flow-payoff game analogously to Vu (b) of the continuous learning
model’s flow-payoff game, by the expected total payoff of quitting
tasks of the first type at time L−1
1 (b) unless learning has completed by
then.
In other words,
Vu,n (b) = V1,n (b) + V2,n (∞),
where V2,n (∞) is the total payoff conditional on j = 2, calculated in
terms of the discrete learning function LS′n (t);
and V1,n (b), the total payoff conditional on j = 1, is defined by the
solution to
Z b Z L−1 (a) Z ∞ !
1
V = δ t fa (LSn (t)) + δ t dt dµ(a)
0 0 L−1
1 (a)

L−1
!
1 (b)
Z Z
t L−1
1 (b)
+ δ fa (LSn (t))dt + δ ((1 − q)V + qV2,n (∞))
a>b 0
Descriptive HDM: Proof, pt. 4

Then, {Vu,n }n>0 is a sequence of continuous functions on the


compactified space [β, ∞) ∪ {∞} monotonically converging to
another continuous function, Vu .
Thus, this convergence is uniform by Dini’s theorem.
In particular, we have
ε
sup |Vu (bn,i ) − Vu,n (bn,i )| ≤ sup |Vu (b) − Vu,n (b)| <
bn,i ≥β b≥β 3

for all sufficiently large n, as desired.


Our overall result then follows from the triangle inequality. Q.E.D.
Descriptive HDM: Discussion, pt. 1

We have constructed a quite general family of learning models whose


fitness-maximizing strategies can simultaneously exhibit the following
desired properties.
Underinference: The expected marginal payoff function does not
meaningfully vary across different sequences of realizations of
high-uncertainty payoff lotteries, and only varies with the length of the
sequence.
Non-monotonicity: It first increases, then decreases, and finally reverts
back to increasing.
Situational effectiveness of marginal educational interventions:
Quitting due to the lack of meaningful teaching occurs at a unique
intermediate level of knowledge.
The dichotomy between innovation and imitation
learning—specifically, the possibility of infinitely difficult tasks—is
necessary for non-monotonicity.
Descriptive HDM: Discussion, pt. 2
To account for cognitive biases, policymakers and market designers
often estimate a person’s rational valuation of a good, service, or
resource—the Bayesian estimate of value added—from her revealed
valuation, by using a structural behavioral model.
Doing so requires that the latter comprise a sufficient statistic for the
former.
However, we have shown that human decision-making may be
optimized for an outdated environment, in which a priori different
situations yielded the same expected payoff and might as well have
been considered equivalent.
Consider, for example, a situation in which effectively obtaining value
from an auctioned good, service, or resource requires expertise.
Due to the anachronistic non-monotonicity of confidence, two people
with differing levels of said expertise—and accordingly, differing
rational valuations of the auctioned object—may nevertheless estimate
it at the same level and value said object at the same price: even in a
Vickrey auction.
Descriptive HDM: Discussion, pt. 3
We propose three solutions to this problem.
First, making static mechanisms dynamic may help differentiate the
types of people with otherwise indistinguishable revealed preferences.
(To illustrate, the aforementioned two auction consumers may be
distinguishable by their histories of past bids for the object if they have
been enhancing their expertise over time.)
Second, researchers of human decision-making can search for new
psychological motives in order to add the necessary dimensions to
complete a sufficient statistic for people’s rational valuations, which
would allow for their structural estimation.
Finally, an educational intervention of statistical training may help
people in modern learning environments—for which retaining payoff
observations is often much less costly than it was for past learning
environments—better incorporate payoff observations of high
uncertainty into their valuations, such as by keeping track of the mean
payoff.
Concluding remarks

In the social sciences, there are a lot of questions that we do not know
how to satisfactorily answer yet.
An important bar to clear is the Lucas critique:
“Given that the structure of an econometric model consists of optimal
decision rules of economic agents, and that optimal decision rules vary
systematically with changes in the structure of series relevant to the
decision maker, it follows that any change in policy will systematically
alter the structure of econometric models.” (Lucas 1976)
Generally, in order to extrapolate from a model, it should not be just
based on statistical analysis;
it should necessarily take into account the “true” parameters that
determine the behavior of the model’s constituents (humans, particles,
etc.).
Concluding remarks, pt. 2
“It turns out that saying ‘high inflation has always been correlated with
low unemployment, so we can tackle unemployment by accepting
higher inflation’ is a bit like saying ‘Fort Knox has never been robbed,
so we can save money by sacking the guards.’ You can’t look just at
the empirical data, you need also to think about incentives. High
inflation has tended to mean lower unemployment because employers
and job seekers expected a particular rate of inflation and were
occasionally surprised by a surge in prices; employers mistook it for
increased demand and tried to hire workers, and workers thought that
they were being offered higher real wages. Both were mistaken: in
fact, what was really happening was that the economy was suffering
unexpected inflation and they’d been slow to notice. The problem was
that people wouldn’t keep on being surprised by inflation if
policymakers, beguiled by the Phillips curve, kept deliberately creating
inflation with the aim of suppressing unemployment. Nobody would be
fooled; they would see the inflation coming from a mile off. Inflation
would rise but unemployment would not fall.” (Harford 2013)
Concluding remarks, pt. 3

Dynamic stochastic general equilibrium (DSGE) models attempt to


model the macroeconomy while clearing the Lucas-critique bar.
“...a Lucas-robust DSGE model has the potential to make its wielders
a LOT of money.
This is especially true in the current environment, where correlations
are high and macro events have become much more important to
investors’ performance.
But not necessarily. Being Lucas-robust is necessary for making
optimal policy-contingent forecasts, but it is not sufficient.
You also need the model to be a good model of the economy.
If your parameters are all structural, but you’ve assumed the wrong
microfoundations, then your model will make bad predictions even
though it’s Lucas-robust...
Concluding remarks, pt. 4

DSGE models...have failed a key test of usefulness. Their main selling


point - satisfying the Lucas critique - should make them very valuable
to industry. But industry shuns them.
Many economic technologies pass the industry test. Companies pay
people lots of money to use auction theory. Companies pay people lots
of money to use vector autoregressions. Companies pay people lots of
money to use matching models.
But companies do not, as far as I can tell, pay people lots of money to
use DSGE to predict the effects of government policy. Maybe this will
change someday, but it’s been 32 years, and no one’s touching these
things.” (Smith 2014)
In a world we do not yet understand...

Accept uncertainty: There is a lot we don’t know yet, and that’s OK.
Treat overconfident claims with skepticism: The world is
uncertain, but the human mind is hard-wired to avoid uncertainty
(Berker et al. 2016).
As a result, there will be many claims made about subjects like the
economy,
and in the absence of a Lucas-robust, experiment-tested model (e.g.,
general relativity, germ theory),
these claims will often be overconfident.
When data has high variance, the human mind may not
meaningfully retain them (Park 2020): Record them while being
careful of statistical bias.
Goal: Find the “true” parameters of human decision-making
whose Lucas-robust model makes consistently accurate
predictions.

You might also like