LectureNotes201b_v9
LectureNotes201b_v9
Important Notice
This material is copyrighted and the author retains all rights. Your use of it is
subject to the following. You may make copies of it for your personal use, but
not for distribution or resale. Any copy must contain this notice page. You are
prohibited from posting or distributing this material electronically.
Faculty interested in assigning all or part of this material in their courses should
contact the author at [email protected] .
Contents
Preface v
I Pricing 1
Purpose 3
2 Linear Tariffs 17
2.1 Elasticity and Linear Pricing . . . . . . . . . . . . . . . . . . . . 21
2.2 Welfare Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 An Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Pass-Through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
II Mechanism Design 73
Purpose 75
i
ii CONTENTS
11 Auctions 133
11.1 Efficient Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11.2 Allocation via Bayesian-Nash Mechanisms . . . . . . . . . . . . . 137
11.3 Common Value Auctions . . . . . . . . . . . . . . . . . . . . . . . 148
11.4 Appendix to Lecture Note 11: Stochastic Orders . . . . . . . . . 155
11.5 Appendix to Lecture Note 11: Affiliation . . . . . . . . . . . . . . 159
Bibliography 223
Index 227
iv CONTENTS
Preface
These lecture notes are intended to supplement the lectures and other materials
for the second half of Economics b at the University of California, Berkeley.
A Word on Notation
Various typographic conventions are used to help guide you through these notes.
Notes in margins:
Text that looks like this is an important definition. On the screen or printed
These denote
using a color printer, such definitions should appear blue.
important
The symbol in the margin denotes a paragraph that may be hard to follow
“takeaways.”
and, thus, requires particularly close attention (not that you should read any of
these notes without paying close attention).
Mathematics Notation
Vectors are typically denoted in bold (e.g., x and w are vectors). In some
instances, when the focus is on the nth element of a vector x, I will write
x = (xn , x−n ). The vector x−n is the subvector formed by removing the nth
element (note its dimension is one less than x’s).
Script letters (e.g., A, B, etc.) generally denote sets. Objects in curly
brackets, { and }, are typically elements of sets. Hence, {a, b, c} is the three-
element set with elements a, b, and c. The notation {x|P} indicates the set of
x satisfying the property P. For instance, {x|x < 2 or x ≥ 3} is the set of all
numbers less than 2 or not less than 3. The usual set notation of ∪ for set union
and ∩ for set intersection will be used. The empty set is denoted ∅.
Intervals can be open, closed, or open at one end and closed at the other:
(a, b) = {x|a < x < b} is open, [a, b] = {x|a ≤ x ≤ b} is closed, (a, b] = {x|a <
x ≤ b} is open to the left and closed to the right, and [a, b) = {x|a ≤ x < b} is
open to the right and closed to the left. The sets (a, ∞) and [a, ∞) are the sets
of all numbers greater than a and all numbers not less than a, respectively.
A function, f , will be indicated by its mapping; that is, f : D → R. The
D is the domain of the function, the set R the range, and the set f (D) ≡
set
y y = f (x) for some x ∈ D is the image of f . A function can be denoted
either by its symbol (e.g., f ); or, when it might be confused for a variable, by
f (·). Note the difference between f (x) and f (·): the former is the value of f
evaluated at x, the latter is the function itself. In some instances, it is easier to
define a function by showing its mapping. Hence, I might write “the function
defined by x 7→ 1 − x2 .”
v
vi Preface
rather than Z Z
··· f (x)dxN · · · dx1 .
X1 XN
Two frequently used abbreviations are lhs for the left-hand side of an ex-
pression and rhs for the right-hand side, as illustrated below:
lhs S rhs .
Other mathematical notation is summarized in Table 1.
1 Recall that a distribution function gives the probability that the realization of the random
variable in question will not exceed the indicated value. That is, if F (·) is a distribution
function, then F (x) is the probability that the realization of the random variable in question
is no greater than x. Some authors refer to a distribution function as a cumulative distribution
function and use the abbreviation cdf for it.
vii
Symbol Meaning
∈ Element of
∀ For all
∃ There exists
· Dot (vector)
P multiplication (i.e.,
x · y = i x i yi )
1
Purpose
If one had to distill economics down to a single-sentence description, one prob-
ably couldn’t do better than describe economics as the study of how prices are
and should be set. This portion of the Lecture Notes is primarily focused on
the normative half of that sentence, how prices should be set, although I hope
it also offers some positive insights as well.
Because I’m less concerned with how prices are set, these notes don’t consider
price setting by the Walrasian auctioneer or other competitive models. Nor is it
concerned with pricing in oligopoly. Our attention will be exclusively on pricing
by a single seller who is not constrained by competitive or strategic pressures
(e.g., a monopolist).
Now, one common way to price is to set a price, p, per unit of the good in
question. So, for instance, I might charge $10 per coffee mug. You can buy as
many or as few coffee mugs as you wish at that price. The revenue I receive
is $10 times the number of mugs you purchase. Or, more generally, at price p
per unit, the revenue from selling x units is px. Because px is the formula for a
line through the origin with slope p, such pricing is called linear pricing. If we
define T to be the function—tariff —that relates quantity purchased to amount
paid seller—so acquiring x units means paying the seller T (x), linear pricing
represents a tariff of the form T (x) = px. Such a tariff is called a
If you think about it, you’ll recognize that linear tariffs are not the only type
of tariffs you see. Generically, tariffs in which revenue is not a linear function of
the amount sold are called nonlinear tariffs.2 An example of nonlinear pricing
would be if I gave a 10% discount if you purchased five or mugs (e.g., revenue
is $10x if x < 5 and $9x if x ≥ 5).
3
4 Purpose
Buyers and Demand
A seller sets prices and buyers respond. To understand how they respond, we
1
need to know what their objectives are. If they are consumers, the standard
assumption is that they wish to maximize utility. If they are firms, the pre-
sumption is they wish to maximize profits.
max u(x)
x
(1.1)
subject to p · x ≤ I ,
5
6 Lecture Note 1: Buyers and Demand
Before solving this problem, recall one can’t add apples and oranges; that is,
there must be agreement in units across the terms being added. The amounts px
and I are in dollars (or whatever the relevant currency is).5 It follows, therefore,
that v(x) must also be a dollar amount.
Solving the optimization program (1.3), we have the first-order condition
v ′ (x) = p . (1.4)
Observe the last term is the benefit (gross of expenditure) the consumer ob-
tains from x units. Utilizing expression (1.3), we see that utility at the utility-
maximizing quantity is Z x
p(q)dq − xp(x) (1.5)
0
plus a constant. The quantity in (1.5) equals the area below the inverse demand
curve and above the price of x. See Figure 1.1. You may also recall that (1.5)
is the formula for consumer surplus (cs).
Another way to think about this is to consider the first unit the individual
purchases. It provides him or her (approximate) benefit v ′ (1) and costs him or
centric as it might at first seem: for all you know I am thinking of Australian, Canadian, or
Singaporean dollars.
1.1 Consumer Demand 7
price ($/unit)
CS
p(x)
p(q)
x units
Figure 1.1: Consumer surplus (CS) at quantity x is the area beneath inverse
demand curve (p(q)) and above inverse demand at x, p(x).
her p. His or her surplus or profit is, thus, v ′ (1) − p. For the second unit the
surplus is v ′ (2) − p. And so forth. Total surplus from x units, where v ′ (x) = p,
is, therefore,
Xx
(v ′ (q) − p) ;
q=1
or, passing to the continuum (i.e., replacing the sum with an integral),
Z x Z x Z x
′
′
v (q) − p dq = v (q)dq − px = p(q)dq − px .
0 0 0
Yet another way to think about this is to recognize that the consumer wishes
to maximize his or her surplus (or profit), which is total benefit, v(x), minus
his or her total expenditure (or cost), px. As always, the solution is found by
equating marginal benefit, v ′ (x), to marginal cost, p.
Bibliographic Note
One of the best treatments of the issues involved in measuring consumer surplus
can be found in Chapter 10 of Varian (1992). This is a good place to go to get
full details on the impact that income effects have on measures of consumer
welfare.
Quasi-linear utility allows us to be correct in using consumer surplus as a
measure of consumer welfare. But even if utility is not quasi-linear, the error
from using consumer surplus instead of the correct measures, compensating or
8 Lecture Note 1: Buyers and Demand
The right-hand side of (1.9) is just the area to the left of the factor demand
curve that’s above price pn . Equivalently, it’s the area below the inverse factor
demand curve and above price pn . The left-hand side is π(pn , p−n )−π(∞, p−n ).
The term π(∞, p−n ) is the firm’s profit if it doesn’t use the nth factor (which
could be zero if production is impossible without the nth factor). Hence, the
left-hand side is the increment in profits that comes from going from being
unable to purchase the nth factor to being able to purchase it at price pn . This
establishes
Proposition 1.1 The area beneath the factor demand curve and above a given
price for that factor is the total net benefit that a firm enjoys from being able to
purchase the factor at that given price.
In other words—as we could with quasi-linear utility—we can use the “con-
sumer” surplus that the firm gets from purchasing a factor at a given price as
the value the firm places on having access to that factor at the given price.
1.3 Demand Aggregation 9
Observation 1.1 One might wonder why we have such a general result with
factor demand, but we didn’t with consumer demand. The answer is that with
factor demands there are no income effects. Income effects are what keep con-
sumer surplus from capturing the consumer’s net benefit from access to a good at
its prevailing price. Quasi-linear utility eliminates income effects, which allows
us to treat consumer surplus as the right measure of value or welfare.
Consequently, if CS(p) is aggregate consumer surplus and csj (p) is buyer j’s
consumer surplus, then
Z ∞ Z ∞ X J J Z ∞
X
CS(p) = X(q)dq = xj (q) dq = xj (q)dq
p p j=1 j=1 p
J
X
= csj (p) ;
j=1
number in the interval [0, 1]; and think of all names as being used). Rather
than thinking of the number of consumers—which would here be uncountably
infinite—we think about their measure; that is, being rather loose, a function
related to the length of the interval.
It might at first seem odd to model consumers as a continuum. One way to
think about it, however, is the following. Suppose there are J consumers. Each
consumer has demand
1 , if p ≤ v
x(p) = , (1.10)
0 , if p > v
where v is a number, the properties of which will be considered shortly. In other
words, a given consumer wants at most one unit and is willing to pay up to v
for it.
Assume, for each consumer, that v is a random draw from the interval [v0 , v1 ]
according to the distribution function F : R → [0, 1]. Assume the draws are
independent. Each consumer knows the realization of his or her v prior to
making his or her purchase decision.
In this case, each consumer’s expected demand is the probability that he or
she wants the good at the given price; that is, the probability that his or her v ≥
p. That probability is 1−F (p) ≡ Σ(p). The function Σ(·) is known in probability
theory as the survival function.6 Aggregate expected demand is, therefore,
JΣ(p) (recall the consumers’ valuations, v, are independently distributed).
Observe, mathematically, this demand function would be the equivalent of
assuming that there are a continuum of consumers living on the interval [v0 , v1 ],
each consumer corresponding to a fixed valuation, v. Assume further that the
measure of consumers on an the interval between v and v ′ is JF (v ′ ) − JF (v)
or, equivalently, JΣ(v) − JΣ(v ′ ). As before, consumers want at most one unit
and they are willing to pay at most their valuation. Aggregate demand at p is
the measure of consumers in [p, v1 ]; that is,
where the equality follows because Σ(v1 ) = 0 (it is impossible to draw a v greater
than v1 ). In other words, the assumption of a continuum of consumers can be
considered shorthand for a model with a finite number of consumers, each of
whom has a demand that is stochastic from the seller’s perspective.
A possible objection is that assuming a continuum of consumers on, say,
[v0 , v1 ] with aggregate demand JΣ(p) is a deterministic specification, whereas
J consumers with random demand is a stochastic specification. In particular,
there is variance in realized demand with the latter, but not the former.7 In
6 The name has an actuarial origin. If the random variable in question is age at death,
then Σ(age) is the probability of surviving to at least that age. Admittedly, a more natural
mnemonic for the survival function would be S(·); S(p), however, is “reserved” in economics
for the supply function.
7 For example, if, J = 2, [v , v ] = [0, 1], and F (v) = v on [0, 1] (uniform distribution),
0 1
then realized demand at p ∈ (0, 1) is 0 with positive probability p2 , 1 with positive probability
2p(1 − p), and 2 with positive probability (1 − p)2 .
1.4 Additional Topics 11
many contexts, though, this is not important because other assumptions make
the seller risk neutral.8
Assume Σ(·) is a differentiable survival function. Let f (p) = −Σ′ (p). The func-
tion f (·) is the density function associated with the survival function Σ(·) (or,
equivalently, with the distribution function 1 − Σ(p) ≡ F (p)). In demography
or actuary science, an important concept is the death rate at a given age, which
is the probability someone that age will die within the year. Treating time con-
tinuously, the death rate can be seen as the instantaneous probability of dying
at time t conditional on having survived to t. (Why conditional? Because you
can’t die at time t unless you’ve lived to time t.) The unconditional probability
of dying at t is f (t), the probability of surviving to t is Σ(t), hence the death rate
is f (t)/Σ(t). Outside of demographic and actuarial circles, the ratio f (t)/Σ(t) is
known as the hazard rate. Let h(t) denote the hazard rate. A key relationship
between hazard rates and survival functions is
Proof:
d log Σ(p) −f (p)
= = −h(p) .
dz Σ(p)
Solving the differential equation:
Z p
log Σ(p) = − h(z)dz + lim log Σ(0) .
p0 z↓p0
| {z }
=0
If demand at zero price, X(0), is finite, and if limp→∞ X(p) = 0, then any
demand function is a multiplicative scalar of a survival function; that is,
X(p) = X(0)Σ(p) ,
8 It would be incorrect to appeal to the law of large numbers and act as if J → ∞ means
Definition 1.1 The hazard rate associated with demand function X(·) is de-
fined as
X ′ (p)
hX (p) = − . (1.11)
X(p)
When the demand function is clear from context, the subscript X may be omit-
ted. In this context, hX (p) is the proportion of all units demanded at price p
that will vanish if the price is increased by an arbitrarily small amount; that is,
it’s the hazard (death) rate of sales that will vanish (die) if the price is increased.
You may recall that the price elasticity of demand, ǫ, is minus one times the
percentage change in demand per a one-percentage-point change in price.10 In
other words
∆X ∆p
ǫ = −1 × × 100% ÷ × 100% , (1.12)
X p
where ∆ denotes “change in.” If we pass to the continuum, we see that (1.12)
can be reëxpressed as
dX(p) p
ǫ=− × = phX (p) , (1.13)
dp X
where the last equality follows from (1.11). In other words, price elasticity places
a monetary value on the proportion of sales lost from a price increase.
(iii) X(p) = 0 for p ≥ p̄ and there is some finite positive constant X0 such that
Z p
X(p) = X0 exp − h(z)dz (1.15)
0
10 Whether one multiplies or not by −1 is a matter of taste; some authors (including this
The amount p̄ is known as the choke price: it is, as we will shortly see, the price
at which demand goes to zero (is choked off).
Given condition (i) of Definition 1.2, expression (1.15) entails that X(0) =
X0 < ∞. That is, if demand is a generalized survival function, demand at zero
is finite. Because the h(·) in Definition 1.2 is non-negative, it follows from (1.15)
that X(·) is non-increasing. It is, therefore, differentiable almost everywhere.
Where it is differentiable, it follows from the fundamental theorem of calculus
that Z p
′
X (p) = X0 exp − h(z)dz − h(p) = −h(p)X(p) .
0
Lemma 1.2 Consider differentiable demands that are generalized survival func-
tions.
(i) For any such demand function, X(·),
Z p
ǫ(z)
X(p) = X(0) exp − dz ,
0 z
where ǫ(p) is the price elasticity of demand at price p.
(ii) Let X1 (·) and X2 (·) be such demand functions. Suppose X1 (p) = ζX2 (p)
for all p, where ζ is a positive constant. Let ǫi (·) be the price-elasticity
function associated with Xi (·). Then ǫ1 (p) = ǫ2 (p) for all p.
which is both unrealistic and not interesting; the latter because, then, welfare would always
be maximized (assuming finite costs) and there would, thus, be little to study.
14 Lecture Note 1: Buyers and Demand
Proof: Recall the definition of consumer surplus is area to the left of demand
and above price:
Z ∞
CS(p) = X(b)db
p
∞
Z ∞
= bX(b) − bX ′ (b)db (1.17)
p p
Z ∞
= −pX(p) − bX ′ (b)db
p
Z ∞ Z ∞
= pX ′ (b)db − bX ′ (b)db . (1.18)
p p
Exercise 1.4.4: Prove that if limb→∞ bX(b) > 0, then consumer surplus at any price
p is infinite. Hints: Suppose limb→∞ bX(b) = L > 0. Fix an η ∈ (0, L). Show there is
a b̄ such that X(b) ≥ L−ηb
for all b ≥ b̄. Does
Z ∞
L−η
db
b̄ b
R∞
converge? Show the answer implies p
X(b)db does not converge (i.e., is infinite).
A function, g(·), is log concave if log g(·) is a concave function.
12 A
stronger result can be established, namely that r s(·) is concave if both r(·) and s(·)
are concave and r(·) is non-decreasing. This requires more work than is warranted given what
we need.
1.4 Additional Topics 15
Proof: If g(·) were twice differentiable, then the result would follow trivially
using calculus (Exercise: do such a proof). More generally, let x0 and x1 be
two points in the domain of g(·) and define xλ = λx0 +(1−λ)x1 . The conclusion
follows, by the definition of concavity, if we can show
log g(xλ ) ≥ λ log g(x0 ) + (1 − λ) log g(x1 ) (1.20)
for all λ ∈ [0, 1]. Because log(·) is order preserving, (1.20) will hold if
Observe the converse of Lemma 1.4 need not hold. For instance, x2 is log
concave (2 log(x) is clearly concave in x), but x2 is not itself concave. In other
words, log-concavity is a weaker requirement than concavity. As we will see, log
concavity is often all we need for our analysis, so we gain a measure of generality
by assuming log-concavity rather than concavity.
13 Pólya’s
Pn
generalization is readily proved. Define A ≡ i=1 λi ai . Observe that x + 1 ≤ ex
ai
ai
(the former is tangent to the latter at x = 0 and the latter is convex). Hence, A
≤ e A −1 .
λ λi a i
Because both sides are positive, aAi i ≤ e A −λi . We therefore have
n
a i λi
Y n
Y λi a i
−λi
≤ e A
i=1
A i=1
P n λi a i
i −λi
=e A .
Lemma 1.5 A demand function is (strictly) log concave if and only if the cor-
responding hazard rate satisfies the (strict) monotone hazard rate property.
Proof: It is sufficient for X(·) to be log concave that the first derivative of
log X(·) , which is
d log X(p) X ′ (p)
= = −hX (p) ,
dp X(p)
Exercise 1.4.5: Let f (·) and g(·) be concave functions. Prove that the function
defined as x 7→ f (x) + g(x) is concave.
Exercise 1.4.6: Suppose the hazard rate is a constant. Prove that pX(p) is, therefore,
everywhere strictly log concave.
Exercise 1.4.7: Prove that linear (affine) demand (e.g., X(p) = a − bp, a and b
positive constants) is log concave.
Exercise 1.4.8: Prove that if demand is log concave, then elasticity is increasing in
price (i.e., ǫ(·) is an increasing function).
Linear Tariffs 2
Consider a firm that sells all units of a given product or service at a constant
price per unit; that is, that deploys a linear tariff . If p is that price and it sells x
units, then its revenue is px. Use of a linear tariff is also referred to as engaging
in linear pricing or uniform pricing.
As in the previous lecture note, let X(p) denote aggregate demand for the
given product or service at price p and let P (·) denote the corresponding inverse
demand function. In other words,
X P (x) = x and P X(p) = p .
Recall this means that the maximum price at which the firm can sell all x units
is P (x). To avoid pathological and unrealistic cases in which the optimal price
tends toward either zero or infinity or in which profit is infinite, we assume
lim pX(p) = 0 ; lim pX(p) = 0 ; and X(p) < ∞ ∀ p > 0 . (A2.1)
p→0 p→∞
Let C(x) denote the firm’s cost of producing x units. Let R(x) denote the
firm’s revenue from selling x units. Given that pricing is linear, this means
R(x) = xP (x). The firm’s profit is revenue minus cost, R(x) − C(x). The
profit-maximizing amount to sell maximizes this difference.
Lemma 2.1 Given assumption (A2.1), the firm’s profit under linear pricing is
bounded above.
Proof: Because cost is non-negative, it is sufficient to prove that revenue is
bounded above. Fix an η > 0. By assumption, there exist p0 and p1 , both
positive and finite, such that pX(p) < η for all p either less than p0 or greater
than p1 . Consider the interval [p0 , p1 ]. If the interval is empty (i.e., p1 < p0 ),
then we have that pX(p) < η for all prices, which means revenue is bounded.
Suppose the interval is not empty. Let X̂ = supp∈[p0 ,p1 ] X(p). By assumption
X̂ < ∞. Note, too, p1 X̂ < ∞. Let p be an arbitrary element of [p0 , p1 ]. Clearly,
pX(p) ≤ p1 X̂; that is, pX(p) is bounded above for all p ∈ [p0 , p1 ].
17
18 Lecture Note 2: Linear Tariffs
Lemma 2.2 Assume P (·) and C(·) are continuous on [0, ∞). Assume, too,
that (A2.1) holds. Then there exists a finite quantity x∗ that maximizes the
firm’s profit.
The assumption that P (·) and C(·) are continuous on (0, ∞) is fairly innocuous
insofar as both are monotone functions (demand curves slope down and cost
functions are nondecreasing). For the same reason, they are also differentiable
almost everywhere and there is, again, little loss of generality in assuming they
are differentiable everywhere. Where this assumption might not be innocuous
is when we assume C(·) is continuous at 0 because this rules out there being
fixed (overhead) costs of production.2 Often there are fixed costs of production,
so the cost function is of the form
0 , if x = 0
C(x) = ,
c(x) + F , if x > 0
where F > 0 is the fixed cost. For the usual reasons c(0) = 0. But even if c(·)
is continuous, clearly C(·) is not continuous at zero. Fortunately, this is not
really a problem if c(·) is continuous: because F is a constant, if x maximizes
R(x) − C(x), then it also maximizes R(x) − c(x). Applying Lemma 2.2, using
c(·) instead of C(·), an x ∈ [0, ∞) exists, which maximizes R(x) − c(x). If that
x = 0, then we have
for all x̂ > 0; that is, x = 0 maximizes R(x) − C(x). Suppose that the x that
maximizes R(x) − c(x) is positive. Now either
If the former, then x maximizes R(x) − C(x). If the latter, then 0 is the maxi-
mizer. Either way, a maximizer and, thus, a maximum exists. We have estab-
lished:
Corollary 2.1 Assume P (·) and C(·) are continuous on (0, ∞). Assume, too,
that (A2.1) holds. Then there exists a finite quantity x∗ that maximizes the
firm’s profit.
In light of the previous lemmas and assuming inverse demand and cost are
differentiable, the profit-maximizing quantity is either zero or some positive
amount that satisfies the first-order condition:
R′ (x) − C ′ (x) = 0 ;
MR(x) = MC (x) ,
Exercise 2.0.2: Prove that if (i) profit is strictly pseudo-concave, (ii) x = 0 does not
maximize profit, and (iii) we assume (A2.1), then there is a unique profit-maximizing
quantity. Show, moreover, that this quantity satisfies MR = MC .
Exercise 2.0.3: Prove that if an x∗ that maximizes profit exists and profit is strictly
pseudo-concave, then (2.2) holds. Does (2.2) imply that profit is pseudo-concave?
20 Lecture Note 2: Linear Tariffs
It can readily be seen that (2.2) implies that profit is strictly quasi-concave.3
Exercise 2.0.4: Prove that if (i) there exists an x∗ ∈ (0, ∞) that solves maxx f (x);
and (ii) f ′ (x)(x − x∗ ) < 0 for all x 6= x∗ , then f is strictly quasi-concave and x∗ is its
unique maximizer.
Not all strictly quasi-concave profit functions, however, satisfy (2.2): if the profit
function had a point of inflection,4 then the inequality in (2.2) would fail to hold
at that point of inflection. Moreover, at the point of inflection we would have
MR = MC , but we would not be at the profit-maximizing quantity. Condition
(2.2) is, in this context, equivalent to the property Edlin and Hermalin (2000)
define as ideally quasi-concave.5
Exercise 2.0.5: Prove that if (2.2) holds, then x∗ is the unique profit-maximizing
quantity.
Exercise 2.0.6: Suppose R(·) is concave and C(·) is convex, at least one strictly.
Prove that (2.2) holds.
Because demand curves slope down, P ′ (x) < 0; hence, MR(x) < P (x), except
at x = 0 where MR(0) = P (0). See Figure 2.1.
The expression for marginal revenue might, at first, seem odd. Naı̈vely, one
might expect marginal revenue to equal the price received for the last unit sold.
But such a naı̈ve view ignores that, to sell an additional item (i.e., go from
X(p) to X(p) + 1), the firm must lower the price (i.e., recall, P (x + 1) < P (x)),
3 Recall a function f : Rn → R is strictly quasi-concave if, for any x and x′ in the domain
4 An inflection point, recall, is a value at which the first derivative of the function is zero,
but the point is neither a local maximum or minimum (e.g., if f (x) = x3 , then x = 0 is a
point of inflection). At a point of inflection, the function goes from being locally concave to
locally convex or the reverse.
5 Edlin and Hermalin define a differentiable function f : R → R to be ideally quasi-concave
if
f ′ (x0 ) = 0 ⇒ f ′ (x)(x − x0 ) < 0 , ∀x 6= x0 .
It is readily seen that any x0 such that f ′ (x0 ) = 0 is the unique maximizer of f (·).
2.1 Elasticity and Linear Pricing 21
price ($/unit)
MC
P (x∗M )
P (x)
MR
x∗M units
Figure 2.1: Relation between inverse demand, P (x), and marginal revenue,
MR, under linear pricing; and the determination of the profit-
maximizing quantity, x∗M , and price, P (x∗M ).
which affects all units sold. So marginal revenue has two components: The price
received on the marginal unit, P (x), less the revenue lost on the infra-marginal
units from having to lower the price, |xP ′ (x)| (i.e., the firm gets |P ′ (x)| less on
each of the x infra-marginal units).
Summary 2.1 Under linear pricing, the profit-maximizing quantity, x∗M , solves
MR(x) = P (x) + xP ′ (x) = MC(x) . (2.3)
And the monopoly price, p∗M , equals P (x∗M ). Because P ′ (x) < 0, expression
(2.3) reveals that p∗M > MC(x∗M ); that is, price is marked up over marginal
cost.
Exercise 2.0.7: Use (2.3) to show that, under linear pricing, the profit-maximizing
quantity, p∗M , solves
1
p∗M + X(p∗M ) = MC X(p∗M ) .
X ′ (p∗M )
on page 12). Observe we can rewrite the continuous formula for elasticity,
expression (1.13) as follows:
P (x)
ǫ=− , (2.4)
xP ′ (x)
xP ′ (x) 1
−1 < =− , (2.5)
P (x) ǫ
where the equality in (2.5) follows from (2.4). Multiplying both sides of (2.5)
by −ǫ (a negative quantity) we have that revenue is increasing if and only if
ǫ > 1. (2.6)
When ǫ satisfies (2.6), we say that demand is elastic. When demand is elastic,
Elasticity & revenue is increasing with units sold. If ǫ < 1, we say that demand is inelastic.
Revenue: Where Reversing the various inequalities, it follows that, when demand is inelastic,
demand is revenue is decreasing with units sold. The case where ǫ = 1 is called unit
elastic/inelastic/unit elasticity.
elastic, revenue is Recall that a firm produces the number of units that equates MR to MC .
increasing/ The latter is positive, which means that a profit-maximizing firm engaged in
decreasing/ linear pricing operates only on the elastic portion of its demand curve. This
unchanging in units makes intuitive sense: If it were on the inelastic portion, then, were it to produce
sold, respectively. less, it would both raise revenue and lower cost; that is, increase profit. Hence,
it can’t maximize profit operating on the inelastic portion of demand.
Recall the first-order condition for profit maximization, equation (2.3). Re-
write it as
P (x) − MC (x) = −xP ′ (x)
and divide both sides by P (x) to obtain
where the second equality follows from (2.4). Expression (2.7) is known as the
Lerner markup rule. In English, it says that the price markup over marginal
cost, P (x) − MC (x), as a proportion of the price is equal to 1/ǫ. Hence, the less
elastic is demand (i.e., as ǫ decreases towards 1), the greater the percentage of
the price that is a markup over cost. Obviously, the portion of the price that
is a markup over cost can’t be greater than the price itself, which again shows
that the firm must operate on the elastic portion of demand.
Recall that ǫ = ph(p), where h(·) is the hazard rate implied by the demand
curve (see expression (1.13) above). Consequently, an alternative expression of
the Lerner markup rule is
1
p − MC X(p) = . (2.8)
h(p)
6 The convexity and log concavity assumptions imply differentiability almost everywhere;
hence, the further loss in generality from assuming differentiability everywhere is rather min-
imal.
24 Lecture Note 2: Linear Tariffs
curve, X(·), from the prevailing price to infinity or if we calculate it as the area
beneath inverse aggregate demand and above the prevailing price from 0 to the
number of units consumed (recall the discussion on page 9); that is,
Z x
CS = P (z) − P (x) dz (2.9)
0
Lemma 2.4 If a total of x units trade at price P (x), where R x P (·) is inverse
aggregate demand, then total benefit realized by consumers is 0 P (z)dz; that is,
the area beneath inverse aggregate demand from 0 to the number of units traded.
(i.e., benefit equals consumer surplus plus expenditure). Aggregating across the
J consumers, aggregate benefit is
J
X Z x Z x
vj xj (p) = CS(p) + px = P (z) − P (x) dz + P (x)x = P (z)dz ,
j=1 0 0
where the first equality follows from (2.10) and Proposition 1.2, the second
equality from (2.9) and the fact that p = P (x), and the third from simplifying
the expression.
7 The “approximately” and “essentially” arise because we’re considering a discrete inter-
Exercise 2.2.1: Prove that, under the standard assumptions for linear pricing, (2.12)
is a sufficient as well as necessary condition for the welfare-maximizing quantity.
What is the welfare loss from linear pricing? It is the amount of welfare
forgone because only x∗M units are traded rather than x∗W units:
Z x∗W ! Z x∗M !
∗ ∗
P (z)dz − C(xW ) − P (z)dz − C(xM )
0 0
Z x∗
W
= P (z)dz − (C(x∗W ) − C(x∗M ))
x∗
M
Z x∗
W
Z x∗
W
= P (z)dz − MC (z)dz
x∗
M x∗
M
Z x∗
W
= P (z) − MC (z) dz . (2.13)
x∗
M
26 Lecture Note 2: Linear Tariffs
price ($/unit)
MC
P (x)
MR
x∗M x∗W units
Figure 2.2: The deadweight loss from linear pricing is the shaded triangular
region.
The area in (2.13) is called the deadweight loss associated with linear pricing. It
is the area beneath the demand curve and above the marginal cost curve between
x∗M and x∗W . Because P (x) and MC (x) meet at x∗W , this area is triangular (see
Figure 2.2) and, thus, the area is often called the deadweight-loss triangle.
The existence of a deadweight-loss triangle is one reason why governments
and antitrust authorities typically seek to discourage monopolization of in-
dustries and, instead, seek to encourage competition. Competition tends to
drive price toward marginal cost, which causes output to approach the welfare-
maximizing quantity.
The welfare loss associated with linear pricing is a motive to change the
industry structure (i.e., encourage competition). It is also—at least from the
firm’s perspective—a motive to change the method of pricing. The deadweight
loss is, in a sense, money left on the table. As we will see, beginning in Chapter 3,
clever pricing schemes by the firm can often allow it to pick up some of this
money left on the table.
An Example
To help make all this more concrete, consider the following example. A monopoly
has cost function C(x) = 2x; that is, MC = 2. It faces inverse demand
P (x) = 100 − x.
Exercise 2.2.2: Verify that this example satisfies the standard assumptions for linear
pricing.
2.3 An Application 27
Marginal revenue under linear pricing is P (x)+xP ′ (x), which equals 100−x+x×
(−1) = 100 − 2x. Equating MR with MC yields 100 − 2x = 2; hence, x∗M = 49.
The profit-maximizing price is 100 − 49 = 51. Profit
R 49 is revenue minus cost; that
is, 51 × 49 − 2 × 49 = 2401. Consumer surplus is 0 (100 − t − 51)dt = 21 × 492 .
Total welfare, however, is maximized by equating price and marginal cost:
P (x) = 100 − x = 2 = MC . So x∗W = 98. Deadweight loss is, thus,
Z 98 98
1
2 )dt = 98t − t2
(100 − t − |{z} = 1200.5 .
49
| {z } 2 49
P (x) MC
Exercise 2.2.3: Prove that if inverse demand is an affine function (i.e., the function
for a line in units-price space), then marginal revenue is also affine with a slope that
is twice as steep as inverse demand.
Exercise 2.2.4: Prove that if inverse demand is P (x) = a − bx and MC = c, a
constant, then x∗M = a−c2b
and P (x∗M ) = a+c
2
. (Note: the smallest price at which
demand is zero is called the choke price. Hence, this result is sometimes summarized
as the profit-maximizing price with linear demand and constant marginal cost is the
average of the choke price and marginal cost.)
2
Exercise 2.2.5: Prove that profit under linear pricing is 1b a−c
2
under the assump-
tions of the previous exercise.
(a−c)2
Exercise 2.2.6: Prove that consumer surplus under linear pricing is 8b
under the
assumptions of exercise 2.2.4.
Exercise 2.2.7: Derive the general condition for deadweight loss for affine demand
and constant marginal cost (i.e., under the assumptions of exercise 2.2.4).
An Application 2.3
We often find linear pricing in situations that don’t immediately appear to be
linear-pricing situations. For example, suppose that a risk-neutral seller faces
a single buyer. Let the seller have single item to sell (e.g., an artwork). Let
the buyer’s value for this artwork be v. The buyer knows v, but the seller does
not. All the seller knows is that v is distributed according to the differential
distribution function F (·). That is, the probability that v ≤ v̂ is F (v̂). Assume
F ′ (·) > 0 on the support of v.8 Let the seller’s value for the good—her cost—be
8 The support of a distribution is, being somewhat rough, the set of values of the random
variable that can occur with positive probability. If the distribution is discrete, this definition
is precise. For instance, if the random variable Y equals 0 if a coin lands tails and equals 1 if
the coin lands heads, then the support is {0, 1}. For a non-discrete distribution—call it G(·)—
28 Lecture Note 2: Linear Tariffs
c. Assume F (c) < 1. Finally, assume the hazard function associated with F (·)
satisfies mhrp.
Suppose that the seller wishes to maximize her expected profit. Suppose,
too, that she makes a take-it-or-leave-it (tioli) offer to the buyer; that is, the
seller quotes a price, p, at which the buyer can purchase the good if he wishes.
If he doesn’t wish to purchase at that price, he walks away and there is no trade.
Clearly, the buyer buys if and only if p ≤ v; hence, the probability of a sale, x,
is given by the formula x = 1 − F (p). The use of “x” is intentional—we can
think of x as the (expected) quantity sold at price p. Note, too, that, because
the formula x = 1 − F (p) relates quantity sold to price charged, it is a demand
curve. Of course, 1 − F (p) is also a survival function, so we again see that
demand curves are effectively survival functions.
The seller’s expected cost is cx; that is, with probability x she forgoes pos-
session of an object that she values at c. Marginal cost is, therefore, c.
Utilizing a variant of the Lerner markup rule, expression (2.8), we have
1 − F (p)
p−c= . (2.14)
F ′ (p)
The seller’s profit-maximizing price is the price that solves (2.14).
For example, if v is distributed uniformly on [0, 1], then F (v) = v and
F ′ (v) = 1. Expression (2.14) becomes
1−p
p−c= .
1
Consequently, the profit-maximizing price is 1+c 2 . Note: Because the uniform
distribution is equivalent to linear demand and we have constant marginal cost,
we know the profit-maximizing price is the average of the choke price and
marginal cost (recall the exercises at the end of the last section); that is, here,
the average of 1 and c.
Note that there is a deadweight loss: Efficiency requires that the good change
hands whenever v > c; But given linear pricing, the good only changes hands
when
1+c
v≥ >c
2
(c, recall, is less than 1 because we assumed F (c) < 1). For instance, suppose
c = 1/2, then the profit-maximizing price is 3/4. So although the good should
change hands whenever v ≥ 1/2, half of these times it doesn’t.
a value y is in the support of G(·) if there exists no δ > 0 such that G(y + δ) − G(y − δ) = 0.
For example, let Y be a random variable and suppose for any y ∈ [0, 1/3), Pr{Y ≤ y} = 3y/2;
and for any y ∈ (2/3, 1], Pr{1/3 < Y ≤ y} = 3y/2 − 1. Then
3
y , if y < 31
2
1
G(y) = 2
, if 31 ≤ y ≤ 23
3
2
y − 21 , if 23 < y ≤ 1
and the support is [0, 1/3] ∪ [2/3, 1].
2.4 Pass-Through 29
Pass-Through 2.4
A comparative statics question worth asking is what is the consequence of a
shock, such as an increase the firm’s factor costs or the imposition of a sales
tax, on price, quantity traded, and welfare?
max f (x, y)
x∈Rm
has a solution. Let x̂ be the solution to that program for a particular vector y
and let x̂′ be the solution to that program for another vector y′ . Then, by the
definition of a maximum (i.e., there is nothing bigger), we have
This insight is often referred to as revealed preference. This term arises because
if a decision maker chooses x̂ when conditions are y, she reveals she prefers x̂
to all other x she could have chosen under conditions y, including an x she
might choose under different conditions (e.g., the x̂′ she would choose were the
conditions y′ ). Expressions such as (2.15) are often said to “follow by revealed
preference” and this line of argumentation is often called a “revealed-preference
argument.”
Let f : R2 → R. If, for any x > x′ and y > y ′ , we have
∂ 2 f (x, y)
>0 (2.18)
∂y∂x
30 Lecture Note 2: Linear Tariffs
almost everywhere (proof: move ∂f (x, y ′ )/∂x to the lhs of (2.17), divide by
y − y ′ , and take the limit as y ′ → y). This last expression is sometimes called
a cross-partial condition. When f is sufficiently differentiable, the cross-partial
condition is also sufficient for f to exhibit increasing differences:
Lemma 2.5 Suppose f : (x, x̄) × (y, ȳ) → R is at least twice continuously
differentiable in each of its arguments.9 If f satisfies the cross-partial condition
(2.18), then f exhibits increasing differences.
If the cross-partial derivative of f exists for all x and y, what is its sign?
Proof: Suppose, contrary to the theorem, that x̂ < x̂′ . By revealed preference,
we have
f (x̂, y) ≥ f (x̂′ , y) and f (x̂′ , y ′ ) ≥ f (x̂, y ′ ) .
9 Note (x, x̄) and (y, ȳ) denote intervals. In other words, the domain is a rectangle in which
each side is parallel to one axis and perpendicular to the other. The lower-left corner is (x, y)
and the upper-right corner is (x̄, ȳ). This condition ensures we don’t integrate across points
not in the domain of f .
2.4 Pass-Through 31
But if x̂′ > x̂, then this last expression contradicts the fact that f exhibits
increasing differences with respect to its second argument (y recall is greater
than y ′ ). Reductio ad absurdum, we can conclude that x̂ ≥ x̂′ .
In words, Theorem 2.1 states that if increasing y raises the marginal return
from x (i.e., f exhibits increasing differences in y), then the value of x that
maximizes f is nondecreasing in y.
In some contexts we want to be able to say definitively that the maximizer
is increasing in the second argument. This is relatively straightforward with
differentiable functions:
Theorem 2.2 Let X be a closed and bounded subset of R and let f : X × R →
R be at least twice differentiable in both arguments. Suppose, too, that f satisfies
the cross-partial condition (2.18). Then if x̂ solves maxx f (x, y) and x̂′ solves
maxx f (x, y ′ ), where y > y ′ , and at least one of x̂′ and x̂ is an interior solution,
then x̂ > x̂′ .
Proof: From Lemma 2.5 and Theorem 2.1, it is sufficient to show that x̂ 6= x̂′ .
Suppose x̂′ is an interior solution. It follows that
∂f (x̂′ , y ′ )
=0
∂x
given that x̂′ is in the interior of X . If x̂ = x̂′ , then
∂f (x̂, y ′ )
=0 (2.20)
∂x
and x̂ is in the interior of X . By assumption, it follows that
Z y 2
∂ f (x̂, z) ∂f (x̂, y) ∂f (x̂, y ′ ) ∂f (x̂, y)
0< dz = − = , (2.21)
y ′ ∂x∂y ∂x ∂x ∂x
where the last equality follows from (2.20). But ∂f (x̂, y)/∂x 6= 0 contradicts the
necessary first-order condition for an interior point to maximize f (·, y). Reduc-
tio ad absurdum, we can conclude that x̂ 6= x̂′ . The proof when x̂ is an interior
solution is similar and left to the reader.
Figure 2.3 illustrates the logic behind Theorem 2.2. The figure also shows
why we require at least one solution being an interior solution: if, for instance,
the values of x were restricted to lie to the left of the dotted line in Figure 2.3,
then the optimal x would be at that dotted line for both y ′ and y.
x̂′
0 x
x̂
∂ 2 f (x, y)
<0
∂y∂x
for all x and y. Then if x̂ solves maxx f (x, y) and x̂′ solves maxx f (x, y ′ ), where
y > y ′ , and at least one of x̂′ and x̂ is an interior solution, then x̂ < x̂′ .
A cost shock
Suppose the firm’s cost of producing x is C(x, ω), where ω ∈ R is some parameter
reflecting some condition or state relevant to the firm’s costs (e.g., ω is the price
of a necessary input). Suppose cost exhibits increasing differences in ω; that is,
if x > x′ and ω > ω ′ , then
Proposition 2.3 Given expression (2.22), if ω > ω ′ , then the amount the firm
sells when the state is ω is not greater than the amount it sells when the state
is ω ′ . If all functions are at least twice differentiable and a positive amount is
sold when the state is ω ′ , then that amount is strictly greater than the amount
sold when the state is ω. (In this latter case, the claim is that dx∗M /dω < 0.)
Proposition 2.3 implies, because demand curves slope down, that a upward
cost shock causes an increase in the price.
dx∗ ∂C(x∗M , ω)
P (x∗M ) − MC (x∗M , ω) × M − , (2.24)
| {z } | dω
{z } | ∂ω
{z }
>0
<0 ?
where the first sign follows because output under linear pricing is less than the
welfare-maximizing amount; and the second sign follows from Proposition 2.3.
The last term is ambiguous because, although an increase in ω increases variable
costs, it is possible that it is also reducing fixed costs. If, however, ω affects
variable cost only—as would be the case, for example, if ω were the price of
some component of the final good—then (2.24) implies that a cost shock would
be welfare reducing.
Although it is possible that a cost shock can have an ambiguous effect on
welfare, any shock that raises price must make buyers worse off; that is, con-
sumer surplus must fall with ω. To see this, recall that we can express consumer
surplus as
Z ∞
− (b − p)X ′ (b)db (2.25)
p
34 Lecture Note 2: Linear Tariffs
(see Lemma 1.3). The derivative of (2.25) with respect to p is, applying Leibniz’s
rule,10 Z ∞
X ′ (b)db < 0 ,
p
where the inequality follows because demand curves slope down. To summarize:
Proposition 2.4 If a cost shock raises marginal cost (i.e., expression (2.23)
holds), then equilibrium consumer surplus falls in response to a cost shock.
Sales Tax
There are two kinds of sales taxes. One is an ad valorem tax, which means the
tax is based on the price; the other is an excise tax, which means the tax is
based on the quantity (i.e., $k per unit). Because an excise tax is equivalent to
a cost shock, we’ll consider only ad valorem taxes here.
If the tax rate is τ , then the consumer pays p(1 + τ ) if the posted price is p.
The firm gets p and the government
gets pτ . Observe that, at a price of p, the
firm’s demand is X p(1 + τ ) because consumers care about the amount they
pay, not the price per se. Inverting, we have P (x) = p(1 + τ ); hence, from the
firm’s perspective its inverse demand curve is
P (x)
P̃ (x) = . (2.26)
1+τ
The firm’s profit is
1
xP̃ (x) − C(x) = xP (x) − C(x) .
1+τ
Using the envelope theorem, it is immediate that the firm’s equilibrium (maxi-
mum) profit is falling in τ .
Observe that the cross-derivative of the firm’s profit with respect to x and
τ is negative; that is, the firm’s marginal profit with respect to output is falling
in τ . From Theorem 2.4, the firm sells less the greater is τ . To summarize our
results to this point:
Proposition 2.5 As the rate of the ad valorem tax increases, the firm’s profits
and output both fall (assuming they are not both zero).
Proposition 2.5 explains why businesses oppose increases in sales taxes. Even
though the statutory incidence could be on the consumers (they physically pay
the tax), businesses lose from the tax.
The posted price is P̃ (x∗M ). How does that vary with the tax rate? Because
demand curves slope down and the firm produces less, the numerator of (2.26)
increases as τ increases. The denominator, however, is also increasing. Conse-
quently, a definitive answer is not possible in general. We can, however, derive
conditions under which a definitive answer is possible.
Proof: Let p and p̃ be the profit-maximizing posted prices when the tax rates
are τ and τ̃ , respectively. By revealed preference, we have
X p̃(1 + τ̃ ) (p̃ − c) ≥ X p(1 + τ̃ ) (p − c) (2.27)
and
X p(1 + τ ) (p − c) ≥ X p̃(1 + τ ) (p̃ − c) . (2.28)
36 Lecture Note 2: Linear Tariffs
From (2.30), we see that p is less than or greater than p̃ whenever the function
X z(1 + τ )
X z(1 + τ̃ )
For example, if demand is affine, X(p) = a − bp, then simple calculations reveal
that ψ ′ (·|τ, τ̃ ) has the same sign as ab(τ̃ − τ ) > 0; hence, if demand is affine and
marginal cost is constant, the posted price is falling as the tax rate increases.
We also have
Proposition 2.6 Suppose that marginal cost is a constant, c. If demand, X(·),
is log-concave, then an increase in the ad valorem tax rate causes the posted price
to fall.
Proof: In light of Lemma 2.6, we need to show that ψ(·|τ, τ̃ ) is increasing
when τ < τ̃ . Observe that
!
∂ 2 log X p(1 + τ ) X ′ p(1 + τ ) ∂ X ′ p(1 + τ )
= +(1+τ ) < 0 , (2.31)
∂τ ∂p X p(1 + τ ) ∂τ X p(1 + τ )
where the sign follows because the first term on the right-hand side of the
equation is negative because demand curves slope down and the second term
on the right-hand side of the equation is negative because X(·) is log concave.11
Expression (2.31) implies that log X p(1 + τ ) exhibits decreasing differences
in p and τ ; that is, if p > p̃ and τ < τ̃ we have
log X p(1+τ ) −log X p̃(1+τ ) > log X p(1+ τ̃ ) −log X p̃(1+ τ̃ ) .
11 Exercise: Why does f (·) log concave imply that f ′ (z)/f (z) is a decreasing function of
z?
2.4 Pass-Through 37
Hence,
X p(1 + τ ) X p̃(1 + τ )
>
X p(1 + τ̃ ) X p̃(1 + τ̃ )
Because p, p̃, τ , and τ̃ were arbitrary, this implies ψ(·|τ, τ̃ ) is increasing for all
τ and τ̃ , τ < τ̃ .
Regardless of what happens to the posted price, the amount the consumers
pay goes up with the tax rate: The consumers pay P (x∗M ), x∗M falls as the tax
rate rises, and demand curves slope down. Observe, too, that the consumers’
demand curve—that is, their marginal-benefit schedule—remains P (·). Given
that they consume less as the tax rate rises, it follows that their consumer
surplus must fall as the tax rate rises. To summarize:
Proposition 2.7 As the rate of the ad valorem tax increases, the total amount
consumers pay per unit goes up and their consumer surplus goes down.
Given that the firm’s profits and consumers’ surplus fall as the tax rate rises,
it must be that their collective welfare is falling as the tax rate rises. This is not
to say, however, that overall welfare is falling because the government presum-
ably uses the revenue it raises, τ P (x∗M )x∗M , for some purpose. Without knowing
what that purpose is, it is impossible to say what happens to overall welfare.
One question we can answer, however, is whether the firm and consumers’ lost
welfare is greater than the revenue taken in by the government. The answer is
yes, as the following analysis shows. Total value created (i.e., consumer surplus,
firm profit, and government revenue) is
Z x∗
M
V ≡ P (x) − C ′ (x))dx .
0
Hence, dV /dx∗M = P (x∗M ) − C ′ (x∗M ) > 0 (the demand curve lies above the
marginal-cost curve). Consequently, a reduction in x∗M , as will result from
increasing the tax rate, reduces total value. Given total value is reduced, it
must be that tax revenue is less than the losses suffered by consumers and the
firm. We have proved:
Proposition 2.8 An increase in the ad valorem tax rate reduces total value
created.
38 Lecture Note 2: Linear Tariffs
First-degree Price
Discrimination 3
We saw in Section 2.2 that linear pricing “leaves money on the table,” in the
sense that there are gains to trade—the deadweight loss—that are not realized.
There is money to be made if the number of units traded can be increased from
x∗M to x∗W .
Why has money been left on the table? The answer is that trade bene-
fits both buyers and seller. The seller profits to the extent that the revenue
received exceeds cost and the buyers profit to the extent that their benefit en-
joyed exceeds their expenditure (cost). The seller, however, does not consider
the positive externality she creates for the buyers by selling them goods. The
fact that their marginal benefit schedule (i.e., inverse demand) lies above their
marginal cost (i.e., the price the seller charges) is irrelevant to the seller insofar
as she doesn’t capture any of this gain enjoyed by the buyers. Consequently,
she underprovides the good. This is the usual problem with positive externali-
ties: The decision maker doesn’t internalize the benefits others derive from her
action, so she does too little of it from a social perspective. In contrast, were
the action decided by a social planner seeking to maximize social welfare, then
more of the action would be taken because the social planner does consider the
externalities created. The cure to the positive externalities problem is to change
the decision maker’s incentives so she effectively faces a decision problem that
replicates the social planner’s problem.
One way to make the seller internalize the externality is to give her the
social benefit of each unit sold. Recall the marginal benefit of the the xth unit
is P (x). So let the seller get P (1) if she sells one unit, P (1) + P (2) if she sells
two, P (1) + P (2)R+ P (3) if she sells three, and so forth. Given that her revenue
x
from x units is 0 P (z)dz, her marginal revenue schedule is P (x). Equating
marginal revenue to marginal cost, she produces x∗W , the welfare-maximizing
quantity.
In general, allowing the seller to vary price unit by unit, so as to march
down the demand curve, is impractical. But, as we will see, there are ways for
the seller to effectively duplicate marching down the demand curve. When the
seller can march down the demand curve or otherwise capture all the surplus,
she’s said to be engaging in first-degree price discrimination. This is sometimes
called perfect price discrimination.
39
40 Lecture Note 3: First-degree Price Discrimination
where p is price per unit and f is the entry fee, the amount the buyer must
pay to have access to any units. The scheme in (3.1) is called a two-part tariff
because there are two parts to what the buyer pays (the tariff), the unit price
and the entry fee.
The buyer will buy only if f is not set so high that he loses all his consumer
surplus. That is, he buys provided
Z x Z x
f≤ (p(z) − p(x))dz = p(z)dz − xp(x) . (3.2)
0 0
subject to (3.2). Observe that (3.2) must bind: If it didn’t, then the seller could
raise f slightly, keeping x fixed, thereby increasing her profits without violating
the constraint. Note this means that the entry fee is set equal to the consumer
surplus that the consumer receives. Because (3.2) is binding, we can substitute
it into (3.3) to obtain the unconstrained problem:
Z x
max p(z)dz − xp(x) + xp(x) − C(x) .
x 0
The first-order condition is p(x) = MC (x); that is, the profit-maximizing quan-
tity is the welfare-maximizing quantity. The unit price is p(x∗W ) and the entry
R x∗
fee is 0 W p(t)dt − x∗W p(x∗W ).
Proposition 3.1 A seller who sells to a single buyer with known demand does
best to offer a two-part tariff with the unit price set to equate demand and
marginal cost and the entry fee set equal to the buyer’s consumer surplus at that
unit price. Moreover, this solution maximizes welfare.
Of course, a seller rarely faces a single buyer. If, however, the buyers all
have the same demand, then a two-part tariff will also achieve efficiency and
3.1 Two-Part Tariffs 41
allow the seller to achieve the maximum possible profits. Let there be J buyers
all of whom are assumed to have the same demand curve. As before, let P (·)
denote aggregate inverse demand. The seller’s problem in designing the optimal
two-part tariff is
max Jf + xP (x) − C(x) (3.4)
f,x
where csj (p) denotes the jth buyer’s consumer surplus at price p. Because the
buyers are assumed to have identical demand, the subscript j is superfluous
and constraint (3.5) is either satisfied for all buyers or it is satisfied for no
buyer. As before, (3.5) must bind, otherwise the seller could profitably raise f .
Substituting the constraint into (3.4), we have
max J × cs P (x) + xP (x) − C(x) ,
x
which, because aggregate consumer surplus is the sum of the individual surpluses
(recall Proposition 1.2 on page 9), can be rewritten as
Z x
max P (z)dz − xP (x) +xP (x) − C(x) .
x
|0 {z }
aggregate CS
The solution is x∗W . Hence, the unit price is P (x∗W ) and the entry fee, f , is
Z x∗W !
1 ∗ ∗
P (z)dz − xW P (xW ) .
J 0
Proposition 3.2 A seller who sells to J buyers, all with identical demands,
does best to offer a two-part tariff with the unit price set to equate demand and
marginal cost and the entry fee set equal to 1/Jth of aggregate consumer surplus
at that unit price. This maximizes social welfare and allows the seller to capture
all of social welfare.
We see many examples of two-part tariffs in real life. A classic example is
an amusement park that charges an entry fee and a per-ride price (the latter,
sometimes, being set to zero). Another example is a price for a machine (e.g.,
a Polaroid instant camera or a punchcard sorting machine), which is a form of
entry fee, and a price for an essential input (e.g., instant film or punchcards),
which is a form of per-unit price.1 Because, in many instances, the per-unit price
1 Such schemes can also be seen as a way of providing consumer credit: the price of the
capital good (e.g., camera or machine) is set lower than the profit-maximizing price (absent
liquidity constraints), with the difference being essentially a loan to the consumer that he
repays by paying more than the profit-maximizing price (absent a repayment motive) for the
ancillary good (e.g., film or punchcards).
42 Lecture Note 3: First-degree Price Discrimination
is set to zero, some two-part tariffs might not be immediately obvious (e.g., an
annual service fee that allows unlimited “free” service calls, a telephone calling
plan in which the user pays so much per month for unlimited “free” phone calls,
Packaging: A or amusement park that allows unlimited rides with paid admission).
disguised two-part Packaging is another way to design a two-part tariff. For instance, a grocery
tariff. story could create a two-part tariff in the following way. Suppose that, rather
than being sold in packages, sugar were kept in a large bin and customers could
purchase as much or as little as they liked (e.g., like fruit at most groceries or
as is actually done at “natural” groceries). Suppose that, under the optimal
two-part tariff, each consumer would buy x pounds, which would yield him
surplus of cs, which would be captured by the store using an entry fee of f = cs.
Alternatively, but equivalently, the grocery could package sugar. Each bag of
sugar would have x pounds and would cost px + cs per bag. Each consumer
would face the binary decision of whether to buy 0 pounds or x pounds. Each
consumer’s total benefit from x pounds is px + cs, so each would just be willing
to pay px + cs for the package of sugar. Because the entry fee is paid on every
x-pound bag, the grocery has devised a (disguised) two-part tariff that is also
arbitrage-proof. In other words, packaging—taking away consumers ability to
buy as much or as little as they wish—can represent an arbitrage-proof way of
employing a two-part tariff.
When the seller was limited to just one price parameter, p —that is, engaged in
linear pricing—she made less money than when she controlled two parameters,
p and f . One way to explain this is that a two-part tariff allows the seller to
face the social planner’s problem of maximizing welfare and, moreover, capture
all welfare. Because society can do no better than maximize welfare and the
seller can do no better than capture all of social welfare, she can’t do better
than a two-part tariff in this context.
But this begs the question of why she couldn’t do as well with a single price
parameter. Certainly, she could have maximized social welfare; all she needed
to do was set P (x) = MC (x). But the problem with that solution is there is
no way for her to capture all the surplus she generates. If she had an entry fee,
then she could use this to capture the surplus; but with linear pricing we’ve
forbidden her that instrument.
The problem with using just the unit price is that we’re asking one instru-
ment to do two jobs. One is to determine allocation. The other is to capture
surplus for the seller. Only the first has anything to do with efficiency, so the
fact that the seller uses it for a second purpose is clearly going to lead to a
distortion. If we give the seller a second instrument, the entry fee, then she has
two instruments for the two jobs and she can “assign” each job an instrument.
This is a fairly general idea—efficiency is improved by giving the mechanism
designer more instruments—call this the two-instruments principle.
3.1 Two-Part Tariffs 43
f
{ y-
Y(x)
x
x*
It might seem that the analysis of two-part tariffs is dependent on our assump-
tion of quasi-linear utility. In fact, this is not the case. To see this, consider a
single consumer with utility u(x, y). Normalize the price of y to 1. Assume the
individual has income I. Define Y (x) to be the indifference curve that passes
through the bundle (0, I); that is, the bundle in which the consumer purchases
only the y-good. See Figure 3.1. Assume MC = c.
Consider the seller of the x good. If she imposes a two-part tariff, then she
transforms the consumer’s budget constraint to be the union of the vertical line
segment {(0, y)|I − f ≤ y ≤ I} and the line y = (I − f ) − px, x > 0. If we
define ȳ = I − f , then this budget constraint is the thick dark curve shown
in Figure 3.1. Given that the consumer can always opt to purchase none of
the x good, the consumer can’t be put below the indifference curve through
(0, I); that is, below Y (x). For a given p, the seller increases profit by raising
f , the entry fee. Hence, the seller’s goal is to set f so that this kinked budget
constraint is just tangent to the indifference Y (x). This condition is illustrated
in Figure 3.1, where the kinked budget constraint and Y (x) are tangent at x∗ .
If the curves are tangent at x∗ , then
−p = Y ′ (x∗ ) . (3.6)
Summary 3.1 The conclusion that the optimal two-part tariff with one con-
sumer or homogeneous consumers entails setting the unit price equal to marginal
cost is not dependent on the assumption of quasi-linear utility.
Bibliographic Note
For more on two-part and multi-part tariffs, see Wilson (1993). Among other
topics, Wilson investigates optimal two-part tariffs with heterogeneous con-
sumers. Varian (1989) is also a useful reference.
that is, a given consumer wants at most one unit and only if price does not
exceed θ. Suppose θ varies across consumers. In this context, θ is a consumer’s
type (index). The set of possible values that θ can take is the type space. For
instance, suppose that consumers come in one of two types, θ = 10 or θ = 15.
In this case, the type space, Θ, can be written as Θ = {10, 15}. Alternatively,
we could have a continuous type space; for instance, Θ = [a, b], a and b ∈ R+ .
Consider the second example. Suppose there is a continuum of consumers
of measure J whose types are distributed uniformly over [a, b]. Consequently,
b−p
at a uniform price of p, demand is J b−a . Suppose the firm’s cost of x units is
cx; observe MC = c. As there would otherwise be no trade, assume c < b. If
the firm engaged in linear pricing, its profit-maximizing price would be
( b+c b+c
∗ 2 , if 2 ≥ a
p = .
a , if b+c
2 <a
(Exercise: Verify.) Rather than deal with both cases, assume a < (b + c)/2.
Given this last assumption, there is no further loss of generality in normalizing
the parameters so a = 0, b = 1, and 0 ≤ c < 1.
it makes more sense to cover third-degree price discrimination before second-degree price
discrimination.
45
46 Lecture Note 4: Third-degree Price Discrimination
Clearly, this exceeds the profit given in (4.1). Of course, we knew this would
be the case without calculating (4.2): linear pricing, because it generates a
deadweight loss and leaves some surplus in consumer hands, cannot yield the
firm as great a profit as perfect discrimination.
The example illustrates that, with heterogenous consumers, the ideal from
the firm’s perspective would be to base its prices on consumers’ types. In most
settings, however, that is infeasible. The question then becomes how closely can
the firm approximate that ideal through its pricing.
Characteristic-based
Discrimination 4.2
When a seller cannot observe consumers’ types, she has two choices. One, she
can essentially ask consumers their type; this, as we will see later, is what
second-degree price discrimination is all about. Two, she can base her prices on
observable characteristics of the consumers, where the observable characteristics
are correlated in some way with the underlying types. This is third-degree price
discrimination.
Examples of third-degree price discrimination are pricing based on observ-
able characteristics such as age, gender, student status, geographic location, or
temporally different markets.2 The idea is that, say, student status is correlated
with willingness to pay; on average, students have a lower willingness to pay for
an event (e.g., a movie) than do working adults.
Formally, consider a seller who can discriminate on the basis of M observable
differences. Let m denote a particular characteristic (e.g., m = 1 is student and
m = 2 = M is other adult). Based on the distribution of types conditional on
m, the firm’s demand from those with characteristic m is Xm (·). Let Pm (·) be
the corresponding inverse demand.
2 Although pricing differently at different times could also be part of second-degree price
discrimination scheme.
4.3 Welfare Considerations 47
• If marginal cost is not constant, then the markets cannot be treated inde-
pendently: how much the seller wishes to sell in one market is dependent
on how much she sells in other markets. In particular, if marginal cost
is not constant and there is a shift in demand in one market, then the
quantity sold in all markets can change.
• Marginal revenue across the M markets is the same at the optimum; that
is, if the seller found herself with one more unit of the good, it wouldn’t
matter in which market (to which group) she sold it.
3 To determine xU , let X (p) = X (p) + X (p) be aggregate demand across the two markets,
m 1 2
and let P(x) = X −1 (p) be aggregate inverse demand. Solve P(x) + xP ′ (x) = c for x (i.e.,
solve for optimal aggregate production assuming one price). Call that solution x∗M . Then
∗
xU
m = Xm P(xM ) .
48 Lecture Note 4: Third-degree Price Discrimination
Likewise,
∗ ∗ ∗
Bm (xU U
m ) < Bm (xm ) + Pm (xm ) · (xm − xm ) . (4.6)
Going from a uniform price across markets to different prices (i.e., to 3rd-degree
price discrimination) changes welfare by
(pU − c)(∆x1 + ∆x2 ) > ∆W > (p∗1 − c)∆x1 + (p∗2 − c)∆x2 . (4.8)
Bibliographic Note
Arbitrage 4.4
We have assumed, so far, in our investigation of price discrimination that arbi-
trage is impossible. That is, for instance, a single buyer can’t pay the entry fee,
then resell his purchases to other buyers, who, thus, escape the entry fee. Simi-
larly, a good purchased in a lower-price market cannot be resold in a higher-price
market.
In real life, however, arbitrage can occur. This can make utilizing nonlinear
pricing difficult; moreover, the possibility of arbitrage helps to explain why
we see nonlinear pricing in some contexts, but not others. For instance, it is
difficult to arbitrage amusement park rides to those who haven’t paid the entry
fee. But is easy to resell supermarket products. Hence, we see two-part tariffs at
amusement parks, but we typically don’t see them at supermarkets.4 Similarly,
senior-citizen discounts to a show are either handled at the door (i.e., at time
of admission), or through the use of color-coded tickets, or through some other
means to discourage seniors from reselling their tickets to their juniors.
If the seller cannot prevent arbitrage, then the separate markets collapse into
one and there is a single uniform price across the markets. The welfare con-
sequences of this are, as shown in the previous section, ambiguous. Aggregate
welfare may either be increased or decreased depending on the circumstances.
The seller, of course, is made worse off by arbitrage—given that she could, but
didn’t, choose a uniform price indicates that a uniform price yields lower profits
than third-degree price discrimination.
If the capacity of the venue, K, is greater than x∗1 +x∗2 , then there is no problem.
As a convention, assume that P2 (x∗2 ) > P1 (x∗1 ) (e.g., group 1 are students and
group 2 are non-students).
Suppose, however, that K < x∗1 + x∗2 . Then a different solution is called
for. It might seem, given a binding capacity constraint, that the seller would
4 Remember, however, that packaging can be a way for supermarkets to use arbitrage-proof
two-part tariffs.
50 Lecture Note 4: Third-degree Price Discrimination
(recall we’re assuming no physical costs that vary with tickets sold) subject to
x1 + x2 ≤ K .
Given that we know the unconstrained problem violates the constraint, the
constraint must bind. Let λ be the Lagrange multiplier on the constraint. The
first-order conditions are, thus,
Observe that the marginal revenue from each group is set equal to λ, the shadow
price of the constraint. Note, too, that the two marginal revenues are equal.
This makes intuitive sense—what is the marginal cost of selling a ticket to a
group-1 customer? It’s the opportunity cost of that ticket, which is the forgone
revenue of selling it to a group-2 customer; that is, the marginal revenue of
selling to a group-2 customer.
Now we can see why the seller might not want to sell only to the high-paying
group. Suppose, by coincidence, that x∗2 = K; that is, the seller could sell out
the event at price P2 (x∗2 ). She wouldn’t, however, do so because
(the equality follows from the definition of x∗2 given that physical marginal cost
is 0). The marginal revenue of the Kth seat, if sold to a group-2 customer, is
clearly less than its marginal (opportunity) cost.
As an example, suppose that P1 (x) = 40 − x and P2 (x) = 100 − x.
Exercise 4.5.1: Suppose there was no capacity constraint. Verify that x∗1 = 20 and
x∗2 = 50 given third-degree price discrimination.
Exercise 4.5.2: Suppose there is a capacity constraint: K = 50. Verify that the
seller could just sell out if she set a uniform price of $50 (i.e., did not discriminate).
To whom would she sell? Verify that her (accounting) profit would be $2500.
The seller could do better than a uniform price of $50. To see this, equate the
marginal revenues:
40 − 2x1 = 100 − 2x2 . (4.9)
4.6 Transportation Costs 51
Observe that the transport cost is equivalent to an excise tax. Hence, this
model also applies to a situation where transportation costs are the same across
markets, but the firm faces different excise taxes (e.g., the different markets
are in separate countries and tn is the duty paid to import a unit of the good
52 Lecture Note 4: Third-degree Price Discrimination
for n = 1, . . . , N . (For convenience, let’s limit attention to the case in which the
firm wishes to operate in all markets.) The first-order conditions imply that
P (xn ) + xn P ′ (xn ) − P (xm ) + xm P ′ (xm ) = tn − tm (4.12)
Exercise 4.6.1: Prove that, under the standard assumptions of linear pricing, mar-
ginal revenue is a decreasing function whenever marginal revenue is non-negative.
Let y and z be vectors in Rn . We define the join and meet of the two vectors
as
max{y1 , z1 } , . . . , max{yn , zn } (join)
min{y1 , z1 } , . . . , min{yn , zn } , (meet)
A function f : Rn → R is supermodular if
for all y and z in the domain of f (·).6 If the inequality in (4.13) is strict for
y 6= y ∨ z and y 6= y ∧ z, then the function is strictly supermodular.
From an economic perspective, we can view supermodularity as a statement
about the complementarity √ of the inputs to f (·). Recall, for example, that
the production function KL exhibits complementarity between capital, K,
and labor, L, insofar as the marginal product of either input is greater the
greater is the other input. Observe this production function is supermodular:
let KM ≥ Km and LM ≥ Lm . Clearly, the vectors (Km , Lm ) and (KM , LM )
satisfy (4.13), so we need only check the vectors (Km , LM ) and (KM , Lm ). To
that end, observe:
p p p p p p
KM LM − Lm ≥ Km LM − Lm
p p p p
⇒ KM LM − KM Lm ≥ Km LM − Km Lm
p p p p
⇒ KM L M + Km L m ≥ KM L m + Km L M .
Lemma 4.2 If f (·) and g(·) are supermodular functions, then so too is αf (·) + βg(·),
α > 0 and β > 0.
Lemma 4.3 Let the domain of f (·) be [y 1 , ȳ1 ]×[y 2 , ȳ2 ], where [y i , ȳi ] ⊂ R for i = 1 , 2.
Suppose that f (y1 , y2 ) = g(y1 − y2 ), where g : R → R is a concave function. Then f (·)
is supermodular.
6 Note we also require y ∨ z and y ∧ z to be in the domain of f (·) for all y and z in the
domain of f (·). This is described by saying that the domain of f (·) is a lattice.
7 Recall y ≥ y′ if and only if yi ≥ yi′ for all i.
54 Lecture Note 4: Third-degree Price Discrimination
A related lemma is
Lemma 4.4 Maintain the assumptions of Topkis’s Monotonicity Theorem, ex-
cept assume f (·, z) is strictly supermodular for any given z. In addition, suppose
argmaxy f (y, z) has at least two distinct elements and let y∗ and y∗∗ denote two
such elements. Then y∗ ≥ y∗∗ or y∗∗ ≥ y∗ .
8 The statement of the theorem follows Milgrom and Roberts (1990), who appear to be the
ones to have named it Topkis’s Monotonicity Theorem. In Topkis’s original work (Topkis,
1978), it is (essentially) Theorem 6.1.
4.6 Transportation Costs 55
Proof: Suppose the claim is false. Then, by the definition of strict supermod-
ularity, we have
But Topkis’s Monotonity Theorem (with z = z ′ ) tells us that the four terms
in that expression all equal the same thing, which yields a contradiction. The
result follows reductio ad absurdum.
y ∨ y′ = (y1 , . . . , ym , ym+1
′
, . . . , yn′ ) and
y ∧ y′ = (y1′ , . . . , ym
′
, ym+1 , . . . , yn ) .
Because the vectors cannot be ordered, 1 < m < n. Define wi,j by the formula
′
yk , if k ≤ m − i or k ≥ n + 1 − j
wki,j = . (4.20)
yk , otherwise
9 Again Milgrom and Roberts (1990) appear to be the ones who named the theorem.
10 The following analysis draws heavily from Topkis (1978).
56 Lecture Note 4: Third-degree Price Discrimination
w0,0 = y ∧ y′
wm,n−m = y ∨ y′
wm,0 = y
w0,n−m = y′
The chain of inequalities in (4.22), considering only the first and last parts,
implies
f (y ∨ y′ ) − f (y′ ) ≥ f (y) − f (y ∧ y′ ) ,
which in turn proves supermodularity (i.e., implies (4.13)).
erywhere on its domain. Then f (·) exhibits increasing differences on any two
dimensions—that is, exhibits the property that, for any i and j, i 6= j, yi > yi′ ,
yj > yj′ , and yk = yk′ , k 6= i and k 6= j,
f (y) − f yi′ , y−i ≥ f yj , y−j
′
− f (y′ ) . (4.23)
Proof: Consider the “only if” part. Suppressing the other arguments for
notational convenience, we have
f (yi , yj ) − f (yi′ , yj ) ≥ f (yi , yj′ ) − f (yi′ , yj′ ) .
Dividing both sides by yi − yi′ and taking the limit as yi′ → yi , we have
Hence,
Z
yi
∂f (y, yj ) ∂f (y, yj′ )
0≤ − dy
yi′ ∂yi ∂yi
= f (yi , yj ) − f (yi , yj′ ) − f (yi′ , yj ) − f (yi′ , yj′ ) .
Straightforward algebra on the last expression yields (4.23).
Let’s return to the question that motivated the development of these additional
tools for comparative statics, which was what happens if the transportation cost
to one market increases. Given that the indices of the markets are arbitrary,
there is no loss of generality in assuming that it is the transportation cost to
market 1 that increases.
Given the toolkit just assembled, we would like the profit function, expres-
sion (4.10) on page 51, to be supermodular in the units to be sold (i.e., in
x) and exhibit increasing differences with respect to the transportation costs
(specifically, with respect to t1 ). We can quickly see that it fails to do so. The
cross-partial derivative between any xi and xj is
N
!
X
′′
−c xn < 0
n=1
subject to
N
X
xn = X .
n=2
We can think of the firm solving its profit-maximization problem in two steps.
First, it asks itself: if it were constrained to capacity X for markets 2 , . . . , N ,
what would be the maximum profit it could achieve (that is the amount R(X))?
It is readily seen that R(·) is an increasing function (at least it will be for the
relevant domain of X).11 Second, the firm asks itself what values of y1 and X
does it want given it wishes to maximize
R(X) + t1 − P (−y1 ) y1 − c(X − y1 ) . (4.25)
Exercise 4.6.5: Verify that (4.25) is supermodular in y1 and X and exhibits increas-
ing differences in t1 for both y1 and X (note we still have increasing differences even
if (4.14) is an equality).
Consider two values for the transportation cost to market 1: t1 > t′1 . Given the
usual assumptions, maximizing (4.25) with respect to X and y1 has a unique
solution. Hence, if we write the solution as X ∗ (t), y1∗ (t) , it follows from Corol-
lary 4.1 that
X ∗ (t1 ), y1∗ (t1 ) ≥ X ∗ (t′1 ), y1∗ (t′1 ) . (4.26)
Recalling that y1 = −x1 , we have shown that an increase in the transportation
cost to market 1 causes the firm to sell no more units in market 1 and no fewer
units in the other markets.
11 These assumptions also guarantee it is differentiable by the Implicit Function Theorem.
4.6 Transportation Costs 59
In fact, we can get a more precise prediction. Suppose that the solution
given t1 and t′1 were the same. The relevant first-order conditions with respect
to y1 would, then, imply
But that cannot hold given t1 > t′1 . So y1∗ (t1 ) 6= y1∗ (t′1 ). It follows from (4.26)
that y1∗ (t1 ) > y1∗ (t′1 ). We can now shown that X ∗ (t1 ) > X ∗ (t′1 ). If that weren’t
the case, then we would have
R′ (X ∗ ) − c′ X ∗ − y1∗ (t1 ) = 0 = R′ (X ∗ ) − c′ X ∗ − y1∗ (t′1 ) .
But that cannot hold given that c′ X ∗ − y1∗ (t1 ) < c′ X ∗ − y1∗ (t′1 ) . It must
thus be that X ∗ (t1 ) > X ∗ (t′1 ). To summarize our analysis to this point:
Lemma 4.7 Maintain the standard assumptions. An increase in transportation
cost to one market, results in fewer units being sold in that market and more
units in aggregate being sold in the other markets.
What can we say about sales in any given market n, n > 1, given an increase
in t1 ? We know sales in at least one of these markets must increase given X has
increased. Suppose there were a market i, i > 1, that saw no increase. Let j,
j > 1, be a market that saw an increase. Write x∗n (t) for the profit-maximizing
sales in market n when transportation cost to market 1 is t. We then have the
following chain of inequalities and equalities:
P x∗i (t1 ) − ti + x∗i (t1 )P ′ x∗i (t1 ) ≥ P x∗i (t′1 ) − ti + x∗i (t′1 )P ′ x∗i (t′1 )
= P x∗j (t′1 ) − tj + x∗j (t′1 )P ′ x∗j (t′1 ) > P x∗j (t1 ) − tj + x∗j (t1 )P ′ x∗j (t1 ) ,
where the first inequality follows because we assumed market i’s sales did not
increase and marginal revenue is a decreasing function; the equality follows from
(4.12); and the last inequality follows because market j has seen an increase in
sales and marginal revenue is a decreasing function. But the first expression
and the last expression in that chain must be equal in light of (4.12). By
contradiction, it cannot be that sales don’t increase in market i. To conclude:
Proposition 4.2 Maintain the standard assumptions. An increase in trans-
portation cost to one market, results in fewer units being sold in that market
and more units being sold in each of the other markets.
Proposition 4.2 sheds light on an important issue in international trade.
Consider a firm that produces domestically only but sells both domestically and
abroad. For convenience, let’s treat “abroad” as a single market. Suppose the
domestic government imposed an export duty on the firm’s product (i.e., taxed
units being shipped abroad). Recall that an excise tax is like a transportation
cost, so the effect of this action is to raise the “transportation cost” to the
abroad market. Who wins and loses from this? The government wins—it gets
additional revenue. Domestic consumers win because the firm will sell more
60 Lecture Note 4: Third-degree Price Discrimination
domestically, which drives down the price they pay. Foreign consumers lose via
the same logic, but run in reverse. Finally, the firm loses. In a democracy,
there is an obvious temptation for a government to impose an export duty:
Lots of voters benefit (domestic consumers) and few voters lose (assuming the
shareholders and stakeholders of the firm are not especially numerous), plus the
government obtains funds without resorting to domestic taxation.
This reasoning helps to explain why the us Constitution contains a prohibi-
tion on export duties. In the negotiations over the Constitution, representatives
of the Southern states, which had both an export-driven economy and, relative
to the North, little population, feared that future national legislatures would,
catering to the majority of the voters, impose export duties (here, the South is
like the firm). Hence, they insisted on a prohibition on export duties.
Second-degree Price
Discrimination 5
In many contexts, a seller knows that different types or groups of consumers
have different demand, but she can’t readily identify from which group any
given buyer comes. For example, it is known that business travelers are willing
to pay more for most flights than are tourists. But it is impossible to know
whether a given flier is a business traveler or a tourist.
A well-known solution is to offer different kinds of tickets. For instance,
because business travelers don’t wish to stay over the weekend or often can’t
book much in advance, the airlines charge more for round-trip tickets that don’t
involve a Saturday-night stayover or that are purchased within a few days of
the flight (i.e., in the latter situation, there is a discount for advance purchase).
Observe an airline still can’t observe which type of traveler is which, but by
offering different kinds of service it hopes to induce revelation of which type
is which. When a firm induces different types to reveal their types for the
purpose of differential pricing, we say the firm is engaged in second-degree price
discrimination.
Restricted tickets are one example of price discrimination. They are an
example of second-degree price discrimination via quality distortions. Other
examples include:
• Different classes of service (e.g., first and second-class carriages on trains).
The classic example here is the French railroads in the 19th century, which
removed the roofs from second-class carriages to create third-class car-
riages.
• Hobbling a product. This is popular in high-tech, where, for instance, Intel
produced two versions of a chip by “brain-damaging” the state-of-the-art
chip. Another example is software, where “regular” and “pro” versions
(or “home” and “office” versions) of the same product are often sold.
• Restrictions. Saturday-night stayovers and advance-ticketing requirements
are a classic example. Another example is limited versus full memberships
at health clubs.
The other common form of second-degree price discrimination is via quantity
discounts. This is why, for instance, the liter bottle of soda is typically less
than twice as expensive as the half-liter bottle. Quantity discounts can often
be operationalized through multi-part tariffs, so many multi-part tariffs are
examples of price discrimination via quantity discounts (e.g., choices in calling
plans between say a low monthly fee, few “free” minutes, and a high per-minute
61
62 Lecture Note 5: Second-degree Price Discrimination
charge thereafter versus a high monthly fee, more “free” minutes, and a lower
per-minute charge thereafter).
Analysis 5.1
Consider two consumer types, 1 and 2, indexed by θ. Assume the two types
occur equally in the population. Assume that each consumer has quasi-linear
utility
v(x, θ) − T ,
where x is either consumption of a good or the quality of the single unit of the
good he consumes (in the latter case, treat x = 0 as not receiving the good
at all) and T is the payment (transfer) from the consumer to the seller of that
good. Assume the following order condition on marginal utility
∂ ∂v(x, θ)
> 0. (5.1)
∂θ ∂x
1 1
max (T1 − cx1 ) + (T2 − cx2 ) (5.3)
{x1 ,x2 ,T1 ,T2 } 2 2
5.1 Analysis 63
• Observe that (5.10) and (5.11) can be combined so that I(x2 ) ≥ U2 −U1 ≥
I(x1 ). Ignoring the middle term for the moment, the fact that I(·) is
increasing means that x2 ≥ x1 . Moreover, if x1 > 0, then U2 − U1 ≥
I(x1 ) > 0. Hence U2 > 0, which means (5.9) is slack.
• We’ve established that, if x1 > 0, then (5.8) binds, (5.9) is slack, and at
least one of (5.10) and (5.11) binds. Observe that we can rewrite the ic
constraints as I(x2 ) ≥ U2 ≥ I(x1 ). The seller’s profit is greater the smaller
is U2 , so it is the lower bound in this last expression that is important.
That is, (5.11) binds. Given that I(x2 ) ≥ I(x1 ), as established above,
we’re free to ignore (5.10).
So our reasoning tells us that, provided x1 > 0, we need only pay attention
to two constraints, (5.8) and (5.11). Using them to solve for U1 and U2 , we can
turn the seller’s problem into the following unconstrained problem:
1 1
max (v(x1 , 1) − cx1 ) + (v(x2 , 2) − I(x1 ) − cx2 ) . (5.13)
{x1 ,x2 } 2 2
∂v(x∗1 , 1)
− I ′ (x∗1 ) − c = 0 (5.14)
∂x
∂v(x∗2 , 2)
− c = 0. (5.15)
∂x
Note that (5.15) is the condition for maximizing welfare were the seller sell-
ing only to type-2 customers; that is, we have efficiency in the type-2 “mar-
ket.” Because, however, I ′ (·) > 0, we don’t have the same efficiency vis-à-vis
type-1 customers; in the type-1 “market,” we see too little output relative to
welfare-maximizing amount. This is a standard result—efficiency at the top and
distortion at the bottom.
To make this more concrete, suppose v(x, θ) = 5(θ + 1) ln(x + 1) and c = 1.
Then x∗2 = 14 and x∗1 = 4. Consequently, T1 ≈ 16.1 and T2 = v(x∗2 , 2) − I(x∗1 ) ≈
32.6. Under the interpretation that x is quantity, observe that a type-2 consumer
purchases more than three times as much, but pays only roughly twice as much
as compared to a type-1 consumer—this is quantity discounts in action!
5.2 A Graphical Approach to Quantity Discounts 65
$/unit
c
A B
ds(p) df (p)
units
qs qf
Figure 5.1: The individual demands of the two types of consumers (family and
single), df (·) and ds (·), respectively, are shown. Under the ideal
third-degree price discrimination scheme, a single would buy a
package with qs units and pay an amount equal to area A (gray
area). A family would buy a package with qf units and pay an
amount equal to the sum of all the shaded areas (A, B, and G).
A Graphical Approach to
Quantity Discounts 5.2
Now we consider an alternative, but ultimately equivalent, analysis of quantity
discounts.
Consider a firm that produces some product. Continue to assume the
marginal cost of production is constant, c. Suppose the population of potential
buyers is divided into families (indexed by f ) and single people (indexed by s).
Let df (·) denote the demand of an individual family and let ds (·) denote the
demand of an individual single. Figure 5.1 shows the two demands. Note that,
at any price, a family’s demand exceeds a single’s demand.
The ideal would be if the firm could engage in third -degree price discrimi-
nation by offering two different two-part tariffs to the two populations. That
is, if the firm could freely identify singles from families, it would sell to each
member of each group the quantity that equated that member’s relevant inverse
demand to cost (i.e., qs or qf in Figure 5.1 for a single or a family, respectively).
It could make the per-unit charge c and the entry fee the respective consumer
surpluses. Equivalently—and more practically—the firm could use packaging.
The package for singles would have qs units and sell for a single’s total benefit,
bs (qs ). This is the area labeled A in Figure 5.1. Similarly, the family package
would have qf units and sell for a family’s total benefit of bf (qf ). This is the
sum of the three labeled areas in Figure 5.1.
The ideal is not, however, achievable. The firm cannot freely distinguish
66 Lecture Note 5: Second-degree Price Discrimination
singles from families. It must induce revelation; that is, it must devise a second -
degree scheme. Observe that the third-degree scheme won’t work as a second-
degree scheme. Although a single would still purchase a package of qs units
at bs (qs ), a family would not purchase a package of qf units at bf (qf ). Why?
Well, were the family to purchase the latter package it would, by design, earn
no consumer surplus. Suppose, instead, it purchased the package intended for
singles. Its total benefit from doing so is the sum of areas A and G in Figure 5.1.
It pays bs (qs ), which is just area A, so it would enjoy a surplus equal to area
G. In other words, the family would deviate from the intended package, with
qf units, which yields it no surplus, to the unintended package, with qs units,
which yields it a positive surplus equal to area G.
Observe that the firm could induce revelation—that is, get the family to buy
the intended package—if it cut the price of the qf -unit package. Specifically, if
it reduced the price to the sum of areas A and B, then a family would enjoy a
surplus equal to area G whether it purchased the qs -unit package (at price = area
A) or it purchased the intended qf -unit package (at price = area A + area B).
Area G is a family’s information rent.
Although that scheme induces revelation, it is not necessarily the profit-
maximizing scheme. To see why, consider Figure 5.2. Suppose that the firm
reduced the size of the package intended for singles. Specifically, suppose it
reduced it to q̂s units, where q̂s = qs − h. Given that it has shrunk the package,
it would need to reduce the price it charges for it. The benefit that a single would
derive from q̂s units is the area beneath its inverse demand curve between 0 and
q̂s units; that is, the area labeled A′ . Note that the firm is forgoing revenues
equal to area J by doing this. But the surplus that a family could get by
purchasing a q̂s -unit package is also smaller; it is now the area labeled G′ . This
means that the firm could raise the price of the qf -unit package by the area
labeled H. Regardless of which package it purchases, a family can only keep
surplus equal to area G′ . In other words, by reducing the quantity sold to the
“low type” (a single), the firm reduces the information rent captured by the
“high type” (a family).
Is it worthwhile for the firm to trade area J for area H? Observe that the
profit represented by area J is rather modest: While selling the additional h
units to a single adds area J in revenue it also adds ch in cost. As drawn, the
profit from the additional h units is the small triangle at the top of area J. In
contrast, area H represents pure profit—regardless of how many it intends to
sell to singles, the firm is selling qf units to each family (i.e., cqf is a sunk
expenditure with respect to how many units to sell each single). So, as drawn,
this looks like a very worthwhile trade for the firm to make.
One caveat, however: The figure only compares a single family against a
single single. What if there were lots of singles relative to families? Observe
that the total net loss of reducing the package intended for singles by h is
(area J − ch) × Ns ,
where Ns is the number of singles in the population. The gain from reducing
5.2 A Graphical Approach to Quantity Discounts 67
$/unit
G'
H
c
A' J B
ds(p) df (p)
units
q^s q^s+h qf
Figure 5.2: By reducing the quantity in the package intended for singles, the
firm loses revenue equal to area J, but gains revenue equal to area
H.
that package is
area H × Nf ,
where Nf is the number of families. If Ns is much larger than Nf , then this
reduction in package size is not worthwhile. On the other hand if the two
populations are roughly equal in size or Nf is larger, then reducing the package
for singles by more than h could be optimal.
How do we determine the amount by which to reduce the package intended
for singles (i.e., the smaller package)? That is, how do we figure out what h
should be? As usual, the answer is that we fall back on our M R = M C rule.
Consider a small expansion of the smaller package from q̂s . Because we are using
an implicit two-part tariff (packaging) on the singles, the change in revenue—
that is, marginal revenue—is the change in a single’s benefit (i.e., mbs (q̂s )) times
the number of singles. That is,
Recall that the marginal benefit schedule is inverse demand. So if we let ρs (·)
denote the inverse individual demand of a single (i.e., ρs (·) = d−1
s (·)), then we
can write
MR(q̂s ) = Ns ρs (q̂s ) . (5.16)
What about MC ? Well, if we increase the amount in the smaller package we
incur costs from two sources. First, each additional unit raises production costs
by c. Second, we increase each family’s information rent (i.e., area H shrinks).
Observe that area H is the area between the two demand curves (thus, between
the two inverse demand curves) between q̂s and q̂s + h. This means that the
68 Lecture Note 5: Second-degree Price Discrimination
4. On the other hand, it will often be the case that the profit-maximizing
q̂s is positive, in which case it will be determined by equating expressions
(5.16) and (5.17).
5.2 A Graphical Approach to Quantity Discounts 69
Extended Example
hence,
qs
ρs (qs ) = 45 − and
10
qf
ρf (qf ) = 65 − .
10
Using expression (5.16), marginal revenue from qs is, therefore,
qs
M R(qs ) = Ns ρs (qs ) = Ns × 45 − .
10
Marginal cost of qs (including forgone surplus extraction from the f type) is
M C(qs ) = Ns c + Nf ρf (qs ) − ρs (qs )
qf qf
= 5Ns + 1, 000, 000 65 − − 45 +
10 10
= 5Ns + 20, 000, 000 .
70 Lecture Note 5: Second-degree Price Discrimination
that is, if
So, if Ns < 500, 000, then qs∗ = 0 and the price for 600 minutes (i.e., qf∗ ) is
bf (600), which is
cents or $210.
Suppose that Ns ≥ 500, 000. Then, equating M R and M C, we have
qs
Ns × 45 − = 5Ns + 20, 000, 000 ;
10
hence,
200, 000, 000
qs∗ = 400 − .
Ns
The low type retains no surplus, so the price for qs∗ minutes is bs (qs∗ ), which
equals the area under ρs (·) from 0 to qs∗ . This can be shown (see derivation of
bf (600) above) to be
firm sells a package with 600 minutes, purchased by the f types, for 210 − qs∗ /5
dollars; and it also sells a package with qs∗ minutes for a price of bs (qs∗ ) dollars.
For example, if Ns = 5, 000, 000, then the two plans are (i) 600 minutes for
$138; and (ii) 360 minutes for $97.20.
72 Lecture Note 5: Second-degree Price Discrimination
Mechanism Design
73
Purpose
Our purpose is to consider the problem of hidden information; that is, a game
between two economic actors, one of whom possesses mutually relevant informa-
tion that the other does not. This is a common situation: The classic example—
covered in the previous part of these lecture notes—being the “game” between
a monopolist, who doesn’t know the consumer’s willingness to pay, and the
consumer, who obviously does. Within the realm of contract theory, relevant
situations include a seller who is better informed than a buyer about the cost
of producing a specific good; an employee who alone knows the difficulty of
completing a task for his employer; a divisional manager who can conceal infor-
mation about his division’s investment opportunities from headquarters; and a
leader with better information than her followers about the value of pursuing
a given course of action. In each of these situations, having private informa-
tion gives the player possessing it a potential strategic advantage in his dealings
with the other player. For example, consider a seller who has better information
about his costs than his buyer. By behaving as if he had high costs, the seller
can seek to induce the buyer to pay him more than she would if she knew he
had low costs. That is, he has an incentive to use his superior knowledge to
capture an “information rent.” Of course, the buyer is aware of this possibility;
so, if she has the right to propose the contract between them, she will propose
a contract that works to reduce this information rent. Indeed, how the con-
tract proposer—the principal—designs contracts to mitigate the informational
disadvantage she faces will be a major focus of this part of the lecture notes.
Bibliographic Note
This part of the lecture notes draws heavily from a set of notes that I co-authored
with Bernard Caillaud.
Not surprisingly, given the many applications of the screening model, this
coverage cannot hope to be fully original. The books by Laffont and Tirole
(1993) and Salanié (1997) include similar chapters. Surveys have also appeared
in journals (e.g., Caillaud et al., 1988). Indeed, while there are idiosyncratic
aspects to the approach pursued here, the treatment is quite standard.
75
76 Purpose
The Basics of
Contractual
Screening
To begin, the problem of interest is broadly as follows:
6
• Two players are involved in a strategic relationship; that is, each player’s
wellbeing depends on the play of the other player. In particular, the
players must contract with each other to achieve some desired outcome.
• One player is better informed than the other; that is, he has private in-
formation about some state of nature relevant to the relationship. This
player is the informed player. In situations of contractual screening, he is
often called the agent. Consistent with the literature, call the informed
player’s (the agent’s) information his type. The player without the infor-
mation is the uninformed player. In situations of contractual screening,
she is often called the principal.
• Critical to the analysis is the bargaining game between the players. In
contractual screening, it is assumed the principal (the uninformed player)
has all the bargaining power; that is, she makes a take-it-or-leave-it (tioli)
offer of a contract to the agent (the informed player). The agent either
accepts, in which case the contract is binding on both parties; or he rejects,
in which case the game is over and the players receive their default (no-
trade) payoffs. It is the assumption that the uninformed player makes the
tioli offer that makes this a screening model. Were, instead, the informed
player the contract proposer, we would have a signaling model.
• In this context, a contract can be seen as setting the rules of a secondary
game to be played by the principal and the agent.
The asymmetry of information in this game is assumed to arise exogenously.
It could, for instance, reflect the agent’s superior experience or expertise, which
provides him the payoff-relevant information. For example, past jobs may tell a
contractor how efficient he is—and thus what his costs will be—while ignorance
of these past jobs means the entity hiring him (e.g., a firm in need of his services
or a home owner seeking to remodel) has a less precise estimate of what his costs
will be.
Note, critically, that the informed player’s (agent’s) information is assumed
superior to the uninformed player’s (principal’s); that is, the analysis excludes
situations in which each player has his or her own private information.1
1 Put formally, the uninformed player’s information partition is coarser than the informed
77
78 Lecture Note 6: The Basics of Contractual Screening
The Two-Type
Screening Model 7
To begin formalizing the previous lecture note’s ideas, let’s start with as simple a
model as possible: the two-type model. That is, the agent’s private information
can take one of only two possible values.
Before proceeding, it is important to emphasize that the convenience of as-
suming only two types is not without cost. Beyond an obvious loss of generality,
the two-type model is “treacherous,” insofar as it may suggest conclusions that
seem general, but which are not. For example, the conclusion that we will shortly
reach with this model that the optimal contract implies distinct outcomes for
distinct states of nature—a result called separation—is not as general as it may
seem. Moreover, the assumption of two types conceals, in essence, a variety of
assumptions that must be made clear. It similarly conceals the richness of the
screening problem in complex, more realistic, relationships. Few prescriptions
and predictions should be reached from considering just a two-type model.
79
80 Lecture Note 7: The Two-Type Screening Model
b(x) − Cθ (x)
xfi
θ = argmax b(x) − Cθ (x) . (7.1)
x≥0
The Pareto optimal solution, also referred to as the ex post efficient solution, is
then given by
fi
b′ (xfi ′ fi ′ fi ′ fi
E ) = CE (xE ) and b (xI ) − CI (xI ) xI = 0
(where only the larger non-negative root of the second equation is relevant).1 As
always, the optimal amount to trade rises as marginal cost falls; hence, xfi fi
I < xE .
As a benchmark , consider the case of symmetric or full information: suppose,
contrary to the situation of interest, the principal knew the agent’s type. Given
she has the bargaining power, the principal would offer the contract hx, ti that
maximized her payoff subject to the constraint that the agent accept; that is,
her problem is
max b(x) − t
x,t
subject to
t − Cθ (x) ≥ 0 (7.2)
(the zero on the rhs of this last expression—the agent’s reservation utility—is,
recall, what he gets if he rejects). As her payoff increases as t decreases, the
1 Assumptions made earlier ensure these first-order conditions are also sufficient.
7.2 Contracts under Incomplete Information 81
principal wishes to make t as small as possible; hence, the constraint must bind.
We can thus write her program as
Comparing that expression to (7.1), it is clear the principal will choose allocation
xfi
θ . Using the binding participation constraint, expression (7.2), it follows that
the transfer, tfi
θ , will be given by
tfi fi
θ = Cθ (xθ ) .
(recall tfi fi
I = CI (xI )). In other words, if the full-information contracts are
offered, an efficient agent does better pretending to be inefficient. The principal
would, therefore, need to be rather naı̈ve if she expected an efficient agent to
choose hxfi fi fi fi
E , tE i when he has the option of choosing hxI , tI i.
fi
The efficient agent’s gain from deception, CI (xI ) − CE (xfi I ), is called an
information rent. This is a loss to the principal, but a gain to the agent. There
is, however, an additional loss suffered by the principal that is not recaptured
by the agent: the agent’s deception means inefficiently little is produced; that
is, a real deadweight loss of
b(xfi fi fi fi
E ) − CE (xE ) − b(xI ) − CE (xI )
is suffered.
Given this analysis, it is clear the principal should not expect the agent
to reveal his type freely. What should the principal do, instead? That is,
what kind of contracts will she offer? Because the principal does not know the
agent’s type, she may want to delegate the choice of allocation to the agent
under a payment schedule that implicitly rewards the agent for not acting as
82 Lecture Note 7: The Two-Type Screening Model
though he was inefficient when he is truly efficient. This payment schedule, τ (·),
specifies what payment, t = τ (x), is to be paid the supplier should it choose to
supply x units. Wilson (1993) provides evidence that such payment schedules
are common in real-world contracting.
If the agent accepts such a contract, the agent’s allocation choice, xθ , is
given by
xθ ∈ argmax τ (x) − Cθ (x) . (7.3)
x≥0
Assume this program has a solution and let uθ denote the value of this maxi-
mization program. By definition,
uθ = τ (xθ ) − Cθ (xθ ) .
tθ = uθ + Cθ (xθ ) .
x 7→ CI (x) − CE (x)
(i.e., R(x) = CI (x)−CE (x)). The function R(·) is the information-rent function.
Previously made assumptions imply that R(0) = 0, R(x) > 0 if x > 0, and R(·)
is strictly increasing.
Exercise 7.2.1: Prove that R(0) = 0, R(x) > 0 if x > 0, and R(·) is strictly increasing.
expressions (7.4) and (7.7) implies tE > tI (unless xE = xI , in which case (7.4)
and (7.5) imply tE = tI ).
An additional requirement is that the agent be willing to accept the contract
proposed by the principal. This means
uI ≥ 0 ; and (7.8)
uE ≥ 0 . (7.9)
or, equivalently,
f × b(xI ) − CI (xI ) − uI + (1 − f ) × b(xE ) − CE (xE ) − uE .
given 0 < xI < xE .2 Because Cθ (·) is increasing in x, the agent would never
choose an x other than 0, xI , or xE (his marginal income is zero for any x
other than those three). The ir constraints ensure that (xθ , tθ ) is (weakly)
preferable to (0, 0). The ic constraints ensure that a type-θ agent prefers (xθ , tθ )
to (x¬θ , t¬θ ), where ¬θ is the type other than θ. That is, we’ve shown that faced
with this schedule, the type-I agent’s solution to (7.3) is xI —as required—and
the type-E agent’s solution is xE —as required.
2 If xI = 0, then tI = 0. If xI = xE , then tI = tE .
84 Lecture Note 7: The Two-Type Screening Model
subject to (7.4), (7.5), (7.8), and (7.9). Solving this problem using the standard
Lagrangean method is straightforward, albeit tedious. Because, however, such
a mechanical method provides little intuition, we pursue a different, though
equivalent, line of reasoning.
• One can check that ignoring the ic constraints (treating them as not bind-
ing) leads us back to the full-information solution. But, as shown at the
beginning of this section, that solution violates the ic constraint of the
efficient type. Conclusion: at least one of the ic constraints must bind.
• The ic constraint when the agent is efficient implies that: uE ≥ R(xI ) +
uI ≥ uI . Therefore, if an inefficient agent is willing to accept the contract,
so too must an efficient agent. Conclusion: constraint (7.9) is slack and
can be ignored.
• It is, however, the case that (7.8) must bind at the optimum. To see this,
suppose not: the principal could, then, lower both utility terms uI and uE
by some ε > 0 without violating the participation constraints. Moreover,
given the two utilities have been changed by the same amount, this can’t
affect the ic constraints. But, from (7.10), lowering the utilities raises the
principal’s expected payoff—which means our “optimum” wasn’t optimal.
• Using the fact that (7.8) is binding, expression (7.6)—the pair of incentive-
compatibility constraints—reduces to
R(xI ) ≤ uE ≤ R(xE ) .
For any target pair (xI , xE ), the principal wants the efficient agent’s in-
formation rent to be as small as possible. It follows, therefore, that
uE = R(xI ). The ic constraint (7.4) is, thus, slack, provided the nec-
essary monotonicity condition (7.7) holds.
max f × b(xI ) − CI (xI ) + (1 − f ) × b(xE ) − CE (xE ) − R(xI ) .
{(xI ,xE )| xI ≤xE }
The solution is
xE = xfi
E = argmax b(x) − CE (x) and (7.11)
x≥0
1−f
xI = x∗I (f ) ≡ argmax b(x) − CI (x) − R(x) . (7.12)
x≥0 f
7.2 Contracts under Incomplete Information 85
The only step left is to verify that the monotonicity condition (7.7) is satisfied
for these values. If we consider the last two terms in the maximand of (7.12)
to be cost, we see that the effective marginal cost of output from the inefficient
type is
1−f ′
CI′ (x) + R (x) > CI′ (x) > CE ′
(x)
f
for x > 0.3 The greater the marginal-cost schedule given a fixed marginal-
revenue schedule, the less is traded; that is, it must be that x∗I (f ) < xfi
E —the
monotonicity condition (7.7) is satisfied.
It is worth summarizing the nature and properties of the optimal price sched-
ule for the principal to propose:
• allocation if the agent is relatively inefficient, x∗I (f ), is less than the full-
∗
information efficient allocation, xfi fi
I ; that is, xI (f ) < xI .
In addition:
To verify the last point, observe, from the principal’s perspective, it as if the
inefficient agent has a marginal cost of
1−f ′
CI′ (x) + R (x) .
f
That “effective” marginal cost is falling in f . By the usual comparative stat-
ics, it follows that x∗I (·) is non-decreasing. Because R(·) is an increasing func-
tion, R x∗I (·) must be similarly non-decreasing. Observe too that this effective
marginal cost is actual marginal cost if f = 1: if the principal were certain the
3 Because xfi
E > 0, this is the relevant domain of output to consider.
86 Lecture Note 7: The Two-Type Screening Model
agent was inefficient, then she would stipulate the welfare-maximizing allocation
(i.e., x∗I (1) = xfi
I ). Conversely, as f ↓ 0, this effective marginal cost tends to
+∞ for x > 0. Given her marginal benefit is bounded, this means there must
be some f ∈ (0, 1) such that, if f ≤ f , the principal stipulates a zero allocation
for the inefficient agent. In the parlance of mechanism design, she shuts down
or shuts out that type of agent when f ≤ f (i.e., x∗I (f ) = 0 for f ≤ f ).
Intuition for Proposition 7.1 can be gained from Figure 7.1. This figure
shows one indifference curve for an inefficient (type-I) agent and three for an
efficient (type-E) agent in output-payment space. The type-I indifference curve
is that type’s zero-profit curve (hence, by necessity, it passes through the origin).
It corresponds to the ir constraint for that type. Similarly, the lowest type-E
indifference curve is that type’s zero-profit curve. Suppose that, under full
information, points a and b would be the contracts offered. Under asymmetric
information, however, contract b is not incentive compatible for type E: were
that type to pretend to be type I (i.e., choose contract a), then he would be on
a higher (more profitable) indifference curve (the highest of its three curves).
Under asymmetric information, an incentive compatible pair of contracts
that induce the full-information allocations are a and c.
Exercise 7.2.2: Explain why, given the assumptions of this model, we known c lies
directly above b.
The problem with this “solution,” however, is that type E earns a large in-
formation rent, equal to the distance between points b and c. The principal
can reduce this rent by distorting downward the quantity asked from a type-I
agent. For example, by lowering the allocation to x∗I (f ), the principal signifi-
cantly reduces the information rent (it’s now the distance between points b and
e). How much distortion the principal will impose depends on the likelihood
of the two types. When f is small, the expected savings in information rent
is large, while the expected cost of too-little allocation is small, so the down-
ward distortion in type I’s allocation is big. Conversely, when f is large, the
expected savings on rent are small and the expected cost of misallocation is
large, so the downward distortion is small. The exact location of point d is
determined by finding where the expected marginal cost of distorting type I’s
output, f × b′ (xI ) − CI′ (xI ) , just equals the expected marginal reduction in
type E’s information rent, (1 − f ) × R′ (xI ).
Because the first best is not achieved, it is natural to ask: could the principal
do better than the solution described in Proposition 7.1 were she to use some
more sophisticated mechanism? The answer is no and the proof is, as we will
see later, quite general. Whatever sophisticated mechanism the principal uses,
note that it must boil down to a pair of points, (xI , tI ) and (xE , tE ), once exe-
cuted; that is, an allocation and a transfer for each possible type. Consequently,
whatever complicated play an alternative mechanism induces, both parties can
see through it; that is, forecast that the equilibrium outcomes correspond to
7.2 Contracts under Incomplete Information 87
}
Efficient
Inefficient
agent’s
{z
agent’s c indifference
indifference
curves
curve
e
|
b
a
fi
x
x∗I (f ) xI xfi
E
Figure 7.1: The full-information contracts, points a and b, are not incentive
FI
compatible. The principal finds the full-information allocations, xE
F
and xI , too costly due to the information rent (distance from point
b to point c). To reduce the rent, the principal distorts a type-I
agent’s allocation (from xIFI to x∗I (f )) which reduces the rent (by
the distance between points c and e).
88 Lecture Note 7: The Two-Type Screening Model
these two points. Hence, the final outcome can always be generated by a simple
(non-linear) payment schedule like the one derived above. We’ve, thus, estab-
lished that the outcome described in Proposition 7.1 cannot be improved on by
using more sophisticated or alternative mechanisms.
Finally, note that we don’t need an entire payment schedule, τ (·). In par-
ticular, there is a well-known alternative: a direct-revelation mechanism. In a
direct-revelation mechanism, the principal commits to pay the agent tE for xE
or tI for xI depending on the agent’s announcement of its type. Failure by the
agent to announce his type (i.e., failure to announce a θ̂ ∈ {I, E}) is equiva-
lent to his rejecting the contract.4 It is immediate that this direct-revelation
mechanism is equivalent to the optimal payment schedule derived above. It is
also simpler, in that it only deals with the relevant part of the payment sched-
ule. Admittedly, it is not terribly realistic,5 but, as this discussion suggests, a
direct-revelation contract can be turned into a more realistic mechanism (this
is Proposition 8.2 below). More importantly, as we will see, in terms of deter-
mining what is the optimal feasible outcome, there is no loss of generality in
restricting attention to direct-revelation mechanisms.
Other Applications
Through the following exercises, we explore two other applications of the two-
type model.
The first set of exercises assume the following: A manager (the agent) can reduce the
cost per unit of producing a given product. The manager knows something about how
easy it will be to successfully reduce costs, but his superior (the principal)—who gets
to make him a tioli offer—does not. Specifically, the manager’s utility is y − d(r, θ),
where y is his income (pay), r is the reduction, per unit, in cost he achieves, θ is his
type—reflecting the ease of cost reduction, and d : R2+ → R+ is the disutility he suffers
from reducing costs. The manager is free to quit and will do so if his utility will be less
than zero. Suppose θ ∈ {θ1 , θ2 }, where θ2 > θ1 . Assume the firm in question expects
to produce X units over the relevant time frame. Suppose for both θ that
4 If the allocation is determined by the agent’s action (e.g., x is the number of units he
supplies, the quality of his workmanship, etc.), then it is further assumed that the contract
imposes a severe punishment on the agent if he fails to produce the contractually specified x
given his announcement (i.e., if his choice of x is not xθ̂ ).
5 Although see Gonik (1978) for a real-life example of a direct-revelation mechanism.
7.2 Contracts under Incomplete Information 89
Lastly, assume
∂d(r, θ1 ) ∂d(r, θ2 )
> ∀r > 0 .
∂r ∂r
Exercise 7.2.4: Were this a situation of full-information, what expressions would
define the optimal full-information solution as a function of θ? (Assume the manager’s
superior seeks to maximize firm profit.)
Exercise 7.2.5: If rnfi is the full-information solution given θn , prove r1fi < r2fi .
Exercise 7.2.6: Let the probability of type θ1 be q, 0 < q < 1. What conditions
define the optimal solution under asymmetric information?
The next set of exercises assume the following: There is an r&d lab. The manager
of the lab knows its type, τ . Let p be the probability that the lab will successfully
develop a new product worth V > 1 to the firm. Through his efforts, the manager
determines p; that is, the manager chooses p. The manager’s supervisor does not know
τ , but she does know it equals 1 with probability γ and 2 with probability 1 − γ. Let
the manager’s utility be
p
y− ,
(1 − p)τ
where y is his income (pay). The manager is free to quit and will do so if his utility
will be less than zero.
Exercise 7.2.7: Solve for the optimal full-information levels of p as a function of τ .
Exercise 7.2.8: Suppose, somewhat unrealistically, that the supervisor can verify
p and, thus, base a contract on it. What is the optimal solution given asymmetric
information about τ ? (Assume the supervisor seeks to maximize expected firm profit.)
Exercise 7.2.9: Consider the more realistic assumption that the supervisor can only
verify the lab’s success or failure. Now what is the optimal solution given asymmetric
information about τ ? (Hint: the contract will have a pair of payments (sτ , fτ ), where
sτ is paid if success and fτ is paid if failure; that is, the mechanism has the manager
choose between hs1 , f1 , p1 i ≡ C1 and hs2 , f2 , p2 i ≡ C2 , where, if the manager chooses
Cτ , he gets paid sτ if successful, but fτ if he fails.)
90 Lecture Note 7: The Two-Type Screening Model
General Screening
Framework 8
The two-type screening model yielded strong results. But buried within it is a
lot of structure and some restrictive assumptions. If we are really to use the
screening model to understand economic relationships, we need to deepen our
understanding of the phenomena it unveils, the assumptions they require, and
the robustness of its conclusions. The approach in this lecture note is, thus, to
start from a very general formalization of the problem.
The principal and agent have an interest in setting an allocation x ∈ X . In
addition to the allocation, both principal and agent care about money, specifi-
cally the value of any transfer between them.
The agent has information—his type—which consists of knowing the value
of a payoff-relevant parameter θ. Let Θ denote the set of possible types, the
type space. Nature draws θ from Θ according to a commonly known probability
distribution, F : Θ → [0, 1]. While the agent learns the value of θ perfectly, the
principal only knows that it was drawn from the commonly known probability
distribution.
Let B(x, t, θ) denote the principal’s utility as a function of allocation, trans-
fer, and payoff-relevant parameter; that is, B : X × R × Θ → R. Let U (x, t, θ)
denote the agent’s utility; that is, U : X × R × Θ → R. As in the previous
lecture note, interpret t > 0 as a transfer to the agent and t < 0 as a transfer
from the agent. Consistent with this interpretation, B(x, ·, θ) is a decreasing
function and U (x, ·, θ) an increasing function for all (x, θ) ∈ X × Θ.
For example, in the two-type model of the previous lecture note, we had
X = R+ , Θ = {I, E},
As that model illustrated, it is not necessary that both the principal and agent’s
utility depend on the agent’s type.
Mechanisms
A contractual outcome is an allocation and transfer pair, (x, t). As we will see,
a contractual outcome can be the outcome of either a deterministic or stochastic
mechanism. With respect to a stochastic mechanism, let ∆(X × R) denote the
set of all possible probability distributions over outcomes (i.e., over the space
X × R). Let σ denote a generic element of ∆(X × R); that is, σ is a particular
distribution over outcomes. We can now define, quite generally, a contractual
mechanism:
91
92 Lecture Note 8: General Screening Framework
that is, the agent states whether he is inefficient or efficient and his announce-
ment fixes his output target and his payment.1
For notational convenience, I will write U σ(m, n), θ rather than the tech-
nically correct
Eσ(m,n) U (x, t, θ) .
Of course if σ(m, n) allocates all weight to a single (x, t) pair, there is noth-
ing incorrect about the notation U σ(m, n), θ . Given that mechanisms with
random outcomes are rare in the literature, the notational convenience seems
worth the possible confusion.
A direct mechanism is a mechanism in which M = Θ; that is, one in which
the agent’s action is limited to making announcements about his type. The
consequences of this announcement are then built into the outcome function, σ.
For instance, as we just saw and as was also discussed earlier, we can interpret
the solution of the previous lecture note’s model as a direct mechanism.
A direct-revelation mechanism is a direct mechanism for which it is an equi-
librium strategy for the agent to announce his type truthfully. In other words,
if m : Θ → Θ is the agent’s strategy (a mapping from type into announcement
about type), then we have a direct-revelation mechanism if, in equilibrium,
m(θ) = θ for all θ ∈ Θ. For truth-telling to be an equilibrium strategy, it must
be a best response to the agent’s type and his expectation of the principal’s
action n:
U σ m(θ), n , θ ≥ U σ m(θ′ ), n , θ ∀θ′ ∈ Θ ;
But if that expression is true, then the agent prefers to play m(θ′ ) instead of
m(θ) in the original mechanism. This contradicts the assumption that m(·) is
an equilibrium best response to n in the original game. It follows, reductio ad
absurdum, that σ̂ induces truth-telling.
2 The revelation principle is often attributed to Myerson (1979), although Gibbard (1973)
and Green and Laffont (1977) could be identified as earlier derivations. Suffice it to say
that the revelation principle has been independently derived a number of times and was a
well-known result before it received its name.
3 Observe that the agent’s strategy can be conditioned on θ, which he knows, while the
Moreover, because σ̂(θ) = σ(m(θ), n), the same distribution over outcomes
is implemented in equilibrium.
Participation
UR (θ) ≡ U (x0 , 0, θ) .
The quantity UR (θ) is called the reservation utility of a type-θ agent. In some
models, UR (θ) is a constant (does not vary with θ). This was, for instance, the
Participation: It is case in the previous lecture note’s model, where the reservation utility was 0 for
without loss of both types. When reservation utility is a constant, I will write UR .
generality to
Because a mechanism could always map θ into (x0 , 0), there is never any
assume all types
loss of generality with respect to mechanism design in assuming that all types
participate in
participate in equilibrium.
equilibrium.
It might strike one as odd, at least in some situations, that the agent’s
reservation utility equals his utility given no trade. What if an employee’s utility
if paid nothing and required to do nothing by the principal were zero, but he
has the alternative to work elsewhere which will yield him some positive utility
U (θ)? Because behavior is unaffected by an additive constant to the utility
function, this situation is equivalent to one in which we add U (θ) to the original
utility function so the agent’s no-trade utility is U (θ) (i.e., if paid nothing and
No-Trade and
asked to do no work—that is, essentially free to pursue the alternative—he gets
Reservation
utility U (θ)). In other words, there is no loss of generality in equating the
Utilities: There is agent’s reservation utility with his no-trade utility.
no loss of generality
in treating these two
utilities as the same.
95
Condition 8.1 Let T ⊆ R be the set of permitted transfers. Then there exists
a t ∈ T such that
sup U (ξ, t, θ) ≤ inf UR (θ) . (8.3)
(ξ,θ)∈X ×Θ θ∈Θ
Proof: First, let’s establish that s(·) is unambiguously defined. To that end,
suppose, to the contrary, that there exist θ and θ′ such that x = x(θ) = x(θ′ ),
but t(θ) 6= t(θ′ ). Without loss of generality, we may take t(θ) > t(θ′ ). Because
U (x, ·, θ) is increasing, we then have
but this implies type θ′ would do better to lie than tell the truth, which con-
tradicts the assumption that σ is a direct-revelation mechanism. Reductio ad
absurdum, it follows that t(θ) = t(θ′ ) and, therefore, that s(·) is unambiguously
defined.
Next we need to verify that the punishment deters the agent from choosing
an x ∈/ x(Θ) (i.e., an x for which no θ exists such that x = x(θ)). Consider an
arbitrary type θ′ . We have
where the first inequality follows because X \x(Θ)) ×{θ′ } ⊂ X × Θ, the third
inequality because {θ′ } ⊂ Θ, and the final equality because t(θ′ ) = s x(θ′ ) by
construction. Hence, no type would choose an x ∈ / x(Θ).
Finally, we need to verify that each type chooses the same x given the com-
pensation schedule s(·) as he would have played under the original mechanism.
Suppose there were a type θ who played differently. As just shown, we know he
chooses an x ∈ x(Θ). Suppose he chooses x(θ′ ). By supposition, this is better
for him than x(θ); hence,
U x(θ′ ), s x(θ′ ) , θ > U x(θ), s x(θ) , θ .
But since t(·) = s x(·) by construction, this implies
U x(θ′ ), t(θ′ ), θ > U x(θ), t(θ), θ ,
1 That the space be bounded below at 0 is not critical—any lower bound would do. Alter-
natively, by appropriate changes to the utility functions, we could allow the allocation space
to be unbounded. Zero is simply a convenience.
97
98 Lecture Note 9: The Standard Framework
In light of the continuity of w(·, θ), these last two assumptions ensure that
w(x, θ) has an interior maximum with respect to x for all θ ∈ (θL , θH ].
Before proceeding, it is worth observing that the standard framework is
restrictive in potentially important ways:
• As noted, the utility functions are separable in money and allocation; the
marginal utility of income is independent of the state of nature; and the
marginal utility of income is constant, which means both players are risk
neutral with respect to gambles over money. The gains from these as-
sumptions are that we can compute the transfer function t(·) in terms of
the allocation function x(·), which means our optimization problem is a
standard optimal-control problem with a unique control, x(·). In addition,
risk neutrality insulates us from problems that exogenously imposed risk
might otherwise create (e.g., the need to worry about mutual insurance).
On the other hand, when the agent is risk averse, the ability to threaten
him with endogenously imposed risk (from the contract itself) can provide
the principal an additional tool with which to improve the ultimate allo-
cation. For a discussion of some of these issues see Edlin and Hermalin
(2000). Note we still have the flexibility to endogenously impose risk over
the allocation (the x).
In order to screen types, the principal must be able to exploit differences across
the tradeoffs that different types are willing to make between money and allo-
cation. Otherwise a strategy, for instance, of decreasing the x expected from
the agent in exchange for slightly less pay wouldn’t work to induce one type to
reveal himself to be different than another type. Recall, for instance, because
marginal cost differed between the efficient and inefficient agents in Lecture
Note 7, the principal could design a contract to induce revelation. Different
willingnesses to make tradeoffs means we require that different types of agents
have different indifference curves in allocation-money (transfer) space. In fact,
2 See, e.g., Caillaud and Hermalin (1993, §3) for a development of the standard framework
in which the type space is finite. Also see Hermalin (2014) for a more detailed of the standard
framework with a finite type space.
3 That is, the mechanism that maps θ to a distribution σ (θ) over payments is equivalent
a d
Figure 9.1: The Spence-Mirrlees Condition: Through any point (e.g., a or b),
the indifference curve through that point for a higher type (red)
crosses the indifference curve through that point for a lower type
(blue) from above.
we want, for any point in that space, that these slopes vary monotonically with
respect to type. Such a monotonicity-of-indifference-curves condition is known
as a Spence-Mirrlees condition.
The slope of an indifference curve in allocation-money space is equal to
−∂u/∂x. Hence, we require that −∂u/∂x or, equivalently and more naturally,
∂u/∂x vary monotonically in θ. Specifically, we assume:
Condition 9.1 (Spence-Mirrlees) For all possible allocations x,
∂u(x, θ) ∂u(x, θ′ )
>
∂x ∂x
if θ > θ′ .
That is, if θ > θ′ —so θ is a higher type than θ′ —then the slope of type θ’s
indifference curve is, at any point, less than the slope of type θ′ ’s indifference
curve. Observe that a consequence of Condition 9.1 is that a given indifference
curve for one type can cross a given indifference curve of another type at most
once. For this reason, the Spence-Mirrlees Assumption is sometimes called a
single-crossing condition. Figure 9.1 illustrates.
Lemma 9.1 If u(·, ·) is at least twice differentiable in both arguments, then the
Spence-Mirrlees assumption (Condition 9.1) is implied by
∂ 2 u(x, θ)
> 0. (9.1)
∂θ∂x
101
∂ 2 u(x, θ)
≥ 0. (9.2)
∂θ∂x
Proof:
Exercise 9.0.1: Prove, via integration, that (9.1) implies Condition 9.1.
Exercise 9.0.2: Prove, by taking the limit of the appropriate expression as θ → θ′ ,
that Condition 9.1 implies (9.2).
Exercise 9.0.3: For the standard framework (including the differentiability assump-
tions), prove that Condition 9.1 implies Condition 9.1′ .
This generalized Spence-Mirrlees condition states that we can order the types
so that if a low type (under this order) prefers, at least weakly, an outcome with
more x (with “more” being defined by the order ≻x ) than a second outcome,
then a higher type must strictly prefer the first outcome to the second. Figure
9.1 illustrates: Since the low type prefers point c to a (weakly), the high type
must strictly prefer c to a, which the figure confirms (c lies above the high
type’s indifference curve through a). Similarly, since the low type prefers c to b
(strictly), the high type must also strictly prefer c to b, which the figure likewise
confirms. See Milgrom and Shannon (1994) for a more complete discussion of
the relationship between Condition 9.1 and Condition 9.1′ .
As suggested at the beginning of this section, a consequence of the Spence-
Mirrlees assumption is that it is possible to separate any two types; by which
we mean it is possible to find two outcomes (x1 , t1 ) and (x2 , t2 ) such that a
type-θ1 agent prefers (x1 , t1 ) to (x2 , t2 ), but a type-θ2 agent has the opposite
preferences. For instance, in Figure 9.1, let point a be (x1 , t1 ) and let d be
102 Lecture Note 9: The Standard Framework
(x2 , t2 ). If θ2 is the higher type, then it is clear that given the choice between
a and d, θ1 would select a and θ2 would select d; that is, this pair of contracts
separates the two types.
Henceforth, we assume that the Spence-Mirrlees condition holds.
for all θ ∈ [θL , θH ], where, recall, UR (θ) is a type-θ agent’s reservation utility.
We assume that the agent acts in the principal’s interest when the agent is
otherwise indifferent. In particular, he accepts a contract when he is indiffer-
ent between accepting and rejecting it and he tells the truth when indifferent
between being honest and lying. This is simply a necessary condition for an
equilibrium to exist and, as such, should not be deemed controversial.
In Lecture Note 7, the assumption was that both types of agent had the
same reservation utility (specifically, zero). It is possible, however, to imagine
models in which the reservation utility indeed varies with θ. For instance, sup-
pose that a more-able agent could, if not employed by the principal, pursue
a more remunerative alternative than a less-able agent. Then the reservation
utility of the former would exceed that of the latter. A number of authors
(Lewis and Sappington, 1989, and Maggi and Rodriguez-Clare, 1995, among
others) have studied the role of such type-dependent reservation utilities in con-
tractual screening models.4 Type dependence can, however, greatly complicate
the analysis. We will, therefore, adopt the more standard assumption of type-
independent reservation utilities; that is, we assume UR (θ) ≡ UR .
We now consider the necessary conditions imposed on any direct-revelation
mechanism by the ic constraints. For any types θ and θ′ , the incentive-compa-
4 Models with type-dependent reservation utilities are sometimes called models of counter-
vailing incentives.
103
As is often the case in contract design, it is easier to work with utilities than
transfers (payments). To enable us to do so, define
v(θ) = t(θ) + u x(θ), θ .
Observe that v(θ) is the type-θ agent’s equilibrium utility. The above pair of
inequalities can then be written as
v(θ) ≥ v(θ′ ) − u x(θ′ ), θ′ +u x(θ′ ), θ and
| {z }
=t(θ ′ )
v(θ′ ) ≥ v(θ) − u x(θ), θ + u x(θ), θ′ .
Using the fundamental theorem of calculus, this last expression can be rewritten
as Z θ Z θ
∂u x(θ′ ), z ∂u x(θ), z
dz ≤ v(θ) − v(θ′ ) ≤ dz . (9.5)
θ′ ∂z θ′ ∂z
Ignoring, for the moment, the middle term of (9.5), we can use the funda-
mental theorem of calculus again to obtain
Z θ Z x(θ)
∂ 2 u(x, z)
dxdz ≥ 0 . (9.6)
θ′ x(θ ′ ) ∂x∂z
A second implication of (9.5) is the following. Via a result from real analysis,
it can be shown that v(·) is almost everywhere differentiable.5 Dividing all parts
of (9.5) by θ − θ′ and taking the limit as θ′ → θ, we see this derivative must be
dv(θ) ∂u x(θ), θ
=
dθ ∂θ
where tL ≡ v(θL ).
Proof: Necessity was established by Lemmas 9.2 and 9.3. We thus only need
to establish sufficiency. Suppose we have an allocation schedule x(·) that is non-
decreasing and a transfer schedule t(·) given by (9.8). We wish to show that
such a scheme induces a type-θ agent to truthfully announce his type rather
5 This follows from the Lebesgue Differentiation Theorem. See, e.g., Yeh (2006, p. 278).
Z θ′ Z θ′
∂u x(θ′ ), z) ∂u x(z), z
= t(θ) + u x(θ), θ − dz + dz
| {z } θ ∂θ θ ∂θ
| {z }
v(θ)
u x(θ ′ ),θ −u x(θ ′ ),θ ′
Z θ′ Z θ′
∂u x(θ′ ), z) ∂u x(θ′ ), z
≤ v(θ) − dz + dz = v(θ) ,
θ ∂θ θ ∂θ
where the inequality in the last line follows because the Spence-Mirrlees con-
dition means ∂u(·, z)/∂θ is a non-decreasing function. Consequently, if θ′ > θ,
then Z θ′ Z θ′
∂u x(z), z ∂u x(θ′ ), z
dz ≤ dz
θ ∂θ θ ∂θ
because the integrand on the rhs exceeds the integrand on the lhs and inte-
gration is in the positive direction; alternatively, if θ′ < θ, then the inequality
holds because the integrand on the rhs is less than the integrand on the rhs
and integration is in the negative direction.
Because we have established
u x(θ′ ), θ + t(θ′ ) ≤ v(θ) ,
we have shown that the mechanism will induce truth-telling insofar as an agent
of a given type would not wish to pretend to be a different type.
This characterization result is, now, a well-known result and can be found,
implicitly at least, in almost every mechanism-design paper. Given its impor-
tance, it is worth understanding how our assumptions drive this result. In
particular, be aware that the necessity of (9.8) does not depend on the Spence-
Mirrlees assumption. The Spence-Mirrlees assumption’s role is to establish (i)
that a monotonic allocation function is necessary and (ii) that, if x (·) is mono-
tonic, then (9.8) is sufficient to ensure a truth-telling equilibrium.
This discussion also demonstrates a point that was implicit in our earlier
∂2u
discussion of the Spence-Mirrlees assumption: what is critical is not that ∂θ∂x
be positive, but rather that it keep a constant sign over the relevant domain.
If, instead of being positive, this cross-partial derivative were negative every-
where, then our analysis would remain valid, except that it would give us the
106 Lecture Note 9: The Standard Framework
The previous analysis has given us, within the standard framework at least, a
complete characterization of the space of direct-revelation (incentive-compatible)
contracts. We can now concentrate on the principal’s problem of designing an
optimal contract.
The principal’s goal is to maximize her expected utility. In light of the
revelation principle, there is no loss in seeking to solve her problem within
the space of direct-revelation mechanisms. Moreover, from Proposition 9.1, we
know that any such mechanism must have a non-decreasing allocation schedule
and a transfer schedule given by (9.8). As noted earlier, it also without loss of
generality to require that the mechanism satisfy the ir constraint for each type
(i.e., v(θ) ≥ UR ).
The principal’s expected utility under the mechanism is
Z θH
b x(θ), θ − t(θ) f (θ)dθ .
θL
The principal’s objective is to maximize (9.9) with respect to x(·) and v(·)
subject to (i) the ir constraint, v(θ) ≥ UR ; and (ii) that x(·) be non-decreasing.
We can rather quickly substitute out the v(·):
Lemma 9.4 Under a direct-revelation mechanism, the agent’s equilibrium util-
ity is an increasing function of type (i.e., v(·) is increasing).
107
for all θ. From Spence-Mirrlees (sm) we know that θ′ > θ′′ implies
Hence,
Z x
∂u(z, θ′ ) ∂u(z, θ′′ )
− dz > 0 .
0 ∂x ∂x
Integrating, we get
0 < u(x, θ′ ) − u(x, θ′′ ) − u(0, θ′ ) − u(0, θ′′ ) = u(x, θ′ ) − u(x, θ′′ ) .
| {z } | {z }
=UR =UR
But, as θ′ and θ′′ were arbitrary, this implies that u(x, ·) is an increasing func-
tion and, therefore, that (9.10) holds.
No Rent at the
Lemma 9.4 implies that if v(θL ) ≥ UR , then v(θ) ≥ UR for all θ ∈ [θL , θH ]. Bottom: The lowest
Hence, the ir constraint is met for all θ if v(θL ) ≥ UR . From expression (9.9) type of agent earns
we see that the greater is v(θL ), the lower is the principal’s utility. Hence, she no rent—his
wants to set v(θL ) as low as possible, which means the constraint v(θL ) ≥ UR equilibrium utility
is binding. We can, therefore, substitute out the v(·) from the problem: the equals his
principal seeks to choose x(·) to maximize reservation utility.
Z Z !
θH θ
∂u x(z), z
b x(θ), θ + u x(θ), θ − dz − UR f (θ)dθ (9.11)
θL θL ∂θ
subject to x(·) being non-decreasing. Before solving this program, two addi-
tional transformations will prove helpful. First,
b x(θ), θ + u x(θ), θ = w x(θ), θ ,
where, recall, w(x, θ) is total welfare if x units are allocated when the agent’s
108 Lecture Note 9: The Standard Framework
Z θ
θH Z θH
∂u x(z), z ∂u x(θ), θ
= − 1 − F (θ) dz + 1 − F (θ) dθ
θL ∂θ θL ∂θ
θL
Z !
θH
∂u x(θ), θ 1 − F (θ)
= f (θ)dθ .
θL ∂θ f (θ)
Proof: The result follows immediately from (9.12) because that expression
differs from the expression for maximizing expected welfare,
Z θH
w x(θ), θ f (θ)dθ ,
θL
Using this insight is known as integration by parts. In our use of integration by parts here,
R
h = −(1 − F ) and g = θθ ∂u x(z), z /∂θ dz.
L
109
Proposition 9.4 Assume the optimal mechanism for the principal can be found Efficiency at the
via pointwise optimization. The allocation for the highest type (θH ) under that Top: The optimal
mechanism is the first-best (full-information) allocation for that type. mechanism from
the principal’s
Proposition 9.4 is sometimes summarized as efficiency at the top. perspective tends to
maximize welfare
An Example: Let the agent manage a division. The principal, the agent’s with respect to the
superior, is concerned with the division’s profit, x. The effort necessary to allocation for the
generate profit imposes disutility on the agent—assume his utility function is best type.
110 Lecture Note 9: The Standard Framework
Exercise 9.0.5: Verify that this example satisfies the assumptions of the standard
framework, including the Spence-Mirrlees condition.
Solving, we have
(θ + 1)2
x∗ (θ) = .
4
Clearly, this solution is non-decreasing in θ. Moreover, the second-order condi-
tion is readily verified. We can, therefore, conclude that pointwise optimization
is valid for this example.
Exercise 9.0.6: Verify that the first-best (full-information) allocation, xfi (·), is given
by xfi (θ) = 21 (θ + 1).
Exercise 9.0.7: Verify that xfi (θ) > x∗ (θ) for all θ < 1. Verify that xfi (1) = x∗ (1).
Given the usefulness of being able to solve the principal’s optimization problem
pointwise, we would like to have some conditions that ensure the validity of
pointwise optimization. What we require are conditions such that the solution
to (9.13) is non-decreasing in θ. To this end, define the virtual surplus function:
1 − F (θ) ∂u(x, θ)
Ω(x, θ) = w(x, θ) − ;
f (θ) ∂θ
and
f (θ)
(9.16)
1 − F (θ)
is non-decreasing in θ.8
The mhrp plays a big role in much of the economic analysis of contracts. Ob-
serve that if mhrp is satisfied, then
1 − F (θ)
f (θ)
Condition 9.4 The marginal change in utility with respect to type—the func-
tion ∂u/∂θ : R+ ×[θL , θH ] → R—exhibits decreasing differences.
Exercise 9.0.8: Consider the example on page 109. Verify that welfare exhibits
increasing differences.
Exercise 9.0.9: Verify that mhrp holds for any random variable with a uniform
distribution.
Exercise 9.0.10: Consider the example on page 109. Verify that Condition 9.4 holds.
subject to
(1981).
113
Assume θ is distributed uniformly on [0, 1]. Observe the agent’s no-trade utility
is zero (i.e., UR = 0). We have
θ3 (2θ − 3)2 x2 x2
Ω(x, θ) = 2
log(x) − + (1 − θ) .
4(1 + θ) θ+1 (θ + 1)2
Pointwise optimization yields
1p 2
x(θ) = 9θ − 12θ3 + 4θ4 ,
4
which is not non-decreasing everywhere on [0, 1] (see Figure 9.2).
Let x∗ (·) denote the solution to (9.17). From Figure 9.2, it is clear that the
∗
x (θ) = x(θ) for all θ ∈ [θL , θ] and equals x(θ) for all θ ∈ [θ, θH ]. The question
is what is θ? We want to choose θ to maximize (9.17); that is, to solve
Z θ Z θH
max Ω x(θ), θ f (θ)dθ + Ω x(θ), θ f (θ)dθ .
θ θL θ
x∗ (θ)
x(θ)
θ
θ
Hence, we see that for θ and θ′ , both greater than θ, we have t(θ) = t(θ′ ) = t(θ).
Of course, we must have that if the mechanism is to be incentive compatible
given that x(θ) = x(θ′ ) = x(θ)—we couldn’t expect truthful revelation if differ-
ent types got different payments for the same allocation.
Mechanism Design
with Multiple Agents 10
We now consider the problem of a principal facing an allocation problem involv-
ing some number of agents, N . Let X denote the set of possible allocations.
Examples of allocation problems are:
The principal can also arrange transfers among the N agents and, possibly,
to or from herself. If tn is the transfer to agent n, the amount transferred from
the principal is
XN
tn
n=1
115
116 Lecture Note 10: Mechanism Design with Multiple Agents
B(x, t, θ) = B(x, t′ , θ)
if
N
X N
X
tn = t′n .
n=1 n=1
If the principal were able to operate under full information, then her problem
would be
max B(x, t, θ) (10.1)
x∈X ,t∈T
Mechanisms 10.1
The set of possible outcomes is X × T . It is conceivable, but rare in the lit-
erature, that the principal would want to choose a randomization over the set
of outcomes. Let ∆(X × T ) denote the set of all such randomizations over
outcomes. We can now define a mechanism:
10.1 Mechanisms 117
for all θ ∈ Θ, where Fn : Θn → [0, 1]. If (10.3) doesn’t hold, then we have
dependent types. In this latter case, it is possible that agent n’s knowledge of
his own type provides information about the types of the other agents. When
types are independent, a given agent’s knowledge of his own type provides no
information about the types of the other agents. For the case of dependent
types, let F−n (θ −n |θn ) denote agent n’s beliefs (conditional distribution) over
the types of the other agents given his knowledge of his own type. When types
are independent,
But as this contradicts (10.4), which holds by assumption, our supposition that
such an n, θn , and θ′ exist must be false. It follows, reductio ad absurdum, that
σ̂ induces truth-telling.
Moreover, because σ̂(θ) = σ m(θ), p for all θ ∈ Θ, the same distribution
over outcomes is implemented in equilibrium.
The intuition is the same as in Lecture Note 8—see the discussion following
Proposition 8.1 on page 93.
the allocation assigned each agent can be made independently of the allocation
assigned another. For example, the production quota, xn , assigned one agent
can be wholly independent of the quota, xm , assigned another agent. In other
problems, there are cross-agent dependencies. For instance, allocating an indi-
visible good to one agent necessarily precludes allocating it to another. Or it
could be impossible to differentiate among agents with respect to the benefits
they accrue from a public good. We will refer to the first kind of problems
as independent-allocation problems. The second kind are dependent-allocation
problems. In this section, we consider the first kind.
Independent Types
Dependent Types
If types are dependent, then knowledge of one agent’s type can convey informa-
tion about another agent’s type. That, in turn, can allow the principal to reduce
the information rent the other agent receives, which is to her benefit. This sug-
gests that when the agents’ types are dependent (correlated), the principal can
do better than she would if she treated the problem as simply repeating the
single-agent problem.
To begin, let’s consider the case of perfect correlation: suppose there is some
state of the world, ω, drawn from Ω ⊆ R. Assume ω ∼ Ψ : Ω → [0, 1]. Assume
for each agent n there is a strictly monotonic mapping gn : Ω → Θn . Because
the mapping is strictly monotonic, it is invertible; that is, each agent n is able to
infer ω from his realization of θn . Consequently, there is no loss of generality in
treating the N agents as all having the same type because we are free to define
un (x, tn |ω) = Un x, tn |gn (ω)
and
b(x, t, ω) = B x, t, g1 (ω), . . . , gN (ω)
as agent n’s payoff and the principal’s payoff, respectively.
Given this common type space, the principal must know that at least one
agent has lied if ω̂m 6= ω̂n for some pair of agents m and n, where ω̂j denotes
the type agent j announces. Provided there is no limit on how severely the
principal can punish the agents, it follows that the principal can support an
equilibrium of truthtelling simply by imposing severe punishments on all agents
if their announcements disagree. Specifically, suppose that, for each agent n,
there exists (xn , tn ) ∈ Xn × Tn such that
sup un (xn , tn , ω) < inf uR
n (ω) , (10.5)
ω∈Ω ω∈Ω
120 Lecture Note 10: Mechanism Design with Multiple Agents
where uR
n (ω) is the reservation utility of agent n in state ω.
Suppose agent n believes that all other agents will always announce truthfully.
Then, because
un xfi (ω), tfi R
n (ω), ω) ≥ un (ω) > un (xn , tn , ω) ,
it is a best response for agent n to announce truthfully for all ω, where the
first inequality follows from the ir constraint (expression (10.2)) and the sec-
ond inequality from (10.5). This establishes that an equilibrium induced by the
mechanism (10.6) is truthtelling by all agents. Because, if the agents tell the
truth, that mechanism induces the full-information outcome, the result follows.
for all x and x′ such that x > x′ . Assume the principal’s payoff is
x 1 + x 2 − t1 − t2 .
and
Because Pr{(H, L)} = Pr{(L, H)}, observe that consistency requires f = 1/2.
1 In
PN
some contexts, we also require that mechanisms be balanced; that is,
PN n=1 tn ≡ 0
always. Given that the tn are punishments, we should expect n=1 tn < 0; that is, the
shoot-them-all mechanism is unbalanced, at least for some out-of-equilibrium outcomes.
2 The reader interested in more general treatments of mechanism design with correlated
types should consult Crémer and McLean (1985, 1988) and McAfee and Reny (1992).
3 Recall Pr{A|B} ≡ Pr{A ∩ B}/ Pr{B}, so Pr{A ∩ B} = Pr{A|B} Pr{B}.
122 Lecture Note 10: Mechanism Design with Multiple Agents
xfi fi fi
n (θn , θm ) = argmax x − c(x, θn ) =⇒ xn (θn , θ) = xn (θn , ¬θ)
x
for all n and θn . This suggests the following strategy for solving the principal’s
problem: can she design transfers to (i) implement full-information allocations
(i.e., xs) and (ii) such that, in expectation, no agent earns an information rent
regardless of his type (i.e., such that the ir types are binding for both agents
regardless of their realized types)? If the answer is yes, then clearly this is the
solution because there is no way for the principal to do better than this. If the
answer is no, then we have more work to do. Fortunately, as we will see, the
answer will be yes.
Given the agents are identical, we can drop the subscript ns in what follows.
The answer to the question of the previous paragraph will be “yes” if
T (L|L) − c xfi (L), L = 0 , (binding ir for low type)
T (H|H) − c xfi (H), H = 0 , (binding ir for high type)
T (L|L) − c xfi (L), L ≥ T (H|L) − c xfi (H), L , (ic for low type)
and
T (H|H) − c xfi (H), H ≥ T (L|H) − c xfi (L), H . (ic for high type)
10.3 Dependent Allocations 123
Observe a solution to this system is T (θ̂|θ) = c xfi (θ̂), θ . So the problem
devolves to whether the following has a solution
ρ 0 1−ρ 0 t(L, L) c xfi (L), L
0 1 − ρ 0 ρ t(H, L) c xfi (H), H
= .
0 ρ 0 1 − ρ t(L, H) c xfi (H), L
1−ρ 0 ρ 0 t(H, H) c xfi (L), H
Proposition 10.3 For the two-agent model considered here, provided there is
any correlation in the agents’ types (i.e., provided ρ > 1/2), there exists a
mechanism that achieves the full-information allocation at (in expectation) the
full-information cost to the principal.
Proposition 10.3 suggests that the principal can do a lot better than the
one-agent model (e.g., the model of Chapter 8) would suggest if she is dealing
with agents with correlated types. At the same time, some caution is warranted:
the lack of robustness of Propositions 10.2 and 10.3 to collusion by the agents
makes these results suspect.
Much of the theory in this section stems from analyzing the problem of a
benevolent social planner who seeks to choose the optimal amount of a public
good. For instance, the social planner could seek to determine how much land
should be set aside for a public park. In an organizational context, the prob-
lem might be a principal who seeks to determine a corporate-level action (e.g.,
investment in information technology, common advertising, etc.) that benefits
many agents within the organization.
To be concrete, suppose the principal wishes to choose a level of a public
good, x ∈ X ⊂ R+ . There are N agents, where each agent n has utility
Un (x, tn , θn ). Following most of the literature, we consider the case in which
each agent’s utility is additively separable between transfer and allocation:
where λn > 0 for all n. Little insight is gained by maintaining the added gener-
ality of the λs varying across agents. For convenience, then, we will normalize
the λs to each be 1. The principal’s objective is to choose x to solve
N
X
max vn (x, θn ) . (10.11)
x∈X
n=1
Assume that a unique solution to (10.11) exists for all θ ∈ Θ and denote that
solution as x∗ (θ). Because the problem would otherwise be of no interest,
assume there exist θ and θ ′ ∈ Θ such that x∗ (θ) 6= x∗ (θ ′ ).
Recall the principal does not know the realization of θ. Hence, she must induce
the agents to announce their types truthfully (by the Revelation Principle, we
are free to limit attention to direct-revelation mechanisms). Consider a mecha-
nism of the following form:
x(θ) = x∗ (θ) (10.12)
and X
tn (θ) = τn + vj x(θ), θj , (10.13)
j6=n
10.3 Dependent Allocations 125
where τn is a constant that could depend on the identity of the agent, but not
his (or others’) type(s). Observe that by selecting τn large enough, the principal
can ensure agent n’s participation; hence, we are free to ignore participation (ir)
constraints in what follows.
Lemma 10.1 Given the mechanism defined by (10.12) and (10.13), it is a dom-
inant strategy for any given agent to announce his type truthfully.
Proof: Substituting for x using (10.12), each agent n faces the program:
N
X
max τn + vj x∗ (θ −n , θ̂n ), θj . (10.14)
θ̂n ∈Θn j=1
that is, agent n can do no better than to announce truthfully regardless of what
his fellow agents announce. Truthtelling is, thus, a dominant strategy as was to
be shown.
Intuitively, given the transfer function given by (10.13), each agent is induced
to face precisely the same optimization program as the principal (i.e., expression
(10.14) is the same program as (10.11)). Consequently, he will do as the principal
would do.
A mechanism in which the agents always have dominant strategies is known
as a dominant-strategy mechanism. An immediate consequence of Lemma 10.1
is, therefore:
Proposition 10.4 Assume the principal’s objective function is given by (10.11)
and it has a unique solution for each realization of agents’ types. Then there
exists a dominant-strategy direct-revelation mechanism that implements that so-
lution.
The mechanism defined by (10.12) and (10.13) is known as a Groves-Clarke-
Vickrey mechanism, which we will abbreviate gcv.
One question we might ask about the gcv mechanism is whether it is unique
within the space of dominant-strategy mechanisms that implement x∗ (·). The
answer is, for all practical intents, yes. A fact that we can readily establish by
imposing slightly more structure on the problem. To that end, assume:
Condition 10.1 The space of possible allocations, X , is R+ . The type space
for any agent is an interval in R.5 For any agent n, n = 1, . . . , N , and any
5 This assumption is, here, without loss of much generality insofar as we do not require full
support on these intervals; that is, some types in the interval could have zero probability of
occurring.
126 Lecture Note 10: Mechanism Design with Multiple Agents
The last property ensures that there is an interior solution to maxx∈R+ vn (x, θn )
for all n and θn ∈ Θn . It follows that the principal’s program, expression (10.11),
has an interior solution for all θ ∈ Θ. Because each vn (·, θn ) is strictly concave,
so too is the principal’s program (10.11). It further follows from Condition 10.1
that the program (10.11) has a unique solution for all θ ∈ Θ, x∗ (θ). By the
implicit function theorem (see, e.g., Körner, 2004, §13.2), x∗ (·) is differentiable
in each of its arguments.
The differentiability and concavity assumptions entail that x∗ (θ) is charac-
terized by
N
X ∂vn x∗ (θ), θn
= 0.
n=1
∂x
For future reference, observe that, in turn, entails
∂vn x∗ (θ), θn X ∂vj x∗ (θ), θj
=− . (10.15)
∂x ∂x
j6=n
Suppose there exist differentiable transfer functions t1 (·), . . . , tN (·) such that
truthtelling is a dominant strategy for each agent n. This would require that
θ ∈ argmax tn (θ −n , θ̂) + vn x∗ (θ −n , θ̂), θ (10.16)
θ̂∈Θn
for all n, all θ ∈ Θn , and all θ −n ∈ Θ−n . A necessary condition, given our
differentiability assumptions, is that
∂tn (θ −n , θ) ∂vn x∗ (θ −n , θ), θ ∂x∗ (θ −n , θ)
+ =0 (10.17)
∂θn ∂x ∂θn
for all n, all θ ∈ Θn , and all θ −n ∈ Θ−n . Rearranging and recognizing this
must hold for all θ ∈ Θ, we have the differential equation
∗
z}|{
∂tn (θ −n , θ) ∂vn x∗ (θ −n , θ), θ ∂x∗ (θ −n , θ)
=− . (10.18)
∂θn ∂x ∂θn
Were it not for the starred term in expression (10.18), this would be a trivial
differential equation to solve—one would simply undo the chain rule. But the
starred term complicates matters. Fortunately we can substitute out for that
partial derivative using expression (10.15). This yields the differential equation:
∂tn (θ) X ∂vj x∗ (θ), θj ∂x∗ (θ)
= . (10.19)
∂θn ∂x ∂θn
j6=n
10.3 Dependent Allocations 127
Balanced Mechanisms
Bayesian Mechanisms
for all θ ∈ Θ.
6 The adjective “Bayesian” reflects that each agent knows only his type but is uncertain
Proof: We have
N
X N
X N
1 XX
tn (θ) = pn (θn ) − pj (θj )
n=1 n=1
N − 1 n=1
j6=n
N
X N
1 X
= pn (θn ) − (N − 1)pn (θn ) = 0 .
n=1
N − 1 n=1
Observe that Lemma 10.2 does not depend on any properties of the pn (·)s.
Consider
X
pn (θn ) = E−n vj x∗ (θ), θj + hn ; (10.23)
j6=n
that is, up to an additive constant hn , pn (θn ) is the expected sum of all other
agents’ utilities under the optimal allocation x∗ (·), given that agent n is type
θn and all other agents are announcing their types truthfully. Observe the
similarity of the pn (·) functions here and the transfers in a gcv mechanism;
they are different, however, in that pn (θn ) is an expected value whereas the
transfers in a gcv mechanism is the actual (realized) value.
Proposition 10.7 Let x∗ : Θ → X denote the optimal allocation (i.e., the
solution to (10.11)). If transfers are given by (10.22), where the pn (·) functions
are given by (10.23), then a mechanism with those transfers and an allocation
rule x(θ) = x∗ (θ) has a Bayesian-Nash equilibrium in which the agents all
announce their types truthfully. In this equilibrium, the allocation rule x∗ (·) is
implemented and transfers are balanced.
Proof: The mechanism is always balanced by Lemma 10.2. Clearly, if truth-
telling is an equilibrium, x∗ (·) is implemented. Hence, the only point to verify is
that truth-telling is an equilibrium. Consider agent n. If he anticipates his fellow
agents will tell the truth, his expected utility, as a function of his announcement,
θ̂, is
n o
E−n vn x∗ (θ̂, θ −n ), θn + tn (θ̂, θ −n )
k(θ
−n )
z P }| {
n X p j (θ j ) o
j6 = n
= E−n vn x∗ (θ̂, θ −n ), θn +E−n vj x∗ (θ̂, θ −n ), θj − +hn
N −1
j6=n
XN
= E−n vj x∗ (θ̂, θ −n ), θj +K, (10.24)
j=1
where the law of iterated expectations was used to derive the second equality
and where K is a constant equal to the expectation of the term k(θ −n ). Agent n
130 Lecture Note 10: Mechanism Design with Multiple Agents
chooses his announcement, θ̂, to maximize (10.24). The agent can do no better
than if he could choose x to maximize
N
X
vj x, θj
j=1
for each realization if θ. But he can do precisely this by announcing his type
truthfully given the other agents are being honest. Hence, truth-telling is his
best response to truth-telling, as was to be shown.
Not only are the pn (·) functions sufficient to achieve an efficient solution
(implementation of the first-best allocation), they are also effectively necessary
given Condition 10.1:
for all θ ∈ Θn .9 Using the same insight that gave us expression (10.19) above,
we can rewrite that first-order condition as
X ∂v x∗ (θ, θ ), θ ∂x∗ (θ, θ )
j −n j −n
p′n (θ) = E−n .
∂x ∂θn
j6=n
8 By imposing conditions on the v functions, it would be possible to show that the p (·)
n n
functions are absolutely continuous. This is, in turn, would allow us to dispense with assuming
the pn (·) functions are differentiable. See the proof of Lemma 9.2 for an indication of how
that line of reasoning would be pursued.
9 Recall we can pass the differentiation operator through the expectation operator.
10.3 Dependent Allocations 131
Bibliographic Note
(2004).
133
134 Lecture Note 11: Auctions
Given the differentiability of the distribution functions, two bidders have the
same valuation with probability zero; hence, ties can be ignored.
In this section, we will see whether a welfare-maximizing allocation can be
part of a direct-revelation mechanism. A mechanism in this context is hx, pi :
Θ → ΛN × RN ; that is, an allocation rule and a payment schedule.
We impose two restrictions. First, every bidder must wish to participate;
that is, regardless of his type, his equilibrium utility must be non-negative. This
is our usual individual rationality constraint and, as such, warrants no special
attention. Second, we require p(θ) ∈ RN + ; that is, the seller never pays a bidder.
This constraint can be thought of as a “realism” constraint: If it were possible
to get paid just to participate in an auction, the auction would be flooded with
bidders and the seller would go bankrupt.
We begin our hunt for a welfare-maximizing mechanism in a somewhat odd
way. Namely, we will limit attention to dominant-strategy mechanisms. Dom-
inant-strategy mechanisms are mechanisms in which an agent finds a truthful
announcement his best response no matter what he believes the other agents’
types are.2 This is an odd start because normally we cannot gain by constraining
the choice set (i.e., the set of mechanisms). On the other hand, if this constraint
fails to bind, in the sense that we find a welfare-maximizing mechanism within
the set of dominant-strategy mechanisms, then imposing the constraint is not
costly. The advantage to imposing this constraint is that it allows us to operate
without taking expected values.
nisms. However, by the Revelation Principle, we know there is no loss in limiting attention
to direct-revelation mechanisms—see Exercise 11.1.1 infra.
11.1 Efficient Allocation 135
nism has a solution in dominant strategies if, for all n and all θn ∈ Θn , there exists a
strategy mn (θn ) ∈ Mn such that
Un σ(mn (θn ), m−n )|θn ) ≥ Un σ(m, m−n )|θn )
for all m ∈ Mn and all m−n ∈ M1 × · · · × Mn−1 × Mn+1 × · · · × MN . Prove
that if a mechanism has a solution in dominant strategies, then there exists a direct
revelation mechanism also solvable in dominant strategies that generates the same
distribution over outcomes in equilibrium as the original mechanism.
and
θn − pw (θ −n ) ≥ 0 (11.2)
θn − pw (θ −n ) ≤ 0 (11.3)
for all θn ≤ max θ −n (where the rhs of (11.3) is zero by Lemma 11.2). Condi-
tions (11.2) and (11.3) can be satisfied only if pw (θ −n ) = max θ −n ; that is, only
if the payment made if awarded the good equals the second highest valuation
for the good. To summarize, we have proved:
Lemma 11.3 If a dominant-strategy mechanism exists that implements the
welfare-maximizing allocation rule, then that mechanism requires a payment
only from the bidder awarded the good (i.e., the bidder with the highest valu-
ation) and the payment made by that bidder equals the next highest valuation.
That is, if the allocation rule xn (θ) = 1 if and only if θn = θ [1] is implementable
via a dominant-strategy mechanism, then pn (θ) = θ [2] if θn = θ [1] and equals 0
otherwise.
(The notation θ [m] denotes the mth largest element of θ.)
The last question is to verify that the mechanism defined by
0 , if θn < θ [1] 0 , if θn < θ [1]
xn (θ) = and pn (θ) = (11.4)
1 , if θn = θ [1] θ [2] , if θn = θ [1]
is incentive compatible and individually rational. Individual rationality follows
immediately because pℓ ≡ 0 and θn ≥ max θ −n if θn = θ [1] . Incentive com-
patibility follows because if bidder n lied and claimed to be θn′ < θn , then his
relative payoffs are:
Condition Payoff Lies Payoff Tells Truth
max θ −n > θn : 0 = 0
θn ≥ max θ −n > θn′ : 0 < θn − max θ −n
θn′ ≥ max θ −n : θn − max θ −n = θn − max θ −n
In other words, he is never better off lying and is possibly strictly worse off
lying.
Exercise 11.1.2: Verify that bidder n would never wish to lie by claiming to be
θn′′ > θn .
(where the inequality in the expression on the lhs is in the vector sense). We
can therefore define the joint probability density function over Θ as
Given that bidder n announces his type as θ̂n , then his probability of ob-
taining the good—given truthful revelation by the other N − 1 bidders—is
Z
ξn (θ̂n ) ≡ xn (θ̂n , θ −n )f−n (θ −n )dθ −n .
Θ−n
Likewise, bidder n’s expected payment if he announces his type as θ̂n —given
truthful revelation by the other N − 1 bidders—is
Z
ρn (θ̂n ) ≡ pn (θ̂n , θ −n )f−n (θ −n )dθ −n .
Θ−n
In equilibrium, bidder n will announce his type truthfully, which means his
probability of getting the good is ξn (θn ) and his expected payment is ρn (θn ).
Consequently, his equilibrium expected utility is
where the rhs follows from (11.5). It follows that the truth-telling or incentive-
compatibility constraint for a type-θn bidder can be written as
θλ ≡ λθ + (1 − λ)θ′ .
and
As the rhs of this last expression is the same as the rhs of (11.7), we have
established (11.7) as desired.
We can, in fact, use the convexity of vn (·) to show that ξn (·) is non-decreasing
everywhere:
Proof: If θn and θn′ are points at which vn (·) is differentiable, the result follows
from the corollary. Therefore, let θn be a point at which vn (·) is not differen-
tiable. Although not differentiable at θn , the right and left derivatives of vn (θn )
exist because vn (·) is convex (the left and right derivatives of a convex function
exist everywhere—see, e.g., van Tiel, 1984, p. 4).3 Where vn (·) is differentiable,
its right and left derivatives are equal and equal the derivative. The left and
right derivatives of a convex function are non-decreasing (van Tiel, 1984, p. 4).
From (11.8), ξn (θn ) lies between the left and right derivatives of vn (θn ). Be-
cause vn (·) is differentiable almost everywhere (i.e., has equal left and right
derivatives almost everywhere), it follows that ξn (θn ) ≤ ξn (θn′′ ) for all θn′′ > θn
and ξn (θn ) ≥ ξn (θn′ ) for all θn′ < θn .
3 The ′ , is defined as
left derivative of a function g, denoted g−
′ g(z) − g(z − ε)
g− (z) = lim .
ε↓0 ε
′ , is defined as
The right derivative, denoted g+
′ g(z + ε) − g(z)
g+ (z) = lim .
ε↓0 ε
11.2 Allocation via Bayesian-Nash Mechanisms 141
Proof: Suppose all bidders other than n will truthfully announce their types.
We need to show that truth telling is a best response for n. Consider any two
types θn and θn′ ∈ Θn . Suppose bidder n’s type is θn . We need to show he
doesn’t wish to announce θn′ . Suppose, first, θn′ < θn . Given (11.9), we have
Z θn
vn (θn ) − vn (θn′ ) = ξn (z)dz .
′
θn
Rearranging (11.10) yields (11.6) (with θn′ instead of θ̂n ). But (11.6) implies
bidder n would prefer to announce the truth, θn , than to lie by claiming to be
a lower type (e.g., θn′ ).
Exercise 11.2.1: Complete the proof by considering the case in which θn′ > θn ; that
is, show the bidder doesn’t want to lie by claiming to be a higher type than he is.
We are now in position to prove one of the more important results in auction
theory, the Revenue Equivalence Theorem:
Rearranging, we have
Z θn
ρn (θn ) = ρn (0) + θn ξn (θn ) − ξn (z)dz . (11.11)
0
Because (11.11) holds for any mechanism that has an allocation rule leading
to ξn (·), it follows that the difference in expected payments between any two
142 Lecture Note 11: Auctions
mechanism is ρ1n (0) − ρ2n (0), a constant (where the superscripts index the mech-
anisms).
ρn (0) ≤ 0 . (ir)
In other words, if (ir) holds for all n, then all bidders will participate regardless
of their type.
The seller is limited to implementing allocation rules that yield non-decreasing
ξn (·) functions and that satisfy (11.9). As seen in the proof of the Revenue
Equivalence Theorem, the latter condition is equivalent to requiring
Z θn
ρn (θn ) = ρn (0) + θn ξn (θn ) − ξn (z)dz . (11.12)
0
11.2 Allocation via Bayesian-Nash Mechanisms 143
Given that the seller seeks to maximize her expected profit, it follows from
(11.12) that she would like ρn (0) to be as large as possible. Given (ir), it
follow, then, she sets ρn (0) = 0. To summarize:
Lemma 11.7 In the profit-maximizing mechanism, the expected payment from
a zero-type bidder is zero; that is, ρn (0) = 0 for all n.
Recall that the seller’s value for the good is zero, so her expected profit is
N Z θ̄n !
X
ρn (θ)fn (θ)dθ .
n=1 0
Using (11.12) and Lemma 11.7, we can rewrite her expected profit as
Z θ̄n Z θ !
N
X
θξn (θ) − ξn (z)dz fn (θ)dθ .
n=1 0 0
Substituting the virtual valuation function into the seller’s expected profit,
expression (11.13), we can carry out the following series of manipulations:
N Z θ̄n !
X
Ωn (θn )ξn (θn )fn (θn )dθn
n=1 0
Z !
N
X θ̄n Z
= Ωn (θn ) xn (θn , θ −n )f−n (θ −n )dθ −n fn (θn )dθn
n=1 0 Θ−n
N Z
X
= Ωn (θn )xn (θ)f (θ)dθ
n=1 Θ
Z N
!
X
= Ωn (θn )xn (θ) f (θ)dθ (11.15)
Θ n=1
The seller seeks to maximize (11.15) with respect to x(·) subject to the
constraint that resulting ξn (·) functions be non-decreasing (i.e., that the mech-
anism be incentive compatible). Ignoring, for the moment, that constraint,
observe that the way to maximize the expression within the large parentheses
in (11.15) is put all weight on the bidder with the largest virtual valuation
provided this largest virtual valuation is non-negative. If no bidder has non-
negative virtual valuation, then the seller keeps the good (i.e., xn (θ) = 0 for all
n). Mathematically, the expected-profit-maximizing allocation rule is
0 , if Ωn (θn ) < max 0, maxj Ωj (θj )
xn (θ) = . (11.16)
1 , if Ωn (θn ) = max 0, maxj Ωj (θj )
Given this maximizes profit pointwise, it must maximize expected profit. The
only question is whether this allocation rule is incentive compatible.
To check if the allocation rule given by (11.16) is incentive compatible, con-
sider θn and θn′ , θn > θn′ . Because Ωn (·) is increasing, we have:
xn (θn′ , θ −n ) = 1 =⇒ xn (θn , θ −n ) = 1 and
xn (θn , θ −n ) = 0 =⇒ xn (θn′ , θ −n ) = 0 ;
hence, xn (θn , θ −n ) ≥ xn (θn′ , θ −n ) and, thus,
Z
ξn (θn ) = xn (θn , θ −n )f−n (θ −n )dθ −n
Θ−n
Z
≥ xn (θn′ , θ −n )f−n (θ −n )dθ −n = ξn (θn′ ) .
Θ−n
So we see that this allocation rule yields non-decreasing ξn (·) functions and is,
therefore, incentive compatible. To conclude, we’ve shown:
Lemma 11.8 The expected-profit-maximizing allocation rule awards the good
to the bidder with the greatest non-negative virtual valuation and has the seller
keep it if no bidder has a non-negative virtual valuation; that is, it is the rule
given by expression (11.16).
11.2 Allocation via Bayesian-Nash Mechanisms 145
Note that the good is awarded on the basis of virtual valuation and not true
valuation.
Exercise 11.2.2: Suppose the valuation of each bidder is an independent draw from
the uniform distribution on [0, 100]. Prove that the expected-profit-maximizing allo-
cation rule awards the good to the bidder with the highest valuation, but only if that
valuation is not less than 50.
Exercise 11.2.3: Suppose there are two bidders. Bidder one has his valuation drawn
from the uniform distribution on [0, 20]; bidder two from the uniform distribution on
[0, 30]. Verify that the expected-profit-maximizing allocation rule is
0 , if θ1 < 10 or θ1 < θ2 − 5 0 , if θ2 < 15 or θ2 ≤ θ1 + 5
x1 (θ) = and x2 (θ) = .
1 , if θ1 ≥ 10 and θ1 ≥ θ2 − 5 1 , if θ2 ≥ 15 and θ2 > θ1 + 5
If θ1 = 18 and θ2 = 20, is the allocation welfare maximizing?
is incentive compatible and achieves the maximum possible revenue given the
allocation rule x(·).
Define
Θwn (θ −n ) = θ ∈ Θn Ωn (θ) ≥ 0 and Ωn (θ) ≥ max Ωj (θj ) ;
j6=n
146 Lecture Note 11: Auctions
that is, Θw
n (θ −n ) is the set of types of bidder n who will be awarded (will win)
the good under the allocation rule given by (11.16) if the types of the other
bidders are θ −n . Because all inequalities are weak inequalities, Θw n (θ −n ) has a
minimum element:
min Θw w
n (θ −n ) , if Θn (θ −n ) 6= ∅
θwn (θ −n ) ≡ w .
∞ , if Θn (θ −n ) = ∅
if θn ≥ θw
n (θ −n ). This establishes:
Proposition 11.5 Assume for all bidders that their distributions over type sat-
isfy mhrp. Then the expected-profit-maximizing mechanism awards the object
to the bidder with the greatest virtual valuation, assuming that virtual valuation
is non-negative, and otherwise does not award the object to any bidder. If a
bidder is awarded the object, then he pays the seller the larger of the reserve
price specific to him or the smallest true valuation he could have and still have
the greatest virtual valuation. That is, if bidder n is awarded the object, he pays
the larger of Ω−1
n (0) and Ω −1
n max j6=n j j .
Ω (θ )
Exercise 11.2.4: Suppose the valuation of each bidder is an independent draw from
the uniform distribution on [0, 100]. Calculate pn (·).
Exercise 11.2.5: Suppose there are two bidders. Bidder one has his valuation drawn
from the uniform distribution on [0, 20]; bidder two from the uniform distribution on
[0, 30]. Calculate p1 (θ1 , θ2 ) and p2 (θ1 , θ2 ).
Exercise 11.2.6: Prove that the reserve price specific to a given bidder n, Ω−1
n (0),
satisfies 0 < Ω−1
n (0) < θ̄n .
The purpose of the reserve price is to effectively exclude low types of a given
bidder from the auction. The seller wishes to do this because this permits her
to reduce the information rents of the high types. Because Ω−1 n (0) > 0, the use
of reserve prices means that the auction cannot be efficient; there are types of
11.2 Allocation via Bayesian-Nash Mechanisms 147
bidders that welfare maximization dictates should get the good as opposed to
leaving it in the seller’s hands, but who don’t get the good. This welfare loss
is similar to the deadweight loss created when a monopolist engages in linear
pricing.
To appreciate this last point, observe that the Proposition 11.5 mechanism
is still the solution even if there is only one bidder. In the case of one bidder,
we see the bidder gets the good if and only if he bids above the reserve price.
If he gets the good, he pays the reserve price. By definition, the reserve price,
r, is the solution to
1 − F (r)
r− = 0. (11.19)
f (r)
Suppose, instead, we treat this a linear-pricing problem:
The seller sets a price r
to maximize her expected profit, which is r 1−F (r) .4 The first-order condition
is
1 − F (r) − rf (r) = 0 .
Exercise 11.2.7: Verify that if F (·) satisfies mhrp, then this first-order condition is
sufficient as well as necessary.
It is readily seen that we can rearrange this first-order condition to get (11.19).
In other words, the mechanism with one bidder is equivalent to the profit-
maximizing linear pricing strategy.
Symmetric Bidders
4 The survival function is like the demand curve for this problem. Recall, in fact, that at a
basic level there is an essential equivalence between demand curves and survival functions—see
discussion on page 12.
148 Lecture Note 11: Auctions
(where Ω(·) is the common virtual valuation function). It follows that if a bidder
is awarded the good, he pays the larger of the common reservation price, Ω−1 (0),
and the value of the bidder with the next highest valuation. To summarize:
Proposition 11.6 Assume the bidders are symmetric and that their common
distribution function over types satisfies mhrp. Then the expected-profit-maxi-
mizing mechanism is a sealed-bid second-price auction with a common reserve
price, Ω−1 (0).
It is worth noting that many online auctions, such as those on eBay, are essen-
tially sealed-bid second-price auctions with a reserve price.
Exercise 11.2.8: Consider two symmetric-bidder auctions. In the first, the common
distribution over types is F , in the second it is G. Suppose that F hrd≥
G (i.e., F
dominates G in the sense of hazard rate dominance, see Section 11.4. If rF is the
reservation price in the first auction and rG is the reservation price in the second
auction, what relation must hold between rF and rG ?
Exercise 11.2.9: Consider a symmetric-bidder auction. Let the common distribution
over types be 1 − exp(−θ2 /2). What is the reservation price for this auction? What is
the probability that any given bidder has a valuation less than the reservation price?
Suppose there are N bidders, what is the probability of deadweight loss (failure to
allocate the good to a bidder)? What is that value if N = 6? What happens to
that probability as N → ∞? What does this suggest about the practical efficiency of
symmetric-bidder auctions?
gamble $0 with probability 2/3 and $1000 with probability 1/3 is likely different
than the value I place on it. Hence, common values requires that either there is
no uncertainty left once all the information is shared or all participants have the
same attitudes toward risk. In the case of major corporations bidding for tracts
of land, the latter is probably reasonable insofar as we typically assume large
corporations are essentially risk neutral. In other cases, however, the assump-
tion of common attitudes toward risk could be suspect. Despite the potential
problems with the common value assumption, it is a useful basis from which to
operate and the models are tractable and yield sensible predictions.
As before, assume there is a single good or object. There is a seller whose
value for the good we normalize to zero. There are N bidders. The common
value of the good is V , where V is a random variable from the perspective of
the bidders; that is, at the time they submit their bids, no bidder knows what
V is. Similar to before, a bidder’s utility is χV − p, where χ = 1 if he gets the
good and = 0 otherwise and p is his payment.
The timing of the game is as follows:
1. Nature determines V and signals θ1 , . . . , θN according to the joint distri-
bution F : V × Θ1 × · · · × ΘN → [0, 1], where V is the set of possible
values of V and Θn is the set of possible values of θn . The distribution F
is common knowledge.
2. Bidder n alone learns θn . This is bidder n’s private information (type).
3. The bidders decide whether to participate in an auction (play the mecha-
nism).
4. The auction is conducted and the good allocated.
Some useful notation is the following:
• Consider a subset of the first N positive integers, K, and let ki denote the
ith element of K (e.g., if N = 4, then a possible K = {2, 3}, so k1 = 2 and
k2 = 3). Let
θ K = θk1 , . . . , θkI ,
where I is the number of elements in θ K . Let
ΘK = Θk 1 × · · · × Θk I .
Observe θ K ∈ ΘK .
• If y is a vector, let [y] denote the vector formed from the elements of
y such that elements of [y] are ordered from largest to smallest; that is,
m < n implies [y]m ≥ [y]n . Observe
XM
d ∂Ψ(ζ1M |ξ) ∂Ψ(ζ1M |ξ)
Ψ(ζ1M |ξ) = =M ,
dζ m=1
∂z m ∂z1
where the last equality follows from symmetry. To conclude, then, the proba-
bility density function of ζ = max z conditional on ξ is
∂Ψ(ζ1M |ξ)
γ(ζ|ξ) = M . (11.21)
∂z1
then a strategy for a bidder is a mapping from what he knows, namely his type,
into a bid. Let Bn : Θ → R denote the strategy of bidder n. Given that the
bidders are symmetric, it is reasonable to look for a symmetric equilibrium;
hence, the subscript n will be dropped from Bn in what follows.
We are now in position to state and prove our main result:
Proposition 11.7 Assume common value and symmetric bidders. A symmet-
ric equilibrium exists for a sealed-bid second-price auction. In this equilibrium,
each bidder n’s equilibrium strategy is B(θn ) = v(θn , θn ).
Proof: Consider bidder n. We wish to show that the strategy B(·) given in
the statement of the proposition is a best response by bidder n if all the other
bidders are playing that strategy. Consider a bid, b, by bidder n. Given that
v(·, ·) is increasing in each argument, B(·) must be an increasing function. As
such, it is invertible. Because the highest bidder gets the good, bidder n is
awarded the good provided B −1 (b) ≥ [θ −n ]1 . Let τ = [θ −n ]1 From (11.21), if
g(τ |θn ) is the conditional probability distribution of τ conditional on θn , then
∂F (τ 1N −1 |θn )
g(τ |θn ) = (N − 1) .
∂θ1
The expected surplus of bidder n if he wins conditional on the next highest
valuation’s being τ is the difference between v(θn , τ ) and the second highest
bid, B(τ ). Hence, bidder n’s unconditional expected surplus is
Z B −1 (b)
v(θn , τ ) − B(τ ) g(τ |θn )dτ
θ
Z B −1 (b)
= v(θn , τ ) − v(τ, τ ) g(τ |θn )dτ , (11.22)
θ
where θ = inf Θ and the equality follows because the other bidders are assumed
to be playing B(θ) = v(θ, θ). The first-order condition for maximizing (11.22)
with respect to b is equivalent to
v θ, B −1 (b) − v B −1 (b), B −1 (b) = 0 (11.23)
Because v ·, B −1 (b) is an increasing function, we can satisfy the first-order con-
dition if and only if θ = B −1 (b), which is to say b = B(θ), as was to be shown.
Exercise 11.3.1: In the proof of Proposition 11.7, we neglected to check the second-
order condition. Verify it is satisfied. (Hint: What is the sign of v θ, B −1 (b) −
−1 −1
v B (b), B (b) if b < B(θn )? If b > B(θn )?)
One fact we might wish to know is how bids in the Proposition 11.7 equilibrium
compare with the bidders’ pre-bid estimates of the good’s value. In other words,
152 Lecture Note 11: Auctions
how does E{V |θn } compare to B(θn )? Recall with private values, they were the
same. Is this true with common value?
The answer, as we will demonstrate in a moment, is no. In fact, the bid will
always be less than the bidder’s expected value of the good given his information
(i.e., his signal). This is due to what is known as the winner’s curse: given the
bidding rule of Proposition 11.7, if you win the good, then your signal must have
been better than that of any other bidder. Learning this fact must cause you
to revise downward your estimate of the good’s value. In other words, winning
is “bad news”—like a “curse”—insofar as it causes a downward revision in the
winning bidder’s estimate of the good’s value. The rational bidder takes the
winner’s curse into account; that is, he realizes he wins only when the good is
not worth as much as he thinks it is. In other words, the value of a won object
is less than the expected value of the object absent knowledge of who will win.
Given the auction is sealed-bid second-price, a bidder bids what the good is
worth to him if he wins. Since the good he gets, should he win, is the won
object, his bid is rationally less than the expected value of the good based on
his signal alone.
This intuition is readily seen graphically. Consider Figure 11.1. In the figure,
X = [0, θ̄] and there are three bidders, n, 1, and 2. If bidder n wins the good, he
knows that the realized pair of signals of the other bidders (θ1 , θ2 ) must fall into
the smaller blue square. Before winning the good, he only that this realized pair
of signals falls into the larger square (blue and green regions combined). On
average, signals drawn from the smaller square will be less than signals drawn
from the larger square. Because the true value of the good, V , is strongly
affiliated with those signals, V will be larger on average when the signals are
drawn from the larger square than from the smaller square. Hence, winning—
which indicates the other bidders’ signals were drawn from the smaller square
rather than the larger square—is bad news and the expected value of the good
upon winning is less than it is conditional on the bidder’s signal alone.
To demonstrate the winner’s curse formally, recall that v(θn , θn ) is bidder
n’s conditional expectation of V conditional on his knowing (i) his signal is θn
and (ii) no other bidder drew a signal greater than θn (i.e., the value of the
“won object”). Hence,
Z θn Z θn
f (θ −n |θn )
v(θn , θn ) = ··· E{V |θn , θ −n } dθ −n , (11.24)
θ θ F (θ n 1N −1 |θn )
where θ = inf Θ and f (·|θn )/F (θn 1N −1 |θn ) is the probability density function
over the other bidders’ signals conditional on knowing (i) θn and (ii) that no
other bidder has a signal greater than θn .
Let N−n = {1, . . . , n − 1, n + 1, . . . , N }; that is, N−n are the first N positive
integers less integer n. Let Kℓ and Kh be two subsets of N−n such that
Kℓ ∪ Kh = N−n and Kℓ ∩ Kh = ∅ ;
that is, Kℓ and Kh are a partition of N−n . Note Kℓ could be the empty set
itself, in which case Kℓ = N−n .
11.3 Common Value Auctions 153
θ2
45◦
θ̄
+ determines E{V |θn }
θn
determines v(θn , θn )
θ1
θn θ̄
Figure 11.1: Illustration of the Winner’s Curse.
where θ̄ = sup Θ, Ij are the number of elements in Kj , j ∈ {ℓ, h}. F̄Kℓ ,Kh (θn )
is the probability, conditional on θn , that a given set of Iℓ bidders other than n
have signals less than θn , while a given set of Ih bidders have a signal greater
than θn . Note, given symmetry, F̄Kℓ ,Kh (θn ) can also be written:
Z θ̄ Z θ̄ Z θn Z θn
F̄Iℓ ,Ih (θn ) = ··· ··· f (θ −n |θn )dθ −n ,
θn θn θ θ
| {z }| {z }
Ih integrals Iℓ integrals
θ2
45◦
θ̄
F̄1,1 (θn ) F̄0,2 (θn )
θn
θ1
θn θ̄
Figure 11.2: Probabilities of the Different Regions of θ −n . Figure assumes
three bidders: 1, 2, and n.
ways.
Consider the following chain:
Z θ̄ Z θ̄
E{V |θn } = ··· E{V |θn , θ −n }f (θ −n |θn )dθ −n
θ θ
Z θn Z θn
= ··· E{V |θn , θ −n }f (θ −n |θn )dθ −n
θ θ
N −1
!Z Z Z Z
θ̄ θ̄ θn θn
X N −1
+ ··· ··· E{V |θn , θ −n }f (θ −n |θn )dθ −n
I θ θn θ θ
I=1 | n {z } | {z }
I integrals N −1−I integrals
Z θn Z θn
f (θ −n |θn )
= F (θn 1N −1 |θn ) ··· E{V |θn , θ −n } dθ −n
θ θ F (θn 1N −1 |θn )
N −1
! Z θ̄ Z θ̄ Z θn Z θn
X N −1 f (θ −n |θn )
+ F̄N −1,I (θn ) ··· ··· E{V |θn , θ −n } dθ −n .
I θ θ θ θ F̄ N −1,I (θn )
I=1 | n {z n} | {z }
I integrals N −1−I integrals
Observe that each integral expression in the last two lines is an expected value
of V conditional on the number of bidders other than n that have signals greater
than n’s. Moreover, (11.25) tells us that each such expected value in the last
exceeds the expected value in the penultimate line. The integral expression in
the penultimate line is, from (11.24), v(θn , θn ). The last two lines—and hence,
11.4 Appendix to Lecture Note 11: Stochastic Orders 155
If (11.26), write F ≥
fsd G. The following theorem can be proved:
Theorem 11.1 F fsd
≥
G if and only if F (x) ≤ G(x) for all x ∈ X .
We will not, however, prove that here; instead, we will prove a somewhat simpler
version:
Proposition 11.9 Suppose that F : X → [0, 1] and G : X → [0, 1] are differ-
entiable distributions and X is a bounded interval. Then F fsd
≥
G if and only if
F (x) ≤ G(x) for all x ∈ X .
Proof: Let f and g denote the derivatives of (densities associated with) F and
G, respectively.
Suppose F (x) ≤ G(x) for all x ∈ X . Let γ : X → R be a non-decreasing
function. Because γ is monotone, it is differentiable almost everywhere. Inte-
gration by parts yields
Z
γ(x) f (x) − g(x) dx = sup γ(x) F (x) − G(x)
X x∈X
Z
− inf γ(x) F (x) − G(x) − γ ′ (x) F (x) − G(x) dx
x∈X X
156 Lecture Note 11: Auctions
The first and second terms on the rhs are zero because supx∈X F (x) − G(x) =
1 − 1 = 0 and inf x∈X F (x) − G(x) = 0 − 0 = 0. The integrand in the third term
is everywhere non-positive by assumption. We can thus conclude:
Z Z
γ(x)f (x)dx ≥ γ(x)g(x)dx , (11.27)
X X
which is (11.26).
Suppose that (11.26) holds for all non-decreasing γ(·), which is to say (11.27)
holds. Suppose
0 , if x ≤ z
γ(x) =
1 , if x > z
for some z ∈ X . Given (11.27), we have 1 − F (z) ≥ 1 − G(z), which implies
F (z) ≤ G(z). Because z was arbitrary, we can conclude F (x) ≤ G(x) for all
x ∈ X.
The first and second terms on the rhs are zero because supx∈X F (x) − G(x) = 0
and inf x∈X F (x) − G(x) = 0. The integrand in the third term is positive by
assumption. We can thus conclude:
Z Z
γ(x)f (x)dx > γ(x)g(x)dx ,
X X
which is (11.28).
An important extension of
11.4 Appendix to Lecture Note 11: Stochastic Orders 157
Another extension is
Proposition 11.12 Let F and G be two differentiable distribution functions
with support (x0 , x3 ). Suppose F (x) ≤ G(x) for all x ∈ (x0 , x3 ) and there
exists an interval (x1 , x2 ) ⊂ (x0 , x3 ), x1 < x2 , such that F (x) < G(x) for all
x ∈ (x1 , x2 ). Then the expectation of x under F strictly exceeds the expectation
of x under G.
Proof: As before, let f and g be the densities associated with F and G,
respectively. We have
2 Z
X xi+1
EF (x) − EG (x) = x f (x) − g(x) dx
i=0 xi
2
X Z xi+1
= xi+1 F (xi+1 )−G(xi+1 ) − xi F (xi )−G(xi ) − F (x)−G(x) dx
i=0 xi
2 Z
X xi+1 Z x2
= G(x) − F (x) dx ≥ G(x) − F (x) dx > 0 ,
i=0 xi x1
where the penultimate implication follows from Lemma 1.1. The result then
follows from Theorem 11.1.
Proof: Observe:
Z ∞ Z ∞
rF (x) ≥ rG (x) ⇒ − rF (z)dz ≤ − rG (z)dz ⇒ F (x) ≤ G(x) ,
x x
y y ∨ y′
y2
y ∧ y′ y′
y2′
y1 y1′
Figure 11.3: Illustration of Join and Meet for M = 2.
Let X and Y be a random vector and variable, respectively, with joint density
f (·, ·). If X and Y are affiliated, then for all x ≥ x′ and y ≥ y ′ we have from
(11.32):
f (x, y)f (x′ , y ′ ) ≥ f (x, y ′ )f (x′ , y) .
Hence,
f (x, y) f (x′ , y)
≥ . (11.34)
f (x, y ′ ) f (x′ , y ′ )
Let fX (·) denote the marginal density of X and recall that f (x, y) = f (y|x)fX (x).
We can rewrite (11.34) as
f (y|x)fX (x) f (y|x′ )fX (x′ ) f (y|x) f (y|x′ )
≥ ⇒ ≥
f (y ′ |x)fX (x) f (y ′ |x′ )fX (x′ ) f (y ′ |x) f (y ′ |x′ )
f (y|x) f (y ′ |x)
⇒ ≥ . (11.35)
f (y|x′ ) f (y ′ |x′ )
Because y ≥ y ′ , (11.35) implies that F (·|x) lrd ≥
F (·|x′ ). Given Proposition 11.15,
′
we then have F (·|x) hrd F (·|x ). Proposition 11.13, in turn, yields F (·|x) fsd
≥ ≥
F (·|x′ ).
Let X be one-element vector (a scalar). Then X and Y affiliated mean
that if x > x′ , F (·|x) fsd
≥
F (·|x′ ). First-order stochastic dominance, in turn im-
′
plies, E{Y |x} ≥ E{Y |x } (Proposition 11.11). Hence, a regression of Y on X
must yield a non-decreasing regression line; that is X and Y are non-negatively
correlated. Formally,
5 For our purposes in this text, a supermodular function is a function, g : RK → R, that
Proposition 11.17 If random variables X and Y are affiliated, then they are
non-negatively correlated.
Proof:
Z Z
cov(X, Y ) = x − E{x} yf (x, y)dydx
ZX ZY
= x − E{x} yf (y|x)fX (x)dydx
ZX Y
= x − E{x} E{Y |x} fX (x)dx ≥ 0 .
X| {z }
both increasing in x
162 Lecture Note 11: Auctions
Hidden Action and
Incentives
163
Purpose
A common economic occurrence is the following: Two parties, principal and
agent, are in a situation—typically of their choosing—in which actions by the
agent impose an externality on the principal. Not surprisingly, the principal
will want to influence the agent’s actions. This influence will often take the
form of a contract that has the principal compensating the agent contingent on
either his actions or the consequences of his actions. Table 2 lists some examples
of situations like this. Note that, in many of these examples, the principal is
buying a good or service from the agent. That is, many buyer-seller relationships
naturally fit into the principal-agent framework. This part of the notes covers
the basic tools and results of agency theory.
To an extent, the principal-agent problem finds its root in the early literature
on insurance. There, the concern was that someone who insures an asset might
then fail to maintain the asset properly (e.g., park his car in a bad neighbor-
hood). Typically, such behavior was either unobservable by the insurance com-
165
166 Purpose
pany or too difficult to contract against directly; hence, the insurance contract
could not be directly contingent on such behavior. But because this behavior—
known as moral hazard—imposes an externality on the insurance company (in
this case, a negative one), insurance companies were eager to develop contracts
that guarded against it. So, for example, many insurance contracts have de-
ductibles—the first k dollars of damage must be paid by the insured rather than
the insurance company. Because the insured now has $k at risk, he’ll think
twice about parking in a bad neighborhood. That is, the insurance contract
is designed to mitigate the externality that the agent—the insured—imposes
on the principal—the insurance company. Although principal-agent analysis is
more general than this, the name “moral hazard” has stuck and, so, the types
of problems considered here are often referred to as moral-hazard problems. A
more descriptive name, which is also used in the literature, is hidden-action
problems.
Bibliographic Note
This part of the lecture notes draws heavily from a set of notes that I co-authored
with Bernard Caillaud.
The Moral-Hazard
Setting
We begin with a general picture of the situation we wish to analyze.
12
1. Two players are in an economic relationship characterized by the following
two features: First, the actions of one player, the agent, affect the well-
being of the other player, the principal. Second, the players can agree
ex ante to a reward schedule by which the principal pays the agent.1
The reward schedule represents an enforceable contract (i.e., if there is a
dispute about whether a player has lived up to the terms of the contract,
then a court or similar body can adjudicate the dispute).
2. The agent’s action is hidden; that is, he knows what action he has taken
but the principal does not directly observe his action. (Although we will
consider, as a benchmark, the situation in which the action can be con-
tracted on directly.) Moreover, the agent has complete discretion in choos-
ing his action from some set of feasible actions.2
For example, consider a salesperson who has discretion over the amount of
time or effort he expends promoting his company’s products. Many of these
actions are unobservable by his company. The company can, however, measure
in a verifiable way the number of orders or revenue he generates. Because these
measures are, presumably, correlated with his actions (i.e., the harder he works,
the more sales he generates on average), it may make sense for the company to
could be negative; that is, the principal fines or otherwise punishes the agent.
2 Typically, this set is assumed to be exogenous to the relationship. One could, however,
imagine situations in which the principal has some control over this set ex ante (e.g., she
decides what tools the agent will have available).
3 Information is verifiable if it can be observed perfectly (i.e., without error) by third parties
who might be called upon to adjudicate a dispute between principal and agent.
167
168 Lecture Note 12: The Moral-Hazard Setting
base his pay on his sales—put him on commission—to induce him to expend
the appropriate level of effort.
Here, we will also be imposing some additional structure on the situation:
• The players are symmetrically informed at the time they agree to a reward
schedule.
• Bargaining is take-it-or-leave-it (tioli): The principal proposes a contract
(reward schedule), which the agent either accepts or rejects. If he rejects
it, the game ends and the players receive their reservation utilities (their
expected utilities from pursuing their next best alternatives). If he accepts,
then both parties are bound by the contract.
• Contracts cannot be renegotiated.
• Once the contract has been agreed to, the only player to take further
actions is the agent.
• The game is played once. In particular, there is only one period in which
the agent takes actions and the agent completes his actions before any
performance measures are realized.
All of these are common assumptions and, indeed, might be taken to constitute
part of the “standard” principal-agent model.
The link between actions and performance can be seen as follows. Perfor-
mance is a random variable and its probability distribution depends on the ac-
tions taken by the agent. So, for instance, a salesperson’s efforts could increase
his average (expected) sales, but he still faces upside risk (e.g., an economic
boom in his sales region) and downside risk (e.g., introduction of a rival prod-
uct). Because the performance measure is only stochastically related to the
action, it is generally impossible to infer perfectly the action from the realiza-
tion of the performance measure. That is, the performance measure does not,
generally, reveal the agent’s action—it remains “hidden” despite observing the
performance measure.
The link between actions and performance can also be viewed in a indirect
way in terms of a state-space model. Performance is a function of the agent’s
actions and of the state of nature; that is, a parameter (scalar or vector) that
describe the economic environment (e.g., the economic conditions in the sales-
person’s territory). In this view, the agent takes his action before knowing the
state of nature. Typically, we assume that the state of nature is not observ-
able to the principal. If she could observe it, then she could perfectly infer the
agent’s action by inverting from realized performance. In this model, it is not
important whether the agent later observes the state of nature or not, given he
could deduce it from his observation of his performance and his knowledge of
his actions.
There is a strong assumption of physical causality in this setting, namely that
actions by the agent determine performance. Moreover, the process is viewed
as a static production process: There are neither dynamics nor feedback. In
169
particular, the contract governs one period of production and the game between
principal and agent encompasses only this period. In addition, when choosing
his actions, the agent’s information is identical to the principal’s. Specifically,
he cannot adjust his actions as the performance measures are realized. The
sequentiality between actions and performance is strict: actions are completed
first and, only then, is performance realized.
170 Lecture Note 12: The Moral-Hazard Setting
Basic Two-Action
Model
We start with the simplest principal-agent model. Admittedly, it is so simple
13
that a number of the issues one would like to understand about contracting
under moral hazard disappear. On the other hand, many issues remain and, for
pedagogical purposes at least, it is a good place to start.1
1 But the pedagogical value of this section should not lead us to forget caution. And caution
is indeed necessary as the model delivers some conclusions that are far from general. One could
say that the two-action model is tailored to fit naı̈ve intuition and to lead to the desired results
without allowing us to see fully the (implicit) assumptions on which we are relying.
171
172 Lecture Note 13: Basic Two-Action Model
U (s, x, a) = u(s) − aC ,
where, taking advantage of there being just two actions, we have normalized
the utility component from action 0 to be 0. Assume C > 0; that is, the agent
prefers action 0 to action 1 ceteris paribus. In the salesperson example, think
of action 1 as corresponding to the salesperson working hard, which he dislikes,
and action 0 corresponding to his taking it easy. The function u : R → R is
the agent’s utility for income. As such, it is an increasing function. As we
will see, for there to truly be an agency problem, one requires either that the
agent be risk averse—so u is strictly concave—or that there be a lower limit
on permissible compensation to the agent (typically, a requirement that s ≥ 0);
when there is a lower limit, we say the agent is protected by limited liability.
Here, though, we limit attention to a risk-averse agent. See Sappington (1983)
for an analysis with limited liability.
The usual assumption in agency models such as this is that the principal’s
payoff is a function of the outcome, x, and not directly of the agent’s action.
Further, because, if need be, we could reasonably suppose some increasing map-
ping from outcome to money, one assumes the outcome is a monetary payoff to
the principal (so X ⊆ R). Hence, the principal’s payoff is standardly
B(s, x, a) = b(x − s) ,
B(s, x, a) = x − s .
Full-Information Benchmark
Suppose, as a benchmark, that the principal could observe and establish (verify)
the agent’s choice of action. Call this benchmark the full or perfect information
case. Then the principal could make the contract with the agent contingent on
the agent’s effort, a. Moreover, because the agent is risk averse, while the princi-
pal is risk neutral, efficiency dictates the principal absorb all the risk. Hence, in
this benchmark case, there is no need to make the agent’s compensation depend
on the outcome; it will depend on his action only. Consider a contract of the
form:
s0 , if a = 0
s= .
s1 , if a = 1
13.1 The Two-action Model 173
Suppose the principal wants the agent to choose a = 1, then she must choose
s0 and s1 to satisfy two conditions. First, conditional on accepting the contract,
the agent must prefer action 1 to action 0; that is,
u (s1 ) − C ≥ UR . (ir)
max E1 {x} − s1
s0 ,s1
subject to the constraints (ic) and (ir), where Ea denotes expectation given
action a. Provided
both have solutions within the domain of u(·), then the solution to the principal’s
problem is straightforward: s1 solves (13.1) and s0 is a solution to u(s) < UR .
It is readily seen that this solution satisfies the constraints. Moreover, because
u(·) is strictly increasing, there is no smaller payment that the principal could
give the agent and still have him accept the contract. This contract is known
as a forcing contract.2 For future reference, let sF 1 be the solution to (13.1).
2 The solution to the principal’s maximization problem depends on the domain and range
of the utility function u(·). Let D, an interval in R, be its domain and R its range. Let s
be inf D (i.e., the greatest lower bound of D) and let s̄ be sup D (i.e., the least upper bound
of D). As shorthand for lims↓s u(s) and lims↑s̄ u(s), write u(s) and u(s̄), respectively. If
u(s) − C < UR for all s ≤ s̄, then no contract exists that satisfies (ir). In this case, the best
the principal could do is implement a = 0. Similarly, if u(s̄) − C < u(s), then no contract
exists that satisfies (ic). The principal would have to be satisfied with implementing a = 0.
Hence, a = 1 can be implemented if and only if u(s̄) − C > max{UR , u(s)}. Assuming this
condition is met, a solution is s0 ↓ s and s1 solving
u(s) − C ≥ max{UR , u(s)}.
Generally, conditions are imposed on u(·) such that a solution exists to u(s) < UR and (13.1).
Henceforth, we will assume that these conditions have, indeed, been imposed. For an example
of an analysis that considers bounds on D that are more binding, see Sappington (1983).
174 Lecture Note 13: Basic Two-Action Model
−1
Observe that sF 1 = u (UR + C), where u−1 (·) is the inverse of the function
u(·).
Another option for the principal is, of course, just to let the agent choose
the action he prefers absent any incentives to the contrary; that is, action 0.
There are many contracts that would accomplish this goal, although the most
“natural” is perhaps a non-contingent contract: s0 = s1 . Observe, here, there
is no ic constraint—the agent inherently prefers a = 0—and the only constraint
is the ir constraint:
u (s0 ) ≥ UR .
The expected-profit-maximizing (cost-minimizing) payment is the smallest pay-
ment satisfying that expression. Given that u(·) is increasing, this entails
s0 = u−1 (UR ). We will refer to this value of s0 as sF
0.
The principal’s expected payoff conditional on inducing a under the optimal
full-information contract for inducing a is Ea {x} − sF
a . The principal will, thus,
prefer to induce a = 1 if
E1 {x} − sF F
1 > E0 {x} − s0 .
In what follows, we will assume that this condition is met: That is, in our
benchmark case of verifiable action, the principal prefers to induce a = 1 (the
action the agent intrinsically dislikes).
Observe the steps taken in solving this benchmark case: first, for each pos-
sible action we solved for the optimal contract that induces that action. Then
we calculated the principal’s expected payoff under each such contract. The
action that the principal chooses to induce in equilibrium is, then, the one that
yields her the largest expected payoff. This two-step process for solving for the
optimal contract is frequently used in hidden-action agency problems, as we will
see.
3 We assume, when indifferent among a group of actions, that the agent chooses from
that group the action that the principal prefers. This assumption, although often troubling to
those new to agency theory, is not truly a problem. Recall that the agency problem is a game.
Consistent with game theory, we’re looking for an equilibrium of this game; i.e., a situation
in which players are playing mutual best responses and in which they correctly anticipate
13.2 The Optimal Incentive Contract 175
If this inequality is violated, the agent prefers a = 0. Observe that this is the
incentive compatibility (ic) constraint in this case.
The game we analyze is in fact a simple Stackelberg game, where the prin-
cipal is the first mover—she chooses the payment schedule—to which she is
committed; and the agent the second mover—choosing his action in response to
the payment schedule; that is, choosing the solution to
n o
max Ea u S(x) − aC
a
(with ties going to the principal). The solution is the agent’s equilibrium choice
of action and it is a function of the payment function S(·). Solving this contract-
ing problem then requires us to understand what kind of contract the principal
could and will offer.
Observe first that if she were to offer the fixed-payment contract S(x) = sF 0
for all x, then, as above, the agent would accept the contract and not bother
to expend effort. Among all contracts that induce the agent to choose action
a = 0 in equilibrium, this is clearly the cheapest one for the principal.
On the other hand, the fixed-payment contract sF 1 will no longer work given
the hidden-action problem: the agent gains sF 1 whatever his efforts, so he will
choose the action that costs him less, namely a = 0. It is in fact immediate
that any fixed-payment contract, which would be optimal if the only concern
were efficient risk-sharing, will induce an agent to choose his least costly action.
As, by assumption, the principal did better in the full-information benchmark
inducing a = 1, it seems plausible that she would still wish to induce that here,
although—as we’ve just seen—that must mean inefficient (relative to the first
best) allocation of risk to the agent.
We now face two separate questions. First, conditional on the principal’s
wanting the agent to choose a = 1, what is the optimal—least-expected-cost—
contract for the principal to offer? Second, are the principal’s expected payoffs
greater doing this than not inducing a = 1 (i.e., greater than her expected
payoffs from offering the fixed-payment contract S(x) = sF 0 )?
As in the benchmark case, not only must the contract give the agent an
incentive to choose the desired action (i.e., meet the ic constraint), it must also
be individually rational:
n o
E1 u S(x) − C ≥ UR . (ir′ )
the best responses of their opponents. Were the agent to behave differently when indifferent,
then we wouldn’t have an equilibrium because the principal would vary her strategy—offer a
different contract—so as to break this indifference. Moreover, it can be shown that in many
models the only equilibrium has the property that the agent chooses among his best responses
(the actions among which he is indifferent given the contract) the one most preferred by the
principal.
176 Lecture Note 13: Basic Two-Action Model
The next few sections will consider the solution to (13.2) under a number of
different assumptions about the distribution functions Fa .
Two assumptions, additional to those given previously, about the agent’s
utility-for-income function, u, will be common to these analyses:
2. lims↓s u(s) = −∞ and lims↑∞ u(s) = ∞; that is, the image of u is the
entire set R.
Fa (xL ) = 1 − qa ,
subject to
and
qu (sH ) + (1 − q) u (sL ) − C ≥ UR (ir)
We could solve this problem mechanically using the usual techniques for max-
imizing a function subject to constraints, but it is far easier, here, to use a
4 Because its domain is an open interval and it is concave, u(·) is continuous everywhere on
But this means that {s̃n } satisfies both constraints and yields a greater expected
payoff, which contradicts the optimality of {s∗n }. Therefore, by contradiction,
we can conclude that ir also binds under the optimal contract for inducing
a = 1.
We’re now in a situation where the two constraints must bind at the optimal
contract. But, given we have only two unknown variables, sH and sL , this means
we can solve for the optimal contract merely by solving the constraints. Doing
so yields
−1 1
ŝH = u UR + C and ŝL = u−1 (UR ) . (13.3)
q
Observe that the payments vary with the state (as we knew they must because
fixed payments fail the ic constraint).
1 − q − λ (1 − q) u′ (sL ) = 0
and
q − λqu′ (sH ) = 0 ,
respectively. Solving, it is clear that sL = sH . The proof when u (·) is not (everywhere)
differentiable is only slightly harder and is left to the reader.
178 Lecture Note 13: Basic Two-Action Model
Recall that were the agent’s action verifiable (i.e., in the full-information
−1
benchmark), the contract would be S (x) = sF 1 = u (UR + C). Rewriting
(13.3) we see that
1−q
ŝH = u−1 u sF
1 + C and ŝL = u−1 u sF
1 −C ;
q
that is, one payment is above the payment under full information, while the other
is below the payment under full information. Moreover, the expected payment
to the salesperson is greater than sF
1:
1−q
qŝH + (1 − q) ŝL = qu−1 u sF1 + q C + (1 − q) u
−1
u sF 1 −C
1−q
≥ u−1 q u sF 1 + q C + (1 − q) u s F
1 − C (13.4)
= u−1 u sF
1 = sF
1;
where the inequality follows from Jensen’s inequality.6 Provided the agent is
strictly risk averse, the above inequality is strict: inducing the agent to choose
a = 1 costs strictly more in expectation when the principal cannot verify the
agent’s action.
Before proceeding, it is worth considering why the principal suffers from her
inability to verify the agent’s action (i.e., from the existence of a hidden-action
problem). Ceteris paribus, the agent prefers a = 0 to a = 1 because the latter
costs him more. Hence, when the principal wishes to induce a = 1, her interests
and the agent’s are not aligned. To align their interests, she must offer the agent
incentives to choose a = 1. The problem is that the principal cannot directly
tie these incentives to the variable in which she is interested, namely the action
itself. Rather, she must tie these incentives to outcomes, which are imperfectly
correlated with action. These incentives, therefore, expose the agent to risk. We
know, relative to the first best, that this is inefficient. Someone must bear the
cost of this inefficiency. Because the bargaining game always yields the agent
the same expected utility (i.e., ir is always binding), the cost of this inefficiency
must, thus, be borne by the principal.
Another way to view this last point is that because the agent is exposed to
risk, which he dislikes, he must be compensated. This compensation takes the
form of a higher expected payment.
To begin to appreciate the importance of the hidden-action problem, observe
6 Jensen’s inequality for convex functions states that if g (·) is convex function, then
that
lim qŝH + (1 − q) ŝL = lim ŝH
q↑1 q↑1
= u−1 u sF
1
= sF
1 .
subject to
quH + (1 − q) uL − C ≥ uL and (ic′′ )
quH + (1 − q) uL − C ≥ UR (ir′′ )
Observe that, in this space, the agent’s indifference curves are straight lines, with
lines farther from the origin corresponding to greater expected utility. The prin-
cipal’s iso-expected-payoff curves are concave relative to the origin—reflecting
7 The support of distribution G over random variable X, sometimes denoted supp {X}, is
uH 45◦
uL
Figure 13.1: Representative indifference curves in utility space for the princi-
pal and agent. Straight lines (in olive) are agent’s, curved lines
(in violet) are principal’s. Agent enjoys greater expected utility
moving up. Principal greater expected payoff moving down.
that u−1 (·) is a convex function— with curves closer to the origin corresponding
to greater expected payoffs (lower). Figure 13.1 illustrates. Observe the agent’s
indifference curves and the principal’s can be tangent only at the 45◦ line, a
well-known result from the insurance literature.8 This illustrates why efficiency
(in a first-best sense) requires that the agent not bear risk.
We can re-express (ic′′ ) as
C
uH ≥ uL + . (13.5)
q
Hence, the set of contracts that are incentive compatible lie on or above a line
above, but parallel, to the 45◦ line. Graphically, we now see that an incentive-
compatible contract requires that we abandon non-contingent contracts. Fig-
ure 13.2 shows the space of incentive-compatible contracts.
8 Proof: Let φ(·) = u−1 (·). Then the mrs for the principal is
(1 − q)φ′ (uL )
− ;
qφ′ (uH )
whereas the mrs for the agent is
1−q
− .
q
′
Because φ(·) is strictly convex, φ (·) is strictly monotone. Consequently, the mrs’s can be
tangent only on the 45◦ line.
13.3 Two-outcome Model 181
IC
uH
45°
Both IR and IC
contracts
Individually
IC rational
contracts Optimal
contracts
contract
C/q IR
uL
The set of individually rational contracts are those that lie on or above the
line defined by (ir′′ ). This is also illustrated in Figure 13.2. The intersection of
these two regions then constitutes the set of feasible contracts for inducing the
salesperson to choose a = 1. Observe that the principal’s lowest iso-expected-
payoff curve that intersects this set is the one that passes through the “corner” of
the set—consistent with our earlier conclusion that both constraints are binding
at the optimal contract.
Lastly, let’s consider the variable q. We can interpret q as representing the
correlation—or, more accurately, the informativeness—of sales to the action
taken. At first glance, it might seem odd to be worried about the informa-
tiveness of sales since, in equilibrium, the principal can accurately predict the
agent’s choice of action from the structure of the game and her knowledge of the
contract. But that’s not the point: the principal is forced to design a contract
that pays the agent based on performance measures that are informative about
the variable upon which she would truly like to contract, namely his action.
The more informative these performance measures are—loosely, the more cor-
related they are with action—the closer the principal is getting to the ideal of
contracting on the agent’s action.
In light of this discussion it wouldn’t be surprising if the principal’s expected
profit under the optimal contract for inducing a = 1 increases as q increases.
182 Lecture Note 13: Basic Two-Action Model
qxH + (1 − q)xL ,
qŝH + (1 − q)ŝL ,
Let us now see that the contract {ũn } satisfies both ir and ic when q = q2 .
Observe, first, that
q2 ũH + (1 − q2 )ũL = rq2 u1H + (1 − r)q2 + 1 − q2 u1L
= q1 u1H + (1 − q1 )u1L .
finds least costly requires a contract that is fully contingent on the performance
measure. This is a consequence of the action being unobservable to the principal,
not the agent’s risk aversion. When, however, the agent is risk averse, then the
principal’s expected cost of solving the hidden-action problem is greater than it
would be in the benchmark full-information case: exposing the agent to risk is
inefficient (relative to the first best) and the cost of this inefficiency is borne by
the principal. The size of this cost depends on how good an approximation the
performance measure is for the variable upon which the principal really desires
to contract, the agent’s action. The better an approximation (statistic) it is, the
lower is the principal’s expected cost. If, as here, that shift also raises expected
revenue, then a more accurate approximation means a greater expected payoff.
It is also worth pointing out that one result, which might seem as though it
should be general, is not: namely, the result that compensation is increasing
with performance (e.g., ŝH > ŝL ). Although this is true when there are only two
possible realizations of the performance measure (as we’ve proved), this result
does not hold generally when there are more than two possible realizations.
subject to
Z ∞
u S(x) dF1 (x) − C ≥ UR and (13.6)
Z0 ∞ Z ∞
u S(x) dF1 (x) − C ≥ u S(x) dF0 (x) , (13.7)
0 0
9 Note that we’ve again established the result that, absent an incentive problem, a risk-
neutral player should absorb all the risk when trading with a risk-averse player.
13.4 Multiple-outcomes Model 185
f0 (x)
r(x) ≡ .
f1 (x)
This ratio has a clear statistical meaning: it measures how much more likely
is it that the distribution from which sales have been determined is F0 rather
than F1 when outcome x is observe. When r(x) is high, observing x allows the
principal to draw the statistical inference that it is much more likely that the
distribution was actually F0 ; that is, the agent did not choose the action she
wished him to take. In this case,
f0 (x)
λ+µ 1−
f1 (x)
is small (but necessarily positive) and S(x) must also be small as well. When
r(x) is small, the principal can feel rather confident that her agent acted as
desired and she should, then, reward him highly. That is, outcomes that are
relatively more likely when the agent has behaved in the desired manner result
in larger payments to the agent than outcomes that would be relatively rare if
the agent had behaved in the desired manner.
The minimum-cost incentive contract that induces the costly action a = 1 in
essence commits the principal to behave like a Bayesian statistician who holds
some diffuse prior over which action the agent has taken:10 She should use
the outcome to revise her beliefs about what action the agent took and she
should reward the agent more for outcomes that cause her to revise upward
her beliefs that he took the desired action and she should reward him less
(punish him) for outcomes that cause a downward revision in her beliefs.11 As
a consequence, the payment schedule is connected to outcomes only through
the outcomes’ statistical properties (the relative differences in the densities),
not through their accounting properties. In particular, there is now no reason
to believe that higher outcomes (larger x) should be rewarded more than lower
ones.
As an example of non-monotonic compensation, suppose that there are three
possible outcomes: low, medium, and high (xL , xM , and xH , respectively). Let
10 A diffuse prior is one that assigns positive probability to each possible action.
11 Of course, as a rational player of the game, the principal can infer that, if the contract
is incentive compatible, the agent will have taken the desired action. Thus, there is not, in
some sense, a real inference problem. Rather the issue is that, to be incentive compatible, the
principal must commit to act as if there were an inference problem.
186 Lecture Note 13: Basic Two-Action Model
Then
λ , if x = xL
f0 (x)
λ+µ 1− = λ − µ , if x = xM .
f1 (x)
λ + µ/3 , if x = xH
Hence, the low outcome is rewarded more than the medium one—a low outcome
is uninformative about the agent’s action, whereas a medium outcome suggests
that the agent has not taken the desired action. Admittedly, non-monotonic
compensation is rarely, if ever, observed in real life. We will see below what
additional properties are required, in this model, to ensure monotonic compen-
sation.
Note, somewhat implicit in our analysis to this point, is an assumption that
f1 (x) > 0 except, possibly, on a subset of x that are impossible (have zero
measure). Without this assumption, (13.8) would entail division by zero, which
is, of course, not permitted. If, however, we let f1 (·) go to zero on some subset
of x that had positive measure under F0 (·), then we see that µ must also tend
to zero because
f0 (x)
λ+µ 1−
f1 (x)
must be positive. In essence, then, the shadow price (cost) of the incentive
constraint vanishes as f0 (·) goes to zero. This makes sense: were f1 (·) zero on
some subset of x that could occur (had positive measure) under F0 (·), then the
occurrence of any x in this subset, X0 , would be proof that the agent had failed
to take the desired action. We can use this, then, to design a contract that
induces a = 1, but which costs the principal no more than the optimal full-
information fixed-payment contract S(x) = sF 1 . That is, the incentive problem
ceases to be costly; so, not surprisingly, its shadow cost is zero.
To see how we can construct such a contract when f1 (x) = 0 for all x ∈ X0 ,
let
s + ε , if x ∈ X0
S(x) = ,
sF
1 , if x ∈/ X0
where ε > 0 is arbitrarily small (s, recall, is the greatest lower bound of the
domain of u(·)). Then
Z ∞
u S(x) dF1 (x) = u sF 1 and
Z0 ∞ Z Z
u S(x) dF0 (x) = u(s + ε)dF0 (x) + u sF1 dF0 (x)
0 X0 R+ \X0
= u (s + ε) F0 (X0 ) + u sF
1 1 − F0 (X0 ) .
13.5 Monotonicity of the Optimal Contract 187
R∞
From the last expression, it’s clear that 0 u S(x) dF0 (x) → −∞ as ε → 0;
hence, the ic constraint is met trivially. By the definition of sF
1 , ir is also met.
We see, therefore, that this contract implements a = 1 at full-information cost.
Again, as we saw in the two-outcome model, having a shifting support (i.e., the
property that F0 (X0 ) > 0 = F1 (X0 )) allows us to implement the desired action
at full-information cost.
To conclude this section, we need to answer one final question. Does the
principal prefer to induce a = 1 or a = 0 given the “high” cost of the former?
The principal’s choice can be viewed as follows: either she offers the fixed-
payment contract sF 0 , which induces action a = 0, or she offers the contract S(·)
derived above, which induces action a = 1. The expected-profit-maximizing
choice results from the simple comparison of these two contracts; that is, the
principal offers the incentive contract S(·) if and only if
E0 {x} − sF
0 < E1 {x} − E1 S(x) . (13.9)
The rhs of this inequality corresponds to the value of the maximization program
(13.2). Given that the incentive constraint is binding in this program, this value
is strictly smaller than the value of the same program without the incentive
constraint; hence, just as we saw in the two-outcome case, the value is smaller
than full-information profits, E1 {x} − sF
1 . Observe, therefore, that it is possible
that
E1 {x} − sF F
1 > E0 {x} − s0 > E1 {x} − E1 S(x) .
In other words, under full information, the principal would induce a = 1, but
not if there’s a hidden-action problem. In this case, imperfect observability of
the agent’s action imposes a cost on the principal that may induce her to distort
the action that she induces the agent to take.
Definition 13.1 The likelihood ratio r(x) = f0 (x)/f1 (x) satisfies the monotone
likelihood ratio property( mlrp) if r(·) is non-increasing almost everywhere and
strictly decreasing on at least some set of outcomes that occur with positive
probability given action a = 1.
188 Lecture Note 13: Basic Two-Action Model
The mlrp states that the greater is the outcome (i.e., x), the greater the relative
probability of x given a = 1 than given a = 0. In other words, under mlrp,
better outcomes are more likely when the agent pursues the desired action than
when he doesn’t. To summarize:
Proposition 13.1 In the model of this section, if the likelihood ratio, r(·), sat-
isfies the monotone likelihood ratio property, then the optimal incentive contract
for inducing him to choose a = 1 is non-decreasing everywhere.
this, suppose it were not true; that is, suppose that r(·) is almost everywhere non-decreasing
under distribution F1 . Note this entails that x and r(x) are non-negatively correlated under
F1 . To make the exposition easier, suppose for the purpose of this aside that fa (·) = Fa′ (·).
Then Z ∞ Z ∞
f0 (x) f0 (x)
E1 = f1 (x)dx = f0 (x)dx = 1
f1 (x) 0 f1 (x) 0
Because x and r(x) are non-negatively correlated, we have
Z ∞
x r(x) − E1 r(x) f1 (x)dx ≥ 0 .
0
But this contradicts our assumption that a = 1 yields a greater expected outcome than does
a = 0. Hence, by contradiction it must be that r(·) is decreasing over some measurable
set. But then this means that S(·) is increasing over some measurable set as well. However,
without mlrp, we can’t conclude that it’s not also decreasing over some other measurable set.
Conclusion: If E0 {x} < E1 {x}, then S(·) is increasing over some set of x that has positive
probability of occurring given action a = 1 even if mlrp does not hold.
13 This can most readily be seen from the previous footnote: simply assume that r(·) satisfies
mlrp, which implies x and r(x) are negatively correlated. Then, following the remaining steps,
it quickly falls out that E1 {x} > E0 {x}.
13.6 Informativeness of the Performance Measure 189
Informativeness of the
Performance Measure 13.6
In this section, we explore again the role played by the informativeness of the
performance measure. In particular, we ask what if the principal has multiple
performance measures on which she could base a contract?
To be more concrete, suppose that there is a second performance measure
y. For example, if the agent is a salesperson, x could be his sales of one good,
y his sales of a second good. Both outcomes speak to the salesperson’s overall
effort.
Let f0 (x, y) and f1 (x, y) denote the joint probability densities of x and y
for actions 0 and 1, respectively. An incentive contract can now be a function
of both performance variables; that is, s = S(x, y). It is immediate that the
same approach as before carries through and yields the following optimality
condition:14
f0 (x, y)
u′ S(x, y) λ + µ 1 − − 1 = 0. (13.10)
f1 (x, y)
When is it optimal to make compensation a function of y as well as of x?
The answer is straightforward: when the likelihood ratio,
f0 (x, y)
r(x, y) = ,
f1 (x, y)
actually depends upon y. Conversely, when the likelihood ratio is independent
of y, then there is no gain from contracting on y to induce a = 1; indeed, it
would be sub-optimal in this case because such a compensation scheme would
fail to satisfy (13.10).
The likelihood ratio is independent of y if and only if the following holds:
there exist three functions h(·, ·), g0 (·), and g1 (·) such that, for all (x, y),
14 Although we use the same letters for the Lagrange multipliers, it should be clear that
their values at the optimum are not related to their values in the previous, one-performance-
measure, contracting problem.
190 Lecture Note 13: Basic Two-Action Model
Sufficiency is obvious: divide f0 (x, y) by f1 (x, y) and observe the resulting ra-
tio, g0 (x)/g1 (x), is independent of y. Necessity is also straightforward: set
h(x, y) = f1 (x, y), g1 (x) = 1, and g0 (x) = r(x). This condition of multiplica-
tive separability, (13.11), has a well-established meaning in statistics: if (13.11)
holds, then x is a sufficient statistic for the action a given data (x, y). In words,
were we trying to infer a, our inference would be just as good if we observed
only x as it would be if we observed the pair (x, y). That is, conditional on
knowing x, y tells us nothing more about a.
The irrelevance of y when x is a sufficient statistic for a is quite intuitive.
Recall that the value of performance measures to our contracting property rests
solely on their statistical properties. The optimal contract should be based on
all performance measures that convey information about the agent’s decision;
but it is not desirable to include performance measures that are statistically
redundant with other measures. As a corollary, there is no gain from considering
ex post random contracts (e.g., a contract that based rewards on x + η, where η
is some random variable—noise—distributed independently of a that is added
to x). As a second corollary, if the principal could freely eliminate noise in the
performance measure—that is, switch from observing x + η to observing x—she
would do better (at least weakly).
Proposition 13.2 If the agent is strictly risk averse, there is no shifting sup-
port, and the principal seeks to implement the action the agent finds costly (i.e.,
a = 1), then the principal’s expected payoffs are smaller than under full (perfect)
information. In some instances, this reduction in expected payoffs may lead the
principal to implement the less costly action (i.e., a = 0).
• When (13.9) holds, the reward schedule imposes risk on the risk-averse
agent: performances that are more likely when the agent takes the correct
action a = 1 are rewarded more than performances that are more likely
under a = 0.
• Under mlrp (or when the agent can sabotage outcomes), the optimal re-
ward schedule is non-decreasing in performance.
• The optimal reward schedule depends only upon performance measures that
are sufficient statistics for the agent’s action.
To conclude, let me stress the two major themes that I would like you to
remember from this section. First, imperfect information implies that the con-
tractual reward designed by the principal should perform two tasks: share the
risks involved in the relationship and provide incentives to induce the agent
13.7 Conclusions from the Two-action Model 191
Bibliographic Notes
• a principal;
• an agent;
• a set of possible actions, A, from which the agent chooses (we take A to
be exogenously determined here);
• a set of benefits, B, for the principal that are affected by the agent’s action
(possibly stochastically);
1 We could also worry about whether the principal wants to participate—even make a take-
it-or-leave-it offer—but because our focus is on the contract design and its execution, stages
of the game not reached if she doesn’t wish to participate, we will not explicitly consider this
issue here.
193
194 Lecture Note 14: General Framework
In many settings, including the one explored above, the principal’s benefit is
the same as the verifiable performance measure (i.e., b = x). But this need not
be the case. We could, for instance, imagine that there is a function mapping
the elements of A onto B. For example, the agent’s action could be fixing
the “true” quality of a product produced for the principal. This quality is also,
then, the principal’s benefit (i.e., b = a). The only verifiable measure of quality,
however, is some noisy (i.e., stochastic) measure of true quality (e.g., x = a + η,
where η is some randomly determined distortion). As yet another possibility,
the benchmark case of full information entails X = X ′ × A, where X ′ is some
set of performance measures other than the action.
We need to impose some structure on X and B and their relationship to A:
We take X to be a Euclidean vector space and we let dF (·|a) denote the prob-
ability measure over X conditional on a. Similarly, we take B to be a Euclidean
vector space and we let dG (·, ·|a) denote the joint probability measure over B
and X conditional on a (when b ≡ x, we will write dF (·|a) instead of dG (·, ·|a)).
This structure is rich enough to encompass the possibilities enumerated in the
previous paragraph (and more).
Although we could capture the preferences of the principal and agent without
assuming the validity of the expected-utility approach to decision-making under
uncertainty (we could, for instance, take as primitives the indifference curves
shown in Figures 13.1 and 13.2), this approach has not been taken in the lit-
erature.2 Instead, the expected-utility approach is assumed to be valid and we
let W (s, x, b) and U (s, x, a) denote the respective von Neumann-Morgenstern
utilities of the principal and of the agent, where s denotes the transfer from the
principal to the agent (to principal from agent if s < 0).
In this situation, the obvious contract is a function that maps X into R. We
define such a contract as
1992; Rabin, 1997), this might, at first, seem somewhat surprising. However, as Epstein, §2.5,
notes many of the predictions of expected-utility theory are robust to relaxing some of the
more stringent assumptions that support it (e.g., such as the independence axiom). Given the
tractability of the expected-utility theory combined with the general empirical support for the
predictions of agency theory, the gain from sticking with expected-utility theory would seem
to outweigh the losses, if any, associated with that theory.
195
must report which action he has chosen. We could even let the principal make
a “good faith” report of what action she believes the agent took, although
this creates its own moral-hazard problem because, in most circumstances, the
principal could gain ex post by claiming she believes the agent’s action was
unacceptable. It turns out, as we will show momentarily, that there is nothing
to be gained by considering such elaborate contracts; that is, there is no such
contract that can improve over the optimal simple contract.
To see this, let us suppose that a contract determines a normal-form game to
be played by both players after the agent has taken his action.3,4 In particular,
suppose the agent takes an action h ∈ H after choosing his action, but prior to
the realization of x; that he takes an action m ∈ M after the realization of x; and
that the principal also takes an action n ∈ N after x has been realized. One or
more of these sets could, but need not, contain a single element, a “null” action.
We assume that the actions in these sets are costless—if we show that costless
elaboration does no better than simple contracts, then costly elaboration also
cannot do better than simple contracts. Finally, let the agent’s compensation
under this elaborate contract be: s = S̃(x, h, m, n). We can now establish the
following:
Proposition
D 14.1 (Simple
E contracts are sufficient) For any general con-
tract H, M, N , S̃ (·) and associated (perfect Bayesian) equilibrium, there ex-
ists a simple contract S(·) that yields the same equilibrium outcome.
Suppose that, facing this contract, the agent chooses an action a different from
a∗ . This implies that:
Z Z
U (S(x), x, a)dF (x|a) > U (S(x), x, a∗ )dF (x|a∗ ),
X X
3 Note this may require that there be some way that the parties can verify that the agent
has taken an action. This may simply be the passage of time: The agent must take his action
before a certain date. Alternatively, there could be a verifiable signal that the agent has acted
(but which does not reveal how he’s acted).
4 Considering a extensive-form game with the various steps just considered would not alter
the reasoning that follows; so, we avoid these unnecessary details by restricting attention to
a normal-form game.
196 Lecture Note 14: General Framework
Because, in the equilibrium of the normal-form game that commences after the
agent chooses his action, h∗ (·) and m∗ (·, ·) must satisfy the following inequality:
Z
U S̃(x, h∗ (a) , m∗ (x, a), n∗ (x)), x, a dF (x|a) ≥
ZX
U S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)), x, a dF (x|a),
X
it follows that
Z
U S̃(x, h∗ (a) , m∗ (x, a), n∗ (x)), x, a dF (x|a) >
ZX
U S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)), x, a∗ dF (x|a∗ ).
X
This contradicts the fact the a∗ is an equilibrium action in the game defined
by the original contract. Hence, the simple contract S(·) gives rise to the same
action choice, and therefore the same distribution of outcomes than the more
complicated contract.
Observe that choosing S(·) amounts to choosing a as well, at least when there
exists a unique optimal choice for the agent. To take care of the possibility of
multiple optima for the agent, one can simply imagine that the principal chooses
a pair (S(·), a) subject to the incentive constraint (14.1). The ir constraint takes
the simple form: Z
max′
U (S(x), x, a′ )dF (x|a′ ) ≥ UR . (14.2)
a X
197
In light of the second assumption, we can always satisfy (14.2) for any action a
(there is no guarantee, however, that we can also satisfy (14.1)).
With these two assumptions in hand, suppose that we’re in the full-infor-
mation case; that is, X = X ′ × A (note X ′ could be a single-element space, so
that we’re also allowing for the possibility that, effectively, the only performance
measure is the action itself). In the full-information case, the principal can rely
on forcing contracts; that is, contracts that effectively leave the agent with no
198 Lecture Note 14: General Framework
choice over the action he chooses. Hence, writing (x′ , a) for an element of X , a
forcing contract for implementing â is
S(x′ , a) = sP if a 6= â
= S F (x′ ) if a = â,
where S F (·) satisfies (14.2). Given that S F (x′ ) = sR satisfies (14.2) by as-
sumption, we know that we can find an S F (·) function that satisfies (14.2).
In equilibrium, the agent will choose to sign the contract—the ir constraint is
met—and he will take action â since this is his only possibility for getting at
least his reservation utility. Forcing contracts are very powerful because they
transform the contracting problem into a simple ex ante Pareto computation
program:
Z
max W (S(x), x, b)dG(b, x|a) (14.5)
(S(·),a) X
s.t. (14.2),
where only the agent’s participation constraint matters. This ex ante Pareto
program determines the efficient risk-sharing arrangement for the full-infor-
mation optimal action, as well as the full-information optimal action itself. Its
solution characterizes the optimal contract under perfect information.
At this point, we’ve gone about as far as we can go without imposing more
structure on the problem. The next couple of Lecture Notes consider more
structured variations of the problem.
The Finite Model 15
In this Lecture Note, we will assume A, the set of possible actions, is finite with
J elements. Likewise, the set of possible verifiable performance measures, X ,
is also taken to be finite with N elements, indexed by n (although, at the end
of this section, we’ll discuss the case where X = R). In many ways, this is the
most general version of the principal-agent model (but, alas, not necessarily the
most analytically tractable one).
Assume that the agent’s utility is additively separable between payments
and action. Moreover, it is not, directly, dependent on performance. Hence,
199
200 Lecture Note 15: The Finite Model
As before, we consider as a benchmark the case where the principal can observe
and verify the agent’s action. Consequently, as we discussed at the end of
Lecture Note 14, the principal can implement any action â that she wants using
a forcing contract: The contract punishes the agent sufficiently for choosing
actions a 6= â that he would never choose any action other than â; and the
contract rewards the agent sufficiently for choosing â that he is just willing to
sign the principal’s contract. This last condition can be stated formally as
where ŝ is what the agent is paid if he chooses action â. Solving this last
expression for ŝ yields
ŝ = u−1 UR + c (â) ≡ C F (â) .
The function C F (·) gives the cost, under full information, of implementing ac-
tions.
min f (â) · s
s
subject to
N
X
fn (â)u(sn ) − c(â) ≥ UR
n=1
subject to
f (â) · u − c(â) ≥ UR (ir)
1 Observe, given the separability between the principal’s benefit and cost, minimizing her
and
f (â) · u − c(â) ≥ f (a) · u − c(a) ∀a ∈ A . (ic)
Proof: Suppose not: let u be a contract that implements â and suppose that
Define
ε = f (â) · u − c(â) − UR .
f (a) · u
e = f (a) · u − ε
for all a ∈ A, this new contract also satisfies (ic). Observe, too, that this new
contract is superior to u: it satisfies the constraints, but costs the principal less.
Hence, a contract cannot be optimal unless (ir) is an equality under it.
and
(where (15.3) follows from (ic) and (15.2)). We are now in position to establish
the following proposition:
Proof: Let j = 1, . . . , J − 1 index the elements in A other than â. Then the
system (15.2) and (15.3) can be written as J + 1 inequalities:
f (â) · u ≤ UR + c(â)
[−f (â)] · u ≤ −UR − c(â)
f (a1 ) · u ≤ UR + c(a1 )
..
.
f (aJ−1 ) · u ≤ UR + c(aJ−1 )
By a well-known result in convex analysis (see, e.g., Rockafellar, 1970, page 198),
there is a u that solves this system if and only if there is no vector
and
J−1
X
µ̂+ UR + c(â) + µ̂− − UR − c(â) + µj UR + c(aj ) < 0 . (15.5)
j=1
Observe that if such a µ exists, then (15.5) entails that not all elements can be
zero. Define µ∗ = µ̂+ −µ̂− . By post-multiplying (15.4) by 1N (an N -dimensional
vector of ones), we see that
J−1
X
µ∗ + µj = 0. (15.6)
j=1
and
J−1
X
c(â) > σj c(aj ) ; (15.8)
j=1
204 Lecture Note 15: The Finite Model
that is, there is a contract u that solves the above system of inequalities if and
only if there is no (mixed) strategy that induces the same density over the per-
formance measures as â (i.e., satisfies (15.7)) and that has lower expected cost
(i.e., satisfies (15.8)).
Proof: Consider the fixed-payment contract that pays the agent un = UR +c(e a)
for all n. This contract clearly satisfies (ir) and, because c(e a) ≤ c(a) for
all a ∈ A, it also satisfies (ic). The cost of this contract to the principal is
2 We can formalize this notion of informationally distinct as follows: the condition that
no strategy duplicate the density over performance measures induced by â is equivalent to
saying that there is no density (strategy) (σ1 , . . . , σJ−1 ) over the other J − 1 elements of A
such that
J−1
X
f (â) = σj f (aj ) .
j=1
Mathematically, that’s equivalent to saying that f (â) is not a convex combination of
{f (a)}a∈A\{â} ; or, equivalently that f (â) is not in the convex hull of {f (a)|a 6= â}. See
Hermalin and Katz (1991) for more on this “convex-hull” condition and its interpretation. Fi-
nally, from Proposition 15.2, the condition that f (â) not be in the convex hull of {f (a)|a 6= â}
is sufficient for â to be implementable.
15.1 The “Two-step” Approach 205
u−1 UR + c(e
a) = C F (e
a), the full-information cost.
where the last inequality follows from the fact that (ir) is binding.
Define
t∗ = B(a∗ ) − C F (a∗ ) .
Suppose the principal offers to sell the right to her benefit to the agent for t∗ .
If the agent accepts, then the principal will enjoy the same expected payoff she
would have enjoyed under full information. Will the agent accept? If he accepts,
he faces the problem
Z
max (b − t∗ )dG(b|a) − c(a) .
a∈A B
This is equivalent to
People often dismiss the case where the agent is risk neutral by claiming that
there is no agency problem because the principal could “sell the store (produc-
tive asset)” to the agent. As this last proposition makes clear, such a conclusion
relies critically on the ability to literally sell the asset; that is, if the principal’s
benefit is not alienable, then this conclusion might not hold.4 In other words, it
is not solely the agent’s risk aversion that causes problems with a hidden action.
Corollary 15.1 Assume the agent is risk neutral; that is, u(·) is affine. As-
sume, too, that the principal’s benefit equals the performance measure (i.e.,
B = X and G(·|a) = F (·|a)). Then the principal can achieve the same expected
utility with a hidden-action problem as she could under full information.
Exercise 15.1.1: Prove Corollary 15.1 (Hint: let s(x) = x − t, where t is a constant.)
Now we turn our attention to the case where u(·) is strictly concave (the
agent is risk averse). Observe (i) this entails that u−1 (·) is strictly convex; (ii)
because S is an open interval, that u(·) is continuous; and (iii) that u−1 (·) is
continuous.
4 To see this, suppose the benefit is unalienable. Assume, too, that A = {1/4, 1/2, 3/4},
√
X = {1, 2}, c(a) = a, f2 (a) = a, UR = 0, and B(a) = 4 − 4(a − 1/2)2 . Then it is readily
seen that a = 1/2. However, from Proposition 15.2, a∗ is not implementable, so the full-
∗
information outcome is unobtainable when the action is hidden (even though the agent is risk
neutral).
208 Lecture Note 15: The Finite Model
Proposition 15.7 Assume that the agent is strictly risk averse in income; that
is, u (·) is strictly concave. If â is implementable, then there exists a unique
contract that implements â at minimum expected cost.
The strict convexity and continuity of u−1 (·) implies that Ω is also a strictly con-
vex and continuous function. Observe that the principal’s problem is to choose
u to minimize Ω(u) subject to (ir) and (ic). Let U be the set of contracts
that satisfy (ir) and (ic) (by assumption, U is not empty). Were U closed and
bounded, then a solution to the principal’s problem would certainly exist be-
cause Ω is a continuous real-valued function.6 Unfortunately, U is not bounded
(although it is closed given that all the inequalities in (ir) and (ic) are weak
inequalities). Fortunately, we can artificially bound U by showing that any so-
lution outside some bound is inferior to a solution inside the bound. Consider
any contract u0 ∈ U and consider the contract u∗ , where u∗n = UR + c(â). Let
U ir be the set of contracts that satisfy (ir). Note that U ⊂ U ir . Note, too, that
both U and U ir are convex sets.
5 The existence portion of this proof is somewhat involved mathematically and can be
(15.9)). It is readily seen that if these two contracts each satisfy both the (ir)
and (ic) constraints, then any convex combination of them must as well (i.e.,
both are elements of U , which is convex). That is, the contract
uλ ≡ λu + (1 − λ)e
u,
λ ∈ (0, 1), must be feasible (i.e., satisfy (ir) and (ic)). Because Ω is strictly
convex, Jensen’s inequality implies
Ω(uλ ) < λΩ(u) + (1 − λ)Ω(e
u) = Ω(u) .
But this contradicts the optimality of u. By contradiction, uniqueness is estab-
lished.
We’ve already seen (Proposition 15.1) that the ir constraint binds, hence λ > 0.
Because â is not a least-cost action and there is no shifting support, it is readily
shown that at least one ic constraint binds (i.e., ∃j such that µj > 0). It’s
convenient to rewrite the first-order condition as
1 X
J−1
fn (aj )
=λ+ µ j 1 − ; n = 1, . . . , N . (15.10)
u′ u−1 (un ) j=1
fn (â)
Note the resemblance between (15.10) and (13.8) in Section 13.4. The difference
is that, now, we have more than one Lagrange multiplier on the actions (as
we now have more than two actions). In particular, we can give a similar
interpretation to the likelihood ratios, fn (aj )/fn (â), that we had in that earlier
section; with the caveat that we now must consider more than one action.
3. Consider two principal-agent models that are identical except that the
information structure (i.e., {f (a)|a ∈ A}) in one is more informative than
the information structure in the other. How do the costs of implementing
actions vary between these two models.
Because u(·) is strictly concave, the principal’s expected cost if the agent chooses
â under contract u, Ω(u), is a strictly convex function of u. By Jensen’s in-
equality and the fact that there is no shifting support, Ω, therefore, has a unique
minimum in U ir , namely u∗ . Clearly, Ω(u∗ ) = C F (â). The result, then, fol-
lows if we can show that u∗ is not incentive compatible. Given that â is not a
least-cost action, there exists an a such that c(â) > c(a). But
Assuming â to be implementable, note the elements that go into this last propo-
sition: there must be an agency problem—misalignment of interests (i.e., â is
not least cost); there must, in fact, be a significant hidden-action problem (i.e.,
no shifting support); and the agent must be risk averse. We saw earlier that
without any one of these elements, an implementable action is implementable
at full-information cost (Propositions 15.3–15.5); that is, each element is indi-
vidually necessary for cost to increase when we go from full information to a
hidden action. This last proposition shows, inter alia, that they are collectively
sufficient for the cost to increase.
15.2 Properties of the Optimal Contract 211
Next we turn to the second question. We already know from our analysis of
the two-action model that the assumptions we have so far made are insufficient
for us to conclude that compensation will be monotonic. From our analysis
of that model, we might expect that we need some monotone likelihood ratio
property. In particular, we assume
Intuitively, mlrp is the condition that actions that the agent finds more costly
be more likely to produce better outcomes.
Unlike the two-action case, however, mlrp is not sufficient for us to obtain
monotone compensation (see Grossman and Hart, 1983, for an example in which
mlrp is satisfied but compensation is non-monotone). We need an additional
assumption:
Another way to state the cdfp is that the distribution over performance is
better—more likely to produce high signals—if the agent plays a pure strategy
than it is if he plays any mixed strategy over two actions when that mixed
strategy has the same expected disutility as the pure strategy.
We can now answer the second question:
Proposition 15.9 Assume there is no shifting support, that u(·) is strictly con-
cave and differentiable, and that mlrp and cdfp are met. Then the optimal
contract given the hidden-action problem satisfies s1 ≤ · · · ≤ sN .
The result then follows if we can show that this contract remains optimal when
we expand A′ to A—adding actions cannot reduce the cost of implementing â,
hence we are done if we can show that the optimal contract for the restricted
problem is incentive compatible in the unrestricted problem. That is, if there
is no a, c(a) > c(â), such that
But this and (15.12) are inconsistent with (15.11); that is, (15.11) cannot hold,
as was required.
and
P2 = A, X , F2 , B(·), c(·), u(·), UR .
Suppose there exists a stochastic transformation matrix Q (i.e., a garbling),9
such that f 2 (a) = Qf 1 (a) for all a ∈ A, where f i (a) denotes an element of
Fi . Then, for all a ∈ A, the principal’s expected cost of optimally implementing
action a in the first principal-agent problem, P1 , is not greater than her expected
cost of optimally implementing a in the second principal-agent problem, P2 .
which each column is a probability density (i.e., has non-negative elements that sum to one).
15.3 A Continuous Performance Measure 213
Proposition 15.10 states that if two principal-agent problems are the same,
except that they have different information structures, where the information
structure of the first problem is more informative than the information structure
of the second problem (in the sense of Blackwell’s Theorem), then the principal’s
expected cost of optimally implementing any action is no greater in the first
problem than in the second problem. By strengthening the assumptions slightly,
we can, in fact, conclude that the principal’s expected cost is strictly less in the
first problem. In other words, making the signal more informative about the
agent’s action makes the principal better off. This is consistent with our earlier
findings that (i) the value of the performance measures is solely their statistical
properties as correlates of the agent’s action; and (ii) the better correlates—
technically, the more informative—they are, the lower the cost of the hidden-
action problem.
It is worth observing that Proposition 15.10 implies that the optimal incen-
tive scheme never entails paying the agent with lotteries over money (i.e., ran-
domly mapping the realized performance levels via weights Q into payments).
A Continuous Performance
Measure 15.3
Suppose that X were a real interval—which, without loss of generality, we can
take to be R—rather than a discrete space and suppose, too, that F (x|a) were
10 For this proof it is necessary to distinguish between row vectors and column vectors, as
well as transposes of matrices. All vectors should be assumed to be column vectors. To make
a vector x a row vector, we write x⊤ . Observe that H⊤ , where H is a matrix, is the transpose
of H. Observe that x⊤ y is the dot-product of x and y (what we’ve been writing as x · y).
214 Lecture Note 15: The Finite Model
subject to
Z ∞
u(x)f (x|â)dx − c(â) ≥ UR ; and
−∞
Z ∞ Z ∞
u(x)f (x|â)dx − c(â) ≥ u(x)f (x|a)dx − c(a) ∀a ∈ A .
−∞ −∞
11 Where our existence proof “falls down” when X is continuous is that our proof relies on
the fact that a continuous function from RN → R has a minimum on a closed and bounded
set. But, here, the contract space is no longer a subset of RN , but rather the space of all
functions from X → R; and there is no general result guaranteeing the existence of a minimum
in this case.
12 Page (1987) considers conditions for existence in this case (actually he also allows for
A to be a continuous space). Most of the assumptions are technical, but not likely to be
considered controversial. Arguably a problematic assumption in Page is that the space of
possible contracts is constrained; that is, assumptions are imposed on an endogenous feature
of the model, the contracts. In particular, if S is the space of permitted contracts, then there
exist L and M ∈ R such that L ≤ s(x) ≤ M for all s(·) ∈ S and all x ∈ X . Moreover,
S is closed under the topology of pointwise convergence. On the other hand, it could be
argued that range of real-life contracts must be bounded: Legal and other constraints on
what payments the parties can make effectively limit the space of contracts to some set of
bounded functions.
13 That is,
x f (x|a) > 0 = x|f (x|a′ ) > 0
for all a and a′ in A.
15.3 A Continuous Performance Measure 215
Bibliographic Note
Much of the analysis in this section has been drawn from Grossman and Hart
(1983). In particular, they deserve credit for Propositions 15.1, 15.5, and 15.7–
15.10 (although, here and there, we’ve made slight modifications to the state-
ments or proofs). Proposition 15.2 is based on Hermalin and Katz (1991). The
rest of the analysis represent well-known results.
216 Lecture Note 15: The Finite Model
Continuous Action
Space 16
So far, we’ve limited attention to finite action spaces. Realistic though this may
be, it can serve to limit the tractability of many models, particularly when we
need to assume the action space is large. A large action space can be problematic
for two, related, reasons. First, under the two-step approach, we are obligated
to solve for the optimal contract for each a ∈ A (or at least each a ∈ AI ) then,
letting C(a) be the expected cost of inducing action a under its corresponding
optimal contact, we next maximize B(a) − C(a)—expected benefit net expected
cost. If A is large, then this is clearly a time-consuming and potentially im-
practical method for solving the principal-agent problem. The second reason a
large action space can be impractical is because it can mean many constraints
in the optimization program involved with finding the optimal contract for a
given action (recall, e.g., that we had J − 1 constraints—one for each action
other than the given action). Again, this raises issues about the practicality of
solving the problem.
These problems suggest that we would like a technique that allows us to
solve program (14.3) on page 197,
Z
max W (S(x), x, b)dG(b, x|a) (16.1)
(S(·),a) X
subject to Z
a ∈ arg max
′
U (S(x), x, a′ )dF (x|a′ ) (16.2)
a X
and Z
max
′
U (S(x), x, a′ )dF (x|a′ ) ≥ UR ,
a X
directly, in a one-step procedure. Generally, to make such a maximization pro-
gram tractable, we would take A to be a compact and continuous space (e.g., a
closed interval on R), and employ standard programming techniques. A number
of complications arise, however, if we take such an approach.
Most of these complications have to do with how we treat the ic constraint,
expression (16.2). To make life simpler, suppose that A = [a, ā] ⊂ R, X = R,
that F (·|a) is differentiable and, moreover, that the expression in (16.2) is itself
differentiable for all a ∈ A. Then, a natural approach would be to observe that
if a ∈ (a, ā) maximizes that expression, it must necessarily be the solution to
the first-order condition to (16.2):
Z
Ua S(x), x, a f (x|a) + U S(x), x, a fa (x|a) dx = 0 (16.3)
X
217
218 Lecture Note 16: Continuous Action Space
W (S (x) , x, b) = x − S (x) ,
Observe Assumptions 5–7 allow us, inter alia, to assume that c (a) = a without
loss of generality. Assumption 5 is known as a spanning condition.
In what follows, the following result will be critical:
so we have
Z Z x̄
d
U S(x), x, a dF (x|a) = u S(x) fH (x) − fL (x) γ ′ (a)dx > 0
da X x
for (16.2).
We’ll proceed as follows. We’ll suppose that S(·) is increasing and we’ll solve
the principal’s problem. Of course, when we’re done, we’ll have to double check
that our solution indeed yields an increasing S(·). It will, but if it didn’t, then
our approach would be invalid. The principal’s problem is
Z x̄
max x − S(x) f (x|a)dx
S(·),a x
220 Lecture Note 16: Continuous Action Space
As we’ve shown many times now, this last constraint must be binding; so we
have a classic constrained optimization program. Letting λ be the Lagrange
multiplier on the ir constraint and letting µ be the Lagrange multiplier on
(16.5), we obtain the first-order conditions:
−f (x|a) + µu′ S(x) fH (x) − fL (x) γ ′ (a) + λu′ S(x) f (x|a) = 0
where r(x) = fL (x)/fH (x). Recall that 1/u′ (·) is an increasing function; hence,
to test whether S(·) is indeed increasing, we need to see whether the rhs is
decreasing in r(x) given r(·) is decreasing. Straightforward calculations reveal
that the derivative of the rhs with respect to r(x) is
−γ ′ (a)
2 < 0.
r(x) + 1 − r(x) γ(a)
We’ve therefore shown that S(·) is indeed increasing as required; that is, our
use of (16.5) for (16.2) was valid.
Observe, from (16.6), that, because the agent’s second-order condition is
met, the first line in (16.6) must be positive; that is,
Z x̄
x − S(x) fH (x) − fL (x) dx > 0 .
x
16.1 The First-order Approach with a Spanning Condition 221
But this implies that, for this S(·), the principal’s problem is globally concave
in a:
Z x̄ Z x̄
d2
2
x − S(x) f (x|a)dx = x − S(x) fH (x) − fL (x) γ ′′ (a)dx < 0 .
da x x
Moreover, for any S(·), the principal’s problem is (trivially) concave in S(·).
Hence, we can conclude that the first-order approach is, indeed, valid for this
problem.
Admittedly, the spanning condition is a fairly stringent condition; although
it does have an economic interpretation. Suppose there are two distributions
from which the performance measure could be drawn, “favorable” (i.e., FH ) and
“unfavorable” (i.e., FL ). The harder the agent chooses to work—the higher is
a—the greater the probability, γ(a), that the performance measure will be drawn
from the favorable distribution. For instance, suppose there are two types of
potential customers, those who tend to buy a lot—the H type—and those who
tend not to buy much—the L type. By investing more effort, a, in learning his
territory, a salesperson (agent) increases the probability that he will sell to H
types rather than L types.
Bibliographic Note
The first papers to use the first-order approach were Holmstrom (1979) and
Shavell (1979). Grossman and Hart (1983) was, in large part, a response to the
potential invalidity of the first-order approach. The analysis under the spanning
condition draws, in part, from Hart and Holmström (1987).
222 Lecture Note 16: Continuous Action Space
Bibliography
Basov, Suren, Multidimensional Screening, Berlin: Springer-Verlag, 2010.
223
224 BIBLIOGRAPHY
van Tiel, Jan, Convex Analysis: An Introductory Text, New York: John Wiley
& Sons, 1984.
Yeh, James, Real Analysis: Theory of Measure and Integration, 2nd ed., Sin-
gapore: World Scientific, 2006.
Index
,v of distribution function property,
∨, see join 211
∧, see meet pseudo-concave function, 19
condition
action cross-partial, 30
least-cost, 204 single-crossing, 100
affiliated, 159 Spence-Mirrlees, 62, 100
strongly, 160 constraint
agency capacity, 49
full information, 172 incentive-compatibility, 82, 102, 173
limited liability and, 172 individual-rationality, 40, 83, 102,
agent, 165 173
contractual screening, 77 participation, 83, 173
allocation revelation, see incentive compati-
dependent, 119 bility
independent, 119 self-selection, see incentive com-
arbitrage, 42, 49 patibility
auction truth-telling, see incentive compat-
English, 137 ibility
sealed-bid, second-price, 137 consumer surplus, 6
auctions aggregate, 9
common value, 148 contract
private value, 133 enforceable, 167
forcing, 173
Basov, Suren, 98 least-cost, 184
Benedetto, John J., 18n cost
Borch, Karl H., 191 marginal, 19
sharing rule, modified, 191, 214 Crémer, Jacques, 121n
bunching, 113 cs, see consumer surplus
Czaja, Wojciech, 18n
Caillaud, Bernard, 75, 99n, 166
Choné, Philippe, 98 d’Aspremont, Claude, 131
Clarke, Edward H., 125 deadweight loss, 26
collusion, 121 triangular shape of, 26
concavity demand
ideally quasi-concave function, 20 aggregate, 9
log, 14 elastic, 22
227
228 INDEX