0% found this document useful (0 votes)

10 views

LectureNotes201b_v9

This document contains lecture notes by Benjamin E. Hermalin, covering topics in economics, including pricing, mechanism design, and incentives. It is copyrighted material, allowing personal use but prohibiting distribution or resale. The notes are intended to supplement lectures for Economics 201b at UC Berkeley.

Uploaded by

sjmin711

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

LectureNotes201b_v9

Uploaded by

sjmin711

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 240

Copyright c 2015 by Benjamin E. Hermalin. All rights reserved.

Important Notice

Copyright c 2015 by Benjamin E. Hermalin. All rights reserved.

This material is copyrighted and the author retains all rights. Your use of it is
subject to the following. You may make copies of it for your personal use, but
not for distribution or resale. Any copy must contain this notice page. You are
prohibited from posting or distributing this material electronically.
Faculty interested in assigning all or part of this material in their courses should
contact the author at [email protected] .
Contents

Preface v

I Pricing 1
Purpose 3

1 Buyers and Demand 5

1.1 Consumer Demand . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Firm Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Demand Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Linear Tariffs 17
2.1 Elasticity and Linear Pricing . . . . . . . . . . . . . . . . . . . . 21
2.2 Welfare Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 An Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Pass-Through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 First-degree Price Discrimination 39

3.1 Two-Part Tariffs . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Third-degree Price Discrimination 45

4.1 The Notion of Type . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Characteristic-based Discrimination . . . . . . . . . . . . . . . . 46
4.3 Welfare Considerations . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Capacity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Transportation Costs . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Second-degree Price Discrimination 61

5.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 A Graphical Approach to Quantity Discounts . . . . . . . . . . . 65

II Mechanism Design 73
Purpose 75

i
ii CONTENTS

6 The Basics of Contractual Screening 77

7 The Two-Type Screening Model 79

7.1 A Simple Two-Type Screening Situation . . . . . . . . . . . . . . 79
7.2 Contracts under Incomplete Information . . . . . . . . . . . . . . 81

8 General Screening Framework 91

9 The Standard Framework 97

10 Mechanism Design with Multiple Agents 115

10.1 Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.2 Independent Allocations . . . . . . . . . . . . . . . . . . . . . . . 118
10.3 Dependent Allocations . . . . . . . . . . . . . . . . . . . . . . . . 123

11 Auctions 133
11.1 Efficient Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11.2 Allocation via Bayesian-Nash Mechanisms . . . . . . . . . . . . . 137
11.3 Common Value Auctions . . . . . . . . . . . . . . . . . . . . . . . 148
11.4 Appendix to Lecture Note 11: Stochastic Orders . . . . . . . . . 155
11.5 Appendix to Lecture Note 11: Affiliation . . . . . . . . . . . . . . 159

III Hidden Action and Incentives 163

Purpose 165

12 The Moral-Hazard Setting 167

13 Basic Two-Action Model 171

13.1 The Two-action Model . . . . . . . . . . . . . . . . . . . . . . . . 171
13.2 The Optimal Incentive Contract . . . . . . . . . . . . . . . . . . 174
13.3 Two-outcome Model . . . . . . . . . . . . . . . . . . . . . . . . . 176
13.4 Multiple-outcomes Model . . . . . . . . . . . . . . . . . . . . . . 183
13.5 Monotonicity of the Optimal Contract . . . . . . . . . . . . . . . 187
13.6 Informativeness of the Performance Measure . . . . . . . . . . . . 189
13.7 Conclusions from the Two-action Model . . . . . . . . . . . . . . 190

14 General Framework 193

15 The Finite Model 199

15.1 The “Two-step” Approach . . . . . . . . . . . . . . . . . . . . . . 200
15.2 Properties of the Optimal Contract . . . . . . . . . . . . . . . . . 209
15.3 A Continuous Performance Measure . . . . . . . . . . . . . . . . 213

16 Continuous Action Space 217

16.1 The First-order Approach with a Spanning Condition . . . . . . 218
CONTENTS iii

Bibliography 223

Index 227
iv CONTENTS
Preface
These lecture notes are intended to supplement the lectures and other materials
for the second half of Economics b at the University of California, Berkeley.

A Word on Notation
Various typographic conventions are used to help guide you through these notes.
Notes in margins:
Text that looks like this is an important definition. On the screen or printed
These denote
using a color printer, such definitions should appear blue.
important
The symbol in the margin denotes a paragraph that may be hard to follow
“takeaways.”
and, thus, requires particularly close attention (not that you should read any of
these notes without paying close attention).

Mathematics Notation

Vectors are typically denoted in bold (e.g., x and w are vectors). In some
instances, when the focus is on the nth element of a vector x, I will write
x = (xn , x−n ). The vector x−n is the subvector formed by removing the nth
element (note its dimension is one less than x’s).
Script letters (e.g., A, B, etc.) generally denote sets. Objects in curly
brackets, { and }, are typically elements of sets. Hence, {a, b, c} is the three-
element set with elements a, b, and c. The notation {x|P} indicates the set of
x satisfying the property P. For instance, {x|x < 2 or x ≥ 3} is the set of all
numbers less than 2 or not less than 3. The usual set notation of ∪ for set union
and ∩ for set intersection will be used. The empty set is denoted ∅.
Intervals can be open, closed, or open at one end and closed at the other:
(a, b) = {x|a < x < b} is open, [a, b] = {x|a ≤ x ≤ b} is closed, (a, b] = {x|a <
x ≤ b} is open to the left and closed to the right, and [a, b) = {x|a ≤ x < b} is
open to the right and closed to the left. The sets (a, ∞) and [a, ∞) are the sets
of all numbers greater than a and all numbers not less than a, respectively.
A function, f , will be indicated by its mapping; that is, f : D → R. The
D is the domain of the function, the set R the range, and the set f (D) ≡
set
y y = f (x) for some x ∈ D is the image of f . A function can be denoted
either by its symbol (e.g., f ); or, when it might be confused for a variable, by
f (·). Note the difference between f (x) and f (·): the former is the value of f
evaluated at x, the latter is the function itself. In some instances, it is easier to
define a function by showing its mapping. Hence, I might write “the function
defined by x 7→ 1 − x2 .”

v
vi Preface

The notation x ∼ F means that the random variable x is distributed accord-

ing to the distribution F .1 Sometimes this will be written x ∼ F : X → [0, 1],
where X is a set with the property that F (z) = 0 for all z < inf X and F (z) = 1
for all z > sup X , where inf and sup denote infimum and supremum, respec-
tively. Recall the infimum of a set is its greatest lower bound and the supremum
of a set is its least upper bound.
If sup X ∈ X , the supremum is the maximum element of X , denoted max X .
A similar relation exists for infimum and minimum, denoted min X . Although
a set bounded above has a supremum, it may fail to have a maximum; for
example, sup(0, 1) = 1 but max(0, 1) does not exist (there is no largest number
strictly less than one). On the other hand, max(0, 1] exists and equals sup(0, 1].
Single primes denote derivatives of single-variable functions (e.g., f ′ (x) =
df (x)/dx). Double primes denote second derivatives (e.g., f ′′ (x) = d2 f (x)/dx2 ).
An asterisk will often denote a solution to an optimization program. Hence,
the solution to maximize x − x2 is x∗ = 1/2. In some instances, I may write
1/2 ∈ argmaxx x − x2 . Observe argmaxx f (x) is a set. When the set has
a single value (the maximizer is unique), one may “cheat” and write 1/2 =
argmaxx x − x2 rather than the truly kosher {1/2} = argmaxx x − x2 . Note the
difference between argmaxx x − x2 , which is {1/2}, and maxx x − x2 , which is
the value of x − x2 when maximized (i.e., 1/4). Often an expression such as
maxx x − x2 indicates that the problem before us is to find the x that maximizes
that expression.
Limits will be written limx→y . Occasionally, when it is important that we
are considering only x less than y, I may write limx↑y . Similarly, if we are
considering only x greater than y, I may write limx↓y .
If integration is be carried out over elements of a set Z, I will often write
Z
f (z)dz .
Z

If X = X1 × · · · XN and x = (x1 , . . . , xN ) ∈ X , I will tend to write

Z
f (x)dx
X

rather than Z Z
··· f (x)dxN · · · dx1 .
X1 XN
Two frequently used abbreviations are lhs for the left-hand side of an ex-
pression and rhs for the right-hand side, as illustrated below:
lhs S rhs .
Other mathematical notation is summarized in Table 1.
1 Recall that a distribution function gives the probability that the realization of the random

variable in question will not exceed the indicated value. That is, if F (·) is a distribution
function, then F (x) is the probability that the realization of the random variable in question
is no greater than x. Some authors refer to a distribution function as a cumulative distribution
function and use the abbreviation cdf for it.
vii

Symbol Meaning

∈ Element of

∀ For all

∃ There exists

· Dot (vector)
P multiplication (i.e.,
x · y = i x i yi )

s.t. Such that

a.e. Almost everywhere (i.e., true ev-

erywhere except, possibly, on a
set of measure zero)

×Nn=1 Xn The Cartesian product space

formed from the spaces Xn , n =
1, . . . , N

R The set of real numbers. Rn =

×ni=1 R is the n-dimensional Eu-
clidean space. R+ are the non-
negative reals.

E The expectations operator. If X

is a random variable, then E{X}
is the expected value of X.

X \Y Set difference; that is, X \Y is

the set of elements that are in
X that areTnot also in Y. Note
X \Y = X Y c , where Y c is the
complement of Y.

Table 1: Some Mathematical Notation

viii Preface
Pricing

1
Purpose
If one had to distill economics down to a single-sentence description, one prob-
ably couldn’t do better than describe economics as the study of how prices are
and should be set. This portion of the Lecture Notes is primarily focused on
the normative half of that sentence, how prices should be set, although I hope
it also offers some positive insights as well.
Because I’m less concerned with how prices are set, these notes don’t consider
price setting by the Walrasian auctioneer or other competitive models. Nor is it
concerned with pricing in oligopoly. Our attention will be exclusively on pricing
by a single seller who is not constrained by competitive or strategic pressures
(e.g., a monopolist).
Now, one common way to price is to set a price, p, per unit of the good in
question. So, for instance, I might charge $10 per coffee mug. You can buy as
many or as few coffee mugs as you wish at that price. The revenue I receive
is $10 times the number of mugs you purchase. Or, more generally, at price p
per unit, the revenue from selling x units is px. Because px is the formula for a
line through the origin with slope p, such pricing is called linear pricing. If we
define T to be the function—tariff —that relates quantity purchased to amount
paid seller—so acquiring x units means paying the seller T (x), linear pricing
represents a tariff of the form T (x) = px. Such a tariff is called a
If you think about it, you’ll recognize that linear tariffs are not the only type
of tariffs you see. Generically, tariffs in which revenue is not a linear function of
the amount sold are called nonlinear tariffs.2 An example of nonlinear pricing
would be if I gave a 10% discount if you purchased five or mugs (e.g., revenue
is $10x if x < 5 and $9x if x ≥ 5).

2 Remember in mathematics a function is linear if αf (x ) + βf (x ) = f (αx + βx ), where

0 1 0 1
α and β are scalars. Note, then, that a linear function from R to R is linear only if it has the
form f (x) = Ax.

3
4 Purpose
Buyers and Demand
A seller sets prices and buyers respond. To understand how they respond, we
1
need to know what their objectives are. If they are consumers, the standard
assumption is that they wish to maximize utility. If they are firms, the pre-
sumption is they wish to maximize profits.

Consumer Demand 1.1

In the classic approach to deriving demand,1 we maximize an individual’s utility
subject to a budget constraint; that is,

max u(x)
x
(1.1)
subject to p · x ≤ I ,

where x is an N -dimensional vector of goods, p is the N -dimensional price

vector, and I is income. Solving this problem yields the individual’s demand
curve for each good n, x∗n (pn |p−n , I) (where the subscript −n indicates that
it is the N − 1-dimensional subvector of prices other than the price of the nth
good). Unfortunately, while this analysis is fine for studying linear pricing, it
works less well for nonlinear pricing. In particular, the literature on nonlinear
pricing typically requires that the inverse of individual demand also represent
the consumer’s marginal benefit curve.2 Unless there are no income effects, this
is not a feature of demand curves.
For this reason, we will limit attention to quasi-linear utility. Assume that an
individual purchases two goods. The amount of the one in which we’re interested
(i.e., the one whose pricing we’re studying) is denoted x. The amount of the
other good is denoted y. We can (and will) normalize the price of good y to 1.
We can, if we like, consider y to be the amount of consumption other than of
good x. The utility function is assumed to have the form

u(x, y) = v(x) + y . (1.2)

1 As set forth, for instance, in Mas-Colell et al. (1995) or Varian (1992).

2 Recall that a demand function is a function from price to quantity; that is, for any given
price, it tells us the amount the consumer wishes to purchase. Because demand curves slope
down—Giffen goods don’t exist—the demand function is invertible. Its inverse, which is a
function from quantity to price, tells us for any quantity the price at which the consumer
would be willing to purchase exactly that quantity (assuming linear pricing).

5
6 Lecture Note 1: Buyers and Demand

Observe there is no further loss of generality in assuming that v(0) = 0, an

assumption we now impose.3
With two goods, we can maximize utility by first solving the constraint in
(1.1) for y, yielding y = I − px (recall y’s price is 1), and then substituting that
into the utility function to get an unconstrained maximization problem:4

max v(x) − px + I . (1.3)

Before solving this problem, recall one can’t add apples and oranges; that is,
there must be agreement in units across the terms being added. The amounts px
and I are in dollars (or whatever the relevant currency is).5 It follows, therefore,
that v(x) must also be a dollar amount.
Solving the optimization program (1.3), we have the first-order condition

v ′ (x) = p . (1.4)

Because (1.4) gives price as a function of quantity, it is the individual’s inverse

demand curve. By definition v ′ (·) is also the marginal benefit schedule. Hence,
as desired, we have marginal benefit of x equal to inverse demand. If we define
p(x) to be the inverse demand curve, then we have
Z x Z x
p(q)dq = v ′ (q)dq = v(x) .
0 0

Observe the last term is the benefit (gross of expenditure) the consumer ob-
tains from x units. Utilizing expression (1.3), we see that utility at the utility-
maximizing quantity is Z x
p(q)dq − xp(x) (1.5)
0
plus a constant. The quantity in (1.5) equals the area below the inverse demand
curve and above the price of x. See Figure 1.1. You may also recall that (1.5)
is the formula for consumer surplus (cs).
Another way to think about this is to consider the first unit the individual
purchases. It provides him or her (approximate) benefit v ′ (1) and costs him or

3 To see why this is without loss of generality, suppose, instead, that

U (x, y) = V (x) + y ,
where V (0) 6= 0. Define v(x) = V (x) − V (0). Because V (0) is a constant, the utility function
u(x, y) = v(x) + y
must reflect the same preferences and is, thus, a valid representation of those preferences.
Exercise: Prove that U (x, y) ≥ U (x′ , y ′ ) if and only if u(x, y) ≥ u(x′ , y ′ ) for any bundles
(x, y) and (x′ , y ′ ).
4 Well, actually, we need to be careful; there is an implicit constraint that y ≥ 0. In what

follows, we assume that this constraint doesn’t bind.

5 Hereafter, the currency used in these Notes will be dollars. This is not as American-

centric as it might at first seem: for all you know I am thinking of Australian, Canadian, or
Singaporean dollars.
1.1 Consumer Demand 7

price ($/unit)

CS
p(x)
p(q)

x units

Figure 1.1: Consumer surplus (CS) at quantity x is the area beneath inverse
demand curve (p(q)) and above inverse demand at x, p(x).

her p. His or her surplus or profit is, thus, v ′ (1) − p. For the second unit the
surplus is v ′ (2) − p. And so forth. Total surplus from x units, where v ′ (x) = p,
is, therefore,
Xx
(v ′ (q) − p) ;
q=1

or, passing to the continuum (i.e., replacing the sum with an integral),
Z x Z x Z x
′
′
v (q) − p dq = v (q)dq − px = p(q)dq − px .
0 0 0

Yet another way to think about this is to recognize that the consumer wishes
to maximize his or her surplus (or profit), which is total benefit, v(x), minus
his or her total expenditure (or cost), px. As always, the solution is found by
equating marginal benefit, v ′ (x), to marginal cost, p.

Bibliographic Note

One of the best treatments of the issues involved in measuring consumer surplus
can be found in Chapter 10 of Varian (1992). This is a good place to go to get
full details on the impact that income effects have on measures of consumer
welfare.
Quasi-linear utility allows us to be correct in using consumer surplus as a
measure of consumer welfare. But even if utility is not quasi-linear, the error
from using consumer surplus instead of the correct measures, compensating or
8 Lecture Note 1: Buyers and Demand

equivalent variation (see Chapter 10 of Varian), is quite small under assumptions

that are reasonable for most goods. See Willig (1976). Hence, as a general rule,
we can use consumer surplus as a welfare measure even when there’s no reason
to assume quasi-linear utility.

Firm Demand 1.2

Consider a firm that produces F (x) units of a good using inputs x. Let the
factor prices be p and let R(·) be the revenue function (i.e., what it takes in as
a function of the amount it sells). Then the firm maximizes

R F (x) − p · x . (1.6)

The first-order condition with respect to input xn is

∂F
R′ F (x) − pn = 0 . (1.7)
∂xn
Let x∗ (p) denote the set of factor demands, which is found by solving the set
of equations (1.7). Define the profit function as

π(p) = R F x∗ (p − p · x∗ (p) .

Utilizing the envelope theorem, it follows that

∂π
= −x∗n (pn |p−n ) . (1.8)
∂pn
Consequently, integrating (1.8) with respect to the price of the nth factor, we
have Z ∞ Z ∞
∂π(p, p−n )
− dp = x∗n (p|p−n )dp . (1.9)
pn ∂p n pn

The right-hand side of (1.9) is just the area to the left of the factor demand
curve that’s above price pn . Equivalently, it’s the area below the inverse factor
demand curve and above price pn . The left-hand side is π(pn , p−n )−π(∞, p−n ).
The term π(∞, p−n ) is the firm’s profit if it doesn’t use the nth factor (which
could be zero if production is impossible without the nth factor). Hence, the
left-hand side is the increment in profits that comes from going from being
unable to purchase the nth factor to being able to purchase it at price pn . This
establishes
Proposition 1.1 The area beneath the factor demand curve and above a given
price for that factor is the total net benefit that a firm enjoys from being able to
purchase the factor at that given price.
In other words—as we could with quasi-linear utility—we can use the “con-
sumer” surplus that the firm gets from purchasing a factor at a given price as
the value the firm places on having access to that factor at the given price.
1.3 Demand Aggregation 9

Observation 1.1 One might wonder why we have such a general result with
factor demand, but we didn’t with consumer demand. The answer is that with
factor demands there are no income effects. Income effects are what keep con-
sumer surplus from capturing the consumer’s net benefit from access to a good at
its prevailing price. Quasi-linear utility eliminates income effects, which allows
us to treat consumer surplus as the right measure of value or welfare.

Demand Aggregation 1.3

Typically, a seller sells to more than one buyer. For some forms of pricing it is
useful to know total demand as a function of price.
Consider two individuals. If, at a price of $3 per unit, individual one buys
4 units and individual two buys 7 units, then total or aggregate demand at $3
per unit is 11 units. More generally, if we have J buyers indexed by j, each
of whom has individual demand xj (p) as a function of price, p, then aggregate
PJ
demand is j=1 xj (p) ≡ X(p).
How does aggregate consumer surplus (i.e., the area beneath aggregate de-
mand and above price) relate to individual consumer surplus? To answer this,
observe that we get the same area under demand and above price whether we
integrate with respect to quantity or price. That is, if x(p) is a demand function
and p(x) is the corresponding inverse demand, then
Z x Z ∞

p(q) − p(x) dq = x(ρ)dρ .
0 p

Consequently, if CS(p) is aggregate consumer surplus and csj (p) is buyer j’s
consumer surplus, then
 
Z ∞ Z ∞ X J J Z ∞
X
CS(p) = X(q)dq =  xj (q) dq = xj (q)dq
p p j=1 j=1 p

J
X
= csj (p) ;
j=1

that is, we have:

Proposition 1.2 Aggregate consumer surplus is the sum of individual con-
sumer surplus.

Additional Topics 1.4

A Continuum of Consumers

In many modeling situations, it is convenient to imagine a continuum of con-

sumers (e.g., think of each consumer having a unique “name,” which is a real
10 Lecture Note 1: Buyers and Demand

number in the interval [0, 1]; and think of all names as being used). Rather
than thinking of the number of consumers—which would here be uncountably
infinite—we think about their measure; that is, being rather loose, a function
related to the length of the interval.
It might at first seem odd to model consumers as a continuum. One way to
think about it, however, is the following. Suppose there are J consumers. Each
consumer has demand
1 , if p ≤ v
x(p) = , (1.10)
0 , if p > v
where v is a number, the properties of which will be considered shortly. In other
words, a given consumer wants at most one unit and is willing to pay up to v
for it.
Assume, for each consumer, that v is a random draw from the interval [v0 , v1 ]
according to the distribution function F : R → [0, 1]. Assume the draws are
independent. Each consumer knows the realization of his or her v prior to
making his or her purchase decision.
In this case, each consumer’s expected demand is the probability that he or
she wants the good at the given price; that is, the probability that his or her v ≥
p. That probability is 1−F (p) ≡ Σ(p). The function Σ(·) is known in probability
theory as the survival function.6 Aggregate expected demand is, therefore,
JΣ(p) (recall the consumers’ valuations, v, are independently distributed).
Observe, mathematically, this demand function would be the equivalent of
assuming that there are a continuum of consumers living on the interval [v0 , v1 ],
each consumer corresponding to a fixed valuation, v. Assume further that the
measure of consumers on an the interval between v and v ′ is JF (v ′ ) − JF (v)
or, equivalently, JΣ(v) − JΣ(v ′ ). As before, consumers want at most one unit
and they are willing to pay at most their valuation. Aggregate demand at p is
the measure of consumers in [p, v1 ]; that is,

JΣ(p) − JΣ(v1 ) = JΣ(p) ,

where the equality follows because Σ(v1 ) = 0 (it is impossible to draw a v greater
than v1 ). In other words, the assumption of a continuum of consumers can be
considered shorthand for a model with a finite number of consumers, each of
whom has a demand that is stochastic from the seller’s perspective.
A possible objection is that assuming a continuum of consumers on, say,
[v0 , v1 ] with aggregate demand JΣ(p) is a deterministic specification, whereas
J consumers with random demand is a stochastic specification. In particular,
there is variance in realized demand with the latter, but not the former.7 In

6 The name has an actuarial origin. If the random variable in question is age at death,

then Σ(age) is the probability of surviving to at least that age. Admittedly, a more natural
mnemonic for the survival function would be S(·); S(p), however, is “reserved” in economics
for the supply function.
7 For example, if, J = 2, [v , v ] = [0, 1], and F (v) = v on [0, 1] (uniform distribution),
0 1
then realized demand at p ∈ (0, 1) is 0 with positive probability p2 , 1 with positive probability
2p(1 − p), and 2 with positive probability (1 − p)2 .
1.4 Additional Topics 11

many contexts, though, this is not important because other assumptions make
the seller risk neutral.8

Survival, Density, and Hazard Functions

Assume Σ(·) is a differentiable survival function. Let f (p) = −Σ′ (p). The func-
tion f (·) is the density function associated with the survival function Σ(·) (or,
equivalently, with the distribution function 1 − Σ(p) ≡ F (p)). In demography
or actuary science, an important concept is the death rate at a given age, which
is the probability someone that age will die within the year. Treating time con-
tinuously, the death rate can be seen as the instantaneous probability of dying
at time t conditional on having survived to t. (Why conditional? Because you
can’t die at time t unless you’ve lived to time t.) The unconditional probability
of dying at t is f (t), the probability of surviving to t is Σ(t), hence the death rate
is f (t)/Σ(t). Outside of demographic and actuarial circles, the ratio f (t)/Σ(t) is
known as the hazard rate. Let h(t) denote the hazard rate. A key relationship
between hazard rates and survival functions is

Lemma 1.1 Consider a survival function Σ(·) andcorresponding hazard rate

Rp
h(·). Let p0 = sup z Σ(z) = 1 .9 Then Σ(p) = exp − p0 h(z)dz .

Proof:
d log Σ(p) −f (p)
= = −h(p) .
dz Σ(p)
Solving the differential equation:
Z p

log Σ(p) = − h(z)dz + lim log Σ(0) .
p0 z↓p0
| {z }
=0

The result follows by exponentiating both sides.

Demand as a Survival Function

If demand at zero price, X(0), is finite, and if limp→∞ X(p) = 0, then any
demand function is a multiplicative scalar of a survival function; that is,

X(p) = X(0)Σ(p) ,

8 It would be incorrect to appeal to the law of large numbers and act as if J → ∞ means

realized demand tends to JΣ(p) according to some probability-theoretic convergence criterion.

If, however, there is a continuum of consumers with identical and independently distributed
demands, then it can be shown that realized demand is almost surely mean demand (see, e.g.,
Uhlig, 1996).
9 Note the convention that sup ∅ = −∞. So if the distribution is the normal, p = −∞. If,
0
instead, the distribution were uniform on [a, b], then sup z Σ(z) = 1 = a.
12 Lecture Note 1: Buyers and Demand

where Σ(p) ≡ X(p)/X(0). To see that Σ(·) is a survival function on R+ , observe

that Σ(0) = 1, limp→∞ Σ(p) = 0, and, because demand curves slope down, Σ(·)
in non-decreasing.
This connection between survival functions and demand suggests the follow-
ing:

Definition 1.1 The hazard rate associated with demand function X(·) is de-
fined as
X ′ (p)
hX (p) = − . (1.11)
X(p)
When the demand function is clear from context, the subscript X may be omit-
ted. In this context, hX (p) is the proportion of all units demanded at price p
that will vanish if the price is increased by an arbitrarily small amount; that is,
it’s the hazard (death) rate of sales that will vanish (die) if the price is increased.
You may recall that the price elasticity of demand, ǫ, is minus one times the
percentage change in demand per a one-percentage-point change in price.10 In
other words
∆X ∆p
ǫ = −1 × × 100% ÷ × 100% , (1.12)
X p
where ∆ denotes “change in.” If we pass to the continuum, we see that (1.12)
can be reëxpressed as

dX(p) p
ǫ=− × = phX (p) , (1.13)
dp X

where the last equality follows from (1.11). In other words, price elasticity places
a monetary value on the proportion of sales lost from a price increase.

Definition 1.2 A demand function, X : R+ → R+ is a generalized survival

function if there exists a function h(·) : [0, p̄) → R+ , where p̄ ≤ ∞, such that

(i) The integral Z p

h(z)dz (1.14)
0

exists (is defined) for all p ∈ [0, p̄);

(ii) The integral in (1.14) goes to infinity as p ↑ p̄; and

(iii) X(p) = 0 for p ≥ p̄ and there is some finite positive constant X0 such that
Z p
X(p) = X0 exp − h(z)dz (1.15)
0

for p < p̄.

10 Whether one multiplies or not by −1 is a matter of taste; some authors (including this

one on occasion) choose not to.

1.4 Additional Topics 13

The amount p̄ is known as the choke price: it is, as we will shortly see, the price
at which demand goes to zero (is choked off).
Given condition (i) of Definition 1.2, expression (1.15) entails that X(0) =
X0 < ∞. That is, if demand is a generalized survival function, demand at zero
is finite. Because the h(·) in Definition 1.2 is non-negative, it follows from (1.15)
that X(·) is non-increasing. It is, therefore, differentiable almost everywhere.
Where it is differentiable, it follows from the fundamental theorem of calculus
that Z p
′

X (p) = X0 exp − h(z)dz − h(p) = −h(p)X(p) .
0

Consequently, at any point of differentiability h(·) = hX (·), where the latter is

the hazard rate associated with X(·). Moreover, it follows that h(·) must be
uniquely defined almost everywhere.
For future reference, let’s consider some relations among demand, the asso-
ciated hazard rate, and price elasticity.

Lemma 1.2 Consider differentiable demands that are generalized survival func-
tions.
(i) For any such demand function, X(·),
Z p
ǫ(z)
X(p) = X(0) exp − dz ,
0 z
where ǫ(p) is the price elasticity of demand at price p.
(ii) Let X1 (·) and X2 (·) be such demand functions. Suppose X1 (p) = ζX2 (p)
for all p, where ζ is a positive constant. Let ǫi (·) be the price-elasticity
function associated with Xi (·). Then ǫ1 (p) = ǫ2 (p) for all p.

Exercise 1.4.1: Prove Lemma 1.2.

Exercise 1.4.2: Suppose X(p) = ξp−k , where k > 0 and ξ > 0. Verify by direct
calculation (i.e., using the formula ǫ = pX ′ (p)/X(p)) that this demand function has
a constant elasticity.
Exercise 1.4.3: Is the demand function X(p) = ξp−k , with k > 0 and ξ > 0, a
generalized survival function?

Lemma 1.3 Suppose limp→∞ pX(p) = 0.11 Consumer surplus at price p,

CS(p), satisfies Z ∞
CS(p) = (b − p) − X ′ (b) db . (1.16)
p

11 This is an innocuous assumption because, otherwise, consumer surplus would be infinite,

which is both unrealistic and not interesting; the latter because, then, welfare would always
be maximized (assuming finite costs) and there would, thus, be little to study.
14 Lecture Note 1: Buyers and Demand

Proof: Recall the definition of consumer surplus is area to the left of demand
and above price:
Z ∞
CS(p) = X(b)db
p
∞
Z ∞
= bX(b) − bX ′ (b)db (1.17)
p p
Z ∞
= −pX(p) − bX ′ (b)db
p
Z ∞ Z ∞
= pX ′ (b)db − bX ′ (b)db . (1.18)
p p

Expression (1.17) follows by integration by parts. Expression (1.18) follows from

the fundamental theorem of calculus.

When demand is a generalized survival function, the integral in (1.16) is an

expected value because we can rewrite it as
Z ∞
X ′ (b)
CS(p) = X(0) max{b − p, 0} − db (1.19)
0 X(0)

and −X ′ (b)/X(0) is a density. If we thought of the benefit enjoyed from each

unit of the good as a random variable drawn from the distribution 1−X(·)/X(0),
then the integral in (1.19) is the expected net benefit (surplus) a given unit will
yield a consumer (net because he or she purchases only if surplus is generated).
The amount X(0) can be thought of as the total number of units (given that at
most X(0) units trade). So the overall expression can be interpreted as expected
(or average) surplus per item times the total number of items.

Exercise 1.4.4: Prove that if limb→∞ bX(b) > 0, then consumer surplus at any price
p is infinite. Hints: Suppose limb→∞ bX(b) = L > 0. Fix an η ∈ (0, L). Show there is
a b̄ such that X(b) ≥ L−ηb
for all b ≥ b̄. Does
Z ∞
L−η
db
b̄ b
R∞
converge? Show the answer implies p
X(b)db does not converge (i.e., is infinite).

A function, g(·), is log concave if log g(·) is a concave function.

Lemma 1.4 If g(·) is a positive concave function, then it is log concave.12

12 A

stronger result can be established, namely that r s(·) is concave if both r(·) and s(·)
are concave and r(·) is non-decreasing. This requires more work than is warranted given what
we need.
1.4 Additional Topics 15

Proof: If g(·) were twice differentiable, then the result would follow trivially
using calculus (Exercise: do such a proof). More generally, let x0 and x1 be
two points in the domain of g(·) and define xλ = λx0 +(1−λ)x1 . The conclusion
follows, by the definition of concavity, if we can show

log g(xλ ) ≥ λ log g(x0 ) + (1 − λ) log g(x1 ) (1.20)

for all λ ∈ [0, 1]. Because log(·) is order preserving, (1.20) will hold if

g(xλ ) ≥ g(x0 )λ g(x1 )1−λ . (1.21)

Because g(·) is concave by assumption,

g(xλ ) ≥ λg(x0 ) + (1 − λ)g(x1 ) .

Expression (1.21) will, therefore, hold if

λg(x0 ) + (1 − λ)g(x1 ) ≥ g(x0 )λ g(x1 )1−λ . (1.22)

Expression (1.22) follows from Pólya’s generalization of the arithmetic mean-

13
geometric mean inequality (see, e.g., Steele, 2004, Pn Chapter 2), which states
that if a1 , . . . , an ∈ R+ , λ1 , . . . , λn ∈ [0, 1], and i=1 λi = 1, then
n
X n
Y
λi ai ≥ aλi i . (1.23)
i=1 i=1

Observe the converse of Lemma 1.4 need not hold. For instance, x2 is log
concave (2 log(x) is clearly concave in x), but x2 is not itself concave. In other
words, log-concavity is a weaker requirement than concavity. As we will see, log
concavity is often all we need for our analysis, so we gain a measure of generality
by assuming log-concavity rather than concavity.

13 Pólya’s
Pn
generalization is readily proved. Define A ≡ i=1 λi ai . Observe that x + 1 ≤ ex
ai
ai
(the former is tangent to the latter at x = 0 and the latter is convex). Hence, A
≤ e A −1 .
λ λi a i
Because both sides are positive, aAi i ≤ e A −λi . We therefore have
n
a i λi
Y n
Y λi a i
−λi
≤ e A

i=1
A i=1
P n λi a i
i −λi
=e A .

The last term simplifies to 1, so we have

Qn λ
i=1 ai i
Pn ≤ 1.
A i=1 λi

Because the denominator on the left is just A, the result follows.

16 Lecture Note 1: Buyers and Demand

The hazard rate is monotone if it is either everywhere non-increasing or

everywhere non-decreasing. The latter case is typically the more relevant (e.g.,
the death rate for adults increases with age) and is a property of many familiar
distributions (e.g., the uniform, the normal, the logistic, among others). When
a hazard rate is non-decreasing, we say it satisfies the monotone hazard rate
property. This is sometimes abbreviated as mhrp.

Lemma 1.5 A demand function is (strictly) log concave if and only if the cor-
responding hazard rate satisfies the (strict) monotone hazard rate property.

Proof: It is sufficient for X(·) to be log concave that the first derivative of
log X(·) , which is

d log X(p) X ′ (p)
= = −hX (p) ,
dp X(p)

be non-increasing (decreasing) in p. Clearly it will be if hX (·) is non-decreasing

(increasing). To prove necessity, assume X(·) is (strictly) log concave, then
−hX (·) must be non-increasing (decreasing), which is to say that h(·) is non-
decreasing (increasing).

Lemma 1.6 If demand is log concave, then so too is consumer expenditure

under linear pricing; that is, X(·) log concave implies pX(p) is log concave for
all p.

Proof: The result follows because log pX(p) ≡ log(p) + log X(p) and the
sum of concave functions is concave (recall log(·) is a concave function).

Exercise 1.4.5: Let f (·) and g(·) be concave functions. Prove that the function
defined as x 7→ f (x) + g(x) is concave.
Exercise 1.4.6: Suppose the hazard rate is a constant. Prove that pX(p) is, therefore,
everywhere strictly log concave.
Exercise 1.4.7: Prove that linear (affine) demand (e.g., X(p) = a − bp, a and b
positive constants) is log concave.
Exercise 1.4.8: Prove that if demand is log concave, then elasticity is increasing in
price (i.e., ǫ(·) is an increasing function).
Linear Tariffs 2
Consider a firm that sells all units of a given product or service at a constant
price per unit; that is, that deploys a linear tariff . If p is that price and it sells x
units, then its revenue is px. Use of a linear tariff is also referred to as engaging
in linear pricing or uniform pricing.
As in the previous lecture note, let X(p) denote aggregate demand for the
given product or service at price p and let P (·) denote the corresponding inverse
demand function. In other words,

X P (x) = x and P X(p) = p .
Recall this means that the maximum price at which the firm can sell all x units
is P (x). To avoid pathological and unrealistic cases in which the optimal price
tends toward either zero or infinity or in which profit is infinite, we assume
lim pX(p) = 0 ; lim pX(p) = 0 ; and X(p) < ∞ ∀ p > 0 . (A2.1)
p→0 p→∞

Exercise 2.0.1: Prove that assumption (A2.1) implies

lim P (x)x = 0 and lim P (x)x = 0 .
x→0 x→∞

Let C(x) denote the firm’s cost of producing x units. Let R(x) denote the
firm’s revenue from selling x units. Given that pricing is linear, this means
R(x) = xP (x). The firm’s profit is revenue minus cost, R(x) − C(x). The
profit-maximizing amount to sell maximizes this difference.
Lemma 2.1 Given assumption (A2.1), the firm’s profit under linear pricing is
bounded above.
Proof: Because cost is non-negative, it is sufficient to prove that revenue is
bounded above. Fix an η > 0. By assumption, there exist p0 and p1 , both
positive and finite, such that pX(p) < η for all p either less than p0 or greater
than p1 . Consider the interval [p0 , p1 ]. If the interval is empty (i.e., p1 < p0 ),
then we have that pX(p) < η for all prices, which means revenue is bounded.
Suppose the interval is not empty. Let X̂ = supp∈[p0 ,p1 ] X(p). By assumption
X̂ < ∞. Note, too, p1 X̂ < ∞. Let p be an arbitrary element of [p0 , p1 ]. Clearly,
pX(p) ≤ p1 X̂; that is, pX(p) is bounded above for all p ∈ [p0 , p1 ].

17
18 Lecture Note 2: Linear Tariffs

Lemma 2.2 Assume P (·) and C(·) are continuous on [0, ∞). Assume, too,
that (A2.1) holds. Then there exists a finite quantity x∗ that maximizes the
firm’s profit.

Proof: If R(x) − C(x) ≤ 0 for all x, then x = 0 maximizes profit given

that R(0) − C(0) = 0. (C(0) = 0 because the cost of not producing must be
zero given that an expense that is incurred even if production is zero is a sunk
expenditure and, thus, not a cost.) Alternatively, suppose a x̂ exists such that
0 < R(x̂) − C(x̂) ≡ η. From (A2.1), there exists an x̄ such that R(x) < η for all
x > x̄. Hence, x̂ yields greater profit than any x > x̄. Consequently, there is no
loss of generality in restricting attention to x ∈ [0, x̄]. That a maximum exists
then follows because a bounded continuous function on a closed interval must
have a maximum.1

The assumption that P (·) and C(·) are continuous on (0, ∞) is fairly innocuous
insofar as both are monotone functions (demand curves slope down and cost
functions are nondecreasing). For the same reason, they are also differentiable
almost everywhere and there is, again, little loss of generality in assuming they
are differentiable everywhere. Where this assumption might not be innocuous
is when we assume C(·) is continuous at 0 because this rules out there being
fixed (overhead) costs of production.2 Often there are fixed costs of production,
so the cost function is of the form

0 , if x = 0
C(x) = ,
c(x) + F , if x > 0

where F > 0 is the fixed cost. For the usual reasons c(0) = 0. But even if c(·)
is continuous, clearly C(·) is not continuous at zero. Fortunately, this is not
really a problem if c(·) is continuous: because F is a constant, if x maximizes
R(x) − C(x), then it also maximizes R(x) − c(x). Applying Lemma 2.2, using
c(·) instead of C(·), an x ∈ [0, ∞) exists, which maximizes R(x) − c(x). If that
x = 0, then we have

0 = R(0) − C(0) = R(0) − c(0) ≥ R(x̂) − c(x̂) > R(x̂) − C(x̂)

for all x̂ > 0; that is, x = 0 maximizes R(x) − C(x). Suppose that the x that
maximizes R(x) − c(x) is positive. Now either

R(x) − c(x) − F ≥ 0 or R(x) − c(x) − F < 0 .

1 Thisis a well-known result that can be found in almost any book on real analysis (see,
e.g., Benedetto and Czaja, 2009, Proposition 1.3.3, p. 12). The result is often attributed to
Weierstrass and is, thus, sometimes called Weierstrass’s Theorem. Reflecting Weierstrass’s
genius, a number of results in math are called Weierstrass’s Theorem, so there is always a
certain ambiguity in calling this result Weierstrass’s Theorem. For our purposes in this text,
though, it is fine for us to so label the result.
2 The terminology “fixed costs” is standard, but unfortunate insofar as “fixed” suggests

immutable and something that is immutable—unaffected by actions—cannot be a cost in the

economic sense.
19

If the former, then x maximizes R(x) − C(x). If the latter, then 0 is the maxi-
mizer. Either way, a maximizer and, thus, a maximum exists. We have estab-
lished:
Corollary 2.1 Assume P (·) and C(·) are continuous on (0, ∞). Assume, too,
that (A2.1) holds. Then there exists a finite quantity x∗ that maximizes the
firm’s profit.
In light of the previous lemmas and assuming inverse demand and cost are
differentiable, the profit-maximizing quantity is either zero or some positive
amount that satisfies the first-order condition:

R′ (x) − C ′ (x) = 0 ;

or, as it is typically written,

MR(x) = MC (x) ,

where MR denotes marginal revenue and MC denotes marginal cost. Note

MR(x) = MC (x) is a necessary condition, not a sufficient condition. Typically,
though, we put enough structure on the problem so that R′ (x) = C ′ (x) is also
sufficient. For example, a differentiable function f : R → R is a strictly pseudo-
concave function if, for all x1 and x2 , x1 6= x2 ,

f ′ (x1 ) × (x2 − x1 ) ≤ 0 ⇒ f (x2 ) < f (x1 ) . (2.1)

Because pseudo-concavity applies to differentiable functions only, we may take

the statement “f is (strictly) pseudo-concave” to imply f is differentiable.
Lemma 2.3 If profit is strictly pseudo-concave, then MR(x) = MC(x) is also
sufficient for x to maximize profit.
Proof: Follows immediately from (2.1) given zero times anything is zero, which
does not exceed zero (here, f (x) = R(x) − C(x)).

Exercise 2.0.2: Prove that if (i) profit is strictly pseudo-concave, (ii) x = 0 does not
maximize profit, and (iii) we assume (A2.1), then there is a unique profit-maximizing
quantity. Show, moreover, that this quantity satisfies MR = MC .

An assumption related to strict pseudo-concavity is

x∗ maximizes profit ⇒ MR(x) − MC (x) (x − x∗ ) < 0 , ∀x 6= x∗ . (2.2)

Exercise 2.0.3: Prove that if an x∗ that maximizes profit exists and profit is strictly
pseudo-concave, then (2.2) holds. Does (2.2) imply that profit is pseudo-concave?
20 Lecture Note 2: Linear Tariffs

It can readily be seen that (2.2) implies that profit is strictly quasi-concave.3

Exercise 2.0.4: Prove that if (i) there exists an x∗ ∈ (0, ∞) that solves maxx f (x);
and (ii) f ′ (x)(x − x∗ ) < 0 for all x 6= x∗ , then f is strictly quasi-concave and x∗ is its
unique maximizer.

Not all strictly quasi-concave profit functions, however, satisfy (2.2): if the profit
function had a point of inflection,4 then the inequality in (2.2) would fail to hold
at that point of inflection. Moreover, at the point of inflection we would have
MR = MC , but we would not be at the profit-maximizing quantity. Condition
(2.2) is, in this context, equivalent to the property Edlin and Hermalin (2000)
define as ideally quasi-concave.5

Exercise 2.0.5: Prove that if (2.2) holds, then x∗ is the unique profit-maximizing
quantity.
Exercise 2.0.6: Suppose R(·) is concave and C(·) is convex, at least one strictly.
Prove that (2.2) holds.

Substituting xP (x) for R(x), we see that marginal revenue is

MR(x) = P (x) + xP ′ (x) .

Because demand curves slope down, P ′ (x) < 0; hence, MR(x) < P (x), except
at x = 0 where MR(0) = P (0). See Figure 2.1.
The expression for marginal revenue might, at first, seem odd. Naı̈vely, one
might expect marginal revenue to equal the price received for the last unit sold.
But such a naı̈ve view ignores that, to sell an additional item (i.e., go from
X(p) to X(p) + 1), the firm must lower the price (i.e., recall, P (x + 1) < P (x)),

3 Recall a function f : Rn → R is strictly quasi-concave if, for any x and x′ in the domain

of f and any λ ∈ (0, 1),

f (x) ≥ f (x′ ) implies f λx + (1 − λ)x′ > f (x′ ) .

4 An inflection point, recall, is a value at which the first derivative of the function is zero,

but the point is neither a local maximum or minimum (e.g., if f (x) = x3 , then x = 0 is a
point of inflection). At a point of inflection, the function goes from being locally concave to
locally convex or the reverse.
5 Edlin and Hermalin define a differentiable function f : R → R to be ideally quasi-concave
if
f ′ (x0 ) = 0 ⇒ f ′ (x)(x − x0 ) < 0 , ∀x 6= x0 .
It is readily seen that any x0 such that f ′ (x0 ) = 0 is the unique maximizer of f (·).
2.1 Elasticity and Linear Pricing 21

price ($/unit)
MC

P (x∗M )

P (x)
MR
x∗M units

Figure 2.1: Relation between inverse demand, P (x), and marginal revenue,
MR, under linear pricing; and the determination of the profit-
maximizing quantity, x∗M , and price, P (x∗M ).

which affects all units sold. So marginal revenue has two components: The price
received on the marginal unit, P (x), less the revenue lost on the infra-marginal
units from having to lower the price, |xP ′ (x)| (i.e., the firm gets |P ′ (x)| less on
each of the x infra-marginal units).
Summary 2.1 Under linear pricing, the profit-maximizing quantity, x∗M , solves
MR(x) = P (x) + xP ′ (x) = MC(x) . (2.3)
And the monopoly price, p∗M , equals P (x∗M ). Because P ′ (x) < 0, expression
(2.3) reveals that p∗M > MC(x∗M ); that is, price is marked up over marginal
cost.

Exercise 2.0.7: Use (2.3) to show that, under linear pricing, the profit-maximizing
quantity, p∗M , solves
1
p∗M + X(p∗M ) = MC X(p∗M ) .
X ′ (p∗M )

Elasticity and Linear Pricing 2.1

Recall that the price elasticity of demand, ǫ, is minus one times the percentage
change in demand per a one-percentage-point change in price (see the discussion
22 Lecture Note 2: Linear Tariffs

on page 12). Observe we can rewrite the continuous formula for elasticity,
expression (1.13) as follows:

P (x)
ǫ=− , (2.4)
xP ′ (x)

which expresses elasticity as a function of quantity.

We know that the revenue from selling 0 units is 0. Under assumption (A2.1),
limx→∞ xP (x) = 0. In between these extremes, provided P (x) 6≡ 0, revenue is
positive. Hence, we know that revenue must increase over some range of output
and decrease over another. Revenue is increasing if and only if marginal revenue
is positive; that is, if and only if

P (x) + xP ′ (x) > 0 or xP ′ (x) > −P (x) .

Divide both sides by P (x) to get

xP ′ (x) 1
−1 < =− , (2.5)
P (x) ǫ

where the equality in (2.5) follows from (2.4). Multiplying both sides of (2.5)
by −ǫ (a negative quantity) we have that revenue is increasing if and only if

ǫ > 1. (2.6)

When ǫ satisfies (2.6), we say that demand is elastic. When demand is elastic,
Elasticity & revenue is increasing with units sold. If ǫ < 1, we say that demand is inelastic.
Revenue: Where Reversing the various inequalities, it follows that, when demand is inelastic,
demand is revenue is decreasing with units sold. The case where ǫ = 1 is called unit
elastic/inelastic/unit elasticity.
elastic, revenue is Recall that a firm produces the number of units that equates MR to MC .
increasing/ The latter is positive, which means that a profit-maximizing firm engaged in
decreasing/ linear pricing operates only on the elastic portion of its demand curve. This
unchanging in units makes intuitive sense: If it were on the inelastic portion, then, were it to produce
sold, respectively. less, it would both raise revenue and lower cost; that is, increase profit. Hence,
it can’t maximize profit operating on the inelastic portion of demand.

Summary 2.2 A profit-maximizing firm engaged in linear pricing operates on

the elastic portion of its demand curve.

Recall the first-order condition for profit maximization, equation (2.3). Re-
write it as
P (x) − MC (x) = −xP ′ (x)
and divide both sides by P (x) to obtain

P (x) − MC (x) xP ′ (x) 1

=− = , (2.7)
P (x) P (x) ǫ
2.2 Welfare Analysis 23

where the second equality follows from (2.4). Expression (2.7) is known as the
Lerner markup rule. In English, it says that the price markup over marginal
cost, P (x) − MC (x), as a proportion of the price is equal to 1/ǫ. Hence, the less
elastic is demand (i.e., as ǫ decreases towards 1), the greater the percentage of
the price that is a markup over cost. Obviously, the portion of the price that
is a markup over cost can’t be greater than the price itself, which again shows
that the firm must operate on the elastic portion of demand.
Recall that ǫ = ph(p), where h(·) is the hazard rate implied by the demand
curve (see expression (1.13) above). Consequently, an alternative expression of
the Lerner markup rule is
1
p − MC X(p) = . (2.8)
h(p)

We often assume that marginal cost is nondecreasing in quantity.

Because de-
mand curves slope down, the composite function −MC X(·) is a nondecreasing
function if marginal cost is nondecreasing. Hence, the lhs of (2.8) is increasing
in p if marginal cost is nondecreasing. If we assume that mhrp applies to the
demand function, then the rhs of (2.8) is decreasing. Given these two assump-
tions, there can be at most one price that satisfies (2.8), which is to say there
is at most one (positive) profit-maximizing quantity. If there is only one max-
imum, then the first-order condition MR = MC must be sufficient as well as
necessary. Recalling that MC nondecreasing is equivalent to C(·)’s being convex
and an increasing hazard rate is equivalent to demand’s being log concave, we
have established the following:

Proposition 2.1 Assume (A2.1). Assume too that demand is differentiable

and cost is differentiable on (0, ∞). If demand is log concave and cost is convex,
then there is at most one positive quantity that maximizes profit under linear
pricing. A necessary and sufficient condition for a quantity x∗ > 0 to uniquely
maximize profit is MR(x∗ ) = MC(x∗ ) and R(x∗ ) − C(x∗ ) > 0.

In what follows, we will refer to (A2.1), the assumption that demand is

differentiable and log concave, and the assumption that cost is convex and dif-
ferentiable on (0, ∞) as the standard assumptions of linear pricing.6 Generally,
unless there is a statement to the contrary, it should be understood that we’re
operating henceforth under the standard assumptions of linear pricing.

Welfare Analysis 2.2

As a preliminary, note that area is area. In particular, it doesn’t matter if
calculate aggregate consumer surplus as the area beneath the aggregate demand

6 The convexity and log concavity assumptions imply differentiability almost everywhere;

hence, the further loss in generality from assuming differentiability everywhere is rather min-
imal.
24 Lecture Note 2: Linear Tariffs

curve, X(·), from the prevailing price to infinity or if we calculate it as the area
beneath inverse aggregate demand and above the prevailing price from 0 to the
number of units consumed (recall the discussion on page 9); that is,
Z x
CS = P (z) − P (x) dz (2.9)
0

if x units trade at price P (x).

We also want to interpret the area beneath inverse
R x aggregate demand be-
tween 0 and the units traded; that is, the quantity 0 P (z)dz.

Lemma 2.4 If a total of x units trade at price P (x), where R x P (·) is inverse
aggregate demand, then total benefit realized by consumers is 0 P (z)dz; that is,
the area beneath inverse aggregate demand from 0 to the number of units traded.

Proof: Assume there are J consumers. We have that

x = x1 (p) + · · · + xJ (p) , (2.10)

where xj (·) is individual j’s demand function. Recall j’s benefit is

vj xj (p) = csj (p) + pxj (p)

(i.e., benefit equals consumer surplus plus expenditure). Aggregating across the
J consumers, aggregate benefit is

J
X Z x Z x

vj xj (p) = CS(p) + px = P (z) − P (x) dz + P (x)x = P (z)dz ,
j=1 0 0

where the first equality follows from (2.10) and Proposition 1.2, the second
equality from (2.9) and the fact that p = P (x), and the third from simplifying
the expression.

Another way to understand Lemma 2.4 is as follows. By construction, given

there are no income effects, there is some consumer who derives benefit equal to
approximately P (1) from the first unit (i.e., if total trade were limited to one
unit and price set at P (1), a unit would exchange hands and the buyer’s benefit
would just essentially P (1)).7 There is a consumer (possibly the same) who
derives benefit P (2) from the second unit and so forth. Hence, the total benefit
PN
of Rtrading N units is approximately n=1 P (n). Passing to the continuum, this
x
is 0 P (z)dz.
Total welfare is the sum of surpluses of the relevant actors (including firms).

7 The “approximately” and “essentially” arise because we’re considering a discrete inter-

pretation to a potentially continuous setting.

2.2 Welfare Analysis 25

In the context of linear pricing, total welfare is

Z x Z x
xP (x) − C(x) + (P (z) − P (x))dz = xP (x) − C(x) + P (z)dz − xP (x)
| {z } 0 0
profit | {z }
CS
Z x
= P (z)dz − C(x) . (2.11)
0
Observe, first, that neither the firm’s revenue, xP (x), nor the consumers’ expen-
diture, xP (x), appear in (2.11). This is the usual rule that monetary transfers
made among agents are irrelevant to the amount of total welfare. WelfareR is
determined by the allocation of the real good; that is, the benefit, P (z)dz,
that consumers obtain and the cost, C(x), that the producer incurs.
Next observe that the derivative of (2.11) is P (x) − MC (x). From (2.3)
on page 21, recall that P (x∗M ) > MC (x∗M ), where x∗M is the profit-maximizing
quantity under linear pricing. In other words, at the profit-maximizing quantity,
welfare could be increased were the firm to trade more units of the good. This
implies that linear pricing leads to too little output from the perspective of
maximizing welfare—if the firm produced more, welfare would increase. We
have established:
Proposition 2.2 Under linear pricing, the monopolist produces too little output
from the perspective of total welfare.
If we assume—as is generally reasonable given that demand slopes down—
that demand crosses marginal cost once from above, then the welfare-maximizing
quantity satisfies
P (x) − MC (x) = 0 . (2.12)
Let x∗W be the solution to (2.12). From Proposition 2.2, x∗W > x∗M .

Exercise 2.2.1: Prove that, under the standard assumptions for linear pricing, (2.12)
is a sufficient as well as necessary condition for the welfare-maximizing quantity.

What is the welfare loss from linear pricing? It is the amount of welfare
forgone because only x∗M units are traded rather than x∗W units:
Z x∗W ! Z x∗M !
∗ ∗
P (z)dz − C(xW ) − P (z)dz − C(xM )
0 0
Z x∗
W
= P (z)dz − (C(x∗W ) − C(x∗M ))
x∗
M
Z x∗
W
Z x∗
W
= P (z)dz − MC (z)dz
x∗
M x∗
M
Z x∗
W
= P (z) − MC (z) dz . (2.13)
x∗
M
26 Lecture Note 2: Linear Tariffs

price ($/unit)
MC

P (x)
MR
x∗M x∗W units

Figure 2.2: The deadweight loss from linear pricing is the shaded triangular
region.

The area in (2.13) is called the deadweight loss associated with linear pricing. It
is the area beneath the demand curve and above the marginal cost curve between
x∗M and x∗W . Because P (x) and MC (x) meet at x∗W , this area is triangular (see
Figure 2.2) and, thus, the area is often called the deadweight-loss triangle.
The existence of a deadweight-loss triangle is one reason why governments
and antitrust authorities typically seek to discourage monopolization of in-
dustries and, instead, seek to encourage competition. Competition tends to
drive price toward marginal cost, which causes output to approach the welfare-
maximizing quantity.
The welfare loss associated with linear pricing is a motive to change the
industry structure (i.e., encourage competition). It is also—at least from the
firm’s perspective—a motive to change the method of pricing. The deadweight
loss is, in a sense, money left on the table. As we will see, beginning in Chapter 3,
clever pricing schemes by the firm can often allow it to pick up some of this
money left on the table.

An Example

To help make all this more concrete, consider the following example. A monopoly
has cost function C(x) = 2x; that is, MC = 2. It faces inverse demand
P (x) = 100 − x.

Exercise 2.2.2: Verify that this example satisfies the standard assumptions for linear
pricing.
2.3 An Application 27

Marginal revenue under linear pricing is P (x)+xP ′ (x), which equals 100−x+x×
(−1) = 100 − 2x. Equating MR with MC yields 100 − 2x = 2; hence, x∗M = 49.
The profit-maximizing price is 100 − 49 = 51. Profit
R 49 is revenue minus cost; that
is, 51 × 49 − 2 × 49 = 2401. Consumer surplus is 0 (100 − t − 51)dt = 21 × 492 .
Total welfare, however, is maximized by equating price and marginal cost:
P (x) = 100 − x = 2 = MC . So x∗W = 98. Deadweight loss is, thus,
Z 98 98
1
2 )dt = 98t − t2
(100 − t − |{z} = 1200.5 .
49
| {z } 2 49
P (x) MC

Exercise 2.2.3: Prove that if inverse demand is an affine function (i.e., the function
for a line in units-price space), then marginal revenue is also affine with a slope that
is twice as steep as inverse demand.
Exercise 2.2.4: Prove that if inverse demand is P (x) = a − bx and MC = c, a
constant, then x∗M = a−c2b
and P (x∗M ) = a+c
2
. (Note: the smallest price at which
demand is zero is called the choke price. Hence, this result is sometimes summarized
as the profit-maximizing price with linear demand and constant marginal cost is the
average of the choke price and marginal cost.)
2
Exercise 2.2.5: Prove that profit under linear pricing is 1b a−c
2
under the assump-
tions of the previous exercise.
(a−c)2
Exercise 2.2.6: Prove that consumer surplus under linear pricing is 8b
under the
assumptions of exercise 2.2.4.
Exercise 2.2.7: Derive the general condition for deadweight loss for affine demand
and constant marginal cost (i.e., under the assumptions of exercise 2.2.4).

An Application 2.3
We often find linear pricing in situations that don’t immediately appear to be
linear-pricing situations. For example, suppose that a risk-neutral seller faces
a single buyer. Let the seller have single item to sell (e.g., an artwork). Let
the buyer’s value for this artwork be v. The buyer knows v, but the seller does
not. All the seller knows is that v is distributed according to the differential
distribution function F (·). That is, the probability that v ≤ v̂ is F (v̂). Assume
F ′ (·) > 0 on the support of v.8 Let the seller’s value for the good—her cost—be

8 The support of a distribution is, being somewhat rough, the set of values of the random

variable that can occur with positive probability. If the distribution is discrete, this definition
is precise. For instance, if the random variable Y equals 0 if a coin lands tails and equals 1 if
the coin lands heads, then the support is {0, 1}. For a non-discrete distribution—call it G(·)—
28 Lecture Note 2: Linear Tariffs

c. Assume F (c) < 1. Finally, assume the hazard function associated with F (·)
satisfies mhrp.
Suppose that the seller wishes to maximize her expected profit. Suppose,
too, that she makes a take-it-or-leave-it (tioli) offer to the buyer; that is, the
seller quotes a price, p, at which the buyer can purchase the good if he wishes.
If he doesn’t wish to purchase at that price, he walks away and there is no trade.
Clearly, the buyer buys if and only if p ≤ v; hence, the probability of a sale, x,
is given by the formula x = 1 − F (p). The use of “x” is intentional—we can
think of x as the (expected) quantity sold at price p. Note, too, that, because
the formula x = 1 − F (p) relates quantity sold to price charged, it is a demand
curve. Of course, 1 − F (p) is also a survival function, so we again see that
demand curves are effectively survival functions.
The seller’s expected cost is cx; that is, with probability x she forgoes pos-
session of an object that she values at c. Marginal cost is, therefore, c.
Utilizing a variant of the Lerner markup rule, expression (2.8), we have
1 − F (p)
p−c= . (2.14)
F ′ (p)
The seller’s profit-maximizing price is the price that solves (2.14).
For example, if v is distributed uniformly on [0, 1], then F (v) = v and
F ′ (v) = 1. Expression (2.14) becomes
1−p
p−c= .
1
Consequently, the profit-maximizing price is 1+c 2 . Note: Because the uniform
distribution is equivalent to linear demand and we have constant marginal cost,
we know the profit-maximizing price is the average of the choke price and
marginal cost (recall the exercises at the end of the last section); that is, here,
the average of 1 and c.
Note that there is a deadweight loss: Efficiency requires that the good change
hands whenever v > c; But given linear pricing, the good only changes hands
when
1+c
v≥ >c
2
(c, recall, is less than 1 because we assumed F (c) < 1). For instance, suppose
c = 1/2, then the profit-maximizing price is 3/4. So although the good should
change hands whenever v ≥ 1/2, half of these times it doesn’t.

a value y is in the support of G(·) if there exists no δ > 0 such that G(y + δ) − G(y − δ) = 0.
For example, let Y be a random variable and suppose for any y ∈ [0, 1/3), Pr{Y ≤ y} = 3y/2;
and for any y ∈ (2/3, 1], Pr{1/3 < Y ≤ y} = 3y/2 − 1. Then
 3

 y , if y < 31
 2
1
G(y) = 2
, if 31 ≤ y ≤ 23


 3
2
y − 21 , if 23 < y ≤ 1
and the support is [0, 1/3] ∪ [2/3, 1].
2.4 Pass-Through 29

Pass-Through 2.4
A comparative statics question worth asking is what is the consequence of a
shock, such as an increase the firm’s factor costs or the imposition of a sales
tax, on price, quantity traded, and welfare?

Basic Comparative Statics

Before launching into a specific comparative-statics analysis of linear pricing, it

is worth first establishing some general results for comparative-statics analysis.
Consider f : Rm × Rn → R. Suppose, for any y ∈ Rn , that the program

max f (x, y)
x∈Rm

has a solution. Let x̂ be the solution to that program for a particular vector y
and let x̂′ be the solution to that program for another vector y′ . Then, by the
definition of a maximum (i.e., there is nothing bigger), we have

f (x̂, y) ≥ f (x̂′ , y) and

(2.15)
f (x̂′ , y′ ) ≥ f (x̂, y′ ) .

This insight is often referred to as revealed preference. This term arises because
if a decision maker chooses x̂ when conditions are y, she reveals she prefers x̂
to all other x she could have chosen under conditions y, including an x she
might choose under different conditions (e.g., the x̂′ she would choose were the
conditions y′ ). Expressions such as (2.15) are often said to “follow by revealed
preference” and this line of argumentation is often called a “revealed-preference
argument.”
Let f : R2 → R. If, for any x > x′ and y > y ′ , we have

f (x, y) − f (x′ , y) > f (x, y ′ ) − f (x′ , y ′ ) , (2.16)

then we say that f exhibits increasing differences in y. Expression (2.16) can

be read as saying that the greater is y, the bigger will be the change in f when
the x term is increased. If f is differentiable in its first argument, then (2.16)
implies
∂f (x, y) ∂f (x, y ′ )
> (2.17)
∂x ∂x
almost everywhere if y > y ′ (proof: divide both sides of (2.16) by x − x′ and
take the limit as x′ → x). Expression (2.17) states that the marginal effect of x
is greater for all x the greater is y; that is, the marginal effect as a function of
x (i.e., the derivative ∂f (x, y)/∂x) is shifted up by an increase in y. Finally, if
∂f (x, y)/∂x is differentiable in y, expression (2.17) implies

∂ 2 f (x, y)
>0 (2.18)
∂y∂x
30 Lecture Note 2: Linear Tariffs

almost everywhere (proof: move ∂f (x, y ′ )/∂x to the lhs of (2.17), divide by
y − y ′ , and take the limit as y ′ → y). This last expression is sometimes called
a cross-partial condition. When f is sufficiently differentiable, the cross-partial
condition is also sufficient for f to exhibit increasing differences:

Lemma 2.5 Suppose f : (x, x̄) × (y, ȳ) → R is at least twice continuously
differentiable in each of its arguments.9 If f satisfies the cross-partial condition
(2.18), then f exhibits increasing differences.

Proof: Let x > x′ and y > y ′ . From (2.18), it follows that

Z yZ x 2
∂ f (w, z)
dwdz > 0 .
y ′ x ′ ∂y∂x

Calculating the inner integral, it follows that

Z y
∂f (x, z) ∂f (x′ , z)
− dz > 0 .
y′ ∂y ∂y

Calculating the last integral, it follows that

f (x, y) − f (x′ , y) − f (x, y ′ ) − f (x′ , y ′ ) > 0 ,

which clearly implies (2.16).

Exercise 2.4.1: Suppose the function g : R2 → R exhibits decreasing differences;

that is, for any x > x′ and y > y ′ , we have

f (x, y) − f (x′ , y) < f (x, y ′ ) − f (x′ , y ′ ) . (2.19)

If the cross-partial derivative of f exists for all x and y, what is its sign?

Increasing (and decreasing) differences play a huge role in comparative-

statics analysis:

Theorem 2.1 Let f : R2 → R. Suppose f exhibits increasing differences with

respect to its second argument. Then if x̂ solves maxx f (x, y) and x̂′ solves
maxx f (x, y ′ ), where y > y ′ , then x̂ ≥ x̂′ .

Proof: Suppose, contrary to the theorem, that x̂ < x̂′ . By revealed preference,
we have
f (x̂, y) ≥ f (x̂′ , y) and f (x̂′ , y ′ ) ≥ f (x̂, y ′ ) .

9 Note (x, x̄) and (y, ȳ) denote intervals. In other words, the domain is a rectangle in which

each side is parallel to one axis and perpendicular to the other. The lower-left corner is (x, y)
and the upper-right corner is (x̄, ȳ). This condition ensures we don’t integrate across points
not in the domain of f .
2.4 Pass-Through 31

Those expressions can be combined as

f (x̂′ , y) − f (x̂, y) ≤ 0 ≤ f (x̂′ , y ′ ) − f (x̂, y ′ ) .

But if x̂′ > x̂, then this last expression contradicts the fact that f exhibits
increasing differences with respect to its second argument (y recall is greater
than y ′ ). Reductio ad absurdum, we can conclude that x̂ ≥ x̂′ .

In words, Theorem 2.1 states that if increasing y raises the marginal return
from x (i.e., f exhibits increasing differences in y), then the value of x that
maximizes f is nondecreasing in y.
In some contexts we want to be able to say definitively that the maximizer
is increasing in the second argument. This is relatively straightforward with
differentiable functions:
Theorem 2.2 Let X be a closed and bounded subset of R and let f : X × R →
R be at least twice differentiable in both arguments. Suppose, too, that f satisfies
the cross-partial condition (2.18). Then if x̂ solves maxx f (x, y) and x̂′ solves
maxx f (x, y ′ ), where y > y ′ , and at least one of x̂′ and x̂ is an interior solution,
then x̂ > x̂′ .
Proof: From Lemma 2.5 and Theorem 2.1, it is sufficient to show that x̂ 6= x̂′ .
Suppose x̂′ is an interior solution. It follows that
∂f (x̂′ , y ′ )
=0
∂x
given that x̂′ is in the interior of X . If x̂ = x̂′ , then
∂f (x̂, y ′ )
=0 (2.20)
∂x
and x̂ is in the interior of X . By assumption, it follows that
Z y 2
∂ f (x̂, z) ∂f (x̂, y) ∂f (x̂, y ′ ) ∂f (x̂, y)
0< dz = − = , (2.21)
y ′ ∂x∂y ∂x ∂x ∂x

where the last equality follows from (2.20). But ∂f (x̂, y)/∂x 6= 0 contradicts the
necessary first-order condition for an interior point to maximize f (·, y). Reduc-
tio ad absurdum, we can conclude that x̂ 6= x̂′ . The proof when x̂ is an interior
solution is similar and left to the reader.

Figure 2.3 illustrates the logic behind Theorem 2.2. The figure also shows
why we require at least one solution being an interior solution: if, for instance,
the values of x were restricted to lie to the left of the dotted line in Figure 2.3,
then the optimal x would be at that dotted line for both y ′ and y.

Exercise 2.4.2: Suppose f : R2 → R exhibits decreasing differences (see exercise 2.4.1

above). Prove the following theorem:
32 Lecture Note 2: Linear Tariffs

x̂′

0 x

∂f (·, y ′ )/∂x ∂f (·, y)/∂x

x̂

Figure 2.3: Illustration of the logic behind Theorem 2.2 (y > y ′ ).

Theorem 2.3 Let f : R2 → R. Suppose f exhibits decreasing differences with respect

to its second argument. Then if x̂ solves maxx f (x, y) and x̂′ solves maxx f (x, y ′ ),
where y > y ′ , then x̂ ≤ x̂′ .

Exercise 2.4.3: Prove the following theorem:

Theorem 2.4 Let X be a closed and bounded subset of R and let f : X × R → R be

at least twice differentiable in both arguments. Suppose, too, that f satisfies

∂ 2 f (x, y)
<0
∂y∂x

for all x and y. Then if x̂ solves maxx f (x, y) and x̂′ solves maxx f (x, y ′ ), where
y > y ′ , and at least one of x̂′ and x̂ is an interior solution, then x̂ < x̂′ .

A cost shock

Suppose the firm’s cost of producing x is C(x, ω), where ω ∈ R is some parameter
reflecting some condition or state relevant to the firm’s costs (e.g., ω is the price
of a necessary input). Suppose cost exhibits increasing differences in ω; that is,
if x > x′ and ω > ω ′ , then

C(x, ω) − C(x′ , ω) > C(x, ω ′ ) − C(x′ , ω ′ ) . (2.22)

As previously noted, if C(·, ·) is twice differentiable in its arguments, then this

condition is equivalent to
∂ 2 C(x, ω)
> 0; (2.23)
∂ω∂x
2.4 Pass-Through 33

that is, to the assumption that marginal cost is increasing in ω. Interpret a

change in ω as a cost shock. Observe that if C(x, ω) exhibits increasing dif-
ferences, then the function −C exhibits decreasing differences (proof: multiply
(2.22) by −1 and suitably rearrange terms).

Proposition 2.3 Given expression (2.22), if ω > ω ′ , then the amount the firm
sells when the state is ω is not greater than the amount it sells when the state
is ω ′ . If all functions are at least twice differentiable and a positive amount is
sold when the state is ω ′ , then that amount is strictly greater than the amount
sold when the state is ω. (In this latter case, the claim is that dx∗M /dω < 0.)

Proof: Profit is −C(x, y) + R(x), which exhibits decreasing differences in y.

That the profit-maximizing quantity is no greater when y = ω than when y = ω ′
follows from Theorem 2.3. That, in the latter case, dx∗M /dω < 0 follows from
Theorem 2.4.

Proposition 2.3 implies, because demand curves slope down, that a upward
cost shock causes an increase in the price.

Corollary 2.2 Given expression (2.23), the profit-maximizing price is increas-

ing in ω (unless the firm does best to shutdown).

What about welfare? Welfare, recall, is

Z x∗
M
P (x)dx − C(x∗M , ω) .
0

The derivative with respect to ω is

dx∗ ∂C(x∗M , ω)
P (x∗M ) − MC (x∗M , ω) × M − , (2.24)
| {z } | dω
{z } | ∂ω
{z }
>0
<0 ?

where the first sign follows because output under linear pricing is less than the
welfare-maximizing amount; and the second sign follows from Proposition 2.3.
The last term is ambiguous because, although an increase in ω increases variable
costs, it is possible that it is also reducing fixed costs. If, however, ω affects
variable cost only—as would be the case, for example, if ω were the price of
some component of the final good—then (2.24) implies that a cost shock would
be welfare reducing.
Although it is possible that a cost shock can have an ambiguous effect on
welfare, any shock that raises price must make buyers worse off; that is, con-
sumer surplus must fall with ω. To see this, recall that we can express consumer
surplus as
Z ∞
− (b − p)X ′ (b)db (2.25)
p
34 Lecture Note 2: Linear Tariffs

(see Lemma 1.3). The derivative of (2.25) with respect to p is, applying Leibniz’s
rule,10 Z ∞
X ′ (b)db < 0 ,
p

where the inequality follows because demand curves slope down. To summarize:

Proposition 2.4 If a cost shock raises marginal cost (i.e., expression (2.23)
holds), then equilibrium consumer surplus falls in response to a cost shock.

A final question we might ask is whether a change in ω affects marginal

cost more or less than it affects the profit-maximizing price? To be precise,
does the markup, as a percentage of price, increase or decrease as ω increases?
From the Lerner markup rule (2.7), we know that the markup as a proportion
of price equals 1/ǫ, where ǫ is the elasticity. Given that x∗M is decreasing in ω,
our question boils down to whether elasticity is increasing or decreasing with
Incidence of a
output. If demand is becoming more elastic as output falls (as would be the
Cost Shock: If
case if demand were affine), then the rightmost term of the Lerner markup rule
demand is more
(2.7) is getting smaller, which means the proportion of price that is a markup
elastic at lower
must be falling as ω increases. In other words, if demand is becoming more
quantities, then an
elastic as quantity falls, then an m% increase in marginal cost yields less than
increase in marginal
an m% increase in price.
cost is not fully
reflected in price.
Exercise 2.4.4: Prove that an increase in ω causes marginal cost to increase by a
greater percentage than the profit-maximizing price if inverse demand is log concave.
(Assuming, of course, the cost shock doesn’t cause the firm to shutdown).

Sales Tax

There are two kinds of sales taxes. One is an ad valorem tax, which means the
tax is based on the price; the other is an excise tax, which means the tax is
based on the quantity (i.e., $k per unit). Because an excise tax is equivalent to
a cost shock, we’ll consider only ad valorem taxes here.

Exercise 2.4.5: Why is an excise tax equivalent to a cost shock?

If the tax rate is τ , then the consumer pays p(1 + τ ) if the posted price is p.
The firm gets p and the government
gets pτ . Observe that, at a price of p, the
firm’s demand is X p(1 + τ ) because consumers care about the amount they

10 Recall that Leibniz’s rule tells us that

Z β(z) Z β(z)
d ∂g(w, z)
g(w, z)dw = g β(z), z β ′ (z) − g α(z), z α′ (z) + dw ,
dz α(z) α(z) ∂z

where α(·), β(·), and g(w, ·) are all differentiable functions.

2.4 Pass-Through 35

pay, not the price per se. Inverting, we have P (x) = p(1 + τ ); hence, from the
firm’s perspective its inverse demand curve is

P (x)
P̃ (x) = . (2.26)
1+τ
The firm’s profit is

1
xP̃ (x) − C(x) = xP (x) − C(x) .
1+τ

Using the envelope theorem, it is immediate that the firm’s equilibrium (maxi-
mum) profit is falling in τ .
Observe that the cross-derivative of the firm’s profit with respect to x and
τ is negative; that is, the firm’s marginal profit with respect to output is falling
in τ . From Theorem 2.4, the firm sells less the greater is τ . To summarize our
results to this point:

Proposition 2.5 As the rate of the ad valorem tax increases, the firm’s profits
and output both fall (assuming they are not both zero).

Proposition 2.5 explains why businesses oppose increases in sales taxes. Even
though the statutory incidence could be on the consumers (they physically pay
the tax), businesses lose from the tax.
The posted price is P̃ (x∗M ). How does that vary with the tax rate? Because
demand curves slope down and the firm produces less, the numerator of (2.26)
increases as τ increases. The denominator, however, is also increasing. Conse-
quently, a definitive answer is not possible in general. We can, however, derive
conditions under which a definitive answer is possible.

Lemma 2.6 Suppose that marginal cost is a constant, c. Suppose demand,

X(·), is such that

X p(1 + τ )
ψ(p|τ, τ̃ ) ≡
X p(1 + τ̃ )
is monotone in p for any ad valorem tax rates τ and τ̃ , τ < τ̃ . If ψ(·|τ, τ̃ ) is
increasing, then the posted price decreases with the tax rate; if it is decreasing,
then the posted price increases with the tax rate.

Proof: Let p and p̃ be the profit-maximizing posted prices when the tax rates
are τ and τ̃ , respectively. By revealed preference, we have

X p̃(1 + τ̃ ) (p̃ − c) ≥ X p(1 + τ̃ ) (p − c) (2.27)

and

X p(1 + τ ) (p − c) ≥ X p̃(1 + τ ) (p̃ − c) . (2.28)
36 Lecture Note 2: Linear Tariffs

Expressions (2.27) and (2.28) can be combined as

X p̃(1 + τ̃ ) p−c X p̃(1 + τ )
≥ ≥ . (2.29)
X p(1 + τ̃ ) p̃ − c X p(1 + τ )

Expression (2.29) holds if and only if

X p(1 + τ ) X p̃(1 + τ )
≥ . (2.30)
X p(1 + τ̃ ) X p̃(1 + τ̃ )

From (2.30), we see that p is less than or greater than p̃ whenever the function

X z(1 + τ )

X z(1 + τ̃ )

is a decreasing or increasing function of z, respectively.

For example, if demand is affine, X(p) = a − bp, then simple calculations reveal
that ψ ′ (·|τ, τ̃ ) has the same sign as ab(τ̃ − τ ) > 0; hence, if demand is affine and
marginal cost is constant, the posted price is falling as the tax rate increases.
We also have
Proposition 2.6 Suppose that marginal cost is a constant, c. If demand, X(·),
is log-concave, then an increase in the ad valorem tax rate causes the posted price
to fall.
Proof: In light of Lemma 2.6, we need to show that ψ(·|τ, τ̃ ) is increasing
when τ < τ̃ . Observe that
!
∂ 2 log X p(1 + τ ) X ′ p(1 + τ ) ∂ X ′ p(1 + τ )
= +(1+τ ) < 0 , (2.31)
∂τ ∂p X p(1 + τ ) ∂τ X p(1 + τ )

where the sign follows because the first term on the right-hand side of the
equation is negative because demand curves slope down and the second term
on the right-hand side of the equation is negative because X(·) is log concave.11

Expression (2.31) implies that log X p(1 + τ ) exhibits decreasing differences
in p and τ ; that is, if p > p̃ and τ < τ̃ we have

log X p(1+τ ) −log X p̃(1+τ ) > log X p(1+ τ̃ ) −log X p̃(1+ τ̃ ) .

Rearranging, this expression yields

log X p(1+τ ) −log X p(1+ τ̃ ) > log X p̃(1+τ ) −log X p̃(1+ τ̃ ) .

11 Exercise: Why does f (·) log concave imply that f ′ (z)/f (z) is a decreasing function of
z?
2.4 Pass-Through 37

Hence,
X p(1 + τ ) X p̃(1 + τ )
>
X p(1 + τ̃ ) X p̃(1 + τ̃ )
Because p, p̃, τ , and τ̃ were arbitrary, this implies ψ(·|τ, τ̃ ) is increasing for all
τ and τ̃ , τ < τ̃ .

Regardless of what happens to the posted price, the amount the consumers
pay goes up with the tax rate: The consumers pay P (x∗M ), x∗M falls as the tax
rate rises, and demand curves slope down. Observe, too, that the consumers’
demand curve—that is, their marginal-benefit schedule—remains P (·). Given
that they consume less as the tax rate rises, it follows that their consumer
surplus must fall as the tax rate rises. To summarize:

Proposition 2.7 As the rate of the ad valorem tax increases, the total amount
consumers pay per unit goes up and their consumer surplus goes down.

Given that the firm’s profits and consumers’ surplus fall as the tax rate rises,
it must be that their collective welfare is falling as the tax rate rises. This is not
to say, however, that overall welfare is falling because the government presum-
ably uses the revenue it raises, τ P (x∗M )x∗M , for some purpose. Without knowing
what that purpose is, it is impossible to say what happens to overall welfare.
One question we can answer, however, is whether the firm and consumers’ lost
welfare is greater than the revenue taken in by the government. The answer is
yes, as the following analysis shows. Total value created (i.e., consumer surplus,
firm profit, and government revenue) is
Z x∗
M
V ≡ P (x) − C ′ (x))dx .
0

Hence, dV /dx∗M = P (x∗M ) − C ′ (x∗M ) > 0 (the demand curve lies above the
marginal-cost curve). Consequently, a reduction in x∗M , as will result from
increasing the tax rate, reduces total value. Given total value is reduced, it
must be that tax revenue is less than the losses suffered by consumers and the
firm. We have proved:

Proposition 2.8 An increase in the ad valorem tax rate reduces total value
created.
38 Lecture Note 2: Linear Tariffs
First-degree Price
Discrimination 3
We saw in Section 2.2 that linear pricing “leaves money on the table,” in the
sense that there are gains to trade—the deadweight loss—that are not realized.
There is money to be made if the number of units traded can be increased from
x∗M to x∗W .
Why has money been left on the table? The answer is that trade bene-
fits both buyers and seller. The seller profits to the extent that the revenue
received exceeds cost and the buyers profit to the extent that their benefit en-
joyed exceeds their expenditure (cost). The seller, however, does not consider
the positive externality she creates for the buyers by selling them goods. The
fact that their marginal benefit schedule (i.e., inverse demand) lies above their
marginal cost (i.e., the price the seller charges) is irrelevant to the seller insofar
as she doesn’t capture any of this gain enjoyed by the buyers. Consequently,
she underprovides the good. This is the usual problem with positive externali-
ties: The decision maker doesn’t internalize the benefits others derive from her
action, so she does too little of it from a social perspective. In contrast, were
the action decided by a social planner seeking to maximize social welfare, then
more of the action would be taken because the social planner does consider the
externalities created. The cure to the positive externalities problem is to change
the decision maker’s incentives so she effectively faces a decision problem that
replicates the social planner’s problem.
One way to make the seller internalize the externality is to give her the
social benefit of each unit sold. Recall the marginal benefit of the the xth unit
is P (x). So let the seller get P (1) if she sells one unit, P (1) + P (2) if she sells
two, P (1) + P (2)R+ P (3) if she sells three, and so forth. Given that her revenue
x
from x units is 0 P (z)dz, her marginal revenue schedule is P (x). Equating
marginal revenue to marginal cost, she produces x∗W , the welfare-maximizing
quantity.
In general, allowing the seller to vary price unit by unit, so as to march
down the demand curve, is impractical. But, as we will see, there are ways for
the seller to effectively duplicate marching down the demand curve. When the
seller can march down the demand curve or otherwise capture all the surplus,
she’s said to be engaging in first-degree price discrimination. This is sometimes
called perfect price discrimination.

39
40 Lecture Note 3: First-degree Price Discrimination

Two-Part Tariffs 3.1

Consider a seller who faces a single buyer with inverse demand p(x). Let the
seller offer a two-part tariff : The buyer pays as follows:

0 , if x = 0
T (x) = , (3.1)
px + f if x > 0

where p is price per unit and f is the entry fee, the amount the buyer must
pay to have access to any units. The scheme in (3.1) is called a two-part tariff
because there are two parts to what the buyer pays (the tariff), the unit price
and the entry fee.
The buyer will buy only if f is not set so high that he loses all his consumer
surplus. That is, he buys provided
Z x Z x
f≤ (p(z) − p(x))dz = p(z)dz − xp(x) . (3.2)
0 0

Constraints like (3.2) are known as participation constraints or individual-

rationality (ir) constraints. These constraints often arise in pricing schemes
or other mechanism design. They reflect that, because participation in the
scheme or mechanism is voluntary, it must be induced.
The seller’s problem is to choose x (effectively, p) and f to maximize profit
subject to (3.2); that is, maximize

f + xp(x) − C(x) (3.3)

subject to (3.2). Observe that (3.2) must bind: If it didn’t, then the seller could
raise f slightly, keeping x fixed, thereby increasing her profits without violating
the constraint. Note this means that the entry fee is set equal to the consumer
surplus that the consumer receives. Because (3.2) is binding, we can substitute
it into (3.3) to obtain the unconstrained problem:
Z x
max p(z)dz − xp(x) + xp(x) − C(x) .
x 0

The first-order condition is p(x) = MC (x); that is, the profit-maximizing quan-
tity is the welfare-maximizing quantity. The unit price is p(x∗W ) and the entry
R x∗
fee is 0 W p(t)dt − x∗W p(x∗W ).

Proposition 3.1 A seller who sells to a single buyer with known demand does
best to offer a two-part tariff with the unit price set to equate demand and
marginal cost and the entry fee set equal to the buyer’s consumer surplus at that
unit price. Moreover, this solution maximizes welfare.

Of course, a seller rarely faces a single buyer. If, however, the buyers all
have the same demand, then a two-part tariff will also achieve efficiency and
3.1 Two-Part Tariffs 41

allow the seller to achieve the maximum possible profits. Let there be J buyers
all of whom are assumed to have the same demand curve. As before, let P (·)
denote aggregate inverse demand. The seller’s problem in designing the optimal
two-part tariff is
max Jf + xP (x) − C(x) (3.4)
f,x

subject to consumer participation,

f ≤ csj P (x) , (3.5)

where csj (p) denotes the jth buyer’s consumer surplus at price p. Because the
buyers are assumed to have identical demand, the subscript j is superfluous
and constraint (3.5) is either satisfied for all buyers or it is satisfied for no
buyer. As before, (3.5) must bind, otherwise the seller could profitably raise f .
Substituting the constraint into (3.4), we have

max J × cs P (x) + xP (x) − C(x) ,
x

which, because aggregate consumer surplus is the sum of the individual surpluses
(recall Proposition 1.2 on page 9), can be rewritten as
Z x
max P (z)dz − xP (x) +xP (x) − C(x) .
x
|0 {z }
aggregate CS

The solution is x∗W . Hence, the unit price is P (x∗W ) and the entry fee, f , is
Z x∗W !
1 ∗ ∗
P (z)dz − xW P (xW ) .
J 0

Proposition 3.2 A seller who sells to J buyers, all with identical demands,
does best to offer a two-part tariff with the unit price set to equate demand and
marginal cost and the entry fee set equal to 1/Jth of aggregate consumer surplus
at that unit price. This maximizes social welfare and allows the seller to capture
all of social welfare.
We see many examples of two-part tariffs in real life. A classic example is
an amusement park that charges an entry fee and a per-ride price (the latter,
sometimes, being set to zero). Another example is a price for a machine (e.g.,
a Polaroid instant camera or a punchcard sorting machine), which is a form of
entry fee, and a price for an essential input (e.g., instant film or punchcards),
which is a form of per-unit price.1 Because, in many instances, the per-unit price
1 Such schemes can also be seen as a way of providing consumer credit: the price of the

capital good (e.g., camera or machine) is set lower than the profit-maximizing price (absent
liquidity constraints), with the difference being essentially a loan to the consumer that he
repays by paying more than the profit-maximizing price (absent a repayment motive) for the
ancillary good (e.g., film or punchcards).
42 Lecture Note 3: First-degree Price Discrimination

is set to zero, some two-part tariffs might not be immediately obvious (e.g., an
annual service fee that allows unlimited “free” service calls, a telephone calling
plan in which the user pays so much per month for unlimited “free” phone calls,
Packaging: A or amusement park that allows unlimited rides with paid admission).
disguised two-part Packaging is another way to design a two-part tariff. For instance, a grocery
tariff. story could create a two-part tariff in the following way. Suppose that, rather
than being sold in packages, sugar were kept in a large bin and customers could
purchase as much or as little as they liked (e.g., like fruit at most groceries or
as is actually done at “natural” groceries). Suppose that, under the optimal
two-part tariff, each consumer would buy x pounds, which would yield him
surplus of cs, which would be captured by the store using an entry fee of f = cs.
Alternatively, but equivalently, the grocery could package sugar. Each bag of
sugar would have x pounds and would cost px + cs per bag. Each consumer
would face the binary decision of whether to buy 0 pounds or x pounds. Each
consumer’s total benefit from x pounds is px + cs, so each would just be willing
to pay px + cs for the package of sugar. Because the entry fee is paid on every
x-pound bag, the grocery has devised a (disguised) two-part tariff that is also
arbitrage-proof. In other words, packaging—taking away consumers ability to
buy as much or as little as they wish—can represent an arbitrage-proof way of
employing a two-part tariff.

The Two-Instruments Principle

When the seller was limited to just one price parameter, p —that is, engaged in
linear pricing—she made less money than when she controlled two parameters,
p and f . One way to explain this is that a two-part tariff allows the seller to
face the social planner’s problem of maximizing welfare and, moreover, capture
all welfare. Because society can do no better than maximize welfare and the
seller can do no better than capture all of social welfare, she can’t do better
than a two-part tariff in this context.
But this begs the question of why she couldn’t do as well with a single price
parameter. Certainly, she could have maximized social welfare; all she needed
to do was set P (x) = MC (x). But the problem with that solution is there is
no way for her to capture all the surplus she generates. If she had an entry fee,
then she could use this to capture the surplus; but with linear pricing we’ve
forbidden her that instrument.
The problem with using just the unit price is that we’re asking one instru-
ment to do two jobs. One is to determine allocation. The other is to capture
surplus for the seller. Only the first has anything to do with efficiency, so the
fact that the seller uses it for a second purpose is clearly going to lead to a
distortion. If we give the seller a second instrument, the entry fee, then she has
two instruments for the two jobs and she can “assign” each job an instrument.
This is a fairly general idea—efficiency is improved by giving the mechanism
designer more instruments—call this the two-instruments principle.
3.1 Two-Part Tariffs 43

f
{ y-

Y(x)
x
x*

Figure 3.1: A general analysis of a two-part tariff.

Two-Part Tariffs without Apology

It might seem that the analysis of two-part tariffs is dependent on our assump-
tion of quasi-linear utility. In fact, this is not the case. To see this, consider a
single consumer with utility u(x, y). Normalize the price of y to 1. Assume the
individual has income I. Define Y (x) to be the indifference curve that passes
through the bundle (0, I); that is, the bundle in which the consumer purchases
only the y-good. See Figure 3.1. Assume MC = c.
Consider the seller of the x good. If she imposes a two-part tariff, then she
transforms the consumer’s budget constraint to be the union of the vertical line
segment {(0, y)|I − f ≤ y ≤ I} and the line y = (I − f ) − px, x > 0. If we
define ȳ = I − f , then this budget constraint is the thick dark curve shown
in Figure 3.1. Given that the consumer can always opt to purchase none of
the x good, the consumer can’t be put below the indifference curve through
(0, I); that is, below Y (x). For a given p, the seller increases profit by raising
f , the entry fee. Hence, the seller’s goal is to set f so that this kinked budget
constraint is just tangent to the indifference Y (x). This condition is illustrated
in Figure 3.1, where the kinked budget constraint and Y (x) are tangent at x∗ .
If the curves are tangent at x∗ , then

−p = Y ′ (x∗ ) . (3.6)

At x∗ , the firm’s profit is

(p − c)x∗ + f (3.7)

(recall we’ve assumed MC = c). As illustrated, f = I − ȳ. In turn, ȳ =

44 Lecture Note 3: First-degree Price Discrimination

Y (x∗ ) + px∗ .2 We can, thus, rewrite (3.7) as

(p − c)x∗ + I − Y (x∗ ) − px∗ = −Y (x∗ ) − cx∗ + I . (3.8)

Maximizing (3.8) with respect to x∗ , we find that c = −Y ′ (x∗ ). Substituting for

Y ′ (x∗ ) using (3.6), we find that c = p; that is, as before, the seller maximizes
profits by setting the unit price equal to marginal cost. The entry fee is I −
(Y (x∗ ) + cx∗ ), where x∗ solves c = −Y ′ (x∗ ). Given that MC = c, it is clear
this generalizes for multiple consumers.

Summary 3.1 The conclusion that the optimal two-part tariff with one con-
sumer or homogeneous consumers entails setting the unit price equal to marginal
cost is not dependent on the assumption of quasi-linear utility.

As a “check” on this analysis, observe that Y ′ (·) is the marginal rate of

substitution (mrs). With quasi-linear utility; that is, u(x, y) = v(x)+y, the mrs
is −v ′ (x). So x∗ satisfies c = −(−v ′ (x)) = v ′ (x) = P (x), where the last equality
follows because, with quasi-linear utility, the consumer’s inverse demand curve
is just his marginal benefit (utility) of the good in question. This, of course,
corresponds to what we found above (recall Proposition 3.1).

Bibliographic Note

For more on two-part and multi-part tariffs, see Wilson (1993). Among other
topics, Wilson investigates optimal two-part tariffs with heterogeneous con-
sumers. Varian (1989) is also a useful reference.

2 Because the line segment ȳ − px is tangent to Y (·) at x∗ , we have ȳ − px∗ = Y (x∗ ).

Third-degree Price
Discrimination1
In real life, consumers are rarely homogenous with respect to their preferences
4
and, hence, demands. We wish, therefore, to extend our analysis of price dis-
crimination to accommodate heterogenous consumers.

The Notion of Type 4.1

To analyze heterogeneous consumers, imagine that we can index the consumers’
different utility functions, which here is equivalent to indexing their different
demand functions. We refer to this index as the type space, a given type being
a specific index number. All consumers of the same type have the same demand.
For example, suppose all consumers have demand of the form

1 , if p ≤ θ
x(p) = ;
0 , if p > θ

that is, a given consumer wants at most one unit and only if price does not
exceed θ. Suppose θ varies across consumers. In this context, θ is a consumer’s
type (index). The set of possible values that θ can take is the type space. For
instance, suppose that consumers come in one of two types, θ = 10 or θ = 15.
In this case, the type space, Θ, can be written as Θ = {10, 15}. Alternatively,
we could have a continuous type space; for instance, Θ = [a, b], a and b ∈ R+ .
Consider the second example. Suppose there is a continuum of consumers
of measure J whose types are distributed uniformly over [a, b]. Consequently,
b−p
at a uniform price of p, demand is J b−a . Suppose the firm’s cost of x units is
cx; observe MC = c. As there would otherwise be no trade, assume c < b. If
the firm engaged in linear pricing, its profit-maximizing price would be
( b+c b+c
∗ 2 , if 2 ≥ a
p = .
a , if b+c
2 <a

(Exercise: Verify.) Rather than deal with both cases, assume a < (b + c)/2.
Given this last assumption, there is no further loss of generality in normalizing
the parameters so a = 0, b = 1, and 0 ≤ c < 1.

1 What happened to second-degree price discrimination? Despite the conventional ordering,

it makes more sense to cover third-degree price discrimination before second-degree price
discrimination.

45
46 Lecture Note 4: Third-degree Price Discrimination

The firm’s profit is

(1 − c)2
πLP = J . (4.1)
4
(Exercise: Verify.)
Suppose, instead of linear pricing, the firm knew each consumer’s type and
could base its price to that consumer on his type. Because a consumer is willing
to pay up to θ, the firm maximizes its profit by charging each consumer θ
(provided θ ≥ c). Given that the firm is capturing all the surplus, this is perfect
discrimination. The firm’s profit is
Z 1
(1 − c)2
πPD = J (θ − c)dθ = J . (4.2)
c 2

Clearly, this exceeds the profit given in (4.1). Of course, we knew this would
be the case without calculating (4.2): linear pricing, because it generates a
deadweight loss and leaves some surplus in consumer hands, cannot yield the
firm as great a profit as perfect discrimination.
The example illustrates that, with heterogenous consumers, the ideal from
the firm’s perspective would be to base its prices on consumers’ types. In most
settings, however, that is infeasible. The question then becomes how closely can
the firm approximate that ideal through its pricing.

Characteristic-based
Discrimination 4.2
When a seller cannot observe consumers’ types, she has two choices. One, she
can essentially ask consumers their type; this, as we will see later, is what
second-degree price discrimination is all about. Two, she can base her prices on
observable characteristics of the consumers, where the observable characteristics
are correlated in some way with the underlying types. This is third-degree price
discrimination.
Examples of third-degree price discrimination are pricing based on observ-
able characteristics such as age, gender, student status, geographic location, or
temporally different markets.2 The idea is that, say, student status is correlated
with willingness to pay; on average, students have a lower willingness to pay for
an event (e.g., a movie) than do working adults.
Formally, consider a seller who can discriminate on the basis of M observable
differences. Let m denote a particular characteristic (e.g., m = 1 is student and
m = 2 = M is other adult). Based on the distribution of types conditional on
m, the firm’s demand from those with characteristic m is Xm (·). Let Pm (·) be
the corresponding inverse demand.

2 Although pricing differently at different times could also be part of second-degree price

discrimination scheme.
4.3 Welfare Considerations 47

The seller’s problem is

M M
!
X X
max xm Pm (xm ) − C xm . (4.3)
{x1 ,...,xM }
m=1 m=1

Imposing assumptions sufficient to make the first-order condition sufficient as

well as necessary (e.g., assuming (4.3) is concave), the solution is given by
M
!
X
′
Pm + xm Pm (xm ) − MC xm = 0 , for m = 1, . . . , M . (4.4)
m=1

Some observations based on conditions (4.4):

• If marginal cost is a constant (i.e., MC = c), then third-degree price

discrimination is nothing more than setting profit-maximizing linear prices
independently in M different markets.

• If marginal cost is not constant, then the markets cannot be treated inde-
pendently: how much the seller wishes to sell in one market is dependent
on how much she sells in other markets. In particular, if marginal cost
is not constant and there is a shift in demand in one market, then the
quantity sold in all markets can change.

• Marginal revenue across the M markets is the same at the optimum; that
is, if the seller found herself with one more unit of the good, it wouldn’t
matter in which market (to which group) she sold it.

Welfare Considerations 4.3

Does allowing a seller to engage in third-degree price discrimination raise or
lower welfare. That is, if she were restricted to set a single price for all markets,
would welfare increase or decrease?
We will answer this question for the R xcase in which MC = c and there are
two markets, m = 1, 2. Let Bm (x) = 0 Pm (z)dz; that is, Bm (x) is the gross
aggregate benefit enjoyed in market m (by those in group m). Welfare is,
therefore,
W (x1 , x2 ) = B1 (x1 ) + B2 (x2 ) − (x1 + x2 )c .
In what follows, let x∗m be the quantity traded in market θ under third-degree
price discrimination and let xU m be the quantity traded in market m if the seller
must charge a uniform price across the two markets.3 Because demand curves

3 To determine xU , let X (p) = X (p) + X (p) be aggregate demand across the two markets,
m 1 2
and let P(x) = X −1 (p) be aggregate inverse demand. Solve P(x) + xP ′ (x) = c for x (i.e.,
solve for optimal aggregate production assuming one price). Call that solution x∗M . Then
∗

xU
m = Xm P(xM ) .
48 Lecture Note 4: Third-degree Price Discrimination

slope down, Bm (·) is a concave function, which means

Bm (x∗m ) < Bm (xU ′ U ∗ U

m ) + Bm (xm ) · (xm − xm )
∗
= Bm (xU U U
m ) + Pm (xm ) · (xm − xm ) . (4.5)

Likewise,
∗ ∗ ∗
Bm (xU U
m ) < Bm (xm ) + Pm (xm ) · (xm − xm ) . (4.6)

If we let ∆xm = x∗m − xU ∗ ∗ U U

m , pm = Pm (xm ), p = Pm (xm ) (note, by assumption,
this last price is common across the markets), and ∆Bm = Bm (x∗m ) − Bm (xU
m ),
then we can combine (4.5) and (4.6) as

pU ∆xm > ∆Bm > p∗m ∆xm . (4.7)

Going from a uniform price across markets to different prices (i.e., to 3rd-degree
price discrimination) changes welfare by

∆W = ∆B1 + ∆B2 − (∆x1 + ∆x2 )c .

Hence, using (4.7), the change in welfare is bounded by

(pU − c)(∆x1 + ∆x2 ) > ∆W > (p∗1 − c)∆x1 + (p∗2 − c)∆x2 . (4.8)

Because pU − c > 0, if ∆x1 + ∆x2 ≤ 0, then switching from a single price

to third-degree price discrimination must reduce welfare. In other words, if
aggregate output falls (weakly), then welfare must be reduced. For example,
suppose that c = 0 and Xm (p) = a − bθ p, then x∗m = a/2. (Exercise: verify
that.) Aggregate demand across the two markets is X (p) = 2a − (b1 + b2 )p
∗ ∗
and xU U
1 + x2 = 2a/2. This equals x1 + x2 , so there is no increase in aggregate
demand. From (4.8), we can conclude that third-degree price discrimination
results in a loss of welfare relative to a uniform price in this case.
But third-degree price discrimination can also increase welfare. The quickest
way to see this is to suppose that, at the common monopoly price, one of the
two markets is shut out (e.g., market 1, say, has relatively little demand and no
demand at the monopoly price that the seller would set if obligated to charge
the same price in both markets). Then, if price discrimination is allowed, the
already-served market faces the same price as before—so there’s no change in
its consumption or welfare, but the unserved market can now be served, which
increases welfare in that market from zero to something positive.

Bibliographic Note

This discussion of welfare under third-degree price discrimination draws heavily

from Varian (1989).
4.4 Arbitrage 49

Arbitrage 4.4
We have assumed, so far, in our investigation of price discrimination that arbi-
trage is impossible. That is, for instance, a single buyer can’t pay the entry fee,
then resell his purchases to other buyers, who, thus, escape the entry fee. Simi-
larly, a good purchased in a lower-price market cannot be resold in a higher-price
market.
In real life, however, arbitrage can occur. This can make utilizing nonlinear
pricing difficult; moreover, the possibility of arbitrage helps to explain why
we see nonlinear pricing in some contexts, but not others. For instance, it is
difficult to arbitrage amusement park rides to those who haven’t paid the entry
fee. But is easy to resell supermarket products. Hence, we see two-part tariffs at
amusement parks, but we typically don’t see them at supermarkets.4 Similarly,
senior-citizen discounts to a show are either handled at the door (i.e., at time
of admission), or through the use of color-coded tickets, or through some other
means to discourage seniors from reselling their tickets to their juniors.
If the seller cannot prevent arbitrage, then the separate markets collapse into
one and there is a single uniform price across the markets. The welfare con-
sequences of this are, as shown in the previous section, ambiguous. Aggregate
welfare may either be increased or decreased depending on the circumstances.
The seller, of course, is made worse off by arbitrage—given that she could, but
didn’t, choose a uniform price indicates that a uniform price yields lower profits
than third-degree price discrimination.

Capacity Constraints 4.5

Third-degree price discrimination often comes up in the context of discounts for
certain groups to some form of entertainment (e.g., a play, movie, or sporting
event). Typically, the venue for the event has limited capacity and it’s worth
considering the implication that has for third-degree price discrimination.
Consider an event for which there are two audiences (e.g., students and non-
students). Assume the (physical) marginal cost of a seat is essentially 0. The
number of seats sold if unconstrained would be x∗1 and x∗2 , where x∗m solves
′
Pm (x) + xPm (x) = MC = 0 .

If the capacity of the venue, K, is greater than x∗1 +x∗2 , then there is no problem.
As a convention, assume that P2 (x∗2 ) > P1 (x∗1 ) (e.g., group 1 are students and
group 2 are non-students).
Suppose, however, that K < x∗1 + x∗2 . Then a different solution is called
for. It might seem, given a binding capacity constraint, that the seller would

4 Remember, however, that packaging can be a way for supermarkets to use arbitrage-proof

two-part tariffs.
50 Lecture Note 4: Third-degree Price Discrimination

abandon discounts (e.g., eliminate student tickets), particularly if x∗2 ≥ K (i.e.,

the seller could sell out charging just the high-paying group its monopoly price).
This view, however, is naı̈ve, as we will see.
The seller’s problem can be written as

max x1 P1 (x1 ) + x2 P2 (x2 )

{x1 ,x2 }

(recall we’re assuming no physical costs that vary with tickets sold) subject to

x1 + x2 ≤ K .

Given that we know the unconstrained problem violates the constraint, the
constraint must bind. Let λ be the Lagrange multiplier on the constraint. The
first-order conditions are, thus,

P1 (x1 ) + x1 P1′ (x1 ) − λ = 0 and

P2 (x2 ) + x2 P2′ (x2 ) − λ = 0 .

Observe that the marginal revenue from each group is set equal to λ, the shadow
price of the constraint. Note, too, that the two marginal revenues are equal.
This makes intuitive sense—what is the marginal cost of selling a ticket to a
group-1 customer? It’s the opportunity cost of that ticket, which is the forgone
revenue of selling it to a group-2 customer; that is, the marginal revenue of
selling to a group-2 customer.
Now we can see why the seller might not want to sell only to the high-paying
group. Suppose, by coincidence, that x∗2 = K; that is, the seller could sell out
the event at price P2 (x∗2 ). She wouldn’t, however, do so because

P1 (0) > 0 = P2 (x∗2 ) + x∗2 P2′ (x∗2 ) ;

| {z }
MR1 (0)

(the equality follows from the definition of x∗2 given that physical marginal cost
is 0). The marginal revenue of the Kth seat, if sold to a group-2 customer, is
clearly less than its marginal (opportunity) cost.
As an example, suppose that P1 (x) = 40 − x and P2 (x) = 100 − x.

Exercise 4.5.1: Suppose there was no capacity constraint. Verify that x∗1 = 20 and
x∗2 = 50 given third-degree price discrimination.
Exercise 4.5.2: Suppose there is a capacity constraint: K = 50. Verify that the
seller could just sell out if she set a uniform price of $50 (i.e., did not discriminate).
To whom would she sell? Verify that her (accounting) profit would be $2500.

The seller could do better than a uniform price of $50. To see this, equate the
marginal revenues:
40 − 2x1 = 100 − 2x2 . (4.9)
4.6 Transportation Costs 51

Substituting the constraint, x1 = 50 − x2 , into (4.9) yields

40 − 2(50 − x2 ) = 100 − 2x2 ; or

4x2 = 160 .

So, optimally, x2 = 40 and, thus, x1 = 10. The seller’s profit is 40 × (100 −

40) + 10 × (40 − 10) = 2700 dollars. As claimed, this amount exceeds her take
from naı̈vely pricing only to the group-2 customers.
While the seller’s profit is greater engaging in third-degree price discrimina-
tion (i.e., charging $30 for student tickets and $60 for regular tickets) than it
is under uniform pricing (i.e., $50 per ticket), welfare is less under third-degree
price discrimination. We know this, of course, from the discussion in Section
4.3—output hasn’t changed (it’s constrained to be 50)—so switching from uni-
form pricing to price discrimination must lower welfare. We can also see this by
considering the last 10 tickets sold. Under uniform pricing, they go to group-2
consumers, whose R 50value for them ranges from $60 to $50 and whose aggregate
gross benefit is 40 (100 − x)dx = 550 dollars. Under price discrimination, they
are reserved for group-1 consumers (students), whose value for them ranges
R 10
from $40 to $30 and whose aggregate gross benefit is just 0 (40 − x)dx = 350
dollars. In other words, to capture more of the total surplus, the seller distorts
the allocation from those who value the tickets more to those who value them
less.

Transportation Costs 4.6

Consider the following situation. A firm sells in N geographically distinct mar-
kets. To allow us to focus on transportation costs, assume that demand in
each market is the same, X(·). Let P (·) be the corresponding inverse demand.
Assume, unlike the specification in (4.3), that the firm’s cost function is
N N
!
X X
C(x) = tn x n + c xn ,
n=1 n=1

where x is the vector (x1 , . . . , xN ) and tn is the cost of transporting a unit of

the good to the nth market.
The firm’s profit is
N N
!
X X
xn P (xn ) − tn − c xn . (4.10)
n=1 n=1

Observe that the transport cost is equivalent to an excise tax. Hence, this
model also applies to a situation where transportation costs are the same across
markets, but the firm faces different excise taxes (e.g., the different markets
are in separate countries and tn is the duty paid to import a unit of the good
52 Lecture Note 4: Third-degree Price Discrimination

into country n). We will, however, here focus on the interpretation of tn as a

transportation cost.
The first-order conditions for profit maximization are
N
!
X
′ ′
P (xn ) − tn + xn P (xn ) − c xn = 0 (4.11)
n=1

for n = 1, . . . , N . (For convenience, let’s limit attention to the case in which the
firm wishes to operate in all markets.) The first-order conditions imply that

P (xn ) + xn P ′ (xn ) − P (xm ) + xm P ′ (xm ) = tn − tm (4.12)

for any n and m.

Exercise 4.6.1: Prove that, under the standard assumptions of linear pricing, mar-
ginal revenue is a decreasing function whenever marginal revenue is non-negative.

From the preceding exercise, we know that marginal revenue is decreasing in

x. Consequently, (4.12) shows that tn > tm implies xn < xm . In words: sales
are less in a market with a greater transportation cost than in a market with a
smaller transportation cost ceteris paribus. Recalling that demand curves slope
down, we’ve established:
Proposition 4.1 A firm selling via linear pricing in distinct markets that differ
only in their transportation cost charges a higher price (and makes correspond-
ingly fewer sales) in one market relative to another if the transportation cost to
the one market is greater than the transportation cost to the other market.
Another question we can ask is what’s the effect of an increase in the trans-
portation cost to one market on the prices in the other markets? From (4.11),
the answer would be “none” if the firm has constant marginal costs of produc-
tion (i.e., if c′ (·) is a constant). But what if c′ (·) is increasing? To answer this
question, it is useful to develop some more comparative statics tools.

More Comparative Statics

Let y and z be vectors in Rn . We define the join and meet of the two vectors
as

max{y1 , z1 } , . . . , max{yn , zn } (join)

min{y1 , z1 } , . . . , min{yn , zn } , (meet)

respectively.5 We denote the join of y and z as y ∨ z and we denote the meet

of y and z as y ∧ z.
5 Technically, the join is defined as sup{y, z} and the meet as inf{y, z}. For our purposes,

the definitions given in the text are adequate.

4.6 Transportation Costs 53

A function f : Rn → R is supermodular if

f (y ∨ z) + f (y ∧ z) ≥ f (y) + f (z) (4.13)

for all y and z in the domain of f (·).6 If the inequality in (4.13) is strict for
y 6= y ∨ z and y 6= y ∧ z, then the function is strictly supermodular.
From an economic perspective, we can view supermodularity as a statement
about the complementarity √ of the inputs to f (·). Recall, for example, that
the production function KL exhibits complementarity between capital, K,
and labor, L, insofar as the marginal product of either input is greater the
greater is the other input. Observe this production function is supermodular:
let KM ≥ Km and LM ≥ Lm . Clearly, the vectors (Km , Lm ) and (KM , LM )
satisfy (4.13), so we need only check the vectors (Km , LM ) and (KM , Lm ). To
that end, observe:
p p p p p p
KM LM − Lm ≥ Km LM − Lm
p p p p
⇒ KM LM − KM Lm ≥ Km LM − Km Lm
p p p p
⇒ KM L M + Km L m ≥ KM L m + Km L M .

Hence, the vectors (Km , LM ) and (KM , Lm ) satisfy (4.13).

√
Exercise 4.6.2: Prove the production function KL is strictly supermodular.
√
Exercise 4.6.3: Is the production function K + L supermodular? What is the
economic interpretation of your answer?
Exercise 4.6.4: Prove the following lemmas:

Lemma 4.1 Define f (y) = γ · y, where γ and y ∈ Rn . Prove f (·) is supermodular.

Lemma 4.2 If f (·) and g(·) are supermodular functions, then so too is αf (·) + βg(·),
α > 0 and β > 0.

Lemma 4.3 Let the domain of f (·) be [y 1 , ȳ1 ]×[y 2 , ȳ2 ], where [y i , ȳi ] ⊂ R for i = 1 , 2.
Suppose that f (y1 , y2 ) = g(y1 − y2 ), where g : R → R is a concave function. Then f (·)
is supermodular.

We can extend our definition of increasing differences to Rn+1 as follows.

Let f : Rn × R → R. If for any y ≥ y′ , y and y′ ∈ Rn , and z > z ′ we have7

f (y, z) − f (y′ , z) ≥ f (y, z ′ ) − f (y′ , z ′ ) , (4.14)

6 Note we also require y ∨ z and y ∧ z to be in the domain of f (·) for all y and z in the

domain of f (·). This is described by saying that the domain of f (·) is a lattice.
7 Recall y ≥ y′ if and only if yi ≥ yi′ for all i.
54 Lecture Note 4: Third-degree Price Discrimination

then we say f exhibits increasing differences in z.

The main comparative statics result is the following.8
Theorem 4.1 (Topkis’s Monotonicity Theorem) Let f : Rn × R → R.
Suppose f (·, z) is supermodular for any given z and suppose, too, it exhibits in-
creasing differences in z. Consider z ≥ z ′ . Suppose y∗ ∈ M ≡ argmaxy f (y, z)
∗ ∗ ∗
and y′ ∈ M′ ≡ argmaxy f (y, z ′ ). Then y∗ ∨ y′ ∈ M and y∗ ∧ y′ ∈ M′ .
Proof: Observe it is sufficient for us to show
∗
f (y∗ ∨ y′ , z) − f (y∗ , z) = 0 (4.15)
∗
(because y∗ ∨ y′ needs to do as well as y∗ ) and
∗ ∗
f (y∗ ∧ y′ , z ′ ) − f (y′ , z ′ ) = 0 (4.16)
(similar reason). To that end, we have
∗
0 ≥ f (y∗ ∨ y′ , z) − f (y∗ , z) (y∗ maximizes f (y, z))
∗
≥ f (y∗ ∨ y′ , z ′ ) − f (y∗ , z ′ ) (increasing differences)
′∗ ′ ∗ ′∗ ′
≥ f (y , z ) − f (y ∧ y , z ) (supermodularity)
′∗
≥ 0. (y maximizes f (y, z ′ ))
This establishes (4.15) and (4.16).

An immediate, but valuable, corollary is:

Corollary 4.1 Maintain the assumptions of Topkis’s Monotonicity Theorem.
Suppose, for all z, maxy f (y, z) has a unique solution; denote that solution by
y∗ (z). Then z > z ′ implies y∗ (z) ≥ y∗ (z ′ ).
Proof: From Topkis’s Monotonicity Theorem y∗ (z) ∨ y∗ (z ′ ) also maximizes
f (·, z). But given y∗ (z) is the unique maximizer, the only way this can be is if
y∗ (z) = y∗ (z) ∨ y∗ (z ′ ). Similar reasoning establishes y∗ (z ′ ) = y∗ (z) ∧ y∗ (z ′ ).
The result follows because
y∗ (z) ∨ y∗ (z ′ ) ≥ y∗ (z) ∧ y∗ (z ′ ) .

A related lemma is
Lemma 4.4 Maintain the assumptions of Topkis’s Monotonicity Theorem, ex-
cept assume f (·, z) is strictly supermodular for any given z. In addition, suppose
argmaxy f (y, z) has at least two distinct elements and let y∗ and y∗∗ denote two
such elements. Then y∗ ≥ y∗∗ or y∗∗ ≥ y∗ .
8 The statement of the theorem follows Milgrom and Roberts (1990), who appear to be the

ones to have named it Topkis’s Monotonicity Theorem. In Topkis’s original work (Topkis,
1978), it is (essentially) Theorem 6.1.
4.6 Transportation Costs 55

Proof: Suppose the claim is false. Then, by the definition of strict supermod-
ularity, we have

f (y∗ ∨ y∗∗ , z) + f (y∗ ∧ y∗∗ , z) > f (y∗ , z) + f (y∗∗ , z) . (4.17)

But Topkis’s Monotonity Theorem (with z = z ′ ) tells us that the four terms
in that expression all equal the same thing, which yields a contradiction. The
result follows reductio ad absurdum.

Directly verifying whether a given function is supermodular can get a bit

tedious. Fortunately, we have the following characterization theorem:9

Theorem 4.2 (Topkis’s Characterization Theorem) Limit the domain of

f , a real-valued function, to ×i=1 (y i , ȳi ) ⊂ Rn . Suppose f is twice continuously
n

differentiable everywhere on its domain. Then f (·) is supermodular if and only

if
∂f (y)
≥0 (4.18)
∂yi ∂yj
for all y in the domain of f (·) and any i, j, i 6= j (where, as usual, ym is the
mth element of the vector y).

To prove this theorem requires first establishing some lemmas.10

Lemma 4.5 The function f : Rn → R is supermodular if and only if it exhibits

increasing differences on any two dimensions; that is, if and only if for any y
and y′ that are equal except on any two distinct dimensions i and j, with yi > yi′
and yj > yj′ , it follows that

f (y) − f yi′ , y−i ≥ f yj , y−j ′
− f (y′ ) . (4.19)

Proof: The “only if” part is straightforward:
by construction y′ = yi′ , y−i ∧
′
yj , y−j and y = yi′ , y−i ∨ yj , y−j ′
. So (4.19) follows from the definition of
supermodularity.
Turn now to the “if” part. Consider two vectors y and y′ . If y ≥ y′ or
y′ ≥ y, then expression (4.13) automatically holds. So suppose that the two
vectors cannot be ordered. Because the indices are arbitrary, there is no loss of
generality in assuming that

y ∨ y′ = (y1 , . . . , ym , ym+1
′
, . . . , yn′ ) and
y ∧ y′ = (y1′ , . . . , ym
′
, ym+1 , . . . , yn ) .

Because the vectors cannot be ordered, 1 < m < n. Define wi,j by the formula
′
yk , if k ≤ m − i or k ≥ n + 1 − j
wki,j = . (4.20)
yk , otherwise

9 Again Milgrom and Roberts (1990) appear to be the ones who named the theorem.
10 The following analysis draws heavily from Topkis (1978).
56 Lecture Note 4: Third-degree Price Discrimination

where i = 0 , . . . , m and j = 0 , . . . , n − m. It is to be understood that indices

that are “out of bounds” (i.e., 0 or n + 1) are to be ignored. Observe

w0,0 = y ∧ y′
wm,n−m = y ∨ y′
wm,0 = y
w0,n−m = y′

Fix a j ≤ n − m − 1. Expression (4.19) implies

f (wi+1,j+1 ) − f (wi,j+1 ) ≥ f (wi+1,j ) − f (wi,j ) . (4.21)

Sum (4.21) over i = 0 , . . . , m − 1:

m−1
X
f (wm,j+1 ) − f (w0,j+1 ) = f (wi+1,j+1 ) − f (wi,j+1 )
i=0
(canceling like terms in the sum)
m−1
X
≥ f (wi+1,j ) − f (wi,j ) (from (4.21))
i=0
= f (wm,j ) − f (w0,j ) .
(canceling like terms in the sum)

Because j was arbitrary, we have established the chain of inequalities:

f (wm,n−m ) − f (w0,n−m ) ≥ f (wm,n−m−1 ) − f (w0,n−m−1 )

≥ · · · ≥ f (wm,j+1 ) − f (w0,j+1 ) ≥ f (wm,j ) − f (w0,j ) (4.22)
m,0 0,0
≥ · · · ≥ f (w ) − f (w ).

The chain of inequalities in (4.22), considering only the first and last parts,
implies
f (y ∨ y′ ) − f (y′ ) ≥ f (y) − f (y ∧ y′ ) ,
which in turn proves supermodularity (i.e., implies (4.13)).

Lemma 4.6 Suppose f : ×i=1 (y i , ȳi ) → R is at least twice differentiable ev-

erywhere on its domain. Then f (·) exhibits increasing differences on any two
dimensions—that is, exhibits the property that, for any i and j, i 6= j, yi > yi′ ,
yj > yj′ , and yk = yk′ , k 6= i and k 6= j,

f (y) − f yi′ , y−i ≥ f yj , y−j
′
− f (y′ ) . (4.23)

—if and only if on any two dimensions i and j

∂ 2 f (y)
≥ 0. (4.24)
∂yi ∂yj
4.6 Transportation Costs 57

Proof: Consider the “only if” part. Suppressing the other arguments for
notational convenience, we have
f (yi , yj ) − f (yi′ , yj ) ≥ f (yi , yj′ ) − f (yi′ , yj′ ) .
Dividing both sides by yi − yi′ and taking the limit as yi′ → yi , we have

∂f (yi , yj ) ∂f (yi , yj′ ) ∂f (yi , yj ) ∂f (yi , yj′ )

≥ ⇒ − ≥ 0.
∂yi ∂yi ∂yi ∂yi
Divide the last expression by yj − yj′ and take the limit as yj′ → yi . This yields
(4.24), as was to be shown.
Consider the “if” part. We have
Z yj 2
∂ f (y, z) ∂f (y, yj ) ∂f (y, yj′ )
0≤ dz = − .
yj′ ∂yi ∂yj ∂yi ∂yi

Hence,
Z
yi
∂f (y, yj ) ∂f (y, yj′ )
0≤ − dy
yi′ ∂yi ∂yi

= f (yi , yj ) − f (yi , yj′ ) − f (yi′ , yj ) − f (yi′ , yj′ ) .
Straightforward algebra on the last expression yields (4.23).

Proof of Topkis’s Characterization Theorem: The result follows imme-

diately from Lemmas 4.5 and 4.6.

More on Transportation Costs

Let’s return to the question that motivated the development of these additional
tools for comparative statics, which was what happens if the transportation cost
to one market increases. Given that the indices of the markets are arbitrary,
there is no loss of generality in assuming that it is the transportation cost to
market 1 that increases.
Given the toolkit just assembled, we would like the profit function, expres-
sion (4.10) on page 51, to be supermodular in the units to be sold (i.e., in
x) and exhibit increasing differences with respect to the transportation costs
(specifically, with respect to t1 ). We can quickly see that it fails to do so. The
cross-partial derivative between any xi and xj is
N
!
X
′′
−c xn < 0
n=1

because manufacturing exhibits diminishing returns to scale (i.e., c(·) is convex).

Moreover, the cross-partial of t1 and x1 is −1, which also has the wrong sign.
58 Lecture Note 4: Third-degree Price Discrimination

Fortunately in circumstances like these there is a useful “trick,” which is to

make an appropriate change of variables. Specifically, think of the firm seeking
to maximizing profits by choosing y1 and x−1 , where y1 ≡ −x1 . The firm’s
profit is
N N
!
X X
t1 − P (−y1 ) y1 + xn P (xn ) − tn − c −y1 + xn .
n=2 n=2

Observe, with respect to y1 , the relevant cross-partial-derivative conditions are

now met.
We still have a problem, however, because with respect to xi and xj , i > 1
and j > 1, the objective function is still not supermodular. Fortunately, there
is a second useful trick. Define
N
X
R(X) = max xn P (xn ) − tn
x−1
n=2

subject to
N
X
xn = X .
n=2

We can think of the firm solving its profit-maximization problem in two steps.
First, it asks itself: if it were constrained to capacity X for markets 2 , . . . , N ,
what would be the maximum profit it could achieve (that is the amount R(X))?
It is readily seen that R(·) is an increasing function (at least it will be for the
relevant domain of X).11 Second, the firm asks itself what values of y1 and X
does it want given it wishes to maximize

R(X) + t1 − P (−y1 ) y1 − c(X − y1 ) . (4.25)

Exercise 4.6.5: Verify that (4.25) is supermodular in y1 and X and exhibits increas-
ing differences in t1 for both y1 and X (note we still have increasing differences even
if (4.14) is an equality).

Consider two values for the transportation cost to market 1: t1 > t′1 . Given the
usual assumptions, maximizing (4.25) with respect to X and y1 has a unique
solution. Hence, if we write the solution as X ∗ (t), y1∗ (t) , it follows from Corol-
lary 4.1 that
X ∗ (t1 ), y1∗ (t1 ) ≥ X ∗ (t′1 ), y1∗ (t′1 ) . (4.26)
Recalling that y1 = −x1 , we have shown that an increase in the transportation
cost to market 1 causes the firm to sell no more units in market 1 and no fewer
units in the other markets.
11 These assumptions also guarantee it is differentiable by the Implicit Function Theorem.
4.6 Transportation Costs 59

In fact, we can get a more precise prediction. Suppose that the solution
given t1 and t′1 were the same. The relevant first-order conditions with respect
to y1 would, then, imply

t1 − P (−y1∗ ) + y1∗ P ′ (−y1∗ ) = c′ (X ∗ − y1∗ ) = t′1 − P (−y1∗ ) + y1∗ P ′ (−y1∗ ) .

But that cannot hold given t1 > t′1 . So y1∗ (t1 ) 6= y1∗ (t′1 ). It follows from (4.26)
that y1∗ (t1 ) > y1∗ (t′1 ). We can now shown that X ∗ (t1 ) > X ∗ (t′1 ). If that weren’t
the case, then we would have

R′ (X ∗ ) − c′ X ∗ − y1∗ (t1 ) = 0 = R′ (X ∗ ) − c′ X ∗ − y1∗ (t′1 ) .

But that cannot hold given that c′ X ∗ − y1∗ (t1 ) < c′ X ∗ − y1∗ (t′1 ) . It must
thus be that X ∗ (t1 ) > X ∗ (t′1 ). To summarize our analysis to this point:
Lemma 4.7 Maintain the standard assumptions. An increase in transportation
cost to one market, results in fewer units being sold in that market and more
units in aggregate being sold in the other markets.
What can we say about sales in any given market n, n > 1, given an increase
in t1 ? We know sales in at least one of these markets must increase given X has
increased. Suppose there were a market i, i > 1, that saw no increase. Let j,
j > 1, be a market that saw an increase. Write x∗n (t) for the profit-maximizing
sales in market n when transportation cost to market 1 is t. We then have the
following chain of inequalities and equalities:

P x∗i (t1 ) − ti + x∗i (t1 )P ′ x∗i (t1 ) ≥ P x∗i (t′1 ) − ti + x∗i (t′1 )P ′ x∗i (t′1 )

= P x∗j (t′1 ) − tj + x∗j (t′1 )P ′ x∗j (t′1 ) > P x∗j (t1 ) − tj + x∗j (t1 )P ′ x∗j (t1 ) ,

where the first inequality follows because we assumed market i’s sales did not
increase and marginal revenue is a decreasing function; the equality follows from
(4.12); and the last inequality follows because market j has seen an increase in
sales and marginal revenue is a decreasing function. But the first expression
and the last expression in that chain must be equal in light of (4.12). By
contradiction, it cannot be that sales don’t increase in market i. To conclude:
Proposition 4.2 Maintain the standard assumptions. An increase in trans-
portation cost to one market, results in fewer units being sold in that market
and more units being sold in each of the other markets.
Proposition 4.2 sheds light on an important issue in international trade.
Consider a firm that produces domestically only but sells both domestically and
abroad. For convenience, let’s treat “abroad” as a single market. Suppose the
domestic government imposed an export duty on the firm’s product (i.e., taxed
units being shipped abroad). Recall that an excise tax is like a transportation
cost, so the effect of this action is to raise the “transportation cost” to the
abroad market. Who wins and loses from this? The government wins—it gets
additional revenue. Domestic consumers win because the firm will sell more
60 Lecture Note 4: Third-degree Price Discrimination

domestically, which drives down the price they pay. Foreign consumers lose via
the same logic, but run in reverse. Finally, the firm loses. In a democracy,
there is an obvious temptation for a government to impose an export duty:
Lots of voters benefit (domestic consumers) and few voters lose (assuming the
shareholders and stakeholders of the firm are not especially numerous), plus the
government obtains funds without resorting to domestic taxation.
This reasoning helps to explain why the us Constitution contains a prohibi-
tion on export duties. In the negotiations over the Constitution, representatives
of the Southern states, which had both an export-driven economy and, relative
to the North, little population, feared that future national legislatures would,
catering to the majority of the voters, impose export duties (here, the South is
like the firm). Hence, they insisted on a prohibition on export duties.
Second-degree Price
Discrimination 5
In many contexts, a seller knows that different types or groups of consumers
have different demand, but she can’t readily identify from which group any
given buyer comes. For example, it is known that business travelers are willing
to pay more for most flights than are tourists. But it is impossible to know
whether a given flier is a business traveler or a tourist.
A well-known solution is to offer different kinds of tickets. For instance,
because business travelers don’t wish to stay over the weekend or often can’t
book much in advance, the airlines charge more for round-trip tickets that don’t
involve a Saturday-night stayover or that are purchased within a few days of
the flight (i.e., in the latter situation, there is a discount for advance purchase).
Observe an airline still can’t observe which type of traveler is which, but by
offering different kinds of service it hopes to induce revelation of which type
is which. When a firm induces different types to reveal their types for the
purpose of differential pricing, we say the firm is engaged in second-degree price
discrimination.
Restricted tickets are one example of price discrimination. They are an
example of second-degree price discrimination via quality distortions. Other
examples include:
• Different classes of service (e.g., first and second-class carriages on trains).
The classic example here is the French railroads in the 19th century, which
removed the roofs from second-class carriages to create third-class car-
riages.
• Hobbling a product. This is popular in high-tech, where, for instance, Intel
produced two versions of a chip by “brain-damaging” the state-of-the-art
chip. Another example is software, where “regular” and “pro” versions
(or “home” and “office” versions) of the same product are often sold.
• Restrictions. Saturday-night stayovers and advance-ticketing requirements
are a classic example. Another example is limited versus full memberships
at health clubs.
The other common form of second-degree price discrimination is via quantity
discounts. This is why, for instance, the liter bottle of soda is typically less
than twice as expensive as the half-liter bottle. Quantity discounts can often
be operationalized through multi-part tariffs, so many multi-part tariffs are
examples of price discrimination via quantity discounts (e.g., choices in calling
plans between say a low monthly fee, few “free” minutes, and a high per-minute

61
62 Lecture Note 5: Second-degree Price Discrimination

charge thereafter versus a high monthly fee, more “free” minutes, and a lower
per-minute charge thereafter).

Analysis 5.1
Consider two consumer types, 1 and 2, indexed by θ. Assume the two types
occur equally in the population. Assume that each consumer has quasi-linear
utility
v(x, θ) − T ,

where x is either consumption of a good or the quality of the single unit of the
good he consumes (in the latter case, treat x = 0 as not receiving the good
at all) and T is the payment (transfer) from the consumer to the seller of that
good. Assume the following order condition on marginal utility

∂ ∂v(x, θ)
> 0. (5.1)
∂θ ∂x

Expression (5.1) is called a Spence-Mirrlees condition; it is a single-crossing

condition. As will become evident, we often impose such an order assumption
on the steepness of the indifference curves across types. Another way to state
(5.1) is that the marginal utility of consumption (of quantity or quality) is
increasing in type for all levels of consumption.
Assume, as is standard, that the reservation utility if x = 0 is the same for
both types:
v(0, 1) = v(0, 2) = 0 . (5.2)

Assumption (5.2) is sometimes described as there being no countervailing in-

centives.
For convenience assume a constant marginal cost, c. This can be interpreted
either as it costs the firm cx to supply a unit of quality x or it costs it cx
to supply x units. The assumption of constant marginal cost means we can
consider the seller’s optimal strategy against a representative customer, who, as
previously assumed, is as likely to be type 1 as type 2.
In analyzing this problem, we can view the seller’s problem as one of de-
signing either two “versions” of the good or two “packages” of different sizes.
Regardless of interpretation, we can consider there to be two varieties. One will
be of quality or size x1 and sold for T1 ; the other will be of x2 and sold for T2 .
Obviously, the xθ variety is intended for the type-θ consumer. (For example,
one can think of these as being different size bottles of soda with xθ as the
number of liters in the θ bottle.) Hence, the seller’s problem is

1 1
max (T1 − cx1 ) + (T2 − cx2 ) (5.3)
{x1 ,x2 ,T1 ,T2 } 2 2
5.1 Analysis 63

subject to participation (ir) constraints,

v(x1 , 1) − T1 ≥ 0 and (5.4)
v(x2 , 2) − T2 ≥ 0 , (5.5)
and subject to revelation (ic) constraints,
v(x1 , 1) − T1 ≥ v(x2 , 1) − T2 and (5.6)
v(x2 , 2) − T2 ≥ v(x1 , 2) − T1 . (5.7)
As is often true of mechanism-design problems, it is easier here to work with
net utility (in this case, consumer surplus) rather than payments. To that end,
let
Uθ = v(xθ , θ) − Tθ .
Also define
I(x) = v(x, 2) − v(x, 1)
Z 2
∂v(x, t)
= dt .
1 ∂θ
Observe then, that Z 2
∂ 2 v(x, t)
I ′ (x) = dt > 0 ,
1 ∂θ∂x
where the inequality follows from the Spence-Mirrlees condition (5.1). Note,
given condition (5.2), this also implies I(x) > 0 if x > 0. The use of the
letter “I” for this function is not accidental; it is, as we will see, related to the
information rent that the type-2 consumer enjoys.
We can rewrite the constraints (5.4)–(5.7) as
U1 ≥ 0 (5.8)
U2 ≥ 0 (5.9)
U1 ≥ U2 − I(x2 ) and (5.10)
U2 ≥ U1 + I(x1 ) . (5.11)
We can also rewrite the seller’s problem (5.3) as
1 1
max (v(x1 , 1) − U1 − cx1 ) + (v(x2 , 2) − U2 − cx2 ) . (5.12)
{x1 ,x2 ,U1 ,U2 } 2 2
We could solve this problem by assigning four Lagrange multipliers to the
four constraints and crank through the problem. This, however, would be way
tedious and, moreover, not much help for developing intuition. So let’s use a
little logic first.
• Unless x1 = 0, one or both of (5.10) and (5.11) must bind. To see this,
suppose neither was binding. Then, since the seller’s profits are decreasing
in Uθ , she would make both U1 and U2 as small as possible, which is to
say 0. But given I(x1 ) > 0 if x1 > 0, this would violate (5.11).
64 Lecture Note 5: Second-degree Price Discrimination

• Observe that (5.10) and (5.11) can be combined so that I(x2 ) ≥ U2 −U1 ≥
I(x1 ). Ignoring the middle term for the moment, the fact that I(·) is
increasing means that x2 ≥ x1 . Moreover, if x1 > 0, then U2 − U1 ≥
I(x1 ) > 0. Hence U2 > 0, which means (5.9) is slack.

• Participation constraint (5.8), however, must bind at the seller’s optimum.

If it didn’t, then there would exist an ε > 0 such that U1 and U2 could
both be reduced by ε without violating (5.8) or (5.9). Since such a change
wouldn’t change the difference in the U s, this change also wouldn’t lead
to a violation of the ic constraints, (5.10) and (5.11). But from (5.12),
lowering the U s by ε increases profit, so we’re not at an optimum if (5.8)
isn’t binding.

• We’ve established that, if x1 > 0, then (5.8) binds, (5.9) is slack, and at
least one of (5.10) and (5.11) binds. Observe that we can rewrite the ic
constraints as I(x2 ) ≥ U2 ≥ I(x1 ). The seller’s profit is greater the smaller
is U2 , so it is the lower bound in this last expression that is important.
That is, (5.11) binds. Given that I(x2 ) ≥ I(x1 ), as established above,
we’re free to ignore (5.10).

So our reasoning tells us that, provided x1 > 0, we need only pay attention
to two constraints, (5.8) and (5.11). Using them to solve for U1 and U2 , we can
turn the seller’s problem into the following unconstrained problem:

1 1
max (v(x1 , 1) − cx1 ) + (v(x2 , 2) − I(x1 ) − cx2 ) . (5.13)
{x1 ,x2 } 2 2

The first-order conditions are:

∂v(x∗1 , 1)
− I ′ (x∗1 ) − c = 0 (5.14)
∂x
∂v(x∗2 , 2)
− c = 0. (5.15)
∂x

Note that (5.15) is the condition for maximizing welfare were the seller sell-
ing only to type-2 customers; that is, we have efficiency in the type-2 “mar-
ket.” Because, however, I ′ (·) > 0, we don’t have the same efficiency vis-à-vis
type-1 customers; in the type-1 “market,” we see too little output relative to
welfare-maximizing amount. This is a standard result—efficiency at the top and
distortion at the bottom.
To make this more concrete, suppose v(x, θ) = 5(θ + 1) ln(x + 1) and c = 1.
Then x∗2 = 14 and x∗1 = 4. Consequently, T1 ≈ 16.1 and T2 = v(x∗2 , 2) − I(x∗1 ) ≈
32.6. Under the interpretation that x is quantity, observe that a type-2 consumer
purchases more than three times as much, but pays only roughly twice as much
as compared to a type-1 consumer—this is quantity discounts in action!
5.2 A Graphical Approach to Quantity Discounts 65

$/unit

c
A B
ds(p) df (p)

units
qs qf

Figure 5.1: The individual demands of the two types of consumers (family and
single), df (·) and ds (·), respectively, are shown. Under the ideal
third-degree price discrimination scheme, a single would buy a
package with qs units and pay an amount equal to area A (gray
area). A family would buy a package with qf units and pay an
amount equal to the sum of all the shaded areas (A, B, and G).

A Graphical Approach to
Quantity Discounts 5.2
Now we consider an alternative, but ultimately equivalent, analysis of quantity
discounts.
Consider a firm that produces some product. Continue to assume the
marginal cost of production is constant, c. Suppose the population of potential
buyers is divided into families (indexed by f ) and single people (indexed by s).
Let df (·) denote the demand of an individual family and let ds (·) denote the
demand of an individual single. Figure 5.1 shows the two demands. Note that,
at any price, a family’s demand exceeds a single’s demand.
The ideal would be if the firm could engage in third -degree price discrimi-
nation by offering two different two-part tariffs to the two populations. That
is, if the firm could freely identify singles from families, it would sell to each
member of each group the quantity that equated that member’s relevant inverse
demand to cost (i.e., qs or qf in Figure 5.1 for a single or a family, respectively).
It could make the per-unit charge c and the entry fee the respective consumer
surpluses. Equivalently—and more practically—the firm could use packaging.
The package for singles would have qs units and sell for a single’s total benefit,
bs (qs ). This is the area labeled A in Figure 5.1. Similarly, the family package
would have qf units and sell for a family’s total benefit of bf (qf ). This is the
sum of the three labeled areas in Figure 5.1.
The ideal is not, however, achievable. The firm cannot freely distinguish
66 Lecture Note 5: Second-degree Price Discrimination

singles from families. It must induce revelation; that is, it must devise a second -
degree scheme. Observe that the third-degree scheme won’t work as a second-
degree scheme. Although a single would still purchase a package of qs units
at bs (qs ), a family would not purchase a package of qf units at bf (qf ). Why?
Well, were the family to purchase the latter package it would, by design, earn
no consumer surplus. Suppose, instead, it purchased the package intended for
singles. Its total benefit from doing so is the sum of areas A and G in Figure 5.1.
It pays bs (qs ), which is just area A, so it would enjoy a surplus equal to area
G. In other words, the family would deviate from the intended package, with
qf units, which yields it no surplus, to the unintended package, with qs units,
which yields it a positive surplus equal to area G.
Observe that the firm could induce revelation—that is, get the family to buy
the intended package—if it cut the price of the qf -unit package. Specifically, if
it reduced the price to the sum of areas A and B, then a family would enjoy a
surplus equal to area G whether it purchased the qs -unit package (at price = area
A) or it purchased the intended qf -unit package (at price = area A + area B).
Area G is a family’s information rent.
Although that scheme induces revelation, it is not necessarily the profit-
maximizing scheme. To see why, consider Figure 5.2. Suppose that the firm
reduced the size of the package intended for singles. Specifically, suppose it
reduced it to q̂s units, where q̂s = qs − h. Given that it has shrunk the package,
it would need to reduce the price it charges for it. The benefit that a single would
derive from q̂s units is the area beneath its inverse demand curve between 0 and
q̂s units; that is, the area labeled A′ . Note that the firm is forgoing revenues
equal to area J by doing this. But the surplus that a family could get by
purchasing a q̂s -unit package is also smaller; it is now the area labeled G′ . This
means that the firm could raise the price of the qf -unit package by the area
labeled H. Regardless of which package it purchases, a family can only keep
surplus equal to area G′ . In other words, by reducing the quantity sold to the
“low type” (a single), the firm reduces the information rent captured by the
“high type” (a family).
Is it worthwhile for the firm to trade area J for area H? Observe that the
profit represented by area J is rather modest: While selling the additional h
units to a single adds area J in revenue it also adds ch in cost. As drawn, the
profit from the additional h units is the small triangle at the top of area J. In
contrast, area H represents pure profit—regardless of how many it intends to
sell to singles, the firm is selling qf units to each family (i.e., cqf is a sunk
expenditure with respect to how many units to sell each single). So, as drawn,
this looks like a very worthwhile trade for the firm to make.
One caveat, however: The figure only compares a single family against a
single single. What if there were lots of singles relative to families? Observe
that the total net loss of reducing the package intended for singles by h is

(area J − ch) × Ns ,

where Ns is the number of singles in the population. The gain from reducing
5.2 A Graphical Approach to Quantity Discounts 67

$/unit

G'
H
c
A' J B
ds(p) df (p)

units
q^s q^s+h qf

Figure 5.2: By reducing the quantity in the package intended for singles, the
firm loses revenue equal to area J, but gains revenue equal to area
H.

that package is
area H × Nf ,
where Nf is the number of families. If Ns is much larger than Nf , then this
reduction in package size is not worthwhile. On the other hand if the two
populations are roughly equal in size or Nf is larger, then reducing the package
for singles by more than h could be optimal.
How do we determine the amount by which to reduce the package intended
for singles (i.e., the smaller package)? That is, how do we figure out what h
should be? As usual, the answer is that we fall back on our M R = M C rule.
Consider a small expansion of the smaller package from q̂s . Because we are using
an implicit two-part tariff (packaging) on the singles, the change in revenue—
that is, marginal revenue—is the change in a single’s benefit (i.e., mbs (q̂s )) times
the number of singles. That is,

MR(q̂s ) = Ns mbs (q̂s ) .

Recall that the marginal benefit schedule is inverse demand. So if we let ρs (·)
denote the inverse individual demand of a single (i.e., ρs (·) = d−1
s (·)), then we
can write
MR(q̂s ) = Ns ρs (q̂s ) . (5.16)
What about MC ? Well, if we increase the amount in the smaller package we
incur costs from two sources. First, each additional unit raises production costs
by c. Second, we increase each family’s information rent (i.e., area H shrinks).
Observe that area H is the area between the two demand curves (thus, between
the two inverse demand curves) between q̂s and q̂s + h. This means that the
68 Lecture Note 5: Second-degree Price Discrimination

marginal reduction in area H is

ρf (q̂s ) − ρs (q̂s ) ,
where ρf (·) is the inverse demand of an individual family. Adding them together,
and scaling by the appropriate population sizes, we have

MC (q̂s ) = Ns c + Nf ρf (q̂s ) − ρs (q̂s ) . (5.17)
Some observations:
1. Observe that if we evaluate expressions (5.16) and (5.17) at the qs shown
in Figure 5.1, we have
MR(qs ) = Ns ρs (qs ) and

MC (qs ) = Ns c + Nf ρf (qs ) − ρs (qs ) .
Subtract the second equation from the first:

MR(qs ) − MC (qs ) = Ns (ρs (qs ) − c) − Nf ρf (qs ) − ρs (qs )

= −Nf ρf (qs ) − ρs (qs )
< 0,
where the second equality follows because, as seen in Figure 5.1, ρs (qs ) = c
(i.e., qs is the quantity that equates inverse demand and marginal cost).
Hence, provided Nf > 0, we see that the profit-maximizing second-degree
pricing scheme sells the low type (e.g., singles) less than the welfare-
maximizing quantity (i.e., there is a deadweight loss of area J − ch). In
other words, as we saw previously, there is distortion at the bottom.
2. How do we know we want the family package to have qf units? Well,
clearly we wouldn’t want it to have more—the marginal benefit we could
capture would be less than our marginal cost. If we reduced the package
size, we would be creating deadweight loss. Furthermore, because we don’t
have to worry about singles’ buying packages intended for families (that
incentive compatibility constraint is slack) we can’t gain by creating such
a deadweight loss (unlike with the smaller package, where the deadweight
loss is offset by the reduction in the information rent enjoyed by families).
We can summarize this as there being no distortion at the top.
3. Do we know that the profit-maximizing q̂s is positive? That is, do we
know that a solution to M R = M C exists in this situation? The answer
is no. It is possible, especially if there are a lot of families relative to
singles, that it might be profit-maximizing to set q̂s = 0; that is, sell only
one package, the qf -unit package, which only families buy. This will be
the case if MR(0) ≤ MC (0). In other words, if

Ns (ρs (0) − c) − Nf ρf (0) − ρs (0) ≤ 0 (5.18)

4. On the other hand, it will often be the case that the profit-maximizing
q̂s is positive, in which case it will be determined by equating expressions
(5.16) and (5.17).
5.2 A Graphical Approach to Quantity Discounts 69

Extended Example

Consider a cell-phone-service provider. It faces two types of customers, those

who seldom have someone to talk to (indexed by s) and those who frequently
have someone to talked to (indexed by f ). Within each population, customers
are homogeneous. The marginal cost of providing connection to a cell phone is
5 cents a minute (for convenience, all currency units are cents). A member of
the s-population has demand:

450 − 10p , if p ≤ 45
ds (p) = .
0 , if p > 45
A member of the f -population has demand:

650 − 10p , if p ≤ 65
df (p) = .
0 , if p > 65
There are 1,000,000 f -type consumers. There are Ns s-type consumers.
What is the profit-maximizing second-degree pricing scheme to use? How
many minutes are in each package? What are the prices?
It is clear that the f types are the high types (like families in our previous
analysis). There is no distortion at the top, so we know we sell an f type the
number of minutes that equates demand and marginal cost; that is,
qf∗ = df (c) = 600 .
We need to find qs∗ . To do this, we need to employ expressions (5.16) and
(5.17). They, in turn, require us to know ρs (·) and ρf (·). Considering the
regions of positive demand, we have:
qs = 450 − 10ρs (qs ) and
qf = 650 − 10ρf (qf ) ;

hence,
qs
ρs (qs ) = 45 − and
10
qf
ρf (qf ) = 65 − .
10
Using expression (5.16), marginal revenue from qs is, therefore,
qs
M R(qs ) = Ns ρs (qs ) = Ns × 45 − .
10
Marginal cost of qs (including forgone surplus extraction from the f type) is

M C(qs ) = Ns c + Nf ρf (qs ) − ρs (qs )
qf qf
= 5Ns + 1, 000, 000 65 − − 45 +
10 10
= 5Ns + 20, 000, 000 .
70 Lecture Note 5: Second-degree Price Discrimination

Do we want to shut out the s-types altogether? Employing expression (5.18),

the answer is yes if

40Ns − 20, 000, 000 < 0 ;

that is, if

Ns < 500, 000 .

So, if Ns < 500, 000, then qs∗ = 0 and the price for 600 minutes (i.e., qf∗ ) is
bf (600), which is

bf (600) = Area under ρf (·) from 0 to 600

Z 600
= ρf (z)dz
0
Z
600
z
= 65 − dz
0 10
= 21, 000

cents or $210.
Suppose that Ns ≥ 500, 000. Then, equating M R and M C, we have
qs
Ns × 45 − = 5Ns + 20, 000, 000 ;
10
hence,
200, 000, 000
qs∗ = 400 − .
Ns
The low type retains no surplus, so the price for qs∗ minutes is bs (qs∗ ), which
equals the area under ρs (·) from 0 to qs∗ . This can be shown (see derivation of
bf (600) above) to be

bs (qs∗ ) = Area under ρs (·) from 0 to qs∗

Z qs∗
q
= 45 − dq
0 10
qs∗ 2
= 45qs∗ − .
20
The price charged the f types for their 600 minutes is bf (600) less their infor-
mation rent, which is the equivalent of area G′ in Figure 5.2.
Z qs∗
′ q q
Area G = 65 − − 45 + dq = 20qs∗ .
0 45 45
So the price charged for 600 minutes is 21, 000 − 20qs∗ cents ($210 − qs∗ /5).
To conclude: If Ns < 500, 000, then the firm sells only a package with 600
minutes for $210. In this case, only f types buy. If Ns ≥ 500, 000, then the
5.2 A Graphical Approach to Quantity Discounts 71

firm sells a package with 600 minutes, purchased by the f types, for 210 − qs∗ /5
dollars; and it also sells a package with qs∗ minutes for a price of bs (qs∗ ) dollars.
For example, if Ns = 5, 000, 000, then the two plans are (i) 600 minutes for
$138; and (ii) 360 minutes for $97.20.
72 Lecture Note 5: Second-degree Price Discrimination
Mechanism Design

73
Purpose
Our purpose is to consider the problem of hidden information; that is, a game
between two economic actors, one of whom possesses mutually relevant informa-
tion that the other does not. This is a common situation: The classic example—
covered in the previous part of these lecture notes—being the “game” between
a monopolist, who doesn’t know the consumer’s willingness to pay, and the
consumer, who obviously does. Within the realm of contract theory, relevant
situations include a seller who is better informed than a buyer about the cost
of producing a specific good; an employee who alone knows the difficulty of
completing a task for his employer; a divisional manager who can conceal infor-
mation about his division’s investment opportunities from headquarters; and a
leader with better information than her followers about the value of pursuing
a given course of action. In each of these situations, having private informa-
tion gives the player possessing it a potential strategic advantage in his dealings
with the other player. For example, consider a seller who has better information
about his costs than his buyer. By behaving as if he had high costs, the seller
can seek to induce the buyer to pay him more than she would if she knew he
had low costs. That is, he has an incentive to use his superior knowledge to
capture an “information rent.” Of course, the buyer is aware of this possibility;
so, if she has the right to propose the contract between them, she will propose
a contract that works to reduce this information rent. Indeed, how the con-
tract proposer—the principal—designs contracts to mitigate the informational
disadvantage she faces will be a major focus of this part of the lecture notes.

Bibliographic Note

This part of the lecture notes draws heavily from a set of notes that I co-authored
with Bernard Caillaud.
Not surprisingly, given the many applications of the screening model, this
coverage cannot hope to be fully original. The books by Laffont and Tirole
(1993) and Salanié (1997) include similar chapters. Surveys have also appeared
in journals (e.g., Caillaud et al., 1988). Indeed, while there are idiosyncratic
aspects to the approach pursued here, the treatment is quite standard.

75
76 Purpose
The Basics of
Contractual
Screening
To begin, the problem of interest is broadly as follows:
6
• Two players are involved in a strategic relationship; that is, each player’s
wellbeing depends on the play of the other player. In particular, the
players must contract with each other to achieve some desired outcome.
• One player is better informed than the other; that is, he has private in-
formation about some state of nature relevant to the relationship. This
player is the informed player. In situations of contractual screening, he is
often called the agent. Consistent with the literature, call the informed
player’s (the agent’s) information his type. The player without the infor-
mation is the uninformed player. In situations of contractual screening,
she is often called the principal.
• Critical to the analysis is the bargaining game between the players. In
contractual screening, it is assumed the principal (the uninformed player)
has all the bargaining power; that is, she makes a take-it-or-leave-it (tioli)
offer of a contract to the agent (the informed player). The agent either
accepts, in which case the contract is binding on both parties; or he rejects,
in which case the game is over and the players receive their default (no-
trade) payoffs. It is the assumption that the uninformed player makes the
tioli offer that makes this a screening model. Were, instead, the informed
player the contract proposer, we would have a signaling model.
• In this context, a contract can be seen as setting the rules of a secondary
game to be played by the principal and the agent.
The asymmetry of information in this game is assumed to arise exogenously.
It could, for instance, reflect the agent’s superior experience or expertise, which
provides him the payoff-relevant information. For example, past jobs may tell a
contractor how efficient he is—and thus what his costs will be—while ignorance
of these past jobs means the entity hiring him (e.g., a firm in need of his services
or a home owner seeking to remodel) has a less precise estimate of what his costs
will be.
Note, critically, that the informed player’s (agent’s) information is assumed
superior to the uninformed player’s (principal’s); that is, the analysis excludes
situations in which each player has his or her own private information.1
1 Put formally, the uninformed player’s information partition is coarser than the informed

player’s information partition.

77
78 Lecture Note 6: The Basics of Contractual Screening
The Two-Type
Screening Model 7
To begin formalizing the previous lecture note’s ideas, let’s start with as simple a
model as possible: the two-type model. That is, the agent’s private information
can take one of only two possible values.
Before proceeding, it is important to emphasize that the convenience of as-
suming only two types is not without cost. Beyond an obvious loss of generality,
the two-type model is “treacherous,” insofar as it may suggest conclusions that
seem general, but which are not. For example, the conclusion that we will shortly
reach with this model that the optimal contract implies distinct outcomes for
distinct states of nature—a result called separation—is not as general as it may
seem. Moreover, the assumption of two types conceals, in essence, a variety of
assumptions that must be made clear. It similarly conceals the richness of the
screening problem in complex, more realistic, relationships. Few prescriptions
and predictions should be reached from considering just a two-type model.

A Simple Two-Type Screening

Situation 7.1
The payoffs of principal and agent depend on the allocation of something (amount
of a good supplied by the agent, hours worked by the agent, quality of the agent’s
work, etc.). Let x ∈ R+ denote an allocation. Absent an agreement (i.e., if the
parties don’t trade), the allocation will be some default value, which we nor-
malize to 0.
Let b(x) denote the principal’s benefit from an allocation of x. Assume that
b(·) is strictly concave and differentiable everywhere. Normalize the principal’s
benefit if no agreement to 0; that is, set b(0) = 0. Observe, in this simple model,
the principal’s benefit is independent of the agent’s type (state of the world).
Assume, here, that type corresponds to efficiency: an inefficient agent incurs
a cost of CI (x) if x is allocated, while an efficient agent incurs cost CE (x). Let
θ ∈ {I, E} denote an arbitrary type. Assume Cθ (·) is increasing, differentiable,
and convex for both types. Reflecting that it is a cost function, Cθ (0) = 0.
Reflecting the idea that type refers to efficiency, assume CI′ (x) > CE ′
(x) for
all x > 0; that is, the inefficient type’s marginal-cost schedule lies above the
efficient type’s. Observe, via integration, that CI (x) > CE (x) for all x > 0.
The principal is uncertain about the agent’s type, whereas the agent knows
his type. Although uncertain about the agent’s type, the principal does hold
prior beliefs about the agent’s type. Specifically, she believes he is inefficient
with probability f . Because the principal is uncertain, it must be that 0 < f < 1.

79
80 Lecture Note 7: The Two-Type Screening Model

Observe total surplus (welfare) from allocating x units is

b(x) − Cθ (x)

when trade is with a type-θ agent. The problem is uninteresting if trade is

never desirable, so assume that b′ (0) > CE ′
(0) ≥ 0: some trade—a positive
allocation—is surplus enhancing, at least with an efficient agent. Infinite trade
is also without interest: assume, therefore, there exists a finite x̄ such that
b′ (x) − CE′
(x) < 0 for all x > x̄.
In this context, an agreement (contract) between the parties fixes an allo-
cation and a transfer (payment), t. The convention is that t > 0 means the
principal is paying the agent; t < 0 means the agent is paying the principal.
Caution: t is not a per-unit price. The payoffs for the principal and agent are,
thus, b(x) − t and t − Cθ (x), respectively.
As assumed earlier, the principal makes a tioli offer, which the agent ac-
cepts or rejects. If he rejects, there is no transfer and the allocation is the default
amount, 0. Hence, if the agent rejects, each party’s payoff from this “transac-
tion” is zero. Because this no-agreement outcome is equivalent to agreeing to
an allocation of 0 and a transfer of 0, it is without loss of generality to assume
that the parties reach some agreement in equilibrium.
Let xfi
θ denote the Pareto optimal allocation if trade is with a type-θ agent
(the rationale for the superscript fi will become evident shortly); that is,

xfi
θ = argmax b(x) − Cθ (x) . (7.1)
x≥0

The Pareto optimal solution, also referred to as the ex post efficient solution, is
then given by
fi
b′ (xfi ′ fi ′ fi ′ fi
E ) = CE (xE ) and b (xI ) − CI (xI ) xI = 0

(where only the larger non-negative root of the second equation is relevant).1 As
always, the optimal amount to trade rises as marginal cost falls; hence, xfi fi
I < xE .
As a benchmark , consider the case of symmetric or full information: suppose,
contrary to the situation of interest, the principal knew the agent’s type. Given
she has the bargaining power, the principal would offer the contract hx, ti that
maximized her payoff subject to the constraint that the agent accept; that is,
her problem is
max b(x) − t
x,t

subject to
t − Cθ (x) ≥ 0 (7.2)
(the zero on the rhs of this last expression—the agent’s reservation utility—is,
recall, what he gets if he rejects). As her payoff increases as t decreases, the

1 Assumptions made earlier ensure these first-order conditions are also sufficient.
7.2 Contracts under Incomplete Information 81

principal wishes to make t as small as possible; hence, the constraint must bind.
We can thus write her program as

max b(x) − Cθ (x) .

Comparing that expression to (7.1), it is clear the principal will choose allocation
xfi
θ . Using the binding participation constraint, expression (7.2), it follows that
the transfer, tfi
θ , will be given by

tfi fi
θ = Cθ (xθ ) .

The solution hxfi fi

θ , tθ i is known as the full-information solution (contract).

Contracts under Incomplete

Information 7.2
The full-information (benchmark) solution collapses when the principal is un-
informed about the agent’s type. To see why, suppose that the, now uncertain,
principal offered the agent the choice of two contracts, hxfi fi fi fi
E , tE i or hxI , tI i,
hoping the agent would choose the one appropriate to his type (i.e., the first
contract if efficient and the second if not). Observe that the principal is relying
on the agent to disclose his type honestly. Consider an efficient agent. If he
reveals he’s the E type (i.e., chooses contract hxfi fi
E , tE i), he will just be com-
pensated for his cost. His payoff would, therefore, be zero. On the other hand,
were he pretend to be inefficient (i.e., choose the type-I contract hxfi fi
I , tI i), his
fi fi
payment would be tI , but his cost only CE (xI ). His payoff from this deception
would be
tfi fi fi fi
I − CE (xI ) = CI (xI ) − CE (xI ) > 0

(recall tfi fi
I = CI (xI )). In other words, if the full-information contracts are
offered, an efficient agent does better pretending to be inefficient. The principal
would, therefore, need to be rather naı̈ve if she expected an efficient agent to
choose hxfi fi fi fi
E , tE i when he has the option of choosing hxI , tI i.
fi
The efficient agent’s gain from deception, CI (xI ) − CE (xfi I ), is called an
information rent. This is a loss to the principal, but a gain to the agent. There
is, however, an additional loss suffered by the principal that is not recaptured
by the agent: the agent’s deception means inefficiently little is produced; that
is, a real deadweight loss of

b(xfi fi fi fi
E ) − CE (xE ) − b(xI ) − CE (xI )

is suffered.
Given this analysis, it is clear the principal should not expect the agent
to reveal his type freely. What should the principal do, instead? That is,
what kind of contracts will she offer? Because the principal does not know the
agent’s type, she may want to delegate the choice of allocation to the agent
under a payment schedule that implicitly rewards the agent for not acting as
82 Lecture Note 7: The Two-Type Screening Model

though he was inefficient when he is truly efficient. This payment schedule, τ (·),
specifies what payment, t = τ (x), is to be paid the supplier should it choose to
supply x units. Wilson (1993) provides evidence that such payment schedules
are common in real-world contracting.
If the agent accepts such a contract, the agent’s allocation choice, xθ , is
given by
xθ ∈ argmax τ (x) − Cθ (x) . (7.3)
x≥0

Assume this program has a solution and let uθ denote the value of this maxi-
mization program. By definition,

uθ = τ (xθ ) − Cθ (xθ ) .

Observe this means equilibrium payment, tθ , can be written as

tθ = uθ + Cθ (xθ ) .

Define the function R : R+ → R+ to be the mapping

x 7→ CI (x) − CE (x)

(i.e., R(x) = CI (x)−CE (x)). The function R(·) is the information-rent function.
Previously made assumptions imply that R(0) = 0, R(x) > 0 if x > 0, and R(·)
is strictly increasing.

Exercise 7.2.1: Prove that R(0) = 0, R(x) > 0 if x > 0, and R(·) is strictly increasing.

Let xI and xE be the equilibrium allocation decisions of a type-I and a

type-E agent, respectively. Revealed preference implies the following about xI
and xE :

uE = tE − CE (xE ) ≥ tI − CE (xI ) = uI + R(xI ) (7.4)

uI = tI − CI (xI ) ≥ tE − CI (xE ) = uE − R(xE ). (7.5)

These inequalities are referred by many names in the literature: incentive-

compatibility (ic) constraints, self-selection constraints, revelation constraints,
and truth-telling constraints.
What can we conclude from expressions (7.4) and (7.5)? First, rewriting
them as
R(xI ) ≤ uE − uI ≤ R(xE ), (7.6)
we see that
xI ≤ xE , (7.7)
because R(·) is strictly increasing. Observe, too, that expression (7.4) implies
uE > uI (except if xI = 0, in which case we only know uE ≥ uI ). Finally,
7.2 Contracts under Incomplete Information 83

expressions (7.4) and (7.7) implies tE > tI (unless xE = xI , in which case (7.4)
and (7.5) imply tE = tI ).
An additional requirement is that the agent be willing to accept the contract
proposed by the principal. This means

uI ≥ 0 ; and (7.8)
uE ≥ 0 . (7.9)

The constraints (7.8) and (7.9) are known to as participation or individual-

rationality (ir) constraints. These constraints state that the agent accepts a
contract if and only if accepting does not mean suffering a loss.
The principal’s problem is to determine a payment schedule τ (·) that maxi-
mizes her expected payoff (“expected” because, recall, she knows only the prob-
abilities as to which type will be realized). Specifically, she seeks to maximize

f × b(xI ) − tI + (1 − f ) × b(xE ) − tE ;

or, equivalently,

f × b(xI ) − CI (xI ) − uI + (1 − f ) × b(xE ) − CE (xE ) − uE .

where (xθ , uθ ) are determined by the agent’s optimization program (7.3) in

response to τ (·).
Observe that only two points on the entire payment schedule are directly
relevant to the principal: (xI , tI ) and (xE , tE ); or, equivalently, (xI , uI ) and
(xE , uE ). The principal’s optimization program can be performed with respect
to just these two points provided a complete payment schedule can subsequently
be recovered that satisfies the ic and ir constraints.
In fact, the ic and ir constraints on (xI , tI ) and (xE , tE ) are necessary and
sufficient for there to exist such a payment schedule. To prove this assertion,
let (xI , tI ) and (xE , tE ) satisfy those constraints and construct the rest of the
payment schedule as follows:

 0 , if 0 ≤ x < xI
τ (x) = tI , if xI ≤ x < xE ,

tE , if xE ≤ x

given 0 < xI < xE .2 Because Cθ (·) is increasing in x, the agent would never
choose an x other than 0, xI , or xE (his marginal income is zero for any x
other than those three). The ir constraints ensure that (xθ , tθ ) is (weakly)
preferable to (0, 0). The ic constraints ensure that a type-θ agent prefers (xθ , tθ )
to (x¬θ , t¬θ ), where ¬θ is the type other than θ. That is, we’ve shown that faced
with this schedule, the type-I agent’s solution to (7.3) is xI —as required—and
the type-E agent’s solution is xE —as required.

2 If xI = 0, then tI = 0. If xI = xE , then tI = tE .
84 Lecture Note 7: The Two-Type Screening Model

The principal’s problem can thus be stated as

max f × b(xI )−CI (xI )−uI +(1−f )× b(xE )−CE (xE )−uE (7.10)
{xI ,xE ,uI ,uE }

subject to (7.4), (7.5), (7.8), and (7.9). Solving this problem using the standard
Lagrangean method is straightforward, albeit tedious. Because, however, such
a mechanical method provides little intuition, we pursue a different, though
equivalent, line of reasoning.

• One can check that ignoring the ic constraints (treating them as not bind-
ing) leads us back to the full-information solution. But, as shown at the
beginning of this section, that solution violates the ic constraint of the
efficient type. Conclusion: at least one of the ic constraints must bind.
• The ic constraint when the agent is efficient implies that: uE ≥ R(xI ) +
uI ≥ uI . Therefore, if an inefficient agent is willing to accept the contract,
so too must an efficient agent. Conclusion: constraint (7.9) is slack and
can be ignored.
• It is, however, the case that (7.8) must bind at the optimum. To see this,
suppose not: the principal could, then, lower both utility terms uI and uE
by some ε > 0 without violating the participation constraints. Moreover,
given the two utilities have been changed by the same amount, this can’t
affect the ic constraints. But, from (7.10), lowering the utilities raises the
principal’s expected payoff—which means our “optimum” wasn’t optimal.
• Using the fact that (7.8) is binding, expression (7.6)—the pair of incentive-
compatibility constraints—reduces to

R(xI ) ≤ uE ≤ R(xE ) .

For any target pair (xI , xE ), the principal wants the efficient agent’s in-
formation rent to be as small as possible. It follows, therefore, that
uE = R(xI ). The ic constraint (7.4) is, thus, slack, provided the nec-
essary monotonicity condition (7.7) holds.

To summarize: uI = 0 and uE = R(xI ). Substituting those values into the

principal’s program, expression (7.10), yields the following program:

max f × b(xI ) − CI (xI ) + (1 − f ) × b(xE ) − CE (xE ) − R(xI ) .
{(xI ,xE )| xI ≤xE }

The solution is

xE = xfi
E = argmax b(x) − CE (x) and (7.11)
x≥0
1−f
xI = x∗I (f ) ≡ argmax b(x) − CI (x) − R(x) . (7.12)
x≥0 f
7.2 Contracts under Incomplete Information 85

The only step left is to verify that the monotonicity condition (7.7) is satisfied
for these values. If we consider the last two terms in the maximand of (7.12)
to be cost, we see that the effective marginal cost of output from the inefficient
type is
1−f ′
CI′ (x) + R (x) > CI′ (x) > CE ′
(x)
f
for x > 0.3 The greater the marginal-cost schedule given a fixed marginal-
revenue schedule, the less is traded; that is, it must be that x∗I (f ) < xfi
E —the
monotonicity condition (7.7) is satisfied.
It is worth summarizing the nature and properties of the optimal price sched-
ule for the principal to propose:

Proposition 7.1 Consider a two-type model with a relatively efficient (lower-

marginal-cost-schedule) agent and a relatively inefficient (higher-marginal-cost-
schedule) agent. The best feasible (non-linear) payment schedule from the prin-
cipal’s perspective induces two possible outcomes:

• allocation if the agent is a relatively efficient equals the full-information

efficient allocation, xfi
E ; and

• allocation if the agent is relatively inefficient, x∗I (f ), is less than the full-
∗
information efficient allocation, xfi fi
I ; that is, xI (f ) < xI .

In addition:

• the inefficient agent earns no rent

(uI = 0), but the efficient agent earns
an information rent of R x∗I (f ) ;
• the incentive-compatibility (revelation) constraint is binding for the effi-
cient agent, but slack for the inefficient agent;
• the individual-rationality (participation) constraint is binding for the inef-
ficient agent, but slack for the efficient agent;

• and, finally, x∗I (f ) and R x∗I (f ) are non-decreasing in the probability of
drawing an inefficient agent (i.e., are non-decreasing in f );

To verify the last point, observe, from the principal’s perspective, it as if the
inefficient agent has a marginal cost of
1−f ′
CI′ (x) + R (x) .
f
That “effective” marginal cost is falling in f . By the usual comparative stat-
ics, it follows that x∗I (·) is non-decreasing. Because R(·) is an increasing func-
tion, R x∗I (·) must be similarly non-decreasing. Observe too that this effective
marginal cost is actual marginal cost if f = 1: if the principal were certain the
3 Because xfi
E > 0, this is the relevant domain of output to consider.
86 Lecture Note 7: The Two-Type Screening Model

agent was inefficient, then she would stipulate the welfare-maximizing allocation
(i.e., x∗I (1) = xfi
I ). Conversely, as f ↓ 0, this effective marginal cost tends to
+∞ for x > 0. Given her marginal benefit is bounded, this means there must
be some f ∈ (0, 1) such that, if f ≤ f , the principal stipulates a zero allocation
for the inefficient agent. In the parlance of mechanism design, she shuts down
or shuts out that type of agent when f ≤ f (i.e., x∗I (f ) = 0 for f ≤ f ).
Intuition for Proposition 7.1 can be gained from Figure 7.1. This figure
shows one indifference curve for an inefficient (type-I) agent and three for an
efficient (type-E) agent in output-payment space. The type-I indifference curve
is that type’s zero-profit curve (hence, by necessity, it passes through the origin).
It corresponds to the ir constraint for that type. Similarly, the lowest type-E
indifference curve is that type’s zero-profit curve. Suppose that, under full
information, points a and b would be the contracts offered. Under asymmetric
information, however, contract b is not incentive compatible for type E: were
that type to pretend to be type I (i.e., choose contract a), then he would be on
a higher (more profitable) indifference curve (the highest of its three curves).
Under asymmetric information, an incentive compatible pair of contracts
that induce the full-information allocations are a and c.

Exercise 7.2.2: Explain why, given the assumptions of this model, we known c lies
directly above b.

The problem with this “solution,” however, is that type E earns a large in-
formation rent, equal to the distance between points b and c. The principal
can reduce this rent by distorting downward the quantity asked from a type-I
agent. For example, by lowering the allocation to x∗I (f ), the principal signifi-
cantly reduces the information rent (it’s now the distance between points b and
e). How much distortion the principal will impose depends on the likelihood
of the two types. When f is small, the expected savings in information rent
is large, while the expected cost of too-little allocation is small, so the down-
ward distortion in type I’s allocation is big. Conversely, when f is large, the
expected savings on rent are small and the expected cost of misallocation is
large, so the downward distortion is small. The exact location of point d is
determined by finding where the expected marginal cost of distorting type I’s
output, f × b′ (xI ) − CI′ (xI ) , just equals the expected marginal reduction in
type E’s information rent, (1 − f ) × R′ (xI ).
Because the first best is not achieved, it is natural to ask: could the principal
do better than the solution described in Proposition 7.1 were she to use some
more sophisticated mechanism? The answer is no and the proof is, as we will
see later, quite general. Whatever sophisticated mechanism the principal uses,
note that it must boil down to a pair of points, (xI , tI ) and (xE , tE ), once exe-
cuted; that is, an allocation and a transfer for each possible type. Consequently,
whatever complicated play an alternative mechanism induces, both parties can
see through it; that is, forecast that the equilibrium outcomes correspond to
7.2 Contracts under Incomplete Information 87

}
Efficient
Inefficient
agent’s

{z
agent’s c indifference
indifference
curves
curve
e

|
b
a

fi
x
x∗I (f ) xI xfi
E

Figure 7.1: The full-information contracts, points a and b, are not incentive
FI
compatible. The principal finds the full-information allocations, xE
F
and xI , too costly due to the information rent (distance from point
b to point c). To reduce the rent, the principal distorts a type-I
agent’s allocation (from xIFI to x∗I (f )) which reduces the rent (by
the distance between points c and e).
88 Lecture Note 7: The Two-Type Screening Model

these two points. Hence, the final outcome can always be generated by a simple
(non-linear) payment schedule like the one derived above. We’ve, thus, estab-
lished that the outcome described in Proposition 7.1 cannot be improved on by
using more sophisticated or alternative mechanisms.
Finally, note that we don’t need an entire payment schedule, τ (·). In par-
ticular, there is a well-known alternative: a direct-revelation mechanism. In a
direct-revelation mechanism, the principal commits to pay the agent tE for xE
or tI for xI depending on the agent’s announcement of its type. Failure by the
agent to announce his type (i.e., failure to announce a θ̂ ∈ {I, E}) is equiva-
lent to his rejecting the contract.4 It is immediate that this direct-revelation
mechanism is equivalent to the optimal payment schedule derived above. It is
also simpler, in that it only deals with the relevant part of the payment sched-
ule. Admittedly, it is not terribly realistic,5 but, as this discussion suggests, a
direct-revelation contract can be turned into a more realistic mechanism (this
is Proposition 8.2 below). More importantly, as we will see, in terms of deter-
mining what is the optimal feasible outcome, there is no loss of generality in
restricting attention to direct-revelation mechanisms.

Exercise 7.2.3: Consider a two-type model in which the principal’s payoff is x − t

1 2
and the agent’s is t − 2θ x , where θ ∈ {1, 2}. If f = Pr{θ = 1}, what is the optimal
contract for the principal to offer given asymmetric information?

Other Applications

Through the following exercises, we explore two other applications of the two-
type model.

The first set of exercises assume the following: A manager (the agent) can reduce the
cost per unit of producing a given product. The manager knows something about how
easy it will be to successfully reduce costs, but his superior (the principal)—who gets
to make him a tioli offer—does not. Specifically, the manager’s utility is y − d(r, θ),
where y is his income (pay), r is the reduction, per unit, in cost he achieves, θ is his
type—reflecting the ease of cost reduction, and d : R2+ → R+ is the disutility he suffers
from reducing costs. The manager is free to quit and will do so if his utility will be less
than zero. Suppose θ ∈ {θ1 , θ2 }, where θ2 > θ1 . Assume the firm in question expects
to produce X units over the relevant time frame. Suppose for both θ that

∂d(0, θ) ∂ 2 d(r, θ) ∂d(r, θ)

d(0, θ) = 0 ; = 0; > 0 ∀r > 0 ; and lim = ∞.
∂r ∂r2 r→∞ ∂r

4 If the allocation is determined by the agent’s action (e.g., x is the number of units he

supplies, the quality of his workmanship, etc.), then it is further assumed that the contract
imposes a severe punishment on the agent if he fails to produce the contractually specified x
given his announcement (i.e., if his choice of x is not xθ̂ ).
5 Although see Gonik (1978) for a real-life example of a direct-revelation mechanism.
7.2 Contracts under Incomplete Information 89

Lastly, assume
∂d(r, θ1 ) ∂d(r, θ2 )
> ∀r > 0 .
∂r ∂r
Exercise 7.2.4: Were this a situation of full-information, what expressions would
define the optimal full-information solution as a function of θ? (Assume the manager’s
superior seeks to maximize firm profit.)
Exercise 7.2.5: If rnfi is the full-information solution given θn , prove r1fi < r2fi .
Exercise 7.2.6: Let the probability of type θ1 be q, 0 < q < 1. What conditions
define the optimal solution under asymmetric information?
The next set of exercises assume the following: There is an r&d lab. The manager
of the lab knows its type, τ . Let p be the probability that the lab will successfully
develop a new product worth V > 1 to the firm. Through his efforts, the manager
determines p; that is, the manager chooses p. The manager’s supervisor does not know
τ , but she does know it equals 1 with probability γ and 2 with probability 1 − γ. Let
the manager’s utility be
p
y− ,
(1 − p)τ
where y is his income (pay). The manager is free to quit and will do so if his utility
will be less than zero.
Exercise 7.2.7: Solve for the optimal full-information levels of p as a function of τ .
Exercise 7.2.8: Suppose, somewhat unrealistically, that the supervisor can verify
p and, thus, base a contract on it. What is the optimal solution given asymmetric
information about τ ? (Assume the supervisor seeks to maximize expected firm profit.)
Exercise 7.2.9: Consider the more realistic assumption that the supervisor can only
verify the lab’s success or failure. Now what is the optimal solution given asymmetric
information about τ ? (Hint: the contract will have a pair of payments (sτ , fτ ), where
sτ is paid if success and fτ is paid if failure; that is, the mechanism has the manager
choose between hs1 , f1 , p1 i ≡ C1 and hs2 , f2 , p2 i ≡ C2 , where, if the manager chooses
Cτ , he gets paid sτ if successful, but fτ if he fails.)
90 Lecture Note 7: The Two-Type Screening Model
General Screening
Framework 8
The two-type screening model yielded strong results. But buried within it is a
lot of structure and some restrictive assumptions. If we are really to use the
screening model to understand economic relationships, we need to deepen our
understanding of the phenomena it unveils, the assumptions they require, and
the robustness of its conclusions. The approach in this lecture note is, thus, to
start from a very general formalization of the problem.
The principal and agent have an interest in setting an allocation x ∈ X . In
addition to the allocation, both principal and agent care about money, specifi-
cally the value of any transfer between them.
The agent has information—his type—which consists of knowing the value
of a payoff-relevant parameter θ. Let Θ denote the set of possible types, the
type space. Nature draws θ from Θ according to a commonly known probability
distribution, F : Θ → [0, 1]. While the agent learns the value of θ perfectly, the
principal only knows that it was drawn from the commonly known probability
distribution.
Let B(x, t, θ) denote the principal’s utility as a function of allocation, trans-
fer, and payoff-relevant parameter; that is, B : X × R × Θ → R. Let U (x, t, θ)
denote the agent’s utility; that is, U : X × R × Θ → R. As in the previous
lecture note, interpret t > 0 as a transfer to the agent and t < 0 as a transfer
from the agent. Consistent with this interpretation, B(x, ·, θ) is a decreasing
function and U (x, ·, θ) an increasing function for all (x, θ) ∈ X × Θ.
For example, in the two-type model of the previous lecture note, we had
X = R+ , Θ = {I, E},

B(x, t, θ) = b(x) − t , and U (x, t, θ) = t − Cθ (x) .

As that model illustrated, it is not necessary that both the principal and agent’s
utility depend on the agent’s type.

Mechanisms

A contractual outcome is an allocation and transfer pair, (x, t). As we will see,
a contractual outcome can be the outcome of either a deterministic or stochastic
mechanism. With respect to a stochastic mechanism, let ∆(X × R) denote the
set of all possible probability distributions over outcomes (i.e., over the space
X × R). Let σ denote a generic element of ∆(X × R); that is, σ is a particular
distribution over outcomes. We can now define, quite generally, a contractual
mechanism:

91
92 Lecture Note 8: General Screening Framework

Mechanism: A contract or mechanism in the screening model is a game form,

hM, N , σi, to be played by principal and agent. The set M denotes the agent’s
strategy set, the set N denotes the principal’s strategy set, and σ maps any
pair of strategies (m, n) to a probability distribution over outcomes (i.e., σ :
M × N → ∆(X × R)).
Mechanism: A
It is important to understand that the parties choose the mechanism to be
mechanism is a
played. That is, as will be come clear, the parties—in particular, the principal—
game form chosen
decide what M, N , and σ are.
by the players.
Recalling the model of the previous lecture note, a mechanism-design inter-
pretation of the solution derived there is M = {I, E}, N = ∅, and

(xI , tI ) , if m = I
σ(m) = ; (8.1)
(xE , tE ) , if m = E

that is, the agent states whether he is inefficient or efficient and his announce-
ment fixes his output target and his payment.1
For notational convenience, I will write U σ(m, n), θ rather than the tech-
nically correct
Eσ(m,n) U (x, t, θ) .
Of course if σ(m, n) allocates all weight to a single (x, t) pair, there is noth-
ing incorrect about the notation U σ(m, n), θ . Given that mechanisms with
random outcomes are rare in the literature, the notational convenience seems
worth the possible confusion.
A direct mechanism is a mechanism in which M = Θ; that is, one in which
the agent’s action is limited to making announcements about his type. The
consequences of this announcement are then built into the outcome function, σ.
For instance, as we just saw and as was also discussed earlier, we can interpret
the solution of the previous lecture note’s model as a direct mechanism.
A direct-revelation mechanism is a direct mechanism for which it is an equi-
librium strategy for the agent to announce his type truthfully. In other words,
if m : Θ → Θ is the agent’s strategy (a mapping from type into announcement
about type), then we have a direct-revelation mechanism if, in equilibrium,
m(θ) = θ for all θ ∈ Θ. For truth-telling to be an equilibrium strategy, it must
be a best response to the agent’s type and his expectation of the principal’s
action n:
U σ m(θ), n , θ ≥ U σ m(θ′ ), n , θ ∀θ′ ∈ Θ ;

or, substituting for m(·),

U σ(θ, n), θ ≥ U σ(θ′ , n), θ ∀θ′ ∈ Θ . (8.2)

1 To be absolutely technical correct, we should write

Pr{(xI , tI )} = 1 & Pr{(x, t)} = 0 ∀(x, t) 6= (xI , tI ) , if m = I
σ(m) = .
Pr{(xE , tE )} = 1 & Pr{(x, t)} = 0 ∀(x, t) 6= (xE , tE ) , if m = E
Consistent with the literature, however, we will use formulations like (8.1) when the mechanism
is deterministic; that is, when each (m, n) pair maps directly to a specific outcome.
93

Note that not every direct mechanism is a direct-revelation mechanism. Be-

ing truthful in equilibrium is a property of a mechanism; that is, it depends on
σ(·).

The Revelation Principle

Observe that the design of a contract means choosing M, N , and σ. In theory,

the class of spaces and outcome functions is incomprehensibly large. How can
we find the optimal contract in such a large class? Indeed, given the inherent
difficulties in even characterizing such a large class, how can we ever be sure that
we’ve found the optimal contract? Fortunately, two simple, yet subtle, results—
the revelation principle and the taxation principle—allow us to avoid these
difficulties. From the revelation principle, the search for on optimal contract
reduces without loss of generality to the search for the optimal direct-revelation
mechanism. Moreover, if the outcome in the direct-revelation mechanism is a
deterministic function of the agent’s announcement, then the taxation principle
tells us we may further restrict attention to a payment schedule that is a function
of the allocation x (as we did in the sales & manufacturing-divisions example).

Proposition 8.1 (The Revelation Principle) 2 For any general mechan-

ism hM, N , σi and associated Nash equilibrium, there exists a direct-revelation
mechanism such that the associated truthful Nash equilibrium generates the same
distribution over outcomes in equilibrium as the general mechanism.

Proof: A Nash equilibrium of the game hM, N , σi is a pair of strategies

(m(·), n).3 Let us consider the following direct mechanism: σ̂(·) = σ(m(·), n).
The claim is that σ̂(·) induces truth-telling (is a direct-revelation mechanism).
To see this, suppose it were not true. Then there must exist a type θ such
that the agent does better to lie—announce some θ′ 6= θ—when he is type θ.
Formally, there must exist θ and θ′ 6= θ such that

U σ̂(θ′ ), θ) > U σ̂(θ), θ).

Using the definition of σ̂(·), this, however, means that

U σ m(θ′ ), n , θ > U σ m(θ), n , θ .

But if that expression is true, then the agent prefers to play m(θ′ ) instead of
m(θ) in the original mechanism. This contradicts the assumption that m(·) is
an equilibrium best response to n in the original game. It follows, reductio ad
absurdum, that σ̂ induces truth-telling.
2 The revelation principle is often attributed to Myerson (1979), although Gibbard (1973)

and Green and Laffont (1977) could be identified as earlier derivations. Suffice it to say
that the revelation principle has been independently derived a number of times and was a
well-known result before it received its name.
3 Observe that the agent’s strategy can be conditioned on θ, which he knows, while the

principal’s cannot be (since she is ignorant of θ).

94 Lecture Note 8: General Screening Framework

Moreover, because σ̂(θ) = σ(m(θ), n), the same distribution over outcomes
is implemented in equilibrium.

An intuitive way to grasp the revelation principle is to imagine that, before

he plays some general mechanism, the agent could delegate his play to some
trustworthy third party. There are two equivalent ways this delegation could
work. One, the agent could tell the third party to play the strategy m. Alterna-
tively, if the third party knows the agent’s equilibrium strategy—the mapping
m : Θ → M—then the agent could simply reveal (announce) his type to the
third party with the understanding that the third party should choose the ap-
propriate action, m(θ). But, because we can “incorporate this third party” into
the design of our direct-revelation mechanism, this equivalence means that there
is no loss of generality in restricting attention to direct-revelation mechanisms.

Participation

Mechanisms that do not need to induce the agent’s participation are of no

interest: if the agent always had to play, then the principal—who recall has all
the bargaining power—would simply require an infinite transfer from the agent
(i.e., set t = −∞). This is, of course, unrealistic.
By adding an element to X if necessary, we assume that there exists a
no-trade outcome, (x0 , 0); that is, (x0 , 0) is the outcome if no agreement is
reached—the agent rejects the principal’s contract offer. This means that the
agent can never be made worse off than

UR (θ) ≡ U (x0 , 0, θ) .

The quantity UR (θ) is called the reservation utility of a type-θ agent. In some
models, UR (θ) is a constant (does not vary with θ). This was, for instance, the
Participation: It is case in the previous lecture note’s model, where the reservation utility was 0 for
without loss of both types. When reservation utility is a constant, I will write UR .
generality to
Because a mechanism could always map θ into (x0 , 0), there is never any
assume all types
loss of generality with respect to mechanism design in assuming that all types
participate in
participate in equilibrium.
equilibrium.
It might strike one as odd, at least in some situations, that the agent’s
reservation utility equals his utility given no trade. What if an employee’s utility
if paid nothing and required to do nothing by the principal were zero, but he
has the alternative to work elsewhere which will yield him some positive utility
U (θ)? Because behavior is unaffected by an additive constant to the utility
function, this situation is equivalent to one in which we add U (θ) to the original
utility function so the agent’s no-trade utility is U (θ) (i.e., if paid nothing and
No-Trade and
asked to do no work—that is, essentially free to pursue the alternative—he gets
Reservation
utility U (θ)). In other words, there is no loss of generality in equating the
Utilities: There is agent’s reservation utility with his no-trade utility.
no loss of generality
in treating these two
utilities as the same.
95

The Taxation Principle

The taxation principle requires a bit of structure, namely it must be possible

to inflict a punishment so severe on the agent that he can be deterred from any
undesired action.

Condition 8.1 Let T ⊆ R be the set of permitted transfers. Then there exists
a t ∈ T such that
sup U (ξ, t, θ) ≤ inf UR (θ) . (8.3)
(ξ,θ)∈X ×Θ θ∈Θ

In words: there exists a transfer (presumably negative—hence from agent to

principal) so low that no type of agent would do better choosing an allocation
that triggered that transfer than he could do if he chose simply not to partici-
pate.

Proposition 8.2 (The Taxation Principle) 4 Assume Condition 8.1

holds. The equilibrium outcome under any deterministic
direct-revelation mech-
anism, σ, where σ is defined by θ 7→ x(θ), t(θ) , is also an equilibrium outcome
of the game in which the principal allows the agent to choose the allocation, x,
in exchange for compensation s(x), where s(·) is defined by

t(θ) , if θ ∈ x−1 (x) (i.e., such that x = x(θ) for some θ ∈ Θ)
s(x) = ,
t, otherwise

for a t (a punishment) satisfying (8.3).

Proof: First, let’s establish that s(·) is unambiguously defined. To that end,
suppose, to the contrary, that there exist θ and θ′ such that x = x(θ) = x(θ′ ),
but t(θ) 6= t(θ′ ). Without loss of generality, we may take t(θ) > t(θ′ ). Because
U (x, ·, θ) is increasing, we then have

U x(θ) , t(θ), θ′ ) > U x(θ′ ), t(θ′ ), θ′ ) ,

|{z}
=x(θ ′ )

but this implies type θ′ would do better to lie than tell the truth, which con-
tradicts the assumption that σ is a direct-revelation mechanism. Reductio ad
absurdum, it follows that t(θ) = t(θ′ ) and, therefore, that s(·) is unambiguously
defined.
Next we need to verify that the punishment deters the agent from choosing
an x ∈/ x(Θ) (i.e., an x for which no θ exists such that x = x(θ)). Consider an
arbitrary type θ′ . We have

sup U (ξ, t, θ′ ) ≤ sup U (ξ, t, θ) ≤ inf UR (θ)

ξ∈X \x(Θ) (ξ,θ)∈X ×Θ θ∈Θ

≤ UR (θ′ ) ≤ U x(θ′ ), t(θ′ ), θ′ = U x(θ′ ), s x(θ′ ) , θ′ ,

4 Rochet (1985) attributes the name “taxation principle” to Roger Guesnerie.

96 Lecture Note 8: General Screening Framework

where the first inequality follows because X \x(Θ)) ×{θ′ } ⊂ X × Θ, the third
inequality because {θ′ } ⊂ Θ, and the final equality because t(θ′ ) = s x(θ′ ) by
construction. Hence, no type would choose an x ∈ / x(Θ).
Finally, we need to verify that each type chooses the same x given the com-
pensation schedule s(·) as he would have played under the original mechanism.
Suppose there were a type θ who played differently. As just shown, we know he
chooses an x ∈ x(Θ). Suppose he chooses x(θ′ ). By supposition, this is better
for him than x(θ); hence,

U x(θ′ ), s x(θ′ ) , θ > U x(θ), s x(θ) , θ .

But since t(·) = s x(·) by construction, this implies

U x(θ′ ), t(θ′ ), θ > U x(θ), t(θ), θ ,

which contradicts the assumption that the original mechanism induces θ to

choose x(θ). Reductio ad absurdum, we can conclude that all θ prefer to play
x(θ) than any other x, as required.

Although payment schedules involve no loss of generality and are realistic,

they are often nonlinear, which creates difficulties in working with them. In
particular, when looking for the optimal nonlinear price schedule, one must be
able to compute the mapping that associates to each schedule s(·) the action
choice x(θ) that maximizes U (x, s(x), θ). Even assuming s(·) is differentiable—
which is not ideal because one should refrain from making assumptions about
endogenous entities—solving this problem can be difficult. Direct-revelation
mechanisms, on the other hand, allow an easier mathematical treatment of
the problem using standard convex analysis: The revelation constraints simply
consist of writing θ̂ = θ is a maximum of U (x(θ̂), t(θ̂), θ) and writing that a given
point is a maximum is easier that characterizing an unknown maximum. For this
reason, much of the mechanism-design literature has focused on direct-revelation
mechanisms over optimal-payment schedules despite the latter’s greater realism.
The Standard
Framework
The general framework just introduced is more general than what is commonly
9
used. In this section, we will therefore considered a restricted framework known
as the standard framework, Much of the contractual screening literature can be
placed within this standard framework. We begin by defining this framework
and contrasting it to the more general framework introduced above.
In the standard framework, the allocation space, X , is R+ .1 The type space,
Θ, is an interval in R. Assume the interval is bounded at both ends, let θL
denote the lower bound and θH the upper bound. The most critical assumptions
in going from the general framework to the standard framework involve the
utility functions: assume they are additively separable in the transfer and the
allocation. Moreover, assume the marginal value of money is type independent.
Hence,

U (x, t, θ) = t + u(x, θ) and

B(x, t, θ) = b(x, θ) − t .

Observe the utility functions in Lecture Note 7 fit this framework.

The value of additive separability is that it eliminates having to consider
income effects. Having the marginal value of money be type independent avoids
complicating countervailing incentives; in particular, it ensures the type space
satisfies an appropriate order condition (the Spence-Mirrlees condition).
More for convenience than anything else, assume u(·, ·) and b(·, ·) are twice
continuously differentiable. Aggregate surplus (welfare) is

w(x, θ) = u(x, θ) + b(x, θ) .

Because neither no trade nor infinite trade is particularly interesting (and, in

the case of the latter, realistic) assume
Some trade is desirable: For all θ ∈ (θL , θH ], ∂w(0, θ)/∂x > 0.
Too much of a good thing: For all θ ∈ [θL , θH ] there exists a finite x̄(θ) such
that
w(x, θ) ≤ w x̄(θ), θ
for all x > x̄(θ).

1 That the space be bounded below at 0 is not critical—any lower bound would do. Alter-

natively, by appropriate changes to the utility functions, we could allow the allocation space
to be unbounded. Zero is simply a convenience.

97
98 Lecture Note 9: The Standard Framework

In light of the continuity of w(·, θ), these last two assumptions ensure that
w(x, θ) has an interior maximum with respect to x for all θ ∈ (θL , θH ].
Before proceeding, it is worth observing that the standard framework is
restrictive in potentially important ways:

• The type space is restricted to be one-dimensional. In many applications,

this is a natural assumption. One can, however, conceive of applications
where it doesn’t fit: suppose, e.g., a principal cared about the quantity
and quality of the goods received and the agent’s type varied on both an
efficiency dimension and a conscientiousness dimension (the latter affect-
ing the cost of providing quality). Not surprisingly, restricting attention
to one dimension is done for analytic tractability: a single dimension make
the order properties of the type space straightforward (i.e., greater and
less than are well-defined on the real line). As will become evident, the
order properties of the type space are critical in the subsequent analysis.

• The set of possible allocations is one-dimensional. Again, this is sufficient

for some applications (e.g., the quantity supplied to the principal), but not
others (e.g., when the principal cares about both quantity and quality).
The difficulty in expanding to more than one dimension arise from diffi-
culties in capturing how the agent’s willingness to make tradeoffs among
the dimensions (including his income) varies with his type. The reader
interested in this extension should consult Rochet and Choné (1998) or
Basov (2010).

• As noted, the utility functions are separable in money and allocation; the
marginal utility of income is independent of the state of nature; and the
marginal utility of income is constant, which means both players are risk
neutral with respect to gambles over money. The gains from these as-
sumptions are that we can compute the transfer function t(·) in terms of
the allocation function x(·), which means our optimization problem is a
standard optimal-control problem with a unique control, x(·). In addition,
risk neutrality insulates us from problems that exogenously imposed risk
might otherwise create (e.g., the need to worry about mutual insurance).
On the other hand, when the agent is risk averse, the ability to threaten
him with endogenously imposed risk (from the contract itself) can provide
the principal an additional tool with which to improve the ultimate allo-
cation. For a discussion of some of these issues see Edlin and Hermalin
(2000). Note we still have the flexibility to endogenously impose risk over
the allocation (the x).

Returning to our development of the standard framework, assume that na-

ture chooses the agent’s type, θ, according to the differentiable distribution
function F : [θL , θH ] → [0, 1]. Let f (·) be the associated density function (i.e.,
f (θ) = F ′ (θ)). Assume the density function is continuous on [θL , θH ]. Assume,
too, that it has full support; that is, f (θ) > 0 for all θ ∈ [θL , θH ]. Assuming a
continuum of types and a distribution without mass points is done largely for
99

convenience. Note, in addition to considering a continuum of types, as here,

another way to generalize beyond two types would be to have a finite number of
types, N > 2. The conclusions reached would be economically similar to those
we’ll shortly obtain with a continuum of types.2 The benefit of going all the
way to the continuum is it allows us to employ calculus, which streamlines the
analysis.
Recall that, at its most general, a direct-revelation mechanism is a map-
ping from the type space into a distribution over outcomes. Given the way
that money enters both players’ utility functions, we’re free to replace a dis-
tribution over payments with an expected payment, which means we’re free to
assume that the payment is fixed deterministically by the agent’s announce-
ment.3 What about random-allocation mechanisms? The answer depends on
the risk properties of the two players’ utilities over allocation. If we postulate
that u(·, θ) and b(·, θ) are concave for all θ, then, absent incentive concerns,
there would be no reason for the principal to prefer a random-allocation mech-
anism; indeed, if at least one utility function is strictly concave, then she would
strictly prefer not to employ a random-allocation mechanism absent incentive
concerns: her expected utility is greater with a deterministic mechanism and,
since the agent’s expected utility is greater, her payment to him will be less (a
benefit to her). Hence, we would only expect to see random-allocation mecha-
nisms if the randomness somehow relaxed the incentive concerns. Where does
this leave us? At this point, consistent with what is standardly done, we will
assume that both u(·, θ) and b(·, θ) are concave, with one at least being strictly
concave, for all θ (note this entails that w(·, θ), the social surplus function, is
also strictly concave). Hence, absent incentive concerns, we’d be free to ig-
nore random-allocation mechanisms. Although we do have incentive concerns,
we’ll nonetheless ignore random-allocation mechanisms at this juncture. Con-
sequently, we’re free to write hx(·), t(·)i instead of σ(·) for the mechanism.

The Spence-Mirrlees Condition

In order to screen types, the principal must be able to exploit differences across
the tradeoffs that different types are willing to make between money and allo-
cation. Otherwise a strategy, for instance, of decreasing the x expected from
the agent in exchange for slightly less pay wouldn’t work to induce one type to
reveal himself to be different than another type. Recall, for instance, because
marginal cost differed between the efficient and inefficient agents in Lecture
Note 7, the principal could design a contract to induce revelation. Different
willingnesses to make tradeoffs means we require that different types of agents
have different indifference curves in allocation-money (transfer) space. In fact,

2 See, e.g., Caillaud and Hermalin (1993, §3) for a development of the standard framework

in which the type space is finite. Also see Hermalin (2014) for a more detailed of the standard
framework with a finite type space.
3 That is, the mechanism that maps θ to a distribution σ (θ) over payments is equivalent

to a mechanism that maps θ to the deterministic payment t(θ) = Eσ(θ) {t}.

100 Lecture Note 9: The Standard Framework

a d

Figure 9.1: The Spence-Mirrlees Condition: Through any point (e.g., a or b),
the indifference curve through that point for a higher type (red)
crosses the indifference curve through that point for a lower type
(blue) from above.

we want, for any point in that space, that these slopes vary monotonically with
respect to type. Such a monotonicity-of-indifference-curves condition is known
as a Spence-Mirrlees condition.
The slope of an indifference curve in allocation-money space is equal to
−∂u/∂x. Hence, we require that −∂u/∂x or, equivalently and more naturally,
∂u/∂x vary monotonically in θ. Specifically, we assume:
Condition 9.1 (Spence-Mirrlees) For all possible allocations x,
∂u(x, θ) ∂u(x, θ′ )
>
∂x ∂x
if θ > θ′ .
That is, if θ > θ′ —so θ is a higher type than θ′ —then the slope of type θ’s
indifference curve is, at any point, less than the slope of type θ′ ’s indifference
curve. Observe that a consequence of Condition 9.1 is that a given indifference
curve for one type can cross a given indifference curve of another type at most
once. For this reason, the Spence-Mirrlees Assumption is sometimes called a
single-crossing condition. Figure 9.1 illustrates.
Lemma 9.1 If u(·, ·) is at least twice differentiable in both arguments, then the
Spence-Mirrlees assumption (Condition 9.1) is implied by
∂ 2 u(x, θ)
> 0. (9.1)
∂θ∂x
101

If Condition 9.1 holds, then

∂ 2 u(x, θ)
≥ 0. (9.2)
∂θ∂x
Proof:

Exercise 9.0.1: Prove, via integration, that (9.1) implies Condition 9.1.
Exercise 9.0.2: Prove, by taking the limit of the appropriate expression as θ → θ′ ,
that Condition 9.1 implies (9.2).

Before leaving the Spence-Mirrlees assumption, it is important to emphasize

that the Spence-Mirrlees assumption is an assumption about order. Conse-
quently, differentiability of u with respect to either x or θ is not necessary. Nor,
in fact, is it necessary that U be additively separable as we’ve been assuming.
At its most general, then, we can state the Spence-Mirrlees assumption as

Condition 9.1′ (Generalized Spence-Mirrlees): There exists a complete

order ≻θ on the type space, Θ, such that if θ ≻θ θ′ and x ≻x x′ (where ≻x
completely orders X ), then the following is a valid implication:

U (x, t, θ′ ) ≥ U (x′ , t′ , θ′ ) =⇒ U (x, t, θ) > U (x′ , t′ , θ)

Exercise 9.0.3: For the standard framework (including the differentiability assump-
tions), prove that Condition 9.1 implies Condition 9.1′ .

This generalized Spence-Mirrlees condition states that we can order the types
so that if a low type (under this order) prefers, at least weakly, an outcome with
more x (with “more” being defined by the order ≻x ) than a second outcome,
then a higher type must strictly prefer the first outcome to the second. Figure
9.1 illustrates: Since the low type prefers point c to a (weakly), the high type
must strictly prefer c to a, which the figure confirms (c lies above the high
type’s indifference curve through a). Similarly, since the low type prefers c to b
(strictly), the high type must also strictly prefer c to b, which the figure likewise
confirms. See Milgrom and Shannon (1994) for a more complete discussion of
the relationship between Condition 9.1 and Condition 9.1′ .
As suggested at the beginning of this section, a consequence of the Spence-
Mirrlees assumption is that it is possible to separate any two types; by which
we mean it is possible to find two outcomes (x1 , t1 ) and (x2 , t2 ) such that a
type-θ1 agent prefers (x1 , t1 ) to (x2 , t2 ), but a type-θ2 agent has the opposite
preferences. For instance, in Figure 9.1, let point a be (x1 , t1 ) and let d be
102 Lecture Note 9: The Standard Framework

(x2 , t2 ). If θ2 is the higher type, then it is clear that given the choice between
a and d, θ1 would select a and θ2 would select d; that is, this pair of contracts
separates the two types.
Henceforth, we assume that the Spence-Mirrlees condition holds.

Characterization of Direct-Revelation Mechanisms

We now turn to characterizing the set of direct-revelation mechanisms (con-

tracts) under the standard framework. In other words, we wish to know which
are the contracts hx, ti : [θL , θH ] → R+ × R that satisfy truthful revelation.
This truthful-revelation—or incentive compatibility (ic)—requirement can be
expressed as:

t(θ) + u x(θ), θ ≥ t(θ′ ) + u x(θ′ ), θ (9.3)

for all θ and θ′ in [θL , θH ].

In addition, each type of agent, anticipating his equilibrium play, must prefer
to participate than not participate. (Recall it is without loss of generality to
limit attention to mechanisms that induce all types to participate.) As before,
we refer to this as the participation or individual-rationality (ir) constraint.
Given truth-telling in equilibrium, the ir constraint is

t(θ) + u x(θ), θ ≥ UR (θ) (9.4)

for all θ ∈ [θL , θH ], where, recall, UR (θ) is a type-θ agent’s reservation utility.
We assume that the agent acts in the principal’s interest when the agent is
otherwise indifferent. In particular, he accepts a contract when he is indiffer-
ent between accepting and rejecting it and he tells the truth when indifferent
between being honest and lying. This is simply a necessary condition for an
equilibrium to exist and, as such, should not be deemed controversial.
In Lecture Note 7, the assumption was that both types of agent had the
same reservation utility (specifically, zero). It is possible, however, to imagine
models in which the reservation utility indeed varies with θ. For instance, sup-
pose that a more-able agent could, if not employed by the principal, pursue
a more remunerative alternative than a less-able agent. Then the reservation
utility of the former would exceed that of the latter. A number of authors
(Lewis and Sappington, 1989, and Maggi and Rodriguez-Clare, 1995, among
others) have studied the role of such type-dependent reservation utilities in con-
tractual screening models.4 Type dependence can, however, greatly complicate
the analysis. We will, therefore, adopt the more standard assumption of type-
independent reservation utilities; that is, we assume UR (θ) ≡ UR .
We now consider the necessary conditions imposed on any direct-revelation
mechanism by the ic constraints. For any types θ and θ′ , the incentive-compa-

4 Models with type-dependent reservation utilities are sometimes called models of counter-

vailing incentives.
103

tibility constraints imply

t(θ) + u x(θ), θ ≥ t(θ′ ) + u x(θ′ ), θ and

t(θ′ ) + u x(θ′ ), θ′ ≥ t(θ) + u x(θ), θ′ .

As is often the case in contract design, it is easier to work with utilities than
transfers (payments). To enable us to do so, define

v(θ) = t(θ) + u x(θ), θ .

Observe that v(θ) is the type-θ agent’s equilibrium utility. The above pair of
inequalities can then be written as

v(θ) ≥ v(θ′ ) − u x(θ′ ), θ′ +u x(θ′ ), θ and
| {z }
=t(θ ′ )

v(θ′ ) ≥ v(θ) − u x(θ), θ + u x(θ), θ′ .

We can combine these inequalities to yield

u x(θ′ ), θ) − u x(θ′ ), θ′ ) ≤ v(θ) − v(θ′ ) ≤ u x(θ), θ) − u x(θ), θ′ ) .

Using the fundamental theorem of calculus, this last expression can be rewritten
as Z θ Z θ
∂u x(θ′ ), z ∂u x(θ), z
dz ≤ v(θ) − v(θ′ ) ≤ dz . (9.5)
θ′ ∂z θ′ ∂z
Ignoring, for the moment, the middle term of (9.5), we can use the funda-
mental theorem of calculus again to obtain
Z θ Z x(θ)
∂ 2 u(x, z)
dxdz ≥ 0 . (9.6)
θ′ x(θ ′ ) ∂x∂z

The integrand in (9.6) is positive given the Spence-Mirrlees condition (recall

Lemma 9.1). Expression (9.6) can hold only if the direction of integration of
both integrals is positive or if the direction of integration of both integrals is
negative; that is, it can hold only if

x(θ) − x(θ′ ) × (θ − θ′ ) ≥ 0 .

It follows, therefore, that x(·) must be a non-decreasing function. A consequence

of that result is that x(·) is continuous almost everywhere. In summary, we have
established:

Lemma 9.2 Given the Spence-Mirrlees condition, a necessary condition for a

screening mechanism to be a direct-revelation mechanism is that the allocation,
x(·), be non-decreasing in type.
104 Lecture Note 9: The Standard Framework

A second implication of (9.5) is the following. Via a result from real analysis,
it can be shown that v(·) is almost everywhere differentiable.5 Dividing all parts
of (9.5) by θ − θ′ and taking the limit as θ′ → θ, we see this derivative must be

dv(θ) ∂u x(θ), θ
=
dθ ∂θ

almost everywhere.6 Solving this differential equation, we have

Z θ
∂u x(z), z
v(θ) = v(θL ) + dz . (9.7)
θL ∂θ

Observe that we can rewrite (9.7) in terms of transfers:

Z θ

∂u x(z), z
t(θ) = tL − u x(θ), θ + dz , (9.8)
θL ∂θ

where tL ≡ v(θL ).

Exercise 9.0.4: Verify that (9.7) implies (9.8).

To summarize, we have proved:

Lemma 9.3 A necessary condition for a screening mechanism to be a direct-

revelation mechanism is that equilibrium utility be given by (9.7); or, equiva-
lently, that the transfer schedule, t(·), be given by (9.8).

Not only are these necessary conditions for a direct-revelation mechanism,

they are also sufficient; that is, any mechanism hx(·), t(·)i such that x(·) is
non-decreasing and t(·) is given by (9.8) is a direct-revelation mechanism.

Proposition 9.1 Under the assumptions of the standard framework, includ-

ing the Spence-Mirrlees condition, a mechanism hx(·), t(·)i is a direct-revelation
mechanism if and only if t(·) is given by (9.8) and x(·) is non-decreasing.

Proof: Necessity was established by Lemmas 9.2 and 9.3. We thus only need
to establish sufficiency. Suppose we have an allocation schedule x(·) that is non-
decreasing and a transfer schedule t(·) given by (9.8). We wish to show that
such a scheme induces a type-θ agent to truthfully announce his type rather

5 This follows from the Lebesgue Differentiation Theorem. See, e.g., Yeh (2006, p. 278).

For details, see Hermalin (2014).

6 Note the important distinction between ∂u x(θ), θ /∂θ and du x(θ), θ /dθ. The former

is the the partial derivative of u with respect to its second argument evaluated at x(θ), θ ,
while the latter is the total derivative of u.
105

than announce some θ′ 6= θ. If he lies about his type, claiming to be θ′ , θ′ 6= θ,

then his utility is

u x(θ′ ), θ + t(θ′ )

t(θ)+u x(θ),θ
z }| { Z θ′
Z θ
∂u x(z), z ∂u x(z), z
= u x(θ′ ), θ −u x(θ′ ), θ + tL +
′
dz + dz
θL ∂θ θ ∂θ
| {z }
t(θ ′ )

Z θ′ Z θ′

∂u x(θ′ ), z) ∂u x(z), z
= t(θ) + u x(θ), θ − dz + dz
| {z } θ ∂θ θ ∂θ
| {z }
v(θ)
u x(θ ′ ),θ −u x(θ ′ ),θ ′

Z θ′ Z θ′

∂u x(θ′ ), z) ∂u x(θ′ ), z
≤ v(θ) − dz + dz = v(θ) ,
θ ∂θ θ ∂θ
where the inequality in the last line follows because the Spence-Mirrlees con-
dition means ∂u(·, z)/∂θ is a non-decreasing function. Consequently, if θ′ > θ,
then Z θ′ Z θ′
∂u x(z), z ∂u x(θ′ ), z
dz ≤ dz
θ ∂θ θ ∂θ
because the integrand on the rhs exceeds the integrand on the lhs and inte-
gration is in the positive direction; alternatively, if θ′ < θ, then the inequality
holds because the integrand on the rhs is less than the integrand on the rhs
and integration is in the negative direction.
Because we have established

u x(θ′ ), θ + t(θ′ ) ≤ v(θ) ,

we have shown that the mechanism will induce truth-telling insofar as an agent
of a given type would not wish to pretend to be a different type.

This characterization result is, now, a well-known result and can be found,
implicitly at least, in almost every mechanism-design paper. Given its impor-
tance, it is worth understanding how our assumptions drive this result. In
particular, be aware that the necessity of (9.8) does not depend on the Spence-
Mirrlees assumption. The Spence-Mirrlees assumption’s role is to establish (i)
that a monotonic allocation function is necessary and (ii) that, if x (·) is mono-
tonic, then (9.8) is sufficient to ensure a truth-telling equilibrium.
This discussion also demonstrates a point that was implicit in our earlier
∂2u
discussion of the Spence-Mirrlees assumption: what is critical is not that ∂θ∂x
be positive, but rather that it keep a constant sign over the relevant domain.
If, instead of being positive, this cross-partial derivative were negative every-
where, then our analysis would remain valid, except that it would give us the
106 Lecture Note 9: The Standard Framework

inverse monotonicity condition: x(·) would need to be non-increasing in type.

But with a simple change of the definition of type, θ̃ = −θ, we’re back to our
original framework. Because, as argued above, the definition of type is some-
what arbitrary, we see that our conclusion of a non-decreasing x(·) is simply
a consequence of the assumption that different types of agent have different
marginal rates of substitution between money and allocation and that an order-
ing of these marginal rates of substitution by type is invariant with respect to
which point in X × R we’re considering.
What if the Spence-Mirrlees assumption is violated? As our discussion in-
dicates, although we still have some necessary conditions concerning incentive-
compatible mechanisms, we no longer have any reason to expect x(·) to be
monotonic. Moreover—and more critically if we hope to characterize the set of
incentive-compatible mechanisms—we have no sufficiency results. It is not sur-
prising, therefore, that little progress has been made on the problem of designing
optimal contracts when the Spence-Mirrlees condition fails.

The Optimal Mechanism in the Standard Framework

The previous analysis has given us, within the standard framework at least, a
complete characterization of the space of direct-revelation (incentive-compatible)
contracts. We can now concentrate on the principal’s problem of designing an
optimal contract.
The principal’s goal is to maximize her expected utility. In light of the
revelation principle, there is no loss in seeking to solve her problem within
the space of direct-revelation mechanisms. Moreover, from Proposition 9.1, we
know that any such mechanism must have a non-decreasing allocation schedule
and a transfer schedule given by (9.8). As noted earlier, it also without loss of
generality to require that the mechanism satisfy the ir constraint for each type
(i.e., v(θ) ≥ UR ).
The principal’s expected utility under the mechanism is
Z θH

b x(θ), θ − t(θ) f (θ)dθ .
θL

Using (9.8), this can be rewritten as

Z θH Z θ
tL
z }| {
∂u x(z), z
b x(θ), θ + u x(θ), θ − dz − v(θL ) f (θ)dθ . (9.9)
θL θL ∂θ
| {z }
−v(θ)
| {z }
−t(θ)

The principal’s objective is to maximize (9.9) with respect to x(·) and v(·)
subject to (i) the ir constraint, v(θ) ≥ UR ; and (ii) that x(·) be non-decreasing.
We can rather quickly substitute out the v(·):
Lemma 9.4 Under a direct-revelation mechanism, the agent’s equilibrium util-
ity is an increasing function of type (i.e., v(·) is increasing).
107

Proof: Given that

Z θ

∂u x(z), z
v(θ) = v(θL ) + dz ,
θL ∂θ

it is sufficient to show that

∂u x(θ), θ
>0 (9.10)
∂θ

for all θ. From Spence-Mirrlees (sm) we know that θ′ > θ′′ implies

∂u(x, θ′ ) ∂u(x, θ′′ )

> .
∂x ∂x

Hence,
Z x
∂u(z, θ′ ) ∂u(z, θ′′ )
− dz > 0 .
0 ∂x ∂x

Integrating, we get

0 < u(x, θ′ ) − u(x, θ′′ ) − u(0, θ′ ) − u(0, θ′′ ) = u(x, θ′ ) − u(x, θ′′ ) .
| {z } | {z }
=UR =UR

But, as θ′ and θ′′ were arbitrary, this implies that u(x, ·) is an increasing func-
tion and, therefore, that (9.10) holds.
No Rent at the
Lemma 9.4 implies that if v(θL ) ≥ UR , then v(θ) ≥ UR for all θ ∈ [θL , θH ]. Bottom: The lowest
Hence, the ir constraint is met for all θ if v(θL ) ≥ UR . From expression (9.9) type of agent earns
we see that the greater is v(θL ), the lower is the principal’s utility. Hence, she no rent—his
wants to set v(θL ) as low as possible, which means the constraint v(θL ) ≥ UR equilibrium utility
is binding. We can, therefore, substitute out the v(·) from the problem: the equals his
principal seeks to choose x(·) to maximize reservation utility.

Z Z !
θH θ
∂u x(z), z
b x(θ), θ + u x(θ), θ − dz − UR f (θ)dθ (9.11)
θL θL ∂θ

subject to x(·) being non-decreasing. Before solving this program, two addi-
tional transformations will prove helpful. First,

b x(θ), θ + u x(θ), θ = w x(θ), θ ,

where, recall, w(x, θ) is total welfare if x units are allocated when the agent’s
108 Lecture Note 9: The Standard Framework

type is θ. Second, via integration by parts,7 we have

Z θH Z θ
!
∂u x(z), z
dz f (θ)dθ
θL θL ∂θ

Z θ
θH Z θH

∂u x(z), z ∂u x(θ), θ
= − 1 − F (θ) dz + 1 − F (θ) dθ
θL ∂θ θL ∂θ
θL

Z !
θH
∂u x(θ), θ 1 − F (θ)
= f (θ)dθ .
θL ∂θ f (θ)

Using these two results, we can rewrite (9.11) as

Z θH !
1 − F (θ) ∂u x(θ), θ
w x(θ), θ − − UR f (θ)dθ . (9.12)
θL f (θ) ∂θ

The principal seeks to maximize (9.12) subject to x(·) being non-decreasing.

Proposition 9.2 Under the standard framework, the equilibrium mechanism

offered by the principal does not maximize expected welfare.

Proof: The result follows immediately from (9.12) because that expression
differs from the expression for maximizing expected welfare,
Z θH

w x(θ), θ f (θ)dθ ,
θL

Inefficiency: The in a non-constant way.

equilibrium of a
screening problem Intuitively, because the principal wishes to maximize something other than ex-
is generally pected social surplus and she chooses the contract, we can’t expect the contract
inefficient. she proposes to maximize social surplus.
Maximizing (9.12) subject to x(·) being non-decreasing is an optimal-control
problem. Fortunately, in many contexts, the constraint that x(·) be non-
decreasing is not binding and, thus, the problem can be solved simply by max-
imizing (9.12) pointwise. Pointwise maximization means determining x(θ) as
the solution to
1 − F (θ) ∂u(x, θ)
max w(x, θ) − . (9.13)
x f (θ) ∂θ

7 The product rule of differentiation tells us d(gh) = h dg + g dh, g and h differentiable

functions. Hence, it must be that

Z Z Z Z
g dh = d(gh) − h dg = gh − h dg .

Using this insight is known as integration by parts. In our use of integration by parts here,
R
h = −(1 − F ) and g = θθ ∂u x(z), z /∂θ dz.
L
109

The corresponding first-order condition is

∂w(x, θ) 1 − F (θ) ∂ 2 u(x, θ)

− ≤ 0. (9.14)
∂x f (θ) ∂x∂θ
The first-best means maximizing welfare, which has the corresponding first-
order condition ∂w(x, θ)/∂x = 0 (at least if all types ideally have positive al-
location). Comparing that first-order condition for welfare maximization to
(9.14) reëstablishes Proposition 9.2 for the case in which pointwise optimization
is valid. Moreover, we can sign the direction of the distortion:
Proposition 9.3 Assume the optimal mechanism for the principal can be found
via pointwise optimization. Consider the allocation schedule under that mech-
anism. Then for any type, except the very highest (θH ), a marginal increase
in the allocation for that type will raise welfare if the allocation for that type is
positive. Furthermore, if welfare is strictly quasi-concave in allocation for that
type, then the allocation under this mechanism cannot exceed the first-best (full-
information) allocation and is less than the first-best allocation if the allocation
under this mechanism is positive.
Proof: Let x∗ (θ) denote the allocation for type θ under the optimal mechanism
for the principal (i.e., the one that solves (9.14)). Let xfi (θ) denote the first-best
(full-information) allocation.
Suppose, first, that x∗ (θ) > 0. By the sm condition, the second term on
the lhs of (9.14) is negative, so ∂w x∗ (θ), θ /∂x > 0. The first part of the
proposition follows.
Considering the “furthermore” part, the result is immediate if x∗ (θ) = 0.
Suppose, then, that x∗ (θ) > 0. Expression (9.14) implies, given the sm con-
dition, that ∂w x∗ (θ), θ /∂x > 0. Given w(·, θ) is strictly quasi-concave, this
implies xfi (θ) > x∗ (θ).
Distortion Below
In other words, Proposition 9.3 indicates that there is local downward distortion the Top: The
in the allocations and, if welfare is quasi-concave, there is global downward optimal mechanism
distortion. These results are sometimes referred to as distortion below the top. from the principal’s
Observe 1 − F (θH ) = 0 (there is no probability of drawing a type better perspective tends to
than θH ), hence (9.14) is the same as the first-order condition for maximizing distort downward
welfare given an agent of type θH . Hence, the mechanism must yield an efficient allocations vis-à-vis
solution for the highest type: welfare.

Proposition 9.4 Assume the optimal mechanism for the principal can be found Efficiency at the
via pointwise optimization. The allocation for the highest type (θH ) under that Top: The optimal
mechanism is the first-best (full-information) allocation for that type. mechanism from
the principal’s
Proposition 9.4 is sometimes summarized as efficiency at the top. perspective tends to
maximize welfare
An Example: Let the agent manage a division. The principal, the agent’s with respect to the
superior, is concerned with the division’s profit, x. The effort necessary to allocation for the
generate profit imposes disutility on the agent—assume his utility function is best type.
110 Lecture Note 9: The Standard Framework

t + u(x, θ) = t − x2 /(θ + 1). The principal’s utility is just total profit, x − t.

Although the agent knows θ, the principal knows only that θ is a drawn from
a uniform distribution on [0, 1]. The agent’s utility under no trade is 0. His
reservation utility is thus 0.

Exercise 9.0.5: Verify that this example satisfies the assumptions of the standard
framework, including the Spence-Mirrlees condition.

Observe that w(x, θ) = x − x2 /(θ + 1). Expression (9.14) is, therefore,

2x 2x (θ + 1)2 − 4x
1− − (1 − θ) 2
= = 0.
θ+1 (θ + 1) (θ + 1)2

Solving, we have
(θ + 1)2
x∗ (θ) = .
4
Clearly, this solution is non-decreasing in θ. Moreover, the second-order condi-
tion is readily verified. We can, therefore, conclude that pointwise optimization
is valid for this example.

Exercise 9.0.6: Verify that the first-best (full-information) allocation, xfi (·), is given
by xfi (θ) = 21 (θ + 1).
Exercise 9.0.7: Verify that xfi (θ) > x∗ (θ) for all θ < 1. Verify that xfi (1) = x∗ (1).

What about the transfer schedule, t(·)? From (9.8) we have

Z θ
x∗ (θ)2 x∗ (z)2
t(θ) = UR + + 2
dz
θ+1 0 (z + 1)
Z θ
(θ + 1)3 (z + 1)2 3 + 12θ + 12θ2 + 4θ3
=0+ + dz = .
16 0 16 48

Validating Pointwise Optimization

Given the usefulness of being able to solve the principal’s optimization problem
pointwise, we would like to have some conditions that ensure the validity of
pointwise optimization. What we require are conditions such that the solution
to (9.13) is non-decreasing in θ. To this end, define the virtual surplus function:

1 − F (θ) ∂u(x, θ)
Ω(x, θ) = w(x, θ) − ;
f (θ) ∂θ

hence, maximizing Ω(x, θ) is the same as maximizing (9.13).

111

In light of Theorem 2.1 on page 30, it is sufficient to show that Ω satisfies

increasing differences; that is, if θ > θ′ and x > x′ , then

Ω(x, θ) − Ω(x′ , θ) > Ω(x, θ′ ) − Ω(x′ , θ′ ) . (9.15)

This motivates two conditions:

Condition 9.2 The welfare function w : R+ ×[θL , θH ] → R exhibits increasing

differences.

and

Condition 9.3 (Monotone Hazard Rate Property) The distribution of types

satisfies the Monotone Hazard Rate Property ( mhrp); that is,

f (θ)
(9.16)
1 − F (θ)

is non-decreasing in θ.8

The mhrp plays a big role in much of the economic analysis of contracts. Ob-
serve that if mhrp is satisfied, then

1 − F (θ)
f (θ)

(a ratio sometimes called the Mills ratio) must be non-increasing.

If Condition 9.2 holds, then (9.15) holds if

1 − F (θ) ∂u(x, θ) ∂u(x′ , θ) 1 − F (θ′ ) ∂u(x, θ′ ) ∂u(x′ , θ′ )
− < − .
f (θ) ∂θ ∂θ f (θ′ ) ∂θ ∂θ

In light of the Spence-Mirrlees condition, the differences on each side of this

expression are non-negative. The Mills ratio is positive. Hence, given mhrp, a
sufficient condition for this last inequality to hold is

Condition 9.4 The marginal change in utility with respect to type—the func-
tion ∂u/∂θ : R+ ×[θL , θH ] → R—exhibits decreasing differences.

To summarize, we have established:

Proposition 9.5 Under the standard framework, including the Spence-

Mirrlees condition, the optimal allocation schedule for the principal can be found
by maximizing the virtual surplus function (i.e., expression (9.13)) with respect
to choice of allocation if Conditions 9.2–9.4 hold.

8 See the definition of a hazard rate on page 11.

112 Lecture Note 9: The Standard Framework

It is important to understand that Proposition 9.5 merely supplies sufficient

conditions for when pointwise optimization of the principal’s problem is valid.
In particular, it says nothing about actually solving the problem; specifically,
one must still verify that the first-order condition (9.14) is sufficient as well as
necessary.

Exercise 9.0.8: Consider the example on page 109. Verify that welfare exhibits
increasing differences.
Exercise 9.0.9: Verify that mhrp holds for any random variable with a uniform
distribution.
Exercise 9.0.10: Consider the example on page 109. Verify that Condition 9.4 holds.

Bunching—Or What to do if Pointwise Optimization is Invalid

What if the solution to the pointwise optimization of the principal’s expected

utility, expression (9.12), does not yield a non-decreasing allocation schedule?
We then have a (more complex) optimal-control problem.9 Specifically, we wish
to solve Z θH

max Ω x(θ), θ f (θ)dθ (9.17)
x(·),y(·) θL

subject to

x′ (θ) = y(θ) , for almost every θ and

y(θ) ≥ 0 , for almost every θ .

In the language of optimal-control problems, x(·) is the state variable, y(·) is

the control variable, and the program imposes a non-negativity constraint on
the control. Let λ(·) be the co-state variable. The Hamiltonian for this problem
is, then,
H x(·), y(·), λ(·), θ = Ω x(θ), θ f (θ) + λ(θ)y(θ) .
The necessary first-order conditions for the program (9.17) are
• λ(·) is differentiable and non-positive;
• λ(θL ) = λ(θH ) = 0;

• y(θ) ∈ argmaxy≥0 Ω x(θ), θ f (θ) + λ(θ)y; and

• λ′ (θ) = −∂H/∂x = −f (θ)∂Ω x(θ), θ /∂x.
The third condition is straightforward: if λ(θ) < 0, then y(θ) must equal zero;
which, in turn, means that x(·) is a constant. Similarly, if λ(θ) = 0 for some
interval, then λ′ (θ) = 0 and, thus, x(θ) equals the pointwise optimum. We can,
thus, conclude:
9 For a general treatment of optimal-control problems see, e.g., Kamien and Schwartz

(1981).
113

Proposition 9.6 Under the standard framework, including the Spence-

Mirrlees condition, suppose that Ω(·, θ) is concave for all θ. Then a contract
is optimal for the principal if and only if it satisfies the following: The alloca-
tion schedule, x∗ (·), is continuous, bounded, and for almost every θ either
• maximizes Ω(x, θ); or
• is a constant, equal to xi over some interval (θi , θ̄i ) such that, for all
θ′ ∈ (θi , θ̄i ),
Z θ̄i Z θ̄i
∂Ω(xi , θ) ∂Ω(xi , θ)
f (θ)dθ ≤ 0 and f (θ)dθ = 0
θ′ ∂x θi ∂x

and xi = x∗ (θi ) = x∗ (θ̄i ).

We say there is bunching on any interval over which the allocation is constant
(e.g., an interval such as (θi , θ̄i )).
As an example, suppose that
θ3 (2θ − 3)2 x2
w(x, θ) = log(x) − .
4(1 + θ)2 θ+1
| {z } | {z }
=b(x,θ) =−u(x,θ)

Assume θ is distributed uniformly on [0, 1]. Observe the agent’s no-trade utility
is zero (i.e., UR = 0). We have
θ3 (2θ − 3)2 x2 x2
Ω(x, θ) = 2
log(x) − + (1 − θ) .
4(1 + θ) θ+1 (θ + 1)2
Pointwise optimization yields
1p 2
x(θ) = 9θ − 12θ3 + 4θ4 ,
4
which is not non-decreasing everywhere on [0, 1] (see Figure 9.2).
Let x∗ (·) denote the solution to (9.17). From Figure 9.2, it is clear that the
∗
x (θ) = x(θ) for all θ ∈ [θL , θ] and equals x(θ) for all θ ∈ [θ, θH ]. The question
is what is θ? We want to choose θ to maximize (9.17); that is, to solve
Z θ Z θH

max Ω x(θ), θ f (θ)dθ + Ω x(θ), θ f (θ)dθ .
θ θL θ

The first-order condition is

Z θH
∂Ω x(θ), θ
f (θ)dθ = 0 . (9.18)
θ ∂x
Equation (9.18) is a non-algebraic equation (there is a log(θ + 1) term). Solving
it numerically, yields θ ≈ .625477. Hence, the overall solution to (9.17) is
1√
2 3 4
x∗ (θ) = 4 9θ − 12θ + 4θ , if θ ≤ .625477 .
.273497 , if θ > .625477
114 Lecture Note 9: The Standard Framework

x∗ (θ)

x(θ)

θ
θ

Figure 9.2: Illustration of Bunching. The thick blue curve is x(θ) =

argmaxx Ω(x, θ). The thin red line is the actual solution, x∗ (θ),
which corresponds with x(θ) for θ ≤ θ and exhibits bunching for
θ > θ. Horizontal and vertical scales differ.

We can calculate the transfer schedule from (9.8):

Z θ
x∗ (θ)2 x∗ (z)2
t(θ) = + dz .
θ+1 θL (z + 1)2

Observe, for θ > .625477 = θ, the integral equals

Z θ Z θ
x∗ (z)2 x∗ (θ)2
dz + dz
θL (z + 1)2 θ (z + 1)2
Z θ
x∗ (z)2 1 1
= dz − x∗ (θ)2 − .
θL (z + 1)2 (θ + 1)2 (θ + 1)2

Hence, we see that for θ and θ′ , both greater than θ, we have t(θ) = t(θ′ ) = t(θ).
Of course, we must have that if the mechanism is to be incentive compatible
given that x(θ) = x(θ′ ) = x(θ)—we couldn’t expect truthful revelation if differ-
ent types got different payments for the same allocation.
Mechanism Design
with Multiple Agents 10
We now consider the problem of a principal facing an allocation problem involv-
ing some number of agents, N . Let X denote the set of possible allocations.
Examples of allocation problems are:

• The allocation of an indivisible good, in which case X is the N -dimensional

unit simplex (∆N ) and a particular choice from X are the probabilities
assigned to the agents of obtaining the good. For example, if N = 3,
a particular choice could be (1/3, 1/2, 1/6): agent 1 gets the good with
probability 1/3, agent 2 with probability 1/2, and agent 3 with probability
1/6.

• The amount of a common (public or non-rivalrous) good, in which case

X ⊆ R+ . For example, x ∈ X could be the amount of overall corporate
brand advertising for a firm with N product lines.

• How much each worker is expected to produce, in which case X ⊆ RN

The principal can also arrange transfers among the N agents and, possibly,
to or from herself. If tn is the transfer to agent n, the amount transferred from
the principal is
XN
tn
n=1

if that sum is non-negative. If it is negative, then agents are transferring, on net,

to the principal. Given this accounting, we can view the principal as setting a
transfer vector, t = (t1 , . . . , tN ), from a set of feasible transfer vectors, T , where
T = T1 × · · · × Tn ⊆ RN . Observe Tn ⊆ R.
Let Un (x, tn |θn ) denote agent n’s utility when the allocation from X is x,
his transfer is tn , and his type (private information) is θn . Assume θn is drawn
from the set Θn . Observe, the agent is assumed to care about his own transfer
only and not the transfers to others. It is assumed that all parties know the
function Un : X × Tn × Θn → R. What they don’t know—except for agent
n—is agent n’s type. It is, however, common knowledge that the θn are drawn
from the Θn according to a joint distribution function F . As examples:

• An agent’s type is the utility/value he assigns obtaining an indivisible

good. If his utility is additively separable in the good and transfer, then
Un (x, tn |θn ) = xn θn +tn , where xn is his probability of receiving the good.

115
116 Lecture Note 10: Mechanism Design with Multiple Agents

• An agent’s type is the marginal return to the product line he manages

from x dollars of overall corporate advertising. If his utility is an affine
transformation of the line’s, then Un (x, tn |θn ) = xθn + tn .

• An agent’s type is a parameter in his disutility of effort function, cn (xn , θn ),

where xn is the amount he must produce. If his utility is additively sep-
arable in income and disutility of effort, then Un (x, tn |θn ) = un (tn ) −
cn (xn , θn ).

The principal’s utility is denoted B(x, t, θ), where θ = (θ1 , . . . , θN ). In

most situations, the principal does not directly care how transfers are allocated
across agents—she P lacks distributional motives—rather, she cares solely about
N
her total transfers, n=1 tn . In other words, in most settings

B(x, t, θ) = B(x, t′ , θ)

if
N
X N
X
tn = t′n .
n=1 n=1

The Full-Information Benchmark

If the principal were able to operate under full information, then her problem
would be
max B(x, t, θ) (10.1)
x∈X ,t∈T

subject to the N participation constraints:

Un (x, tn |θn ) ≥ UnR (θn ) , (10.2)

n = 1, . . . , N , where UnR (θn ) is agent n’s reservation utility when he is type θn .

If we assume the principal lacks distributional motives and if we further assume,
reasonably, her preferences are weakly to transfer less to the agents rather than
more, then (10.2) will bind for all n. Assume the program the program (10.1)
has a solution; let xfi (θ) denote the allocation portion of that solution and tfi (θ)
denote the corresponding vector of transfers.

Mechanisms 10.1
The set of possible outcomes is X × T . It is conceivable, but rare in the lit-
erature, that the principal would want to choose a randomization over the set
of outcomes. Let ∆(X × T ) denote the set of all such randomizations over
outcomes. We can now define a mechanism:
10.1 Mechanisms 117

Mechanism: A mechanism for N agents is a game form, hM1 , . . . , MN , P, σi,

to be played by the N agents and the principal. The set Mn denotes the strategy
set of agent n, the set P denotes the strategy set of the principal, and σ maps
an N + 1-tuple of strategies to a probability distribution over outcomes ( i.e.,
σ : M1 × · · · × MN × P → ∆(X × T )).

It is important to understand that the principal chooses this game form; in

particular, she decides what M1 , . . . , MN , P, and σ are.
Let M = M1 × · · · × MN . Denote an element of M as m. As we did in
Lecture Note 8, we will “cheat” somewhat on notation and write Un σ(m, p)|θn
rather than the more technically correct

Eσ(m,p) U (x, tn |θn ) .

Of course, when σ is a deterministic mapping (never puts weight on more

than
a single element of X × T ), there is no problem writing Un σ(m, p)|θn .
Define
Θ = Θ1 × · · · × ΘN .
Let θ = (θ1 , . . . , θN ) denote an element of Θ. Let F : Θ → [0, 1] be the joint
distribution over types. We have independent types when

F (θ) = F1 (θ1 ) × · · · × FN (θN ) (10.3)

for all θ ∈ Θ, where Fn : Θn → [0, 1]. If (10.3) doesn’t hold, then we have
dependent types. In this latter case, it is possible that agent n’s knowledge of
his own type provides information about the types of the other agents. When
types are independent, a given agent’s knowledge of his own type provides no
information about the types of the other agents. For the case of dependent
types, let F−n (θ −n |θn ) denote agent n’s beliefs (conditional distribution) over
the types of the other agents given his knowledge of his own type. When types
are independent,

F−n (θ −n |θn ) = F1 (θ1 ) × · · · × Fn−1 (θn−1 ) × Fn+1 (θn+1 ) × · · · × FN (θN ) .

As we did in Lecture Note 8, we define a direct mechanism as a mechanism

in which Mn = Θn for all n. That is, in a direct mechanism, each agent is
limited to making announcements about his type.
A direct-revelation mechanism is a direct mechanism for which it is an equi-
librium strategy for each agent to truthfully announce his type. In other words,
if mn : Θn → Θn is the agent’s strategy, then we have a direct-revelation mech-
anism if, in equilibrium, mn (θn ) = θn for alln and all θn ∈ Θn .
In general, strategies m1 (·), . . . , mn (·), p are consistent with equilibrium if
Z
Un σ m(θn , θ −n ), p |θn dF−n (θ −n |θn )
Θ−n
Z
≥ Un σ mn , m−n (θ −n ), p |θn dF−n (θ −n |θn ) (10.4)
Θ−n
118 Lecture Note 10: Mechanism Design with Multiple Agents

for all n, mn ∈ Mn , and θn ∈ Θn where

m(θ) = m1 (θ1 ), . . . , mN (θN ) and

m−n (θ −n ) = m1 (θ1 ), . . . , mn−1 (θn−1 ), mn+1 (θn+1 ), . . . , mN (θN ) .

The Revelation Principle

For mechanism design to be tractable, we need to be able to work with well-

defined strategy spaces. As we did in Lecture Note 8, we can accomplish this
by invoking the revelation principle.

Proposition 10.1 (The Revelation Principle)For any general mechanism

hM1 , . . . , MN , P, σi and associated equilibrium, there exists a direct-revelation
mechanism such that the associated truthful equilibrium generates the same dis-
tribution over outcomes in equilibrium as the general mechanism.

Proof: Let m(·), p be an equilibrium. Construct the direct-revelation mech-
anism σ̂(·) = σ m(·), p . Our claim is that σ̂(·) induces truth-telling (is a
direct-revelation mechanism). To see this, suppose it were not true. Then there
must exist an agent n and a type θn such that the agent n does better to lie—
announce some θ′ 6= θn —rather than tell the truth when he is type θn . Formally,
there is an agent n, a θn ∈ Θn , and a θ′ ∈ Θn , θn 6= θ′ such that
Z Z

Un σ̂(θ′ , θ −n )|θn dF−n (θ −n |θn ) > Un σ̂(θn , θ −n )|θn dF−n (θ −n |θn ) .
Θ−n Θ−n

Using the definition of σ̂(·), this, however, means that

Z
Un σ mn (θ′ ), m−n (θ −n ), p |θn dF−n (θ −n |θn )
Θ−n
Z
> Un σ m(θn , θ −n ), p |θn dF−n (θ −n |θn ) .
Θ−n

But as this contradicts (10.4), which holds by assumption, our supposition that
such an n, θn , and θ′ exist must be false. It follows, reductio ad absurdum, that
σ̂ induces truth-telling.
Moreover, because σ̂(θ) = σ m(θ), p for all θ ∈ Θ, the same distribution
over outcomes is implemented in equilibrium.

The intuition is the same as in Lecture Note 8—see the discussion following
Proposition 8.1 on page 93.

Independent Allocations 10.2

As the examples with which we began the chapter show, there are a variety
of mechanism-design problems that can arise with multiple agents. In some,
10.2 Independent Allocations 119

the allocation assigned each agent can be made independently of the allocation
assigned another. For example, the production quota, xn , assigned one agent
can be wholly independent of the quota, xm , assigned another agent. In other
problems, there are cross-agent dependencies. For instance, allocating an indi-
visible good to one agent necessarily precludes allocating it to another. Or it
could be impossible to differentiate among agents with respect to the benefits
they accrue from a public good. We will refer to the first kind of problems
as independent-allocation problems. The second kind are dependent-allocation
problems. In this section, we consider the first kind.

Independent Types

If types are independent, as defined by expression (10.3) above, and if allocation

is also independent, then the problem with N agents is simply the problem with
an individual agent repeated N times. Hence, the solution for each agent will
be as given in Chapter 8.

Dependent Types

If types are dependent, then knowledge of one agent’s type can convey informa-
tion about another agent’s type. That, in turn, can allow the principal to reduce
the information rent the other agent receives, which is to her benefit. This sug-
gests that when the agents’ types are dependent (correlated), the principal can
do better than she would if she treated the problem as simply repeating the
single-agent problem.
To begin, let’s consider the case of perfect correlation: suppose there is some
state of the world, ω, drawn from Ω ⊆ R. Assume ω ∼ Ψ : Ω → [0, 1]. Assume
for each agent n there is a strictly monotonic mapping gn : Ω → Θn . Because
the mapping is strictly monotonic, it is invertible; that is, each agent n is able to
infer ω from his realization of θn . Consequently, there is no loss of generality in
treating the N agents as all having the same type because we are free to define

un (x, tn |ω) = Un x, tn |gn (ω)

and

b(x, t, ω) = B x, t, g1 (ω), . . . , gN (ω)
as agent n’s payoff and the principal’s payoff, respectively.
Given this common type space, the principal must know that at least one
agent has lied if ω̂m 6= ω̂n for some pair of agents m and n, where ω̂j denotes
the type agent j announces. Provided there is no limit on how severely the
principal can punish the agents, it follows that the principal can support an
equilibrium of truthtelling simply by imposing severe punishments on all agents
if their announcements disagree. Specifically, suppose that, for each agent n,
there exists (xn , tn ) ∈ Xn × Tn such that
sup un (xn , tn , ω) < inf uR
n (ω) , (10.5)
ω∈Ω ω∈Ω
120 Lecture Note 10: Mechanism Design with Multiple Agents

where uR
n (ω) is the reservation utility of agent n in state ω.

Proposition 10.2 (Shoot-them-all Mechanism) Assume perfect correla-

tion across agent types and that the principal can sufficiently punish the agents
(i.e., expression (10.5) holds). Then there exists a mechanism that has, as an
equilibrium, the full-information outcome.
Proof: Let ω b = (ω̂1 , . . . , ω̂N ), where ω̂n is agent n’s announcement of the
state, ω. Consider the mechanism:
(
xfi (ω̂), tfi (ω̂) , if ω̂m = ω̂n ∀m, n
x(bω ), t(b
ω) = . (10.6)
(x1 , . . . , xN ), (t1 , . . . , tN ) , if ω̂m 6= ω̂n ∃m, n

Suppose agent n believes that all other agents will always announce truthfully.
Then, because
un xfi (ω), tfi R
n (ω), ω) ≥ un (ω) > un (xn , tn , ω) ,

it is a best response for agent n to announce truthfully for all ω, where the
first inequality follows from the ir constraint (expression (10.2)) and the sec-
ond inequality from (10.5). This establishes that an equilibrium induced by the
mechanism (10.6) is truthtelling by all agents. Because, if the agents tell the
truth, that mechanism induces the full-information outcome, the result follows.

The mechanism of Proposition 10.2 is known as a “shoot-them-all” mech-

anism because essentially truth telling is induced by the threat to severely
punish—“shoot”—all the agents if there evidence that any one of them lied.
Because disagreement in their announcements is proof that at least one agent is
lying, each agent has a strong incentive to make the same announcement as all
other agents: as noted in the proof, the best response to a common announce-
ment is to make that announcement yourself.
Although the shoot-them-all mechanism works, it is not without its issues.
First, suppose that there were a state, ω̄, such that
un (xfi (ω̄), tfi fi fi
n (ω̄), ω) ≥ un (x (ω), tn (ω), ω) (10.7)
for all n and all ω ∈ Ω, with strict inequality holding for some n, ω, or both.
In other words, the agents all do better, at least weakly, under the outcome
corresponding to state ω̄. It is readily seen that another equilibrium given
the shoot-them-all mechanism (mechanism (10.6)) is for the agents to always
announce ω̄ regardless of the true state.

Exercise 10.2.1: Prove there is an equilibrium under the shoot-them-all mechanism

in which the agents always announce ω̄.
Exercise 10.2.2: Does your proof in the last exercise depend on (10.7) holding?
That is, could there be an equilibrium in which they always announce some ω̄ even if
it doesn’t satisfy (10.7)?
10.2 Independent Allocations 121

As these exercises show, although truth-telling is an equilibrium of the shoot-

them-all mechanism, it is not the only equilibrium. Moreover, in situations such
as that described by (10.7), one might well expect the agents to seek to play
an equilibrium other than the truth-telling equilibrium. Given that, in many
contexts, it is reasonable to imagine agents have a capacity to collude, we should
be somewhat suspicious of the shoot-them-all mechanism.1
Consider now imperfect correlation. Rather than a fully general analysis,
limit attention to two agents (i.e., N = 2); each of whom can have one of two
types, L or H (i.e., Θ1 = Θ2 = {L, H}); who have the same utility function,
defined by (xn , tn , θn ) 7→ tn − c(xn , θn ); and who have the same reservation
utilities, normalized to zero for both types (i.e., UnR (θ) ≡ 0 for both n and
both θ).2 An interpretation is that a principal wishes to employ two agents
and provide each with an output target (an x). Other than, possibly, their
realized types, the agents are identical. Each agent finds meeting the target
personally costly; that is c is a cost function with the usual properties, for
either θ: c(0, θ) = 0 and x > x′ implies c(x, θ) > c(x′ , θ). Assume the high type
(θ = H) has a lower marginal cost of production than the low type:

c(x, H) − c(x′ , H) < c(x, L) − c(x′ , L) (10.8)

for all x and x′ such that x > x′ . Assume the principal’s payoff is

x 1 + x 2 − t1 − t2 .

Assume the following information structure: the conditional probabilities

are
1
Pr{θn = θ|θm = θ} = ρ ∈ ,1
2
6 m and all θ. Assume a common unconditional probability of the low
for n =
type: Pr{θn = L} = f . By the definition of conditional probability, observe:3

Pr{(H, L)} = Pr{H|L} Pr{L} = (1 − ρ)f

and

Pr{(L, H)} = Pr{L|H} Pr{H} = (1 − ρ)(1 − f ) .

Because Pr{(H, L)} = Pr{(L, H)}, observe that consistency requires f = 1/2.

1 In
PN
some contexts, we also require that mechanisms be balanced; that is,
PN n=1 tn ≡ 0
always. Given that the tn are punishments, we should expect n=1 tn < 0; that is, the
shoot-them-all mechanism is unbalanced, at least for some out-of-equilibrium outcomes.
2 The reader interested in more general treatments of mechanism design with correlated

types should consult Crémer and McLean (1985, 1988) and McAfee and Reny (1992).
3 Recall Pr{A|B} ≡ Pr{A ∩ B}/ Pr{B}, so Pr{A ∩ B} = Pr{A|B} Pr{B}.
122 Lecture Note 10: Mechanism Design with Multiple Agents

By the revelation principle, there is no loss in restricting the principal to

direct-revelation mechanisms:
D E
x1 (θ1 , θ2 ), x2 (θ2 , θ1 ) , t1 (θ1 , θ2 ), t2 (θ2 , θ1 ) .

Note the convention on the ordering of arguments. Define ¬θ as follows:

L , if θ = H
¬θ = ;
H , if θ = L

that is, ¬θ is the other type to θ. Define

(
ρtn (θ̂, θ̂) + (1 − ρ)tn (θ̂, ¬θ̂) , if θ̂ = θ
Tn (θ̂|θ) = .
(1 − ρ)tn (θ̂, θ̂) + ρtn (θ̂, ¬θ̂) , if θ̂ 6= θ

In other words Tn (θ|θ) is agent n’s expected compensation if he truthfully an-

nounces that his type is θ and Tn (¬θ|θ) is his expected compensation if he lies
when his type is θ; where, observe, he calculates these expectations taking into
account the correlation between his type and the other agent’s type.
Although the principal may wish to use one agent’s announcement to limit
the information rent enjoyed by another agent—that is, to pay him less—there
is no obvious reason why she should want his output target (i.e., x) to depend
on the output target given the other agent. Moreover, it is clear that the full-
information values of x1 and x2 depend only on the individual agents’ types;
that is,

xfi fi fi
n (θn , θm ) = argmax x − c(x, θn ) =⇒ xn (θn , θ) = xn (θn , ¬θ)
x

for all n and θn . This suggests the following strategy for solving the principal’s
problem: can she design transfers to (i) implement full-information allocations
(i.e., xs) and (ii) such that, in expectation, no agent earns an information rent
regardless of his type (i.e., such that the ir types are binding for both agents
regardless of their realized types)? If the answer is yes, then clearly this is the
solution because there is no way for the principal to do better than this. If the
answer is no, then we have more work to do. Fortunately, as we will see, the
answer will be yes.
Given the agents are identical, we can drop the subscript ns in what follows.
The answer to the question of the previous paragraph will be “yes” if

T (L|L) − c xfi (L), L = 0 , (binding ir for low type)

T (H|H) − c xfi (H), H = 0 , (binding ir for high type)

T (L|L) − c xfi (L), L ≥ T (H|L) − c xfi (H), L , (ic for low type)

and

T (H|H) − c xfi (H), H ≥ T (L|H) − c xfi (L), H . (ic for high type)
10.3 Dependent Allocations 123

Observe a solution to this system is T (θ̂|θ) = c xfi (θ̂), θ . So the problem
devolves to whether the following has a solution
    
ρ 0 1−ρ 0 t(L, L) c xfi (L), L
 0 1 − ρ 0 ρ   t(H, L)   c xfi (H), H 
  = .
 0 ρ 0 1 − ρ   t(L, H)   c xfi (H), L 

1−ρ 0 ρ 0 t(H, H) c xfi (L), H

Because the “ρ” matrix is invertible, a solution must exist.4 To conclude:

Proposition 10.3 For the two-agent model considered here, provided there is
any correlation in the agents’ types (i.e., provided ρ > 1/2), there exists a
mechanism that achieves the full-information allocation at (in expectation) the
full-information cost to the principal.

Exercise 10.2.3: What happens to the mechanism as ρ ↑ 1? Interpret economically.

Exercise 10.2.4: What happens to the mechanism as ρ ↓ 1/2? Interpret economi-
cally.
Exercise 10.2.5: Consider the Proposition 10.3 mechanism. Suppose, prior to play-
ing the mechanism, the agents could converse prior to learning their types and suppose,
further, that they agreed to always announces θ = L regardless of their true types. Is
such an agreement self-enforcing? What does your answer suggest about the Propo-
sition 10.3 mechanism?

Proposition 10.3 suggests that the principal can do a lot better than the
one-agent model (e.g., the model of Chapter 8) would suggest if she is dealing
with agents with correlated types. At the same time, some caution is warranted:
the lack of robustness of Propositions 10.2 and 10.3 to collusion by the agents
makes these results suspect.

Dependent Allocations 10.3

We now consider dependent allocations. Specifically, in this section we con-
sider the provision of a public good. Another well-studied dependent-allocation
problem—allocation of an indivisible object—is considered in the next chapter.

4 The inverse of the matrix is

 ρ 1−ρ 
2ρ−1
0 0 − 2ρ−1
 0 1−ρ
− 2ρ−1 ρ
0 
 2ρ−1 
 1−ρ ρ .
 − 2ρ−1 0 0 2ρ−1

ρ 1−ρ
0 2ρ−1
− 2ρ−1 0
124 Lecture Note 10: Mechanism Design with Multiple Agents

Much of the theory in this section stems from analyzing the problem of a
benevolent social planner who seeks to choose the optimal amount of a public
good. For instance, the social planner could seek to determine how much land
should be set aside for a public park. In an organizational context, the prob-
lem might be a principal who seeks to determine a corporate-level action (e.g.,
investment in information technology, common advertising, etc.) that benefits
many agents within the organization.
To be concrete, suppose the principal wishes to choose a level of a public
good, x ∈ X ⊂ R+ . There are N agents, where each agent n has utility
Un (x, tn , θn ). Following most of the literature, we consider the case in which
each agent’s utility is additively separable between transfer and allocation:

Un (x, tn , θn ) = vn (x, θn ) + tn . (10.9)

Suppose the principal is benevolent insofar as her objective is to maximize

some social-welfare function:

B(x, t, θ) = W v1 (x, θ1 ), . . . , vN (x, θN ) . (10.10)

Observe that the principal is assumed to have no direct preferences concerning

transfers (i.e., t). Following most of the literature, assume that the welfare
function is a weighted “average” of the individual payoffs:
N
X
W (v1 , . . . , vN ) = λn v n ,
n=1

where λn > 0 for all n. Little insight is gained by maintaining the added gener-
ality of the λs varying across agents. For convenience, then, we will normalize
the λs to each be 1. The principal’s objective is to choose x to solve
N
X
max vn (x, θn ) . (10.11)
x∈X
n=1

Assume that a unique solution to (10.11) exists for all θ ∈ Θ and denote that
solution as x∗ (θ). Because the problem would otherwise be of no interest,
assume there exist θ and θ ′ ∈ Θ such that x∗ (θ) 6= x∗ (θ ′ ).

Solution via Dominant-Strategy Mechanisms

Recall the principal does not know the realization of θ. Hence, she must induce
the agents to announce their types truthfully (by the Revelation Principle, we
are free to limit attention to direct-revelation mechanisms). Consider a mecha-
nism of the following form:
x(θ) = x∗ (θ) (10.12)
and X
tn (θ) = τn + vj x(θ), θj , (10.13)
j6=n
10.3 Dependent Allocations 125

where τn is a constant that could depend on the identity of the agent, but not
his (or others’) type(s). Observe that by selecting τn large enough, the principal
can ensure agent n’s participation; hence, we are free to ignore participation (ir)
constraints in what follows.
Lemma 10.1 Given the mechanism defined by (10.12) and (10.13), it is a dom-
inant strategy for any given agent to announce his type truthfully.
Proof: Substituting for x using (10.12), each agent n faces the program:
N
X
max τn + vj x∗ (θ −n , θ̂n ), θj . (10.14)
θ̂n ∈Θn j=1

Let Xe = x∗ (θ −n , Θn ); that is, X

e are the set of xs that agent n can “choose”
given the announcements of the other other agents. Because x∗ (θ) is unique for
all θ ∈ Θ, it follows that
N
X N
X

vj x∗ (θ −n , θn ), θj ≥ max vj x, θj ;
e
x∈X
j=1 j=1

that is, agent n can do no better than to announce truthfully regardless of what
his fellow agents announce. Truthtelling is, thus, a dominant strategy as was to
be shown.

Intuitively, given the transfer function given by (10.13), each agent is induced
to face precisely the same optimization program as the principal (i.e., expression
(10.14) is the same program as (10.11)). Consequently, he will do as the principal
would do.
A mechanism in which the agents always have dominant strategies is known
as a dominant-strategy mechanism. An immediate consequence of Lemma 10.1
is, therefore:
Proposition 10.4 Assume the principal’s objective function is given by (10.11)
and it has a unique solution for each realization of agents’ types. Then there
exists a dominant-strategy direct-revelation mechanism that implements that so-
lution.
The mechanism defined by (10.12) and (10.13) is known as a Groves-Clarke-
Vickrey mechanism, which we will abbreviate gcv.
One question we might ask about the gcv mechanism is whether it is unique
within the space of dominant-strategy mechanisms that implement x∗ (·). The
answer is, for all practical intents, yes. A fact that we can readily establish by
imposing slightly more structure on the problem. To that end, assume:
Condition 10.1 The space of possible allocations, X , is R+ . The type space
for any agent is an interval in R.5 For any agent n, n = 1, . . . , N , and any
5 This assumption is, here, without loss of much generality insofar as we do not require full

support on these intervals; that is, some types in the interval could have zero probability of
occurring.
126 Lecture Note 10: Mechanism Design with Multiple Agents

type θn ∈ Θn , vn (·, θn ) is strictly concave, twice differentiable, and satisfies the

property that there exist x and x′ , x < x′ , such that

vn (x, θn ) > max vn (0, θn ), vn (x′ , θn ) .

The last property ensures that there is an interior solution to maxx∈R+ vn (x, θn )
for all n and θn ∈ Θn . It follows that the principal’s program, expression (10.11),
has an interior solution for all θ ∈ Θ. Because each vn (·, θn ) is strictly concave,
so too is the principal’s program (10.11). It further follows from Condition 10.1
that the program (10.11) has a unique solution for all θ ∈ Θ, x∗ (θ). By the
implicit function theorem (see, e.g., Körner, 2004, §13.2), x∗ (·) is differentiable
in each of its arguments.
The differentiability and concavity assumptions entail that x∗ (θ) is charac-
terized by
N
X ∂vn x∗ (θ), θn
= 0.
n=1
∂x
For future reference, observe that, in turn, entails

∂vn x∗ (θ), θn X ∂vj x∗ (θ), θj
=− . (10.15)
∂x ∂x
j6=n

Suppose there exist differentiable transfer functions t1 (·), . . . , tN (·) such that
truthtelling is a dominant strategy for each agent n. This would require that

θ ∈ argmax tn (θ −n , θ̂) + vn x∗ (θ −n , θ̂), θ (10.16)
θ̂∈Θn

for all n, all θ ∈ Θn , and all θ −n ∈ Θ−n . A necessary condition, given our
differentiability assumptions, is that

∂tn (θ −n , θ) ∂vn x∗ (θ −n , θ), θ ∂x∗ (θ −n , θ)
+ =0 (10.17)
∂θn ∂x ∂θn
for all n, all θ ∈ Θn , and all θ −n ∈ Θ−n . Rearranging and recognizing this
must hold for all θ ∈ Θ, we have the differential equation
∗
z}|{
∂tn (θ −n , θ) ∂vn x∗ (θ −n , θ), θ ∂x∗ (θ −n , θ)
=− . (10.18)
∂θn ∂x ∂θn
Were it not for the starred term in expression (10.18), this would be a trivial
differential equation to solve—one would simply undo the chain rule. But the
starred term complicates matters. Fortunately we can substitute out for that
partial derivative using expression (10.15). This yields the differential equation:
 
∂tn (θ) X ∂vj x∗ (θ), θj  ∂x∗ (θ)
= . (10.19)
∂θn ∂x ∂θn
j6=n
10.3 Dependent Allocations 127

We can reverse the chain rule on (10.19) to obtain the solution:

X
tn (θ) = vj x∗ (θ), θj + τn (θ −n ) , (10.20)
j6=n

where τn (θ −n ) is a constant of integration with respect to θn . Expression (10.20)

is the formulation for the transfer function to the nth agent in a gcv mechanism.
We have thus, shown:
Proposition 10.5 Given Condition 10.1, a necessary condition for a dominant-
strategy mechanism to implement the optimal allocation, x∗ (·), is that it be a
Groves-Clarke-Vickrey ( gcv) mechanism.

Balanced Mechanisms

The Groves-Clarke-Vickrey mechanism is a powerful solution to the principal’s

allocation problem, because it makes truth-telling a dominant strategy for each
agent; hence, agents do not have to be especially sophisticated. In particular,
the agents do not need to engage any higher-level reasoning about the strategies
of their fellows agents, nor do they need to even believe their fellow agents are
rational actors.
The downside of gcv mechanisms is that they tend not to be balanced:
as a rule it is not possible to design a gcv mechanism so that the sum of
transfers across agents is zero
PN for all realizations of types. In other words, it is
generically not true that n=1 tn (θ) = 0 for all θ ∈ Θ. When the principal
is a social planner (e.g., a government), this is can be a serious limitation: it
is unlikely that the planner
PN has funds beyond those available from the citizens
(the agents); hence, n=1 tn (θ) ≯ 0. At the same time, it may be impossible
for the government to commit to destroy excess funds (“burn money”); hence,
PN
n=1 tn (θ) ≮ 0.
Laffont and Maskin (1980) provides a general analysis showing gcv mech-
anisms are generically not balanced. Here, we will just consider an illustrative
example. Suppose that N = 2, vn (x, θn ) = θn x − x2 /2, and Θ1 = Θ2 = (0, θ̄).

Exercise 10.3.1: Verify that x∗ (θ) = (θ1 + θ2 )/2.

Given the exercise and using (10.20), we have that

x∗ (θ)2 (3θ−n − θn )(θn + θ−n )

tn (θ) = θ−n x∗ (θ) − + τn (θ−n ) = + τn (θ−n ) .
2 8
Summing and simplifying yields
(θ1 + θ2 )2
t1 (θ) + t2 (θ) = + τ1 (θ2 ) + τ2 (θ1 ) . (10.21)
4
If the mechanism were balanced, then (10.21) would equal zero for all θ; that is,
it would be an identity. If identically equal zero, then its derivative with respect
128 Lecture Note 10: Mechanism Design with Multiple Agents

to either argument must also equal zero; hence,

∂ expression (10.21) θ1 + θ2
0= = + τ2′ (θ1 ) .
∂θ1 2
That expression could be true only if τ2′ —and thus τ2 —is a function of θ2 , but
that contradicts that τ2 is a function of θ1 only. Reductio ad absurdum, it cannot
be that (10.21) is identically equal to zero. To summarize:
Proposition 10.6 For the case of two agents, n = 1 or 2, where agent n’s
utility is θn x − x2 /2 + t and θn ∈ (0, θ̄), no balanced Groves-Clarke-Vickrey
mechanism exists.

Bayesian Mechanisms

In light of gcv mechanisms shortcomings with respect to budget balance, it

is worth considering another class of mechanisms: Bayesian mechanisms. A
Bayesian mechanism is one in which truth-telling is a given agent’s best response
to the truth-telling strategies of his fellow agents. In other words, truth-telling
constitutes a Bayesian-Nash equilibrium of the direct-revelation game induced
by the mechanism.6 The benefit of Bayesian mechanisms, as will be shown,
is that they can always be designed to be budget-balanced. The downside of
Bayesian mechanisms is that they require more of the agents: agents must be
able to engage in higher-level strategic reasoning and they must believe that
their fellow agents are all rational.
Assume that types are independent; that is, the joint distribution over types
is multiplicatively separable as given by expression (10.3).7 Let E−n denote the
expectation operator with respect to the random vector θ −n . In other words,
E−n represents how the principal and agent n each form expectations with
respect to the types of the agents other than agent n.
Suppose we define
1 X
tn (θ) = pn (θn ) − pj (θj ) , (10.22)
N −1
j6=n

where the N functions pn : Θn → R are to be determined.

Lemma 10.2 If transfers are given by expression (10.22), then the sum of
transfers is identically zero; that is,
N
X
tn (θ) = 0
n=1

for all θ ∈ Θ.
6 The adjective “Bayesian” reflects that each agent knows only his type but is uncertain

about the types of the other agents.

7 The problem of Bayesian mechanism design with dependent types is much more difficult

and won’t be covered in this text.

10.3 Dependent Allocations 129

Proof: We have
N
X N
X N
1 XX
tn (θ) = pn (θn ) − pj (θj )
n=1 n=1
N − 1 n=1
j6=n

N
X N
1 X
= pn (θn ) − (N − 1)pn (θn ) = 0 .
n=1
N − 1 n=1

Observe that Lemma 10.2 does not depend on any properties of the pn (·)s.
Consider  
X 
pn (θn ) = E−n vj x∗ (θ), θj + hn ; (10.23)
 
j6=n

that is, up to an additive constant hn , pn (θn ) is the expected sum of all other
agents’ utilities under the optimal allocation x∗ (·), given that agent n is type
θn and all other agents are announcing their types truthfully. Observe the
similarity of the pn (·) functions here and the transfers in a gcv mechanism;
they are different, however, in that pn (θn ) is an expected value whereas the
transfers in a gcv mechanism is the actual (realized) value.
Proposition 10.7 Let x∗ : Θ → X denote the optimal allocation (i.e., the
solution to (10.11)). If transfers are given by (10.22), where the pn (·) functions
are given by (10.23), then a mechanism with those transfers and an allocation
rule x(θ) = x∗ (θ) has a Bayesian-Nash equilibrium in which the agents all
announce their types truthfully. In this equilibrium, the allocation rule x∗ (·) is
implemented and transfers are balanced.
Proof: The mechanism is always balanced by Lemma 10.2. Clearly, if truth-
telling is an equilibrium, x∗ (·) is implemented. Hence, the only point to verify is
that truth-telling is an equilibrium. Consider agent n. If he anticipates his fellow
agents will tell the truth, his expected utility, as a function of his announcement,
θ̂, is
n o
E−n vn x∗ (θ̂, θ −n ), θn + tn (θ̂, θ −n )

k(θ
−n )
 z P }| {
n X  p j (θ j ) o
j6 = n
= E−n vn x∗ (θ̂, θ −n ), θn +E−n vj x∗ (θ̂, θ −n ), θj − +hn
  N −1
j6=n
 
XN

= E−n vj x∗ (θ̂, θ −n ), θj +K, (10.24)
 
j=1

where the law of iterated expectations was used to derive the second equality
and where K is a constant equal to the expectation of the term k(θ −n ). Agent n
130 Lecture Note 10: Mechanism Design with Multiple Agents

chooses his announcement, θ̂, to maximize (10.24). The agent can do no better
than if he could choose x to maximize

N
X
vj x, θj
j=1

for each realization if θ. But he can do precisely this by announcing his type
truthfully given the other agents are being honest. Hence, truth-telling is his
best response to truth-telling, as was to be shown.

Not only are the pn (·) functions sufficient to achieve an efficient solution
(implementation of the first-best allocation), they are also effectively necessary
given Condition 10.1:

Proposition 10.8 Given Condition 10.1, a necessary condition for a balanced-

budget Bayesian mechanism with differentiable transfers to implement the opti-
mal allocation, x∗ (·), is that the pn (·) functions be given by (10.23).8

Proof: Agent n solves an optimization program equivalent to

n o
max E−n vn x∗ (θ̂, θ −n ), θn + pn (θ̂) .
θ̂∈Θn

We require that θ̂ = θn be a solution to that program for all θn ∈ Θn . Given

the differentiability of all functions, this means
( )
∂vn x∗ (θ, θ −n ), θ ∂x∗ (θ, θ −n )
E−n + p′n (θ) = 0
∂x ∂θn

for all θ ∈ Θn .9 Using the same insight that gave us expression (10.19) above,
we can rewrite that first-order condition as
 
X ∂v x∗ (θ, θ ), θ ∂x∗ (θ, θ ) 
j −n j −n
p′n (θ) = E−n .
 ∂x ∂θn 
j6=n

8 By imposing conditions on the v functions, it would be possible to show that the p (·)
n n
functions are absolutely continuous. This is, in turn, would allow us to dispense with assuming
the pn (·) functions are differentiable. See the proof of Lemma 9.2 for an indication of how
that line of reasoning would be pursued.
9 Recall we can pass the differentiation operator through the expectation operator.
10.3 Dependent Allocations 131

Recalling this must hold for all θ ∈ Θn , we can integrate:

 
Z X ∂v x∗ (θ, θ ), θ ∂x∗ (θ, θ ) 
j −n j −n
pn (θ) = E−n dθ
 ∂x ∂θn 
j6=n
 
X Z ∂v x∗ (θ, θ ), θ ∂x∗ (θ, θ ) 
j −n j −n
= E−n dθ
 ∂x ∂θn 
j6=n
 
X 
= E−n vj x∗ (θ, θ −n ), θj + hn .
 
j6=n

Because hn is a constant with respect to θ −n , it passes through the expectation

operator so this last expression becomes (10.23).

Bibliographic Note

Bayesian mechanisms are due to d’Aspremont and Gérard-Varet (1979).

132 Lecture Note 10: Mechanism Design with Multiple Agents
Auctions 11
One means of allocating resources is via auction. In this chapter, we consider
auction design, building on our earlier insights from mechanism design. Given
that auctions are a broad topic worthy of book-length treatment, the discussion
here will only scratch the surface.1 Nevertheless, it should provide a basic
understanding of auctions and serve as a basis for further study.
Auctions come in many shapes and sizes. The “classic” auction—the one
most frequently depicted in movies and on tv—is one in which a particular
and unique item, a work of art say, is sold. But one could also hold a series of
auctions to sell off multiple units of an item one at a time or a single auction
to sell multiple units. In some cases, the bidders’ valuations for the object
being auctioned are independent of each other (e.g., knowing the value you
place on a Picasso tells you nothing about another bidder’s value). In other
cases, valuations are correlated (e.g., what you know about the value of a piece
of commercial real estate provides clues as to what other bidders’ values might
be). Finally, auctions come in many forms. The classic auction is one form, but
others, such as sealed bid and descending price auctions also exist.
We will begin by considering the single auction of a single, indivisible object.
We will also assume, initially, that bidders’ valuations are independent. That
is, in the language of the previous chapter, we have independent types. In the
language of auction theory, we are assuming private values.

Efficient Allocation 11.1

There are N + 1 parties, a seller and N potential bidders. A single item is to
be allocated. As we are considering the private values case, there is no loss of
generality in normalizing the seller’s value for the item to zero. Each bidder has
a type, θn , drawn from the set Θn . Although θn is the bidder’s private infor-
mation, the set Θn is common knowledge. Similarly, the distribution function
over bidder n’s type, Fn : Θn → [0, 1], is common knowledge.
We assume that each bidder’s utility function is
un (χ, p, θn ) = χθn − p ,
where p is a payment made by the bidder and χ = 1 if the bidder obtains the
item and it equals 0 otherwise. A bidder’s type is, thus, his value for the item
1 Some recent books on auctions include Krishna (2002), Klemperer (2004), and Milgrom

(2004).

133
134 Lecture Note 11: Auctions

in question. As such Θn ⊆ R+ ; we will, in fact, take it to be an interval [0, θ̄n ].

A bidder’s utility if he declines to participate in the auction is zero.
The analysis is greatly facilitated if we assume Fn (·) is differentiable every-
where on [0, θ̄n ]. Let fn (·) denote the derivative, the probability density function
over types.
An allocation is an N -dimensional vector x = (x1 , . . . , xN ), where xn is the
PNbidder n is awarded the good. As they are probabilities, xn ≥ 0
probability that
for all n and n=1 xn ≤ 1. Note we are allowing for the possibility of leaving
the good in the seller’s hands.
An allocation is welfare maximizing if it awards the good to the bidder with
the highest valuation. An allocation rule is a mapping
( N
)
X
x : Θ → (x1 , . . . , xN ) xn ≥ 0 ∀n and xn ≤ 1 ≡ ΛN .
n=1

Observe an allocation rule maximizes welfare if

xn (Θ) = 1 if and only if θn = max θ .

Given the differentiability of the distribution functions, two bidders have the
same valuation with probability zero; hence, ties can be ignored.
In this section, we will see whether a welfare-maximizing allocation can be
part of a direct-revelation mechanism. A mechanism in this context is hx, pi :
Θ → ΛN × RN ; that is, an allocation rule and a payment schedule.
We impose two restrictions. First, every bidder must wish to participate;
that is, regardless of his type, his equilibrium utility must be non-negative. This
is our usual individual rationality constraint and, as such, warrants no special
attention. Second, we require p(θ) ∈ RN + ; that is, the seller never pays a bidder.
This constraint can be thought of as a “realism” constraint: If it were possible
to get paid just to participate in an auction, the auction would be flooded with
bidders and the seller would go bankrupt.
We begin our hunt for a welfare-maximizing mechanism in a somewhat odd
way. Namely, we will limit attention to dominant-strategy mechanisms. Dom-
inant-strategy mechanisms are mechanisms in which an agent finds a truthful
announcement his best response no matter what he believes the other agents’
types are.2 This is an odd start because normally we cannot gain by constraining
the choice set (i.e., the set of mechanisms). On the other hand, if this constraint
fails to bind, in the sense that we find a welfare-maximizing mechanism within
the set of dominant-strategy mechanisms, then imposing the constraint is not
costly. The advantage to imposing this constraint is that it allows us to operate
without taking expected values.

Exercise 11.1.1: Consider an arbitrary mechanism hM1 , . . . , MN , σi. This mecha-

2 To be technical, we are limiting attention to dominant-strategy direct-revelation mecha-

nisms. However, by the Revelation Principle, we know there is no loss in limiting attention
to direct-revelation mechanisms—see Exercise 11.1.1 infra.
11.1 Efficient Allocation 135

nism has a solution in dominant strategies if, for all n and all θn ∈ Θn , there exists a
strategy mn (θn ) ∈ Mn such that
Un σ(mn (θn ), m−n )|θn ) ≥ Un σ(m, m−n )|θn )
for all m ∈ Mn and all m−n ∈ M1 × · · · × Mn−1 × Mn+1 × · · · × MN . Prove
that if a mechanism has a solution in dominant strategies, then there exists a direct
revelation mechanism also solvable in dominant strategies that generates the same
distribution over outcomes in equilibrium as the original mechanism.

Given that we are limiting attention to dominant-strategy mechanisms, we

are free to treat the situation as one in which there is symmetric information
among the bidders—if truth-telling is a best response no matter what beliefs a
bidder has, then it remains a best response if he holds correct beliefs. If, then,
truth telling is a dominant strategy, then revealed preference implies:
xn (θn , θ −n )θn − pn (θn , θ −n ) ≥ xn (θn′ , θ −n )θn − pn (θn′ , θ −n )

and

xn (θn′ , θ −n )θn′ − pn (θn′ , θ −n ) ≥ xn (θn , θ −n )θn′ − pn (θn , θ −n )

for all θn , θn′ ∈ Θn and all θ −n ∈ Θ−n . We can combine these two inequalities
to obtain

xn (θn , θ −n ) − xn (θn′ , θ −n ) θn ≥ pn (θn , θ −n ) − pn (θn′ , θ −n )

≥ xn (θn , θ −n ) − xn (θn′ , θ −n ) θn′ . (11.1)
We are seeking to see whether we can implement a welfare-maximizing allo-
cation rule. Without loss of generality, suppose θn > θn′ . If the rule is welfare-
maximizing, then xn (θn′ , θ −n ) = 1 implies xn (θn , θ −n ) = 1. Expression (11.1)
then implies pn (θn , θ −n ) = pn (θn′ , θ −n ). Similarly, xn (θn , θ −n ) = 0 implies
xn (θn′ , θ −n ) = 0. Expression (11.1) again implies pn (θn , θ −n ) = pn (θn′ , θ −n ).
Because θn , θn′ , and θ −n were arbitrary, we have just proved the following:
Lemma 11.1 If there is a dominant-strategy mechanism that supports the wel-
fare-maximizing allocation rule, then the payment bidder n makes if he obtains
the good can be a function of the other bidders’ types only and, similarly, the
payment he makes if he doesn’t obtain the good can be a function of the other
bidders’ types only. That is, if he obtains (wins) the good, he pays pw (θ −n );
and if he doesn’t (loses); he pays pℓ (θ −n ).
Keeping in mind that we can treat the bidders as if they are playing un-
der symmetric information, bidder n will expect not to obtain the good un-
der a welfare-maximizing allocation rule if θn < max θ −n . His utility will be
−pℓ (θ −n ). He won’t participate if his utility will be negative. Hence, it must
be that pℓ (θ −n ) ≤ 0 for all θ −n ∈ Θ−n . Given our requirement that the seller
never pays a bidder, we have pℓ (θ −n ) ≥ 0 for all θ −n ∈ Θ−n . These two
requirements can be met only if pℓ (θ −n ) = 0 for all θ −n ∈ Θ−n . To summarize:
136 Lecture Note 11: Auctions

Lemma 11.2 If there is a dominant-strategy mechanism that supports the wel-

fare-maximizing allocation rule, then the payment any bidder makes if he is not
allocated the good is zero. That is, pℓ (·) ≡ 0.
Welfare maximization and incentive compatibility dictate that

θn − pw (θ −n ) ≥ 0 (11.2)

for all θn ≥ max θ −n and

θn − pw (θ −n ) ≤ 0 (11.3)

for all θn ≤ max θ −n (where the rhs of (11.3) is zero by Lemma 11.2). Condi-
tions (11.2) and (11.3) can be satisfied only if pw (θ −n ) = max θ −n ; that is, only
if the payment made if awarded the good equals the second highest valuation
for the good. To summarize, we have proved:
Lemma 11.3 If a dominant-strategy mechanism exists that implements the
welfare-maximizing allocation rule, then that mechanism requires a payment
only from the bidder awarded the good (i.e., the bidder with the highest valu-
ation) and the payment made by that bidder equals the next highest valuation.
That is, if the allocation rule xn (θ) = 1 if and only if θn = θ [1] is implementable
via a dominant-strategy mechanism, then pn (θ) = θ [2] if θn = θ [1] and equals 0
otherwise.
(The notation θ [m] denotes the mth largest element of θ.)
The last question is to verify that the mechanism defined by

0 , if θn < θ [1] 0 , if θn < θ [1]
xn (θ) = and pn (θ) = (11.4)
1 , if θn = θ [1] θ [2] , if θn = θ [1]
is incentive compatible and individually rational. Individual rationality follows
immediately because pℓ ≡ 0 and θn ≥ max θ −n if θn = θ [1] . Incentive com-
patibility follows because if bidder n lied and claimed to be θn′ < θn , then his
relative payoffs are:
Condition Payoff Lies Payoff Tells Truth
max θ −n > θn : 0 = 0
θn ≥ max θ −n > θn′ : 0 < θn − max θ −n
θn′ ≥ max θ −n : θn − max θ −n = θn − max θ −n
In other words, he is never better off lying and is possibly strictly worse off
lying.

Exercise 11.1.2: Verify that bidder n would never wish to lie by claiming to be
θn′′ > θn .

Putting it altogether, we have:

11.2 Allocation via Bayesian-Nash Mechanisms 137

Proposition 11.1 Within the class of dominant-strategy mechanisms, the only

mechanism that implements the welfare-maximizing allocation rule is the mech-
anism that requires a payment only from the bidder awarded the good (i.e., the
bidder with the highest valuation) and the payment made by that bidder equals
the next highest valuation. That is, the only mechanism that implements the
welfare-maximizing allocation rule is given by expression (11.4).
The Proposition 11.1 mechanism is sometimes called a Vickrey mechanism (or
auction) in the literature in light of Vickrey (1961).
Observe that a Vickrey mechanism is the same as sealed-bid second-price
auction. In such an auction, each bidder submits a single bid in an envelope.
Once everyone has bid, the envelopes are opened. The good being auction is
awarded to the highest bidder, but he pays the price of the second highest bidder.
The remaining bidders pay nothing. Our analysis shows that an equilibrium of
such an auction is for each bidder to bid his true valuation for the good; indeed,
such bids are dominant strategies.
The Vickrey mechanism is also essentially the same as the classic open-cry
ascending-bid auction (also called an English auction). To win, a bidder must
just beat the last bid. If everyone keeps bidding and only drops out when
the price has risen above his valuation, then the penultimate bid more or less
equals the second highest valuation and the bidder with the highest valuation
essentially pays that price (the final bid being only slightly greater). As the bid
increment tends to zero, the English auction becomes equivalent to a sealed bid
second-price auction.

Exercise 11.1.3: Consider the following variation on an English auction. A display

shows the price, which rises slowly. To stay in, each bidder keeps his finger on a button.
If he lifts his finger from the button, he is out of the bidding. The last person with his
finger on his button is awarded the good and he pays the price shown on the display
at the moment the next-to-last person lifted his finger off his button. Each bidder has
a dominant strategy. What is it? What is the equilibrium?

Allocation via Bayesian-Nash

Mechanisms 11.2
The restriction to dominant-strategy mechanisms is fine as long as we achieve
the optimum. Once we fail to achieve the optimum or when we are uncertain
what the optimum is (e.g., what is the maximum expected profit the seller
can make), we are obliged to return to a more general class of mechanisms.
In this section, we return to mechanisms in which the equilibrium concept is
Bayesian-Nash equilibrium.
Maintain all the assumptions of the previous section. To these, add the
assumption that fn (θn ) > 0 for all n and all θn ∈ Θn . This assumption is
sometimes expressed as each bidder’s type space has full support.
Because we have assumed independent values, we have
Pr{θ ≤ θ̂} = F1 (θ̂1 ) × · · · × FN (θ̂N )
138 Lecture Note 11: Auctions

(where the inequality in the expression on the lhs is in the vector sense). We
can therefore define the joint probability density function over Θ as

f (θ) = f1 (θ1 ) × · · · × fN (θN ) .

Likewise the joint probability density function over Θ−n is

f−n (θ −n ) = f1 (θ1 ) × · · · × fn−1 (θn−1 ) × fn+1 (θn+1 ) × · · · × fN (θN ) .

Given that bidder n announces his type as θ̂n , then his probability of ob-
taining the good—given truthful revelation by the other N − 1 bidders—is
Z
ξn (θ̂n ) ≡ xn (θ̂n , θ −n )f−n (θ −n )dθ −n .
Θ−n

Likewise, bidder n’s expected payment if he announces his type as θ̂n —given
truthful revelation by the other N − 1 bidders—is
Z
ρn (θ̂n ) ≡ pn (θ̂n , θ −n )f−n (θ −n )dθ −n .
Θ−n

In equilibrium, bidder n will announce his type truthfully, which means his
probability of getting the good is ξn (θn ) and his expected payment is ρn (θn ).
Consequently, his equilibrium expected utility is

vn (θn ) ≡ θn ξn (θn ) − ρn (θn ) . (11.5)

His expected utility if he lies is

θn ξn (θ̂n ) − ρn (θ̂n ) = vn (θ̂n ) + ξn (θ̂n )(θn − θ̂n ) ,

where the rhs follows from (11.5). It follows that the truth-telling or incentive-
compatibility constraint for a type-θn bidder can be written as

vn (θn ) ≥ vn (θ̂n ) + ξn (θ̂n )(θn − θ̂n ) (11.6)

for all θ̂n ∈ Θn .

Expression (11.6) resembles, in form, a first-order Taylor’s expansion. This
insight would be tremendously useful if we knew that vn (·) is a convex function.
Fortunately, as it turns out, vn (·) is a convex function:
Lemma 11.4 If the mechanism is incentive compatible, then a bidder’s equi-
librium utility is convex in his type (i.e., vn (·) as defined in (11.5) is a convex
function).
Proof: For convenience, let’s drop the n subscripts. We seek to show that if θ
and θ′ ∈ Θ, then

λv(θ) + (1 − λ)v(θ′ ) ≥ v λθ + (1 − λ)θ′ (11.7)
11.2 Allocation via Bayesian-Nash Mechanisms 139

for any λ ∈ (0, 1). Define

θλ ≡ λθ + (1 − λ)θ′ .

Incentive compatibility implies that (11.6) holds. Hence,

λv(θ) ≥ λv(θλ ) + λξ(θλ )(θ − θλ )

and

(1 − λ)v(θ′ ) ≥ (1 − λ)v(θλ ) + (1 − λ)ξ(θλ )(θ′ − θλ ) .

Adding these two expressions yields

λv(θ) + (1 − λ)v(θ′ ) ≥ v(θλ ) .

As the rhs of this last expression is the same as the rhs of (11.7), we have
established (11.7) as desired.

A property of convex functions is that they are absolutely continuous, which

in turns means they are differentiable almost everywhere (see, e.g., van Tiel,
1984, p. 5). It follows from Lemma 11.4:
Lemma 11.5 If the mechanism is incentive compatible, then a bidder’s equi-
librium utility is differentiable almost everywhere and, where it is differentiable,
its derivative equals his equilibrium probability of receiving the good under the
mechanism (i.e., vn′ (θn ) = ξn (θn ) almost everywhere).
Proof: That vn′ (·) exists almost everywhere was established in the text. From
(11.6), we have

vn (θn + ε) ≥ vn (θn ) + ξn (θn )ε and

vn (θn − ε) ≥ vn (θn ) − ξn (θn )ε .

Combining these expressions, we have

vn (θn + ε) − vn (θn ) vn (θn ) − vn (θn − ε)
≥ ξn (θ) ≥ . (11.8)
ε ε
For θn at which vn (·) is differentiable, the limit of the leftmost expression and
the rightmost expression must be the same as ε → 0. Consequently, this com-
mon limit is ξn (θn ).

The first derivative of a convex function is a non-decreasing function. Hence,

an immediately corollary of the last lemma is
Corollary 11.1 If the mechanism is incentive compatible, then a bidder’s equi-
librium probability of receiving the good is non-decreasing almost everywhere
(i.e., ξn (·) is non-decreasing almost everywhere).
140 Lecture Note 11: Auctions

We can, in fact, use the convexity of vn (·) to show that ξn (·) is non-decreasing
everywhere:

Lemma 11.6 If the mechanism is incentive compatible, then a bidder’s equi-

librium probability of receiving the good is non-decreasing everywhere (i.e., ξn (·)
is non-decreasing everywhere).

Proof: If θn and θn′ are points at which vn (·) is differentiable, the result follows
from the corollary. Therefore, let θn be a point at which vn (·) is not differen-
tiable. Although not differentiable at θn , the right and left derivatives of vn (θn )
exist because vn (·) is convex (the left and right derivatives of a convex function
exist everywhere—see, e.g., van Tiel, 1984, p. 4).3 Where vn (·) is differentiable,
its right and left derivatives are equal and equal the derivative. The left and
right derivatives of a convex function are non-decreasing (van Tiel, 1984, p. 4).
From (11.8), ξn (θn ) lies between the left and right derivatives of vn (θn ). Be-
cause vn (·) is differentiable almost everywhere (i.e., has equal left and right
derivatives almost everywhere), it follows that ξn (θn ) ≤ ξn (θn′′ ) for all θn′′ > θn
and ξn (θn ) ≥ ξn (θn′ ) for all θn′ < θn .

Every absolutely continuous function is the indefinite integral of its derivative

(see, e.g., Yeh, 2006, Theorem 13.17, p. 283). Hence, we can write
Z θn
vn (θn ) = vn (0) + ξn (z)dz , (11.9)
0

where vn (0) is arbitrary.

To summarize the analysis to this point, we have established:

Proposition 11.2 If the mechanism is incentive compatible, then (i) a bidder’s

probability of receiving the good must be non-decreasing in his announced type
and (ii) expression (11.9) must hold.

Similar to the situation in Chapter 9 (recall, Proposition 9.1 on page 104),

the conditions of Proposition 11.2 are not only necessary, but they are also
sufficient for an incentive-compatible mechanism:

Proposition 11.3 A mechanism in which (i) a bidder’s probability of receiving

the good is non-decreasing in his announced type and (ii) expression (11.9) holds
is incentive compatible; that is, there is a truth-telling equilibrium.

3 The ′ , is defined as
left derivative of a function g, denoted g−

′ g(z) − g(z − ε)
g− (z) = lim .
ε↓0 ε
′ , is defined as
The right derivative, denoted g+

′ g(z + ε) − g(z)
g+ (z) = lim .
ε↓0 ε
11.2 Allocation via Bayesian-Nash Mechanisms 141

Proof: Suppose all bidders other than n will truthfully announce their types.
We need to show that truth telling is a best response for n. Consider any two
types θn and θn′ ∈ Θn . Suppose bidder n’s type is θn . We need to show he
doesn’t wish to announce θn′ . Suppose, first, θn′ < θn . Given (11.9), we have
Z θn
vn (θn ) − vn (θn′ ) = ξn (z)dz .
′
θn

By the intermediate-value theorem, there must exist a θ̃ ∈ (θn′ , θn ) such that

the integral on the rhs is equal to ξn (θ̃)(θn − θn′ ) and, thus, such that

vn (θn ) − vn (θn′ ) = ξn (θ̃)(θn − θn′ ) .

Because ξn (·) is non-decreasing, it follows that

vn (θn ) − vn (θn′ ) ≥ ξn (θn′ )(θn − θn′ ) . (11.10)

Rearranging (11.10) yields (11.6) (with θn′ instead of θ̂n ). But (11.6) implies
bidder n would prefer to announce the truth, θn , than to lie by claiming to be
a lower type (e.g., θn′ ).

Exercise 11.2.1: Complete the proof by considering the case in which θn′ > θn ; that
is, show the bidder doesn’t want to lie by claiming to be a higher type than he is.

We are now in position to prove one of the more important results in auction
theory, the Revenue Equivalence Theorem:

Proposition 11.4 (Revenue Equivalence Theorem) The expected pay-

ments in any two incentive-compatible mechanisms with the same allocation
rule are equal up to a constant.

Proof: In light of Proposition 11.3, we know we can restrict attention to

mechanisms satisfying (11.9). Expanding that expression, we have
Z θn
θn ξn (θn ) − ρn (θn ) = 0 × ξn (0) − ρn (0) + ξn (z)dz .
| {z } | {z } 0
=vn (θn ) =vn (0)

Rearranging, we have
Z θn
ρn (θn ) = ρn (0) + θn ξn (θn ) − ξn (z)dz . (11.11)
0

Because (11.11) holds for any mechanism that has an allocation rule leading
to ξn (·), it follows that the difference in expected payments between any two
142 Lecture Note 11: Auctions

mechanism is ρ1n (0) − ρ2n (0), a constant (where the superscripts index the mech-
anisms).

Observe one consequence of the revenue equivalence theorem is the following.

If the allocation rule is the welfare-maximizing rule, then any mechanism or
auction that implements that rule yields the seller the same expected revenue up
to an additive constant. Moreover, we often pin that constant down by requiring
individual rationality on the part of the bidders (i.e., no one who expects to
lose with probability one—a type 0 bidder—expects to pay a positive amount)
and by a “realism” constraint that the principal won’t pay sure-to-lose bidders
to participate (so type 0 bidders must expect to pay a non-positive amount), in
which case we have ρn (0) ≡ 0. We therefore have the following corollary of the
Revenue Equivalence Theorem:
Corollary 11.2 Any mechanism or auction that implements the welfare-maxi-
mizing allocation rule (e.g., a sealed-bid second-price auction) that satisfies the
bidders’ individual rationality constraints and does not have the principal sub-
sidizing sure-to-lose bidders yields the seller exactly the same expected revenue
as any other mechanism or auction that does the same and meets the same
requirements.
In other words, the corollary tells us that if the seller wishes to implement the
welfare-maximizing allocation, then the seller can never do better than employ
a sealed-bid, second-price auction.

The Profit-Maximizing Mechanism

As a general rule, there is no reason to expect that a seller would wish to

implement the welfare-maximizing allocation. The seller’s goal is presumably
to maximize her expected profit. Let us now tackle the question of how she
might accomplish this goal.
Because vn′ (θn ) = ξn (θn ) ≥ 0 (ξn (θn ) is, recall, a probability) and non-
participation by a bidder yields him the same payoff of zero regardless of his
type, it follows that if a type-0 bidder wishes to participate in the mechanism,
then so too must all types higher than he. Consequently, given that his expected
payoff is −ρn (0) if his type is zero (this latter observation is a consequence of
(11.9)), it follows that the individual-rationality constraint for bidder n is

ρn (0) ≤ 0 . (ir)

In other words, if (ir) holds for all n, then all bidders will participate regardless
of their type.
The seller is limited to implementing allocation rules that yield non-decreasing
ξn (·) functions and that satisfy (11.9). As seen in the proof of the Revenue
Equivalence Theorem, the latter condition is equivalent to requiring
Z θn
ρn (θn ) = ρn (0) + θn ξn (θn ) − ξn (z)dz . (11.12)
0
11.2 Allocation via Bayesian-Nash Mechanisms 143

Given that the seller seeks to maximize her expected profit, it follows from
(11.12) that she would like ρn (0) to be as large as possible. Given (ir), it
follow, then, she sets ρn (0) = 0. To summarize:
Lemma 11.7 In the profit-maximizing mechanism, the expected payment from
a zero-type bidder is zero; that is, ρn (0) = 0 for all n.
Recall that the seller’s value for the good is zero, so her expected profit is
N Z θ̄n !
X
ρn (θ)fn (θ)dθ .
n=1 0

Using (11.12) and Lemma 11.7, we can rewrite her expected profit as
Z θ̄n Z θ !
N
X
θξn (θ) − ξn (z)dz fn (θ)dθ .
n=1 0 0

Observe, via integration by parts, that

Z θ̄n Z θ Z θ̄n

ξn (z)dz − fn (θ) dθ = − 1 − Fn (θ) ξn (θ)dθ .
0 0 0

Hence, we can further rewrite the seller’s expected profit as

Z θ̄n !
1 − Fn (θ)
XN
θ− ξn (θ)fn (θ)dθ . (11.13)
n=1 0 fn (θ)

As in our earlier exploration of mechanism design (see, e.g., page 111), a

desirable property for the Mills ratio (the multiplicative inverse of the hazard
rate) is that it be non-increasing. This it will be if the distribution Fn satisfies
the monotone hazard rate property (mhrp):
Condition 11.1 (Monotone Hazard Rate Property) For each bidder, n,
his distribution of types, Fn , exhibits a non-decreasing hazard rate; that is,
fn (θ)
(11.14)
1 − Fn (θ)
is non-decreasing in θ.
Observe Condition 11.1 is just a restatement of Condition 9.3.
Define
1 − Fn (θ)
Ωn (θ) = θ − .
fn (θ)
The function Ωn (·) is known as the virtual valuation function. As we will
see, it plays a similar role to the virtual surplus function considered in our
earlier exploration of mechanism design. If mhrp holds, then Ωn (·) is a strictly
increasing function. As such, it is invertible.
144 Lecture Note 11: Auctions

Substituting the virtual valuation function into the seller’s expected profit,
expression (11.13), we can carry out the following series of manipulations:
N Z θ̄n !
X
Ωn (θn )ξn (θn )fn (θn )dθn
n=1 0
Z !
N
X θ̄n Z
= Ωn (θn ) xn (θn , θ −n )f−n (θ −n )dθ −n fn (θn )dθn
n=1 0 Θ−n
N Z
X
= Ωn (θn )xn (θ)f (θ)dθ
n=1 Θ
Z N
!
X
= Ωn (θn )xn (θ) f (θ)dθ (11.15)
Θ n=1

The seller seeks to maximize (11.15) with respect to x(·) subject to the
constraint that resulting ξn (·) functions be non-decreasing (i.e., that the mech-
anism be incentive compatible). Ignoring, for the moment, that constraint,
observe that the way to maximize the expression within the large parentheses
in (11.15) is put all weight on the bidder with the largest virtual valuation
provided this largest virtual valuation is non-negative. If no bidder has non-
negative virtual valuation, then the seller keeps the good (i.e., xn (θ) = 0 for all
n). Mathematically, the expected-profit-maximizing allocation rule is

0 , if Ωn (θn ) < max 0, maxj Ωj (θj )
xn (θ) = . (11.16)
1 , if Ωn (θn ) = max 0, maxj Ωj (θj )
Given this maximizes profit pointwise, it must maximize expected profit. The
only question is whether this allocation rule is incentive compatible.
To check if the allocation rule given by (11.16) is incentive compatible, con-
sider θn and θn′ , θn > θn′ . Because Ωn (·) is increasing, we have:
xn (θn′ , θ −n ) = 1 =⇒ xn (θn , θ −n ) = 1 and
xn (θn , θ −n ) = 0 =⇒ xn (θn′ , θ −n ) = 0 ;
hence, xn (θn , θ −n ) ≥ xn (θn′ , θ −n ) and, thus,
Z
ξn (θn ) = xn (θn , θ −n )f−n (θ −n )dθ −n
Θ−n
Z
≥ xn (θn′ , θ −n )f−n (θ −n )dθ −n = ξn (θn′ ) .
Θ−n

So we see that this allocation rule yields non-decreasing ξn (·) functions and is,
therefore, incentive compatible. To conclude, we’ve shown:
Lemma 11.8 The expected-profit-maximizing allocation rule awards the good
to the bidder with the greatest non-negative virtual valuation and has the seller
keep it if no bidder has a non-negative virtual valuation; that is, it is the rule
given by expression (11.16).
11.2 Allocation via Bayesian-Nash Mechanisms 145

Note that the good is awarded on the basis of virtual valuation and not true
valuation.

Exercise 11.2.2: Suppose the valuation of each bidder is an independent draw from
the uniform distribution on [0, 100]. Prove that the expected-profit-maximizing allo-
cation rule awards the good to the bidder with the highest valuation, but only if that
valuation is not less than 50.
Exercise 11.2.3: Suppose there are two bidders. Bidder one has his valuation drawn
from the uniform distribution on [0, 20]; bidder two from the uniform distribution on
[0, 30]. Verify that the expected-profit-maximizing allocation rule is

0 , if θ1 < 10 or θ1 < θ2 − 5 0 , if θ2 < 15 or θ2 ≤ θ1 + 5
x1 (θ) = and x2 (θ) = .
1 , if θ1 ≥ 10 and θ1 ≥ θ2 − 5 1 , if θ2 ≥ 15 and θ2 > θ1 + 5
If θ1 = 18 and θ2 = 20, is the allocation welfare maximizing?

Why is the expected-profit-maximizing allocation rule a function of virtual

valuation and not valuation? The answer is the usual one in mechanism design:
The mechanism designer—here the seller—faces a tradeoff between the efficiency
of allocation (the size of the pie) and minimizing the information rents enjoyed
by high types (her share of the pie). To increase her share, the mechanism
designer is willing, on some margins, to reduce the size of the pie.
What about the payment schedules? From (11.12) we have
Z θ̄n
ρn (θn ) = ξn (θn )θn − ξn (z)dz ;
0

or, expanding ρn (·) and ξn (·),

Z
pn (θn , θ −n )f−n (θ −n )dθ −n
Θ−n
Z Z θn Z
= θn xn (θn , θ −n )f−n (θ −n )dθ −n − xn (z, θ −n )f−n (θ −n )dθ −n dz
Θ−n 0 Θ−n
Z Z !
θn
= θn xn (θn , θ −n ) − xn (z, θ −n )dz f−n (θ −n )dθ −n .
Θ−n 0

It follows that the payment schedule

Z θn
pn (θn , θ −n ) = θn xn (θn , θ −n ) − xn (z, θ −n )dz (11.17)
0

is incentive compatible and achieves the maximum possible revenue given the
allocation rule x(·).
Define

Θwn (θ −n ) = θ ∈ Θn Ωn (θ) ≥ 0 and Ωn (θ) ≥ max Ωj (θj ) ;
j6=n
146 Lecture Note 11: Auctions

that is, Θw
n (θ −n ) is the set of types of bidder n who will be awarded (will win)
the good under the allocation rule given by (11.16) if the types of the other
bidders are θ −n . Because all inequalities are weak inequalities, Θw n (θ −n ) has a
minimum element:

min Θw w
n (θ −n ) , if Θn (θ −n ) 6= ∅
θwn (θ −n ) ≡ w .
∞ , if Θn (θ −n ) = ∅

Because Ωn (·) is increasing, θw n (θ −n ) is the lowest-type bidder n who obtains

the object when the types of the other bidders are θ −n .
Expression (11.17) reveals that pn (θn , θ −n ) = 0 if θn < θw n (θ −n ). From
(11.17),
Z θn
pn (θn , θ −n ) = θn × 1 − 1dz = θwn (θ −n ) (11.18)
θw
n (θ −n )

if θn ≥ θw
n (θ −n ). This establishes:

Lemma 11.9 Under the expected-profit-maximizing allocation rule, a bidder

not awarded the good pays zero and a bidder,
n, awarded the good pays the
larger of Ω−1 −1
n (0) and Ωn maxj6=n Ωj (θj ) .

The quantity Ω−1

n (0) is the reserve price specific to bidder n.
To summarize our analysis:

Proposition 11.5 Assume for all bidders that their distributions over type sat-
isfy mhrp. Then the expected-profit-maximizing mechanism awards the object
to the bidder with the greatest virtual valuation, assuming that virtual valuation
is non-negative, and otherwise does not award the object to any bidder. If a
bidder is awarded the object, then he pays the seller the larger of the reserve
price specific to him or the smallest true valuation he could have and still have
the greatest virtual valuation. That is, if bidder n is awarded the object, he pays
the larger of Ω−1
n (0) and Ω −1
n max j6=n j j .
Ω (θ )

Exercise 11.2.4: Suppose the valuation of each bidder is an independent draw from
the uniform distribution on [0, 100]. Calculate pn (·).
Exercise 11.2.5: Suppose there are two bidders. Bidder one has his valuation drawn
from the uniform distribution on [0, 20]; bidder two from the uniform distribution on
[0, 30]. Calculate p1 (θ1 , θ2 ) and p2 (θ1 , θ2 ).
Exercise 11.2.6: Prove that the reserve price specific to a given bidder n, Ω−1
n (0),
satisfies 0 < Ω−1
n (0) < θ̄n .

The purpose of the reserve price is to effectively exclude low types of a given
bidder from the auction. The seller wishes to do this because this permits her
to reduce the information rents of the high types. Because Ω−1 n (0) > 0, the use
of reserve prices means that the auction cannot be efficient; there are types of
11.2 Allocation via Bayesian-Nash Mechanisms 147

bidders that welfare maximization dictates should get the good as opposed to
leaving it in the seller’s hands, but who don’t get the good. This welfare loss
is similar to the deadweight loss created when a monopolist engages in linear
pricing.
To appreciate this last point, observe that the Proposition 11.5 mechanism
is still the solution even if there is only one bidder. In the case of one bidder,
we see the bidder gets the good if and only if he bids above the reserve price.
If he gets the good, he pays the reserve price. By definition, the reserve price,
r, is the solution to
1 − F (r)
r− = 0. (11.19)
f (r)
Suppose, instead, we treat this a linear-pricing problem:
The seller sets a price r
to maximize her expected profit, which is r 1−F (r) .4 The first-order condition
is
1 − F (r) − rf (r) = 0 .

Exercise 11.2.7: Verify that if F (·) satisfies mhrp, then this first-order condition is
sufficient as well as necessary.

It is readily seen that we can rearrange this first-order condition to get (11.19).
In other words, the mechanism with one bidder is equivalent to the profit-
maximizing linear pricing strategy.

Symmetric Bidders

In many situations it is unreasonable to expect the seller to be in a position

to discriminate among the different bidders as called for by Proposition 11.5.
Indeed, it is often unreasonable to expect that she has the information necessary
to discriminate among the bidders. Rather, from her perspective, the bidders
are all the same. Each bidder’s valuation (type) is a draw from a common
interval, [0, θ̄], according to a common distribution F : [0, θ̄] → [0, 1] (the draws
are still independent draws).
In such a situation of symmetric bidders, the virtual valuation functions are
the same for all bidders. Hence, Ωn (θn ) ≥ Ωm (θm ) if and only if θn > θm .
Consequently, if a bidder gets the good, it must be the bidder with the highest
true valuation. Hence, we see
−1
θw
n (θ −n ) = max Ω (0), max θj
j6=n

4 The survival function is like the demand curve for this problem. Recall, in fact, that at a

basic level there is an essential equivalence between demand curves and survival functions—see
discussion on page 12.
148 Lecture Note 11: Auctions

(where Ω(·) is the common virtual valuation function). It follows that if a bidder
is awarded the good, he pays the larger of the common reservation price, Ω−1 (0),
and the value of the bidder with the next highest valuation. To summarize:
Proposition 11.6 Assume the bidders are symmetric and that their common
distribution function over types satisfies mhrp. Then the expected-profit-maxi-
mizing mechanism is a sealed-bid second-price auction with a common reserve
price, Ω−1 (0).
It is worth noting that many online auctions, such as those on eBay, are essen-
tially sealed-bid second-price auctions with a reserve price.

Exercise 11.2.8: Consider two symmetric-bidder auctions. In the first, the common
distribution over types is F , in the second it is G. Suppose that F hrd≥
G (i.e., F
dominates G in the sense of hazard rate dominance, see Section 11.4. If rF is the
reservation price in the first auction and rG is the reservation price in the second
auction, what relation must hold between rF and rG ?
Exercise 11.2.9: Consider a symmetric-bidder auction. Let the common distribution
over types be 1 − exp(−θ2 /2). What is the reservation price for this auction? What is
the probability that any given bidder has a valuation less than the reservation price?
Suppose there are N bidders, what is the probability of deadweight loss (failure to
allocate the good to a bidder)? What is that value if N = 6? What happens to
that probability as N → ∞? What does this suggest about the practical efficiency of
symmetric-bidder auctions?

Common Value Auctions 11.3

We have so far limited attention to private values. In this section, we consider
the other extreme known as common value. In a common-value setting, if
all bidders had the same information, they would all assign the same value
to the good. For example, consider a tract of land that has an oil deposit of
unknown magnitude beneath it. Each potential bidder has access to information
that allows him to estimate how much oil there might be under the tract. If
the bidders pooled their information, then they would each arrive at the same
estimate. Moreover, given they all know the world price of oil, they would all
then place the same value on the tract of land; that is, they have a common
value for the land.
Notice that the assumption of common values has two potentially problem-
atic assumptions embedded in it: first, given the same information, the bidders
all arrive at the same estimate. Although a good approximation in many set-
tings, there are settings in which experts can arrive at different conclusions from
the same pieces of information (consider, e.g., the prevalence of multiple opin-
ions among doctors who examine the same patient). Second, even if the bidders
arrive at a common estimate—essentially, arrive at a common distribution over
the potential amounts of oil—the bidders could value them differently if they
have different attitudes toward risk. For instance, the value you place on the
11.3 Common Value Auctions 149

gamble $0 with probability 2/3 and $1000 with probability 1/3 is likely different
than the value I place on it. Hence, common values requires that either there is
no uncertainty left once all the information is shared or all participants have the
same attitudes toward risk. In the case of major corporations bidding for tracts
of land, the latter is probably reasonable insofar as we typically assume large
corporations are essentially risk neutral. In other cases, however, the assump-
tion of common attitudes toward risk could be suspect. Despite the potential
problems with the common value assumption, it is a useful basis from which to
operate and the models are tractable and yield sensible predictions.
As before, assume there is a single good or object. There is a seller whose
value for the good we normalize to zero. There are N bidders. The common
value of the good is V , where V is a random variable from the perspective of
the bidders; that is, at the time they submit their bids, no bidder knows what
V is. Similar to before, a bidder’s utility is χV − p, where χ = 1 if he gets the
good and = 0 otherwise and p is his payment.
The timing of the game is as follows:
1. Nature determines V and signals θ1 , . . . , θN according to the joint distri-
bution F : V × Θ1 × · · · × ΘN → [0, 1], where V is the set of possible
values of V and Θn is the set of possible values of θn . The distribution F
is common knowledge.
2. Bidder n alone learns θn . This is bidder n’s private information (type).
3. The bidders decide whether to participate in an auction (play the mecha-
nism).
4. The auction is conducted and the good allocated.
Some useful notation is the following:
• Consider a subset of the first N positive integers, K, and let ki denote the
ith element of K (e.g., if N = 4, then a possible K = {2, 3}, so k1 = 2 and
k2 = 3). Let
θ K = θk1 , . . . , θkI ,
where I is the number of elements in θ K . Let

ΘK = Θk 1 × · · · × Θk I .

Observe θ K ∈ ΘK .
• If y is a vector, let [y] denote the vector formed from the elements of
y such that elements of [y] are ordered from largest to smallest; that is,
m < n implies [y]m ≥ [y]n . Observe

[y] = (y[1] , . . . , y[M ] )

if y is an M -dimensional vector (recall y[n] denotes the nth largest element

of y).
150 Lecture Note 11: Auctions

We assume that each bidder’s signal is drawn from a common space, Θ,

so θ ∈ ΘN . We assume the function F (V, ·) is symmetric with respect to the
signals; that is, if θ and θ̂ have the same elements, but in different order,
F (V, θ) = F (V, θ̂) nonetheless. In the following, we will frequently refer to con-
ditional distributions based on this joint distribution. Observe that conditional
distributions inherit symmetry from F . In what follows, all distributions should
be assumed to be differentiable.
We want the signals, the θ, to be informative about what the true value,
V , is. This has implications for the conditional distributions of V given the
signals or some subset thereof. Specifically, we assume that V , θ1 , . . . , and θN
are all strongly affiliated (for a review of affiliation, see Section 11.5). Among
the implications of this assumption are the following. For any given K, if θ K
and θ ′K are elements of ΘK and θ K > θ ′K , then F (·|θ K ) fsd
≥
F (·|θ ′K ) strictly (this
follows because affiliation implies stochastic dominance—see Section 11.5). It
follows, therefore, that
E V θ K > E V θ ′K (11.20)
(this follows from Proposition 11.11).
The purpose of the next part of the analysis may seem opaque at first, but
its usefulness will become clear later. Consider the conditional distribution
Ψ(·|ξ) : Z M → [0, 1]. Assume Ψ(·|ζ) is a symmetric function. Observe the
probability that the greatest element in z ∈ Z M is less than or equal to ζ
conditional on ξ is
Ψ( ζ, . . . , ζ |ξ) ,
| {z }
M elements

which we can write more succinctly as Ψ(ζ1M |ξ), where, recall, 1M is an M -

dimensional vector of 1s. It follows that the probability density function of the
largest element in z given ξ is

XM
d ∂Ψ(ζ1M |ξ) ∂Ψ(ζ1M |ξ)
Ψ(ζ1M |ξ) = =M ,
dζ m=1
∂z m ∂z1

where the last equality follows from symmetry. To conclude, then, the proba-
bility density function of ζ = max z conditional on ξ is

∂Ψ(ζ1M |ξ)
γ(ζ|ξ) = M . (11.21)
∂z1

Define v(θn , [θ −n ]1 ) = E{V |θn , [θ −n ]1 }, where, by definition, [θ −n ]1 is the

largest element in θ −n . The quantity v(θn , [θ −n ]1 ) is, thus, the expected value
of the object conditional on bidder n’s signal and the largest signal observed by
any other bidder. From (11.20), v(·, ·) is increasing in each argument.
As a rule, mechanism design with dependent types is difficult and doesn’t
lend itself to general analyses such as the one in the previous section. We will,
therefore, limit attention to analyzing a particular mechanism, namely a sealed-
bid second-price auction. If the mechanism is sealed-bid second-price auction,
11.3 Common Value Auctions 151

then a strategy for a bidder is a mapping from what he knows, namely his type,
into a bid. Let Bn : Θ → R denote the strategy of bidder n. Given that the
bidders are symmetric, it is reasonable to look for a symmetric equilibrium;
hence, the subscript n will be dropped from Bn in what follows.
We are now in position to state and prove our main result:
Proposition 11.7 Assume common value and symmetric bidders. A symmet-
ric equilibrium exists for a sealed-bid second-price auction. In this equilibrium,
each bidder n’s equilibrium strategy is B(θn ) = v(θn , θn ).
Proof: Consider bidder n. We wish to show that the strategy B(·) given in
the statement of the proposition is a best response by bidder n if all the other
bidders are playing that strategy. Consider a bid, b, by bidder n. Given that
v(·, ·) is increasing in each argument, B(·) must be an increasing function. As
such, it is invertible. Because the highest bidder gets the good, bidder n is
awarded the good provided B −1 (b) ≥ [θ −n ]1 . Let τ = [θ −n ]1 From (11.21), if
g(τ |θn ) is the conditional probability distribution of τ conditional on θn , then
∂F (τ 1N −1 |θn )
g(τ |θn ) = (N − 1) .
∂θ1
The expected surplus of bidder n if he wins conditional on the next highest
valuation’s being τ is the difference between v(θn , τ ) and the second highest
bid, B(τ ). Hence, bidder n’s unconditional expected surplus is
Z B −1 (b)
v(θn , τ ) − B(τ ) g(τ |θn )dτ
θ
Z B −1 (b)
= v(θn , τ ) − v(τ, τ ) g(τ |θn )dτ , (11.22)
θ

where θ = inf Θ and the equality follows because the other bidders are assumed
to be playing B(θ) = v(θ, θ). The first-order condition for maximizing (11.22)
with respect to b is equivalent to

v θ, B −1 (b) − v B −1 (b), B −1 (b) = 0 (11.23)

Because v ·, B −1 (b) is an increasing function, we can satisfy the first-order con-
dition if and only if θ = B −1 (b), which is to say b = B(θ), as was to be shown.

Exercise 11.3.1: In the proof of Proposition 11.7, we neglected to check the second-

order condition. Verify it is satisfied. (Hint: What is the sign of v θ, B −1 (b) −
−1 −1
v B (b), B (b) if b < B(θn )? If b > B(θn )?)

The Winner’s Curse

One fact we might wish to know is how bids in the Proposition 11.7 equilibrium
compare with the bidders’ pre-bid estimates of the good’s value. In other words,
152 Lecture Note 11: Auctions

how does E{V |θn } compare to B(θn )? Recall with private values, they were the
same. Is this true with common value?
The answer, as we will demonstrate in a moment, is no. In fact, the bid will
always be less than the bidder’s expected value of the good given his information
(i.e., his signal). This is due to what is known as the winner’s curse: given the
bidding rule of Proposition 11.7, if you win the good, then your signal must have
been better than that of any other bidder. Learning this fact must cause you
to revise downward your estimate of the good’s value. In other words, winning
is “bad news”—like a “curse”—insofar as it causes a downward revision in the
winning bidder’s estimate of the good’s value. The rational bidder takes the
winner’s curse into account; that is, he realizes he wins only when the good is
not worth as much as he thinks it is. In other words, the value of a won object
is less than the expected value of the object absent knowledge of who will win.
Given the auction is sealed-bid second-price, a bidder bids what the good is
worth to him if he wins. Since the good he gets, should he win, is the won
object, his bid is rationally less than the expected value of the good based on
his signal alone.
This intuition is readily seen graphically. Consider Figure 11.1. In the figure,
X = [0, θ̄] and there are three bidders, n, 1, and 2. If bidder n wins the good, he
knows that the realized pair of signals of the other bidders (θ1 , θ2 ) must fall into
the smaller blue square. Before winning the good, he only that this realized pair
of signals falls into the larger square (blue and green regions combined). On
average, signals drawn from the smaller square will be less than signals drawn
from the larger square. Because the true value of the good, V , is strongly
affiliated with those signals, V will be larger on average when the signals are
drawn from the larger square than from the smaller square. Hence, winning—
which indicates the other bidders’ signals were drawn from the smaller square
rather than the larger square—is bad news and the expected value of the good
upon winning is less than it is conditional on the bidder’s signal alone.
To demonstrate the winner’s curse formally, recall that v(θn , θn ) is bidder
n’s conditional expectation of V conditional on his knowing (i) his signal is θn
and (ii) no other bidder drew a signal greater than θn (i.e., the value of the
“won object”). Hence,
Z θn Z θn
f (θ −n |θn )
v(θn , θn ) = ··· E{V |θn , θ −n } dθ −n , (11.24)
θ θ F (θ n 1N −1 |θn )

where θ = inf Θ and f (·|θn )/F (θn 1N −1 |θn ) is the probability density function
over the other bidders’ signals conditional on knowing (i) θn and (ii) that no
other bidder has a signal greater than θn .
Let N−n = {1, . . . , n − 1, n + 1, . . . , N }; that is, N−n are the first N positive
integers less integer n. Let Kℓ and Kh be two subsets of N−n such that

Kℓ ∪ Kh = N−n and Kℓ ∩ Kh = ∅ ;

that is, Kℓ and Kh are a partition of N−n . Note Kℓ could be the empty set
itself, in which case Kℓ = N−n .
11.3 Common Value Auctions 153

θ2
45◦

θ̄
+ determines E{V |θn }
θn

determines v(θn , θn )

θ1
θn θ̄
Figure 11.1: Illustration of the Winner’s Curse.

Because of strong affiliation (i.e., expression (11.20)), we know

E{V |θn , θ Kℓ , θ ′Kh } < E{V |θn , θ Kℓ , θ Kh } (11.25)

if θ Kh > θ ′Kh (where θ −n,−m is the N − 2-dimensional vector formed from θ by

dropping the nth and mth elements). Define
Z θ̄ Z θ̄ Z θn Z θn
F̄Kℓ ,Kh (θn ) = ··· ··· f (θ −n |θn )dθ −n ,
θn θn θ θ
| {z }| {z }
Ih integrals Iℓ integrals

where θ̄ = sup Θ, Ij are the number of elements in Kj , j ∈ {ℓ, h}. F̄Kℓ ,Kh (θn )
is the probability, conditional on θn , that a given set of Iℓ bidders other than n
have signals less than θn , while a given set of Ih bidders have a signal greater
than θn . Note, given symmetry, F̄Kℓ ,Kh (θn ) can also be written:
Z θ̄ Z θ̄ Z θn Z θn
F̄Iℓ ,Ih (θn ) = ··· ··· f (θ −n |θn )dθ −n ,
θn θn θ θ
| {z }| {z }
Ih integrals Iℓ integrals

Figure 11.2 illustrates.

Observe that N−n can be partitioned into two sets of Iℓ and Ih elements in

N −1 (N − 1)!
=
Iℓ Iℓ !Ih !
154 Lecture Note 11: Auctions

θ2
45◦
θ̄
F̄1,1 (θn ) F̄0,2 (θn )

θn

F (θn 12 |θn ) F̄1,1 (θn )

θ1
θn θ̄
Figure 11.2: Probabilities of the Different Regions of θ −n . Figure assumes
three bidders: 1, 2, and n.

ways.
Consider the following chain:

Z θ̄ Z θ̄
E{V |θn } = ··· E{V |θn , θ −n }f (θ −n |θn )dθ −n
θ θ
Z θn Z θn
= ··· E{V |θn , θ −n }f (θ −n |θn )dθ −n
θ θ
N −1
!Z Z Z Z
θ̄ θ̄ θn θn
X N −1
+ ··· ··· E{V |θn , θ −n }f (θ −n |θn )dθ −n
I θ θn θ θ
I=1 | n {z } | {z }
I integrals N −1−I integrals
Z θn Z θn
f (θ −n |θn )
= F (θn 1N −1 |θn ) ··· E{V |θn , θ −n } dθ −n
θ θ F (θn 1N −1 |θn )
N −1
! Z θ̄ Z θ̄ Z θn Z θn
X N −1 f (θ −n |θn )
+ F̄N −1,I (θn ) ··· ··· E{V |θn , θ −n } dθ −n .
I θ θ θ θ F̄ N −1,I (θn )
I=1 | n {z n} | {z }
I integrals N −1−I integrals

Observe that each integral expression in the last two lines is an expected value
of V conditional on the number of bidders other than n that have signals greater
than n’s. Moreover, (11.25) tells us that each such expected value in the last
exceeds the expected value in the penultimate line. The integral expression in
the penultimate line is, from (11.24), v(θn , θn ). The last two lines—and hence,
11.4 Appendix to Lecture Note 11: Stochastic Orders 155

E{V |θn }—must therefore strictly exceed

N
X −1
N −1
F (θn 1N −1 |θn )v(θn , θn ) + F̄N −1,I (θn )v(θn , θn ) = v(θn , θn ) ,
I
I=1

where the equality follows because

N
X −1
N −1
1 = F (θn 1N −1 |θn ) + F̄N −1,I (θn ) .
I
I=1

We have, therefore, verified the winner’s curse:

Proposition 11.8 (Winner’s Curse) Assume common value, symmetric bid-
ders, and a sealed-bid second-price auction. In the symmetric equilibrium, each
bidder n’s bid, B(θn ), is strictly less than his expected value of the good based
on his signal alone (i.e., strictly less than E{V |θn }).

Appendix to Lecture Note 11:

Stochastic Orders 11.4
Let X ⊆ R. A distribution F : X → [0, 1] dominates distribution G : X → [0, 1]
in the sense of first-order stochastic dominance (fsd) if and only if for all non-
decreasing functions γ : X → R we have
Z Z
γ(x)dF (x) ≥ γ(x)dG(x) . (11.26)
X X

If (11.26), write F ≥
fsd G. The following theorem can be proved:
Theorem 11.1 F fsd
≥
G if and only if F (x) ≤ G(x) for all x ∈ X .
We will not, however, prove that here; instead, we will prove a somewhat simpler
version:
Proposition 11.9 Suppose that F : X → [0, 1] and G : X → [0, 1] are differ-
entiable distributions and X is a bounded interval. Then F fsd
≥
G if and only if
F (x) ≤ G(x) for all x ∈ X .
Proof: Let f and g denote the derivatives of (densities associated with) F and
G, respectively.
Suppose F (x) ≤ G(x) for all x ∈ X . Let γ : X → R be a non-decreasing
function. Because γ is monotone, it is differentiable almost everywhere. Inte-
gration by parts yields
Z

γ(x) f (x) − g(x) dx = sup γ(x) F (x) − G(x)
X x∈X
Z

− inf γ(x) F (x) − G(x) − γ ′ (x) F (x) − G(x) dx
x∈X X
156 Lecture Note 11: Auctions

The first and second terms on the rhs are zero because supx∈X F (x) − G(x) =
1 − 1 = 0 and inf x∈X F (x) − G(x) = 0 − 0 = 0. The integrand in the third term
is everywhere non-positive by assumption. We can thus conclude:
Z Z
γ(x)f (x)dx ≥ γ(x)g(x)dx , (11.27)
X X

which is (11.26).
Suppose that (11.26) holds for all non-decreasing γ(·), which is to say (11.27)
holds. Suppose
0 , if x ≤ z
γ(x) =
1 , if x > z
for some z ∈ X . Given (11.27), we have 1 − F (z) ≥ 1 − G(z), which implies
F (z) ≤ G(z). Because z was arbitrary, we can conclude F (x) ≤ G(x) for all
x ∈ X.

We can define a strict version of first-order stochastic dominance: distribu-

tion F : X → [0, 1] strictly dominates distribution G : X → [0, 1] in the sense
of first-order stochastic dominance if and only if for all increasing differentiable
functions γ : X → R we have
Z Z
γ(x)dF (x) > γ(x)dG(x) . (11.28)
X X

Proposition 11.10 Suppose that F : X → [0, 1] and G : X → [0, 1] are dif-

ferentiable distributions and X is a bounded interval. Then F fsd
≥
G strictly if
F (x) < G(x) for all x ∈ (inf X , sup X ).
Proof: Again, let f and g denote the derivatives of (densities associated with)
F and G, respectively.
Suppose F (x) < G(x) for all x ∈ (inf X , sup X ). Let γ : X → R be an
increasing differentiable function. Integration by parts yields
Z

γ(x) f (x) − g(x) dx = sup γ(x) F (x) − G(x)
X x∈X
Z

− inf γ(x) F (x) − G(x) − γ ′ (x) F (x) − G(x) dx
x∈X X

The first and second terms on the rhs are zero because supx∈X F (x) − G(x) = 0
and inf x∈X F (x) − G(x) = 0. The integrand in the third term is positive by
assumption. We can thus conclude:
Z Z
γ(x)f (x)dx > γ(x)g(x)dx ,
X X

which is (11.28).

An important extension of
11.4 Appendix to Lecture Note 11: Stochastic Orders 157

Proposition 11.11 If F fsd ≥

G (strictly), then the expectation of x under F weakly
(strictly) exceeds the expectation of x under G.
Proof: Follows from the definitions given x is an increasing differentiable func-
tion.

Another extension is
Proposition 11.12 Let F and G be two differentiable distribution functions
with support (x0 , x3 ). Suppose F (x) ≤ G(x) for all x ∈ (x0 , x3 ) and there
exists an interval (x1 , x2 ) ⊂ (x0 , x3 ), x1 < x2 , such that F (x) < G(x) for all
x ∈ (x1 , x2 ). Then the expectation of x under F strictly exceeds the expectation
of x under G.
Proof: As before, let f and g be the densities associated with F and G,
respectively. We have
2 Z
X xi+1
EF (x) − EG (x) = x f (x) − g(x) dx
i=0 xi

2
X Z xi+1

= xi+1 F (xi+1 )−G(xi+1 ) − xi F (xi )−G(xi ) − F (x)−G(x) dx
i=0 xi

2 Z
X xi+1 Z x2

= G(x) − F (x) dx ≥ G(x) − F (x) dx > 0 ,
i=0 xi x1

where the middle line follows by integration by parts

Let hF and hG be the hazard rates associated with distributions F and G,

respectively. Distribution F dominates G in the sense of hazard-rate dominance,
denoted F hrd
≥
G if hF (x) ≤ hG (x) for all x.
Proposition 11.13 F hrd
≥
G implies F fsd
≥
G.
Proof: Observe:
Z x
0 ≥ hF (x) − hG (x) ⇒ 0 ≥ hF (z) − hG (z) dz
−∞
Z x Z x
⇒ hG (z) ≥ hF (z)dz ⇒ 1 − G(x) ≤ 1 − F (x) ⇒ F (x) ≤ G(x) ,
−∞ −∞

where the penultimate implication follows from Lemma 1.1. The result then
follows from Theorem 11.1.

The reverse hazard rate is defined as r(x) = f (x)/F (x). As an example, it

is the instantaneous probability of dying at age x conditional on not surviving
past age x. Observe
d log F (x)
r(x) = .
dx
158 Lecture Note 11: Auctions

Hence, treating that as differential equation, we see

Z ∞
F (x) = exp − r(z)dz . (11.29)
x

Distribution F dominates G in the sense of reverse hazard-rate dominance,

denoted F rhrd
≥
G if rF (x) ≥ rG (x) for all x.

Proposition 11.14 F rhrd

≥
G implies F fsd
≥
G.

Proof: Observe:
Z ∞ Z ∞
rF (x) ≥ rG (x) ⇒ − rF (z)dz ≤ − rG (z)dz ⇒ F (x) ≤ G(x) ,
x x

where the last implication follows from (11.29).

Differentiable distribution F dominates differentiable distribution G in the

sense of likelihood ratio dominance, denoted F lrd
≥
G if x′ > x implies
f (x′ ) f (x)
≥ , (11.30)
g(x′ ) g(x)
where f and g are the densities associated with F and G, respectively. In words,
F lrd
≥
G if high xs are relatively more likely under F than G.
Proposition 11.15 F lrd
≥
G implies F hrd
≥
G.
Proof: Observe that (11.30) implies
f (x′ ) g(x′ )
≥ (11.31)
f (x) g(x)
if x′ > x. Expression (11.31) in turn implies
Z ∞ Z ∞
f (z) g(z) 1 − F (x) 1 − G(x)
dz ≥ dz ⇒ ≥ .
x f (x) x g(x) f (x) g(x)
Because x was arbitrary, we can conclude
f (x) g(x)
hF (x) = ≤ = hG (x) .
1 − F (x) 1 − G(x)

Exercise 11.4.1: Prove

Proposition 11.16 F lrd
≥
G implies F rhrd
≥
G.
11.5 Appendix to Lecture Note 11: Affiliation 159

y y ∨ y′
y2

y ∧ y′ y′
y2′

y1 y1′
Figure 11.3: Illustration of Join and Meet for M = 2.

Appendix to Lecture Note 11:

Affiliation 11.5
Affiliation is related to the concept of correlation. It is, in a sense, a stronger
version of correlation insofar as affiliation implies correlation, but correlation
does not imply affiliation.
Consider the density function f : RN → R+ . Let x and x′ be two N -
dimensional random vectors. We say that random vectors drawn according to
f are affiliated if
f (x ∨ x′ )f (x ∧ x′ ) ≥ f (x)f (x′ ) , (11.32)
where ∨ and ∧ denote the join and meet, respectively. If y and y′ are two
vectors in RM , then

y ∨ y′ = max{y1 , y1′ }, . . . , max{yM , yM ′
} and

y ∧ y′ = min{y1 , y1′ }, . . . , min{yM , yM
′
} .

See Figure 11.3.

We can think of affiliation as telling us that large values on one dimension
tend to get matched with large values on other dimensions, while small values
tend to matched to small. That is, in terms of Figure 11.3, there tends to be
more probability mass on the line connecting y ∧ y′ and y ∨ y′ than on the line
connecting y and y′ .
If we take logs of both side of (11.32), we get

log f (x ∨ x′ ) + log f (x ∧ x′ ) ≥ log f (x) log f (x′ ) . (11.33)

Because expression (11.33) holds for all x and x′ , we see that log f (·) is a
160 Lecture Note 11: Auctions

supermodular function.5 This is sometimes also described as saying that f (·) is

log-supermodular. To summarize
Lemma 11.10 Let f (·) be a density function. Random vectors drawn according
to f (·) are affiliated if and only if f (·) is log-supermodular.
We say that random vectors drawn according to f are strongly affiliated if
the inequality in (11.32) is strict for x 6= x ∨ x′ and x 6= x ∧ x′ .

Exercise 11.5.1: A function g : RK → R is strictly supermodular if

g(u ∨ u′ ) + g(u ∧ u′ ) > g(u) + g(u′ )
for all u and u′ in the domain of g such that ′ ′
u 6= u ∨ u and u 6= u ∧ u . Prove that
if f exhibits strong affiliation, then log f (·) is strictly supermodular.

Relation to Stochastic Dominance and Correlation

Let X and Y be a random vector and variable, respectively, with joint density
f (·, ·). If X and Y are affiliated, then for all x ≥ x′ and y ≥ y ′ we have from
(11.32):
f (x, y)f (x′ , y ′ ) ≥ f (x, y ′ )f (x′ , y) .
Hence,
f (x, y) f (x′ , y)
≥ . (11.34)
f (x, y ′ ) f (x′ , y ′ )
Let fX (·) denote the marginal density of X and recall that f (x, y) = f (y|x)fX (x).
We can rewrite (11.34) as
f (y|x)fX (x) f (y|x′ )fX (x′ ) f (y|x) f (y|x′ )
≥ ⇒ ≥
f (y ′ |x)fX (x) f (y ′ |x′ )fX (x′ ) f (y ′ |x) f (y ′ |x′ )
f (y|x) f (y ′ |x)
⇒ ≥ . (11.35)
f (y|x′ ) f (y ′ |x′ )
Because y ≥ y ′ , (11.35) implies that F (·|x) lrd ≥
F (·|x′ ). Given Proposition 11.15,
′
we then have F (·|x) hrd F (·|x ). Proposition 11.13, in turn, yields F (·|x) fsd
≥ ≥
F (·|x′ ).
Let X be one-element vector (a scalar). Then X and Y affiliated mean
that if x > x′ , F (·|x) fsd
≥
F (·|x′ ). First-order stochastic dominance, in turn im-
′
plies, E{Y |x} ≥ E{Y |x } (Proposition 11.11). Hence, a regression of Y on X
must yield a non-decreasing regression line; that is X and Y are non-negatively
correlated. Formally,
5 For our purposes in this text, a supermodular function is a function, g : RK → R, that

has the property

g(u ∨ u′ ) + g(u ∧ u′ ) ≥ g(u) + g(u′ )
′
for all u and u in the domain of g. For example, the function g(u1 , u2 ) = u1 u2 is supermod-
ular.
11.5 Appendix to Lecture Note 11: Affiliation 161

Proposition 11.17 If random variables X and Y are affiliated, then they are
non-negatively correlated.

Proof:
Z Z

cov(X, Y ) = x − E{x} yf (x, y)dydx
ZX ZY

= x − E{x} yf (y|x)fX (x)dydx
ZX Y

= x − E{x} E{Y |x} fX (x)dx ≥ 0 .
X| {z }
both increasing in x
162 Lecture Note 11: Auctions
Hidden Action and
Incentives

163
Purpose
A common economic occurrence is the following: Two parties, principal and
agent, are in a situation—typically of their choosing—in which actions by the
agent impose an externality on the principal. Not surprisingly, the principal
will want to influence the agent’s actions. This influence will often take the
form of a contract that has the principal compensating the agent contingent on
either his actions or the consequences of his actions. Table 2 lists some examples
of situations like this. Note that, in many of these examples, the principal is
buying a good or service from the agent. That is, many buyer-seller relationships
naturally fit into the principal-agent framework. This part of the notes covers
the basic tools and results of agency theory.

Table 2: Examples of Moral-Hazard Problems

Principal Agent Problem Solution
Employer Employee Induce employee to take Base employee’s
actions that increase compensation on employer’s
employer’s profits, but profits.
which he finds personally
costly.
Plaintiff Attorney Induce attorney to expend Make attorney’s fee
costly effort to increase contingent on damages
plaintiff’s chances of awarded plaintiff.
prevailing at trial.
Homeowner Contractor Induce contractor to Give contractor bonus for
complete work (e.g., completing job on time.
remodel kitchen) on time.
Landlord Tenant Induce tenant to make Pay the tenant a fraction of
investments (e.g., in time or the increased value (e.g.,
money) that preserve or share-cropping contract).
enhance property’s value to Alternatively, make tenant
the landlord. post deposit to be forfeited
if value declines too much.

To an extent, the principal-agent problem finds its root in the early literature
on insurance. There, the concern was that someone who insures an asset might
then fail to maintain the asset properly (e.g., park his car in a bad neighbor-
hood). Typically, such behavior was either unobservable by the insurance com-

165
166 Purpose

pany or too difficult to contract against directly; hence, the insurance contract
could not be directly contingent on such behavior. But because this behavior—
known as moral hazard—imposes an externality on the insurance company (in
this case, a negative one), insurance companies were eager to develop contracts
that guarded against it. So, for example, many insurance contracts have de-
ductibles—the first k dollars of damage must be paid by the insured rather than
the insurance company. Because the insured now has $k at risk, he’ll think
twice about parking in a bad neighborhood. That is, the insurance contract
is designed to mitigate the externality that the agent—the insured—imposes
on the principal—the insurance company. Although principal-agent analysis is
more general than this, the name “moral hazard” has stuck and, so, the types
of problems considered here are often referred to as moral-hazard problems. A
more descriptive name, which is also used in the literature, is hidden-action
problems.

Bibliographic Note

This part of the lecture notes draws heavily from a set of notes that I co-authored
with Bernard Caillaud.
The Moral-Hazard
Setting
We begin with a general picture of the situation we wish to analyze.
12
1. Two players are in an economic relationship characterized by the following
two features: First, the actions of one player, the agent, affect the well-
being of the other player, the principal. Second, the players can agree
ex ante to a reward schedule by which the principal pays the agent.1
The reward schedule represents an enforceable contract (i.e., if there is a
dispute about whether a player has lived up to the terms of the contract,
then a court or similar body can adjudicate the dispute).

2. The agent’s action is hidden; that is, he knows what action he has taken
but the principal does not directly observe his action. (Although we will
consider, as a benchmark, the situation in which the action can be con-
tracted on directly.) Moreover, the agent has complete discretion in choos-
ing his action from some set of feasible actions.2

3. The actions determine, usually stochastically, some performance measures.

In many models, these are identical to the benefits received by the princi-
pal, although in some contexts the two are distinct. The reward schedule
is a function of (at least some) of these performance variables. In partic-
ular, the reward schedule can be a function of the verifiable performance
measures.3

4. The structure of the situation is common knowledge between the players.

For example, consider a salesperson who has discretion over the amount of
time or effort he expends promoting his company’s products. Many of these
actions are unobservable by his company. The company can, however, measure
in a verifiable way the number of orders or revenue he generates. Because these
measures are, presumably, correlated with his actions (i.e., the harder he works,
the more sales he generates on average), it may make sense for the company to

1 Although we typically think in terms positive payments, in many applications payments

could be negative; that is, the principal fines or otherwise punishes the agent.
2 Typically, this set is assumed to be exogenous to the relationship. One could, however,

imagine situations in which the principal has some control over this set ex ante (e.g., she
decides what tools the agent will have available).
3 Information is verifiable if it can be observed perfectly (i.e., without error) by third parties

who might be called upon to adjudicate a dispute between principal and agent.

167
168 Lecture Note 12: The Moral-Hazard Setting

base his pay on his sales—put him on commission—to induce him to expend
the appropriate level of effort.
Here, we will also be imposing some additional structure on the situation:

• The players are symmetrically informed at the time they agree to a reward
schedule.
• Bargaining is take-it-or-leave-it (tioli): The principal proposes a contract
(reward schedule), which the agent either accepts or rejects. If he rejects
it, the game ends and the players receive their reservation utilities (their
expected utilities from pursuing their next best alternatives). If he accepts,
then both parties are bound by the contract.
• Contracts cannot be renegotiated.
• Once the contract has been agreed to, the only player to take further
actions is the agent.
• The game is played once. In particular, there is only one period in which
the agent takes actions and the agent completes his actions before any
performance measures are realized.

All of these are common assumptions and, indeed, might be taken to constitute
part of the “standard” principal-agent model.
The link between actions and performance can be seen as follows. Perfor-
mance is a random variable and its probability distribution depends on the ac-
tions taken by the agent. So, for instance, a salesperson’s efforts could increase
his average (expected) sales, but he still faces upside risk (e.g., an economic
boom in his sales region) and downside risk (e.g., introduction of a rival prod-
uct). Because the performance measure is only stochastically related to the
action, it is generally impossible to infer perfectly the action from the realiza-
tion of the performance measure. That is, the performance measure does not,
generally, reveal the agent’s action—it remains “hidden” despite observing the
performance measure.
The link between actions and performance can also be viewed in a indirect
way in terms of a state-space model. Performance is a function of the agent’s
actions and of the state of nature; that is, a parameter (scalar or vector) that
describe the economic environment (e.g., the economic conditions in the sales-
person’s territory). In this view, the agent takes his action before knowing the
state of nature. Typically, we assume that the state of nature is not observ-
able to the principal. If she could observe it, then she could perfectly infer the
agent’s action by inverting from realized performance. In this model, it is not
important whether the agent later observes the state of nature or not, given he
could deduce it from his observation of his performance and his knowledge of
his actions.
There is a strong assumption of physical causality in this setting, namely that
actions by the agent determine performance. Moreover, the process is viewed
as a static production process: There are neither dynamics nor feedback. In
169

particular, the contract governs one period of production and the game between
principal and agent encompasses only this period. In addition, when choosing
his actions, the agent’s information is identical to the principal’s. Specifically,
he cannot adjust his actions as the performance measures are realized. The
sequentiality between actions and performance is strict: actions are completed
first and, only then, is performance realized.
170 Lecture Note 12: The Moral-Hazard Setting
Basic Two-Action
Model
We start with the simplest principal-agent model. Admittedly, it is so simple
13
that a number of the issues one would like to understand about contracting
under moral hazard disappear. On the other hand, many issues remain and, for
pedagogical purposes at least, it is a good place to start.1

The Two-action Model 13.1

There are two players: principal and agent. The principal’s problem is to design
incentives for the agent to take the action she prefers for him to take.
As this is a two-action model, there is no loss in labeling the agent’s possible
actions 0 and 1. Let a, for action, denote an arbitrary element of {0, 1}. As
a consequence of the agent’s action, some outcome, x ∈ X , is realized. For
instance, if a denotes a salesperson’s efforts, then x could be the units he sells
or the revenue he generates.
As a rule, hidden-action models posit a stochastic relation between action
and outcome. To wit, the agent’s action merely determines the distribution
from which the outcome is drawn. Continuing the salesperson example, we
could imagine that his sales are not only a function of his efforts, but also of
factors outside his control (e.g., the preferences of the customers he meets, how
many come to the store or are at home when he calls, etc.). From an ex ante
perspective, at least, such factors are uncertain. Let Fa : X → [0, 1] be the
distribution function from which x is drawn if the agent chooses action a.
The parties also have payoffs that depend on the agent’s action and, possibly,
the outcome. Their payoffs also depend on monetary transfers from principal
to agent, denoted by s ∈ R (an s < 0 means the agent is paying the principal).
Let the agent’s utility be U (s, x, a) and the principal’s be B(s, x, a).
In what follows, we limit attention to situations in which the principal has
all the bargaining power; specifically, she makes the agent an offer on a tioli
basis. It is rational for the agent to accept if his expected utility playing his best
response to the contract is not less than the utility he would earn if rejected the
contract and to reject otherwise. Let UR denote the agent’s utility if he rejects.
That value is known as his reservation utility.

1 But the pedagogical value of this section should not lead us to forget caution. And caution

is indeed necessary as the model delivers some conclusions that are far from general. One could
say that the two-action model is tailored to fit naı̈ve intuition and to lead to the desired results
without allowing us to see fully the (implicit) assumptions on which we are relying.

171
172 Lecture Note 13: Basic Two-Action Model

It is standard in hidden-action agency models to suppose that the agent’s

utility does not depend directly on the outcome and is also additively separable
in income and action. In other words, the standard assumption is

U (s, x, a) = u(s) − aC ,

where, taking advantage of there being just two actions, we have normalized
the utility component from action 0 to be 0. Assume C > 0; that is, the agent
prefers action 0 to action 1 ceteris paribus. In the salesperson example, think
of action 1 as corresponding to the salesperson working hard, which he dislikes,
and action 0 corresponding to his taking it easy. The function u : R → R is
the agent’s utility for income. As such, it is an increasing function. As we
will see, for there to truly be an agency problem, one requires either that the
agent be risk averse—so u is strictly concave—or that there be a lower limit
on permissible compensation to the agent (typically, a requirement that s ≥ 0);
when there is a lower limit, we say the agent is protected by limited liability.
Here, though, we limit attention to a risk-averse agent. See Sappington (1983)
for an analysis with limited liability.
The usual assumption in agency models such as this is that the principal’s
payoff is a function of the outcome, x, and not directly of the agent’s action.
Further, because, if need be, we could reasonably suppose some increasing map-
ping from outcome to money, one assumes the outcome is a monetary payoff to
the principal (so X ⊆ R). Hence, the principal’s payoff is standardly

B(s, x, a) = b(x − s) ,

where b : R → R is the principal’s utility from money and, so, an increas-

ing function. Although one could assume the principal is also risk averse, that
additional generality would have limited consequence for the analysis here. Con-
sequently, for simplicity and in keeping with what is standardly done, assume
the principal to be risk neutral; that is,

B(s, x, a) = x − s .

Full-Information Benchmark

Suppose, as a benchmark, that the principal could observe and establish (verify)
the agent’s choice of action. Call this benchmark the full or perfect information
case. Then the principal could make the contract with the agent contingent on
the agent’s effort, a. Moreover, because the agent is risk averse, while the princi-
pal is risk neutral, efficiency dictates the principal absorb all the risk. Hence, in
this benchmark case, there is no need to make the agent’s compensation depend
on the outcome; it will depend on his action only. Consider a contract of the
form:
s0 , if a = 0
s= .
s1 , if a = 1
13.1 The Two-action Model 173

Suppose the principal wants the agent to choose a = 1, then she must choose
s0 and s1 to satisfy two conditions. First, conditional on accepting the contract,
the agent must prefer action 1 to action 0; that is,

u (s1 ) − C ≥ u (s0 ) . (ic)

Such a constraint is known as an incentive compatibility constraint (convention-

ally abbreviated ic): taking the action desired by the principal must maximize
the agent’s expected utility. Second, anticipating his rational play (i.e., here
knowing he will choose a = 1), he must prefer to sign the contract than to forgo
employment with the principal; that is,

u (s1 ) − C ≥ UR . (ir)

Such a constraint is known as an individual rationality constraint (convention-

ally abbreviated ir). The ir constraint is also referred to as a participation
constraint.
The principal’s objective in choosing s0 and s1 is to maximize her expected
payoff conditional on gaining acceptance of the contract and inducing a = 1.
That is, she wishes to solve

max E1 {x} − s1
s0 ,s1

subject to the constraints (ic) and (ir), where Ea denotes expectation given
action a. Provided

u (s0 ) < UR and

u (s1 ) − C = UR (13.1)

both have solutions within the domain of u(·), then the solution to the principal’s
problem is straightforward: s1 solves (13.1) and s0 is a solution to u(s) < UR .
It is readily seen that this solution satisfies the constraints. Moreover, because
u(·) is strictly increasing, there is no smaller payment that the principal could
give the agent and still have him accept the contract. This contract is known
as a forcing contract.2 For future reference, let sF 1 be the solution to (13.1).

2 The solution to the principal’s maximization problem depends on the domain and range
of the utility function u(·). Let D, an interval in R, be its domain and R its range. Let s
be inf D (i.e., the greatest lower bound of D) and let s̄ be sup D (i.e., the least upper bound
of D). As shorthand for lims↓s u(s) and lims↑s̄ u(s), write u(s) and u(s̄), respectively. If
u(s) − C < UR for all s ≤ s̄, then no contract exists that satisfies (ir). In this case, the best
the principal could do is implement a = 0. Similarly, if u(s̄) − C < u(s), then no contract
exists that satisfies (ic). The principal would have to be satisfied with implementing a = 0.
Hence, a = 1 can be implemented if and only if u(s̄) − C > max{UR , u(s)}. Assuming this
condition is met, a solution is s0 ↓ s and s1 solving
u(s) − C ≥ max{UR , u(s)}.
Generally, conditions are imposed on u(·) such that a solution exists to u(s) < UR and (13.1).
Henceforth, we will assume that these conditions have, indeed, been imposed. For an example
of an analysis that considers bounds on D that are more binding, see Sappington (1983).
174 Lecture Note 13: Basic Two-Action Model

−1
Observe that sF 1 = u (UR + C), where u−1 (·) is the inverse of the function
u(·).
Another option for the principal is, of course, just to let the agent choose
the action he prefers absent any incentives to the contrary; that is, action 0.
There are many contracts that would accomplish this goal, although the most
“natural” is perhaps a non-contingent contract: s0 = s1 . Observe, here, there
is no ic constraint—the agent inherently prefers a = 0—and the only constraint
is the ir constraint:
u (s0 ) ≥ UR .
The expected-profit-maximizing (cost-minimizing) payment is the smallest pay-
ment satisfying that expression. Given that u(·) is increasing, this entails
s0 = u−1 (UR ). We will refer to this value of s0 as sF
0.
The principal’s expected payoff conditional on inducing a under the optimal
full-information contract for inducing a is Ea {x} − sF
a . The principal will, thus,
prefer to induce a = 1 if
E1 {x} − sF F
1 > E0 {x} − s0 .

In what follows, we will assume that this condition is met: That is, in our
benchmark case of verifiable action, the principal prefers to induce a = 1 (the
action the agent intrinsically dislikes).
Observe the steps taken in solving this benchmark case: first, for each pos-
sible action we solved for the optimal contract that induces that action. Then
we calculated the principal’s expected payoff under each such contract. The
action that the principal chooses to induce in equilibrium is, then, the one that
yields her the largest expected payoff. This two-step process for solving for the
optimal contract is frequently used in hidden-action agency problems, as we will
see.

The Optimal Incentive

Contract 13.2
Now, we return to the case of interest: the agent’s action is hidden. Con-
sequently, the principal cannot make her payment directly contingent on his
action. The only verifiable measure is outcome, x; that is, x is the only pa-
rameter on which compensation can be made contingent. A contract, then, is
a function mapping outcome into compensation: s = S(x). Should he have ac-
cepted such a contract, the agent then freely chooses the action that maximizes
his expected utility. Consequently, the salesperson chooses action a = 1 if and
only if:3 n o n o
E1 u S(x) − C ≥ E0 u S(x) . (ic′ )

3 We assume, when indifferent among a group of actions, that the agent chooses from

that group the action that the principal prefers. This assumption, although often troubling to
those new to agency theory, is not truly a problem. Recall that the agency problem is a game.
Consistent with game theory, we’re looking for an equilibrium of this game; i.e., a situation
in which players are playing mutual best responses and in which they correctly anticipate
13.2 The Optimal Incentive Contract 175

If this inequality is violated, the agent prefers a = 0. Observe that this is the
incentive compatibility (ic) constraint in this case.
The game we analyze is in fact a simple Stackelberg game, where the prin-
cipal is the first mover—she chooses the payment schedule—to which she is
committed; and the agent the second mover—choosing his action in response to
the payment schedule; that is, choosing the solution to
n o
max Ea u S(x) − aC
a

(with ties going to the principal). The solution is the agent’s equilibrium choice
of action and it is a function of the payment function S(·). Solving this contract-
ing problem then requires us to understand what kind of contract the principal
could and will offer.
Observe first that if she were to offer the fixed-payment contract S(x) = sF 0
for all x, then, as above, the agent would accept the contract and not bother
to expend effort. Among all contracts that induce the agent to choose action
a = 0 in equilibrium, this is clearly the cheapest one for the principal.
On the other hand, the fixed-payment contract sF 1 will no longer work given
the hidden-action problem: the agent gains sF 1 whatever his efforts, so he will
choose the action that costs him less, namely a = 0. It is in fact immediate
that any fixed-payment contract, which would be optimal if the only concern
were efficient risk-sharing, will induce an agent to choose his least costly action.
As, by assumption, the principal did better in the full-information benchmark
inducing a = 1, it seems plausible that she would still wish to induce that here,
although—as we’ve just seen—that must mean inefficient (relative to the first
best) allocation of risk to the agent.
We now face two separate questions. First, conditional on the principal’s
wanting the agent to choose a = 1, what is the optimal—least-expected-cost—
contract for the principal to offer? Second, are the principal’s expected payoffs
greater doing this than not inducing a = 1 (i.e., greater than her expected
payoffs from offering the fixed-payment contract S(x) = sF 0 )?
As in the benchmark case, not only must the contract give the agent an
incentive to choose the desired action (i.e., meet the ic constraint), it must also
be individually rational:
n o
E1 u S(x) − C ≥ UR . (ir′ )

The optimal contract is then the solution to the following program:

max E1 x − S(x) (13.2)
S(·)

subject to (ic′ ) and (ir′ ).

the best responses of their opponents. Were the agent to behave differently when indifferent,
then we wouldn’t have an equilibrium because the principal would vary her strategy—offer a
different contract—so as to break this indifference. Moreover, it can be shown that in many
models the only equilibrium has the property that the agent chooses among his best responses
(the actions among which he is indifferent given the contract) the one most preferred by the
principal.
176 Lecture Note 13: Basic Two-Action Model

The next few sections will consider the solution to (13.2) under a number of
different assumptions about the distribution functions Fa .
Two assumptions, additional to those given previously, about the agent’s
utility-for-income function, u, will be common to these analyses:

1. The domain of u is (s, ∞), s ≥ −∞.4

2. lims↓s u(s) = −∞ and lims↑∞ u(s) = ∞; that is, the image of u is the
entire set R.

An example of a utility function satisfying all our assumptions is log with s = 0.

For the most part, these assumptions are for convenience and are more restrictive
√
than we need (for instance, many of our results will also hold if u(s) = x,
although this fails the second assumption). Some consequences of these and our
earlier assumptions are

• u (·) is invertible (a consequence of its strict monotonicity). Let u−1 (·)

denote its inverse.

• The domain of u−1 (·) is R (a consequence of the last two assumptions).

• u−1 (·) is continuous, strictly increasing, and convex (a consequence of the

continuity of u (·), its concavity, and that it is strictly increasing).

Two-outcome Model 13.3

Imagine that there are only two possible outcomes, high and low, denoted re-
spectively by xH and xL , with xH > xL . Assume that

Fa (xL ) = 1 − qa ,

where q ∈ (0, 1), a constant, is common knowledge.

A contract is sH = S(xH ) and sL = S(xL ). We can, thus, write program
(13.2) as
max q (xH − sH ) + (1 − q) (xL − sL )
sH ,sL

subject to

qu (sH ) + (1 − q) u (sL ) − C ≥ 0 · u (sH ) + 1 · u (sL ) (ic)

and
qu (sH ) + (1 − q) u (sL ) − C ≥ UR (ir)
We could solve this problem mechanically using the usual techniques for max-
imizing a function subject to constraints, but it is far easier, here, to use a

4 Because its domain is an open interval and it is concave, u(·) is continuous everywhere on

its domain (see van Tiel, 1984, p. 5).

13.3 Two-outcome Model 177

little intuition. To begin, we need to determine which constraints are binding.

Is ic binding? Well, suppose it were not. Then the problem would simply be
one of optimal risk sharing, because, by supposition, the incentive problem no
longer binds. But we know optimal risk sharing entails sH = sL ; that is, a
fixed-payment contract.5 As we saw above, however, a fixed-payment contract
cannot satisfy ic:
u (s) − C < u (s) .
Hence, ic must be binding.
What about ir? Is it binding? Suppose it were not (i.e., it were a strict
inequality) and let s∗L and s∗H be the optimal contract. Then there must exist
an ε > 0 such that

q − u (s∗H ) − ε + (1 − q) u (s∗L ) − ε − C ≥ UR .

Let s̃n = u−1 u (s∗n ) − ε , n ∈ {L, H}. Clearly, s̃n < s∗n for both n, so that the
{s̃n } contract costs the principal less than the {s∗n } contract; or, equivalently,
the {s̃n } contract yields the principal a greater expected payoff than the {s∗n }
contract. Moreover, the {s̃n } contract satisfies ic:

qu (s̃H ) + (1 − q) u (s̃L ) − C = q u (s∗H ) − ε + (1 − q) u (s∗L ) − ε − C
= qu (s∗H ) + (1 − q) u (s∗L ) − C − ε
≥ 0 · u (s∗H ) + 1 · u (s∗L ) − ε ({s∗n } satisfies ic)
= u (s̃L ) .

But this means that {s̃n } satisfies both constraints and yields a greater expected
payoff, which contradicts the optimality of {s∗n }. Therefore, by contradiction,
we can conclude that ir also binds under the optimal contract for inducing
a = 1.
We’re now in a situation where the two constraints must bind at the optimal
contract. But, given we have only two unknown variables, sH and sL , this means
we can solve for the optimal contract merely by solving the constraints. Doing
so yields
−1 1
ŝH = u UR + C and ŝL = u−1 (UR ) . (13.3)
q
Observe that the payments vary with the state (as we knew they must because
fixed payments fail the ic constraint).

5 The proof is straightforward if u(·) is differentiable: Let λ be the Lagrange multiplier on

(ir). The first-order conditions with respect to sL and sH are

1 − q − λ (1 − q) u′ (sL ) = 0

and
q − λqu′ (sH ) = 0 ,
respectively. Solving, it is clear that sL = sH . The proof when u (·) is not (everywhere)
differentiable is only slightly harder and is left to the reader.
178 Lecture Note 13: Basic Two-Action Model

Recall that were the agent’s action verifiable (i.e., in the full-information
−1
benchmark), the contract would be S (x) = sF 1 = u (UR + C). Rewriting
(13.3) we see that

1−q
ŝH = u−1 u sF
1 + C and ŝL = u−1 u sF
1 −C ;
q

that is, one payment is above the payment under full information, while the other
is below the payment under full information. Moreover, the expected payment
to the salesperson is greater than sF
1:

1−q
qŝH + (1 − q) ŝL = qu−1 u sF1 + q C + (1 − q) u
−1
u sF 1 −C
1−q
≥ u−1 q u sF 1 + q C + (1 − q) u s F
1 − C (13.4)

= u−1 u sF
1 = sF
1;

where the inequality follows from Jensen’s inequality.6 Provided the agent is
strictly risk averse, the above inequality is strict: inducing the agent to choose
a = 1 costs strictly more in expectation when the principal cannot verify the
agent’s action.
Before proceeding, it is worth considering why the principal suffers from her
inability to verify the agent’s action (i.e., from the existence of a hidden-action
problem). Ceteris paribus, the agent prefers a = 0 to a = 1 because the latter
costs him more. Hence, when the principal wishes to induce a = 1, her interests
and the agent’s are not aligned. To align their interests, she must offer the agent
incentives to choose a = 1. The problem is that the principal cannot directly
tie these incentives to the variable in which she is interested, namely the action
itself. Rather, she must tie these incentives to outcomes, which are imperfectly
correlated with action. These incentives, therefore, expose the agent to risk. We
know, relative to the first best, that this is inefficient. Someone must bear the
cost of this inefficiency. Because the bargaining game always yields the agent
the same expected utility (i.e., ir is always binding), the cost of this inefficiency
must, thus, be borne by the principal.
Another way to view this last point is that because the agent is exposed to
risk, which he dislikes, he must be compensated. This compensation takes the
form of a higher expected payment.
To begin to appreciate the importance of the hidden-action problem, observe

6 Jensen’s inequality for convex functions states that if g (·) is convex function, then

E {g (X)} ≥ g (EX), where X is a random variable whose support is an interval of R and

E is the expectations operator with respect to X (see, e.g., van Tiel, 1984, p. 11, for a proof).
If g (·) is strictly convex and the distribution of X is not degenerate (i.e., does not concentrate
all mass on one point), then the inequality is strict. For concave functions, the inequalities
are reversed.
13.3 Two-outcome Model 179

that
lim qŝH + (1 − q) ŝL = lim ŝH
q↑1 q↑1

= u−1 u sF
1

= sF
1 .

Hence, when q = 1, there is effectively no hidden-action problem: the low

outcome, xL , constitutes proof that the agent chose action 0, because Pr{x =
xL |a = 1} = 0 in that case. The principal is, thus, free to “punish” the agent
for a low outcome, thereby deterring a = 0. But because there is no risk of
the punishment being inflicted when a = 1, the principal does not have to
compensate the agent for bearing such risk; the ir constraint can be satisfied
paying the same compensation as under full information. When q = 1, we have
what is known as a shifting support.7 We will consider shifting supports in
greater depth later.
To see the importance of the agent’s risk aversion, note that were he risk
neutral, then the inequality in (13.4) would, instead, be an equality and the
expected compensation paid the agent would equal the compensation paid under
full information. Given that the principal is risk neutral by assumption, she is
indifferent between an expected payment of sF F
1 and paying s1 with certainty. In
other words, with a risk-neutral agent, there would be no loss, relative to full
information, of overcoming the hidden-action problem by basing compensation
on outcome. It is important to note, however, that assuming a risk-neutral agent
does not obviate the need to pay contingent compensation (i.e., we still need
sH > sL )—as can be seen by checking the ic constraint; agent risk neutrality
means only that the principal suffers no loss from the fact that the agent’s action
is hidden.
We can also analyze the situation graphically. To do so, it helps to switch
from compensation space to utility space; that is, rather than put sL and sH
on the axes, we put uL ≡ u(sL ) and uH ≡ u(sH ) on the axes. With this change
of variables, program (13.2) becomes:

max q xH − u−1 (uH ) + (1 − q) xL − u−1 (uL )
uL ,uH

subject to
quH + (1 − q) uL − C ≥ uL and (ic′′ )
quH + (1 − q) uL − C ≥ UR (ir′′ )
Observe that, in this space, the agent’s indifference curves are straight lines, with
lines farther from the origin corresponding to greater expected utility. The prin-
cipal’s iso-expected-payoff curves are concave relative to the origin—reflecting
7 The support of distribution G over random variable X, sometimes denoted supp {X}, is

the set of x’s such that for all ε > 0,

G (x) − G (x − ε) > 0 .
Loosely speaking, it is the set of x’s that have positive probability of occurring.
180 Lecture Note 13: Basic Two-Action Model

uH 45◦

agent’s indifference curves

principal’s indifference curves

Figure 13.1: Representative indifference curves in utility space for the princi-
pal and agent. Straight lines (in olive) are agent’s, curved lines
(in violet) are principal’s. Agent enjoys greater expected utility
moving up. Principal greater expected payoff moving down.

that u−1 (·) is a convex function— with curves closer to the origin corresponding
to greater expected payoffs (lower). Figure 13.1 illustrates. Observe the agent’s
indifference curves and the principal’s can be tangent only at the 45◦ line, a
well-known result from the insurance literature.8 This illustrates why efficiency
(in a first-best sense) requires that the agent not bear risk.
We can re-express (ic′′ ) as

C
uH ≥ uL + . (13.5)
q
Hence, the set of contracts that are incentive compatible lie on or above a line
above, but parallel, to the 45◦ line. Graphically, we now see that an incentive-
compatible contract requires that we abandon non-contingent contracts. Fig-
ure 13.2 shows the space of incentive-compatible contracts.

8 Proof: Let φ(·) = u−1 (·). Then the mrs for the principal is
(1 − q)φ′ (uL )
− ;
qφ′ (uH )
whereas the mrs for the agent is
1−q
− .
q
′
Because φ(·) is strictly convex, φ (·) is strictly monotone. Consequently, the mrs’s can be
tangent only on the 45◦ line.
13.3 Two-outcome Model 181

IC
uH

45°
Both IR and IC
contracts

Individually
IC rational
contracts Optimal
contracts
contract

C/q IR

Figure 13.2: The set of feasible contracts.

The set of individually rational contracts are those that lie on or above the
line defined by (ir′′ ). This is also illustrated in Figure 13.2. The intersection of
these two regions then constitutes the set of feasible contracts for inducing the
salesperson to choose a = 1. Observe that the principal’s lowest iso-expected-
payoff curve that intersects this set is the one that passes through the “corner” of
the set—consistent with our earlier conclusion that both constraints are binding
at the optimal contract.
Lastly, let’s consider the variable q. We can interpret q as representing the
correlation—or, more accurately, the informativeness—of sales to the action
taken. At first glance, it might seem odd to be worried about the informa-
tiveness of sales since, in equilibrium, the principal can accurately predict the
agent’s choice of action from the structure of the game and her knowledge of the
contract. But that’s not the point: the principal is forced to design a contract
that pays the agent based on performance measures that are informative about
the variable upon which she would truly like to contract, namely his action.
The more informative these performance measures are—loosely, the more cor-
related they are with action—the closer the principal is getting to the ideal of
contracting on the agent’s action.
In light of this discussion it wouldn’t be surprising if the principal’s expected
profit under the optimal contract for inducing a = 1 increases as q increases.
182 Lecture Note 13: Basic Two-Action Model

Clearly her expected revenue,

qxH + (1 − q)xL ,

is increasing in q. Hence, it is sufficient to show her expected cost,

qŝH + (1 − q)ŝL ,

is non-increasing in q. To do so, it is convenient to work in terms of utility

(i.e., uH and uL ) rather than directly with compensation. Let q1 and q2 be two
distinct values of q, with q1 < q2 . Let {u1n } be the optimal contract (expressed
in utility terms) when q = q1 .
Define r = q1 /q2 and define

ũH = ru1H + (1 − r)u1L and

ũL = u1L .

Let us now see that the contract {ũn } satisfies both ir and ic when q = q2 .
Observe, first, that

q2 ũH + (1 − q2 )ũL = rq2 u1H + (1 − r)q2 + 1 − q2 u1L
= q1 u1H + (1 − q1 )u1L .

Given that the contract {u1n } satisfies ir and ic when q = q1 , it follows, by

transitivity, that {ũn } satisfies ir and ic when q = q2 . We need, now, simply
show that expected compensation is less; that is, that

q2 u−1 (ũH ) + (1 − q2 ) u−1 (ũL ) ≤ q1 u−1 u1H + (1 − q1 ) u−1 u1L .

To do so, observe that

u−1 (ũH ) ≤ ru−1 u1H + (1 − r) u−1 u1L and

u−1 (ũL ) ≤ u−1 u1L

by Jensen’s inequality (recall u−1 (·) is convex). We have, therefore, that

q2 u−1 (ũH ) + (1 − q2 ) u−1 (ũL ) ≤ q2 ru−1 u1H + (1 − r) u−1 u1L

+ (1 − q2 ) u−1 u1L

= q1 u−1 u1H + (1 − q1 ) u−1 u1L .

In other words, the principal’s expected cost is no greater in the regime q = q2

than in the regime q = q1 —as was to be shown.

Summary: Although we’ve considered the simplest of agency models in this

section, there are, nevertheless, some general lessons that come from this. First,
the optimal contract for inducing an action other than the one that the agent
13.4 Multiple-outcomes Model 183

finds least costly requires a contract that is fully contingent on the performance
measure. This is a consequence of the action being unobservable to the principal,
not the agent’s risk aversion. When, however, the agent is risk averse, then the
principal’s expected cost of solving the hidden-action problem is greater than it
would be in the benchmark full-information case: exposing the agent to risk is
inefficient (relative to the first best) and the cost of this inefficiency is borne by
the principal. The size of this cost depends on how good an approximation the
performance measure is for the variable upon which the principal really desires
to contract, the agent’s action. The better an approximation (statistic) it is, the
lower is the principal’s expected cost. If, as here, that shift also raises expected
revenue, then a more accurate approximation means a greater expected payoff.
It is also worth pointing out that one result, which might seem as though it
should be general, is not: namely, the result that compensation is increasing
with performance (e.g., ŝH > ŝL ). Although this is true when there are only two
possible realizations of the performance measure (as we’ve proved), this result
does not hold generally when there are more than two possible realizations.

Multiple-outcomes Model 13.4

Now we assume that there are multiple possible outcomes, including, possibly,
an infinite number. Without loss of generality, we may assume the set of possible
outcomes is (0, ∞), given that impossible outcomes can be assigned zero prob-
ability. We will also assume, henceforth, that u(·) exhibits strict risk aversion
(i.e., is strictly concave).
Recall that our problem is to solve program (13.2), on page 175. In this
context, we can rewrite the problem as
Z ∞
max (x − S (x)) dF1 (x)
S(·) 0

subject to
Z ∞
u S(x) dF1 (x) − C ≥ UR and (13.6)
Z0 ∞ Z ∞

u S(x) dF1 (x) − C ≥ u S(x) dF0 (x) , (13.7)
0 0

which are the ir and ic constraints, respectively. In what follows, we assume

that there is a well-defined density function, fa (·), associated with Fa (·) for
both a. For
R instance, if Fa (·) is differentiable everywhere,
R then fa (·) = Fa′ (·)
and the dFa (x) notation could be replaced with fa (x) dx. Alternatively,
the possible outcomes could be discrete, x = x1 , . . . , xN , in which case

fa (x̂) = Fa (x̂) − lim Fa (x)

x↑x̂
R PN
and the dFa (x) notation could be replaced with n=1 fa (xn ).
184 Lecture Note 13: Basic Two-Action Model

We solve the above program using standard Kuhn-Tucker techniques. Let

µ be the (non-negative) Lagrange multiplier on the incentive constraint and
let λ be the (non-negative) Lagrange multiplier on the individual rationality
constraint. The Lagrangian of the problem is, thus,
Z ∞ Z ∞

L(S(·), λ, µ) = x − S(x) dF1 (x) + λ u S(x) dF1 (x) − C
0 0
Z ∞ Z ∞

+µ u S(x) dF1 (x) − u S(x) dF0 (x) − C .
0 0

The necessary first-order conditions are λ ≥ 0, µ ≥ 0, (13.6), (13.7),

f0 (x)
u′ S(x) λ + µ 1 − − 1 = 0, (13.8)
f1 (x)
Z ∞

λ>0⇒ u S(x) dF1 (x) = C + UR , and
0
Z ∞ Z ∞

µ>0⇒ u S(x) dF1 (x) − u S(x) dF0 (x) = C .
0 0
From our previous reasoning, we already know that the ic constraint is
binding. To see this again, observe that if it were not (i.e., µ = 0), then (13.8)
would reduce to u′ S(x) = 1/λ for all x; that is, a fixed payment.9 But we
know a fixed-payment contract is not incentive compatible. It is also immediate
that the participation constraint must be satisfied as an equality; otherwise, the
principal could reduce the payment schedule, thereby increasing her profits, in
a manner that preserved
the incentive
constraint (i.e., replace S ∗ (x) with S̃(x),
−1 ∗

where S̃(x) = u u S (x) − ε ).
The necessary conditions above are also sufficient given the assumed con-
cavity of u(·). At every point where it is maximized with respect to s,
f0 (x)
λ+µ 1−
f1 (x)
must be positive—observe it equals
1
>0
u′ (s)
—so the second derivative with respect to s must, therefore, be negative.
The least-cost contract inducing a = 1 therefore corresponds to a payment
schedule S(·) that varies with the level of sales in a non-trivial way given by
(13.8). That expression might look complicated, but its interpretation is central
to the model and easy to follow. Observe, in particular, that because u′ (·) is a

9 Note that we’ve again established the result that, absent an incentive problem, a risk-

neutral player should absorb all the risk when trading with a risk-averse player.
13.4 Multiple-outcomes Model 185

decreasing function (u(·), recall, is strictly concave), S(x) is positively correlated

with
f0 (x)
λ+µ 1− ;
f1 (x)
that is, the larger (smaller) is this term, the larger (smaller) is S(x).
The reward for a given level of sales x depends upon the likelihood ratio

f0 (x)
r(x) ≡ .
f1 (x)

This ratio has a clear statistical meaning: it measures how much more likely
is it that the distribution from which sales have been determined is F0 rather
than F1 when outcome x is observe. When r(x) is high, observing x allows the
principal to draw the statistical inference that it is much more likely that the
distribution was actually F0 ; that is, the agent did not choose the action she
wished him to take. In this case,
f0 (x)
λ+µ 1−
f1 (x)

is small (but necessarily positive) and S(x) must also be small as well. When
r(x) is small, the principal can feel rather confident that her agent acted as
desired and she should, then, reward him highly. That is, outcomes that are
relatively more likely when the agent has behaved in the desired manner result
in larger payments to the agent than outcomes that would be relatively rare if
the agent had behaved in the desired manner.
The minimum-cost incentive contract that induces the costly action a = 1 in
essence commits the principal to behave like a Bayesian statistician who holds
some diffuse prior over which action the agent has taken:10 She should use
the outcome to revise her beliefs about what action the agent took and she
should reward the agent more for outcomes that cause her to revise upward
her beliefs that he took the desired action and she should reward him less
(punish him) for outcomes that cause a downward revision in her beliefs.11 As
a consequence, the payment schedule is connected to outcomes only through
the outcomes’ statistical properties (the relative differences in the densities),
not through their accounting properties. In particular, there is now no reason
to believe that higher outcomes (larger x) should be rewarded more than lower
ones.
As an example of non-monotonic compensation, suppose that there are three
possible outcomes: low, medium, and high (xL , xM , and xH , respectively). Let

10 A diffuse prior is one that assigns positive probability to each possible action.
11 Of course, as a rational player of the game, the principal can infer that, if the contract
is incentive compatible, the agent will have taken the desired action. Thus, there is not, in
some sense, a real inference problem. Rather the issue is that, to be incentive compatible, the
principal must commit to act as if there were an inference problem.
186 Lecture Note 13: Basic Two-Action Model

the density functions be

 1
 3 , if x = xL
2−a
fa (x) =
 6 , if x = xM .
2+a
6 , if x = xH

Then 
 λ , if x = xL
f0 (x)
λ+µ 1− = λ − µ , if x = xM .
f1 (x) 
λ + µ/3 , if x = xH
Hence, the low outcome is rewarded more than the medium one—a low outcome
is uninformative about the agent’s action, whereas a medium outcome suggests
that the agent has not taken the desired action. Admittedly, non-monotonic
compensation is rarely, if ever, observed in real life. We will see below what
additional properties are required, in this model, to ensure monotonic compen-
sation.
Note, somewhat implicit in our analysis to this point, is an assumption that
f1 (x) > 0 except, possibly, on a subset of x that are impossible (have zero
measure). Without this assumption, (13.8) would entail division by zero, which
is, of course, not permitted. If, however, we let f1 (·) go to zero on some subset
of x that had positive measure under F0 (·), then we see that µ must also tend
to zero because
f0 (x)
λ+µ 1−
f1 (x)
must be positive. In essence, then, the shadow price (cost) of the incentive
constraint vanishes as f0 (·) goes to zero. This makes sense: were f1 (·) zero on
some subset of x that could occur (had positive measure) under F0 (·), then the
occurrence of any x in this subset, X0 , would be proof that the agent had failed
to take the desired action. We can use this, then, to design a contract that
induces a = 1, but which costs the principal no more than the optimal full-
information fixed-payment contract S(x) = sF 1 . That is, the incentive problem
ceases to be costly; so, not surprisingly, its shadow cost is zero.
To see how we can construct such a contract when f1 (x) = 0 for all x ∈ X0 ,
let
s + ε , if x ∈ X0
S(x) = ,
sF
1 , if x ∈/ X0
where ε > 0 is arbitrarily small (s, recall, is the greatest lower bound of the
domain of u(·)). Then
Z ∞

u S(x) dF1 (x) = u sF 1 and
Z0 ∞ Z Z

u S(x) dF0 (x) = u(s + ε)dF0 (x) + u sF1 dF0 (x)
0 X0 R+ \X0

= u (s + ε) F0 (X0 ) + u sF
1 1 − F0 (X0 ) .
13.5 Monotonicity of the Optimal Contract 187

R∞
From the last expression, it’s clear that 0 u S(x) dF0 (x) → −∞ as ε → 0;
hence, the ic constraint is met trivially. By the definition of sF
1 , ir is also met.
We see, therefore, that this contract implements a = 1 at full-information cost.
Again, as we saw in the two-outcome model, having a shifting support (i.e., the
property that F0 (X0 ) > 0 = F1 (X0 )) allows us to implement the desired action
at full-information cost.
To conclude this section, we need to answer one final question. Does the
principal prefer to induce a = 1 or a = 0 given the “high” cost of the former?
The principal’s choice can be viewed as follows: either she offers the fixed-
payment contract sF 0 , which induces action a = 0, or she offers the contract S(·)
derived above, which induces action a = 1. The expected-profit-maximizing
choice results from the simple comparison of these two contracts; that is, the
principal offers the incentive contract S(·) if and only if

E0 {x} − sF
0 < E1 {x} − E1 S(x) . (13.9)

The rhs of this inequality corresponds to the value of the maximization program
(13.2). Given that the incentive constraint is binding in this program, this value
is strictly smaller than the value of the same program without the incentive
constraint; hence, just as we saw in the two-outcome case, the value is smaller
than full-information profits, E1 {x} − sF
1 . Observe, therefore, that it is possible
that

E1 {x} − sF F
1 > E0 {x} − s0 > E1 {x} − E1 S(x) .

In other words, under full information, the principal would induce a = 1, but
not if there’s a hidden-action problem. In this case, imperfect observability of
the agent’s action imposes a cost on the principal that may induce her to distort
the action that she induces the agent to take.

Monotonicity of the Optimal

Contract 13.5
Let us suppose that (13.9) is, indeed, satisfied so that the contract S(·) de-
rived above is the optimal contract. Can we exhibit additional and meaningful
assumptions that would imply interesting properties of the optimal contract?
We begin with monotonicity, the idea that better outcomes (higher xs)
should mean greater compensation for the agent. As we saw above (page 185),
there is no guarantee in the multiple-outcome model that this property should
hold everywhere. From (13.8), it does hold if and only if the likelihood ratio,
r(·), is not decreasing and increasing at least somewhere. As this is an important
property, it has a name:

Definition 13.1 The likelihood ratio r(x) = f0 (x)/f1 (x) satisfies the monotone
likelihood ratio property( mlrp) if r(·) is non-increasing almost everywhere and
strictly decreasing on at least some set of outcomes that occur with positive
probability given action a = 1.
188 Lecture Note 13: Basic Two-Action Model

The mlrp states that the greater is the outcome (i.e., x), the greater the relative
probability of x given a = 1 than given a = 0. In other words, under mlrp,
better outcomes are more likely when the agent pursues the desired action than
when he doesn’t. To summarize:

Proposition 13.1 In the model of this section, if the likelihood ratio, r(·), sat-
isfies the monotone likelihood ratio property, then the optimal incentive contract
for inducing him to choose a = 1 is non-decreasing everywhere.

In fact, because we know that S(·) can’t be constant—a fixed-payment contract

is not incentive compatible for inducing a = 1—we can further conclude that
S(·) must, therefore, be increasing over some set of x.12
Is mlrp a reasonable assumption? To some extent is simply a strengthening
of our assumption that E1 {x} > E0 {x}, given it can readily be shown that mlrp
implies E1 {x} > E0 {x}.13 Moreover, many standard distributions satisfy mlrp.
But it quite easy to exhibit meaningful distributions that do not. For instance,
consider our example above (page 185). We could model these distributions as
the consequence of a two-stage stochastic phenomenon: with probability 1/3,
an event occurs that guarantees the low outcome (i.e., xL = 0) regardless of
the agent’s action. With probability 2/3, this event does not occur and it is
“business as usual,” with the possible outcomes being xM and xH ; the former
is more likely than the latter if the agent doesn’t choose a = 1 and it less likely
than the latter if he does. These compound distributions do not satisfy mlrp.
In such a situation, mlrp is not acceptable and the optimal reward schedule is
not monotonic, as we saw.

12 Even if mlrp does not hold, r(·) must be decreasing over some measurable range. To see

this, suppose it were not true; that is, suppose that r(·) is almost everywhere non-decreasing
under distribution F1 . Note this entails that x and r(x) are non-negatively correlated under
F1 . To make the exposition easier, suppose for the purpose of this aside that fa (·) = Fa′ (·).
Then Z ∞ Z ∞
f0 (x) f0 (x)
E1 = f1 (x)dx = f0 (x)dx = 1
f1 (x) 0 f1 (x) 0
Because x and r(x) are non-negatively correlated, we have
Z ∞

x r(x) − E1 r(x) f1 (x)dx ≥ 0 .
0

Substituting, this implies

Z ∞ Z ∞ Z ∞
f0 (x)
0≤ x − 1 f1 (x)dx = xf0 (x)dx − xf1 (x)dx = E0 {x} − E1 {x}.
0 f1 (x) 0 0

But this contradicts our assumption that a = 1 yields a greater expected outcome than does
a = 0. Hence, by contradiction it must be that r(·) is decreasing over some measurable
set. But then this means that S(·) is increasing over some measurable set as well. However,
without mlrp, we can’t conclude that it’s not also decreasing over some other measurable set.
Conclusion: If E0 {x} < E1 {x}, then S(·) is increasing over some set of x that has positive
probability of occurring given action a = 1 even if mlrp does not hold.
13 This can most readily be seen from the previous footnote: simply assume that r(·) satisfies

mlrp, which implies x and r(x) are negatively correlated. Then, following the remaining steps,
it quickly falls out that E1 {x} > E0 {x}.
13.6 Informativeness of the Performance Measure 189

Although the early literature on agency devoted considerable attention to

the monotonicity issue, it may have been overemphasized. If we return to the
economic reality that the mathematics seeks to capture, the discussion relies
on the assumption that a payment schedule with the feature that the agent is
penalized for improving outcomes does not actual induce a new agency problem.
To wit, couldn’t such “perverse” incentives encourage the agent to sabotage the
outcome? In other words, if the agent can freely and secretly diminish his
performance, then it makes no sense for the principal to have a reward schedule
that is decreasing with performance over some range. In short, there is often
an economic justification for monotonicity even when mlrp doesn’t hold.

Informativeness of the
Performance Measure 13.6
In this section, we explore again the role played by the informativeness of the
performance measure. In particular, we ask what if the principal has multiple
performance measures on which she could base a contract?
To be more concrete, suppose that there is a second performance measure
y. For example, if the agent is a salesperson, x could be his sales of one good,
y his sales of a second good. Both outcomes speak to the salesperson’s overall
effort.
Let f0 (x, y) and f1 (x, y) denote the joint probability densities of x and y
for actions 0 and 1, respectively. An incentive contract can now be a function
of both performance variables; that is, s = S(x, y). It is immediate that the
same approach as before carries through and yields the following optimality
condition:14

f0 (x, y)
u′ S(x, y) λ + µ 1 − − 1 = 0. (13.10)
f1 (x, y)
When is it optimal to make compensation a function of y as well as of x?
The answer is straightforward: when the likelihood ratio,
f0 (x, y)
r(x, y) = ,
f1 (x, y)
actually depends upon y. Conversely, when the likelihood ratio is independent
of y, then there is no gain from contracting on y to induce a = 1; indeed, it
would be sub-optimal in this case because such a compensation scheme would
fail to satisfy (13.10).
The likelihood ratio is independent of y if and only if the following holds:
there exist three functions h(·, ·), g0 (·), and g1 (·) such that, for all (x, y),

fa (x, y) = h(x, y)ga (x). (13.11)

14 Although we use the same letters for the Lagrange multipliers, it should be clear that

their values at the optimum are not related to their values in the previous, one-performance-
measure, contracting problem.
190 Lecture Note 13: Basic Two-Action Model

Sufficiency is obvious: divide f0 (x, y) by f1 (x, y) and observe the resulting ra-
tio, g0 (x)/g1 (x), is independent of y. Necessity is also straightforward: set
h(x, y) = f1 (x, y), g1 (x) = 1, and g0 (x) = r(x). This condition of multiplica-
tive separability, (13.11), has a well-established meaning in statistics: if (13.11)
holds, then x is a sufficient statistic for the action a given data (x, y). In words,
were we trying to infer a, our inference would be just as good if we observed
only x as it would be if we observed the pair (x, y). That is, conditional on
knowing x, y tells us nothing more about a.
The irrelevance of y when x is a sufficient statistic for a is quite intuitive.
Recall that the value of performance measures to our contracting property rests
solely on their statistical properties. The optimal contract should be based on
all performance measures that convey information about the agent’s decision;
but it is not desirable to include performance measures that are statistically
redundant with other measures. As a corollary, there is no gain from considering
ex post random contracts (e.g., a contract that based rewards on x + η, where η
is some random variable—noise—distributed independently of a that is added
to x). As a second corollary, if the principal could freely eliminate noise in the
performance measure—that is, switch from observing x + η to observing x—she
would do better (at least weakly).

Conclusions from the

Two-action Model 13.7
It may be worth summing up all the conclusions we have reached from the
two-action model in a proposition:

Proposition 13.2 If the agent is strictly risk averse, there is no shifting sup-
port, and the principal seeks to implement the action the agent finds costly (i.e.,
a = 1), then the principal’s expected payoffs are smaller than under full (perfect)
information. In some instances, this reduction in expected payoffs may lead the
principal to implement the less costly action (i.e., a = 0).

• When (13.9) holds, the reward schedule imposes risk on the risk-averse
agent: performances that are more likely when the agent takes the correct
action a = 1 are rewarded more than performances that are more likely
under a = 0.
• Under mlrp (or when the agent can sabotage outcomes), the optimal re-
ward schedule is non-decreasing in performance.
• The optimal reward schedule depends only upon performance measures that
are sufficient statistics for the agent’s action.

To conclude, let me stress the two major themes that I would like you to
remember from this section. First, imperfect information implies that the con-
tractual reward designed by the principal should perform two tasks: share the
risks involved in the relationship and provide incentives to induce the agent
13.7 Conclusions from the Two-action Model 191

to undertake the desired action. Except in trivial cases (e.g., a risk-neutral

agent or a shifting support), these two goals are in conflict. Consequently, the
optimal contract may induce an inefficient action and a Pareto suboptimal shar-
ing of risk.15 Second, the optimal reward schedule establishes a link between
rewards and performances that depends upon the statistical properties of the
performance measures with respect to the agent’s action.

Bibliographic Notes

The above analysis is fairly standard. The two-step approach—first determine,

separately, the optimal contracts for implementing a = 0 and a = 1, then choose
which yields greater profits—is due to Grossman and Hart (1983). The analysis,
in the two-outcome case, when q varies is also based on their work. They also
consider the monotonicity of the optimal contract, although our analysis here
draws more from Holmstrom (1979). Holmstrom is also the source for the
sufficient-statistic result. Finally, the expression

1 f0 (x)
=λ+µ 1− ,
u′ S(x) f1 (x)

which played such an important part in our analysis, is sometimes referred to

as the modified Borch sharing rule, in honor of Borch (1968), who worked out
the rules for optimal risk sharing absent a moral-hazard problem (hence, the
adjective “modified”).

15 Think of this as yet another example of the two-instruments principle.

192 Lecture Note 13: Basic Two-Action Model
General Framework 14
As we’ve just seen, the two-action model yields strong results. But the model
incorporates a lot of structure and it relies on strong assumptions. Consequently,
it’s hard to understand which findings are robust and which are merely artifacts
of an overly simple formalization. The basic ideas behind the incentive model
are quite deep and it is worthwhile, therefore, to consider whether and how they
generalize in less constrained situations.
Our approach is to propose a very general framework that captures the
situation described in the opening section. Such generality comes at the cost of
tractability, so we will again find ourselves making specific assumptions. But
doing so, we will try to motivate the assumptions we have to make and discuss
their relevance or underline how strong they are.
The situation of incentive contracting under hidden action or imperfect mon-
itoring involves:

• a principal;

• an agent;

• a set of possible actions, A, from which the agent chooses (we take A to
be exogenously determined here);

• a set of verifiable signals or performance measures, X ;

• a set of benefits, B, for the principal that are affected by the agent’s action
(possibly stochastically);

• rules (functions, distributions, or some combination) that relate elements

of A, X , and B;

• preferences for the principal and agent; and

• a bargaining game that establishes the contract between principal and

agent (here, recall, we’ve fixed the bargaining game as the principal makes
a take-it-or-leave-it offer, so that the only element of the bargaining game
of interest here is the agent’s reservation utility, UR ).1

1 We could also worry about whether the principal wants to participate—even make a take-

it-or-leave-it offer—but because our focus is on the contract design and its execution, stages
of the game not reached if she doesn’t wish to participate, we will not explicitly consider this
issue here.

193
194 Lecture Note 14: General Framework

In many settings, including the one explored above, the principal’s benefit is
the same as the verifiable performance measure (i.e., b = x). But this need not
be the case. We could, for instance, imagine that there is a function mapping
the elements of A onto B. For example, the agent’s action could be fixing
the “true” quality of a product produced for the principal. This quality is also,
then, the principal’s benefit (i.e., b = a). The only verifiable measure of quality,
however, is some noisy (i.e., stochastic) measure of true quality (e.g., x = a + η,
where η is some randomly determined distortion). As yet another possibility,
the benchmark case of full information entails X = X ′ × A, where X ′ is some
set of performance measures other than the action.
We need to impose some structure on X and B and their relationship to A:
We take X to be a Euclidean vector space and we let dF (·|a) denote the prob-
ability measure over X conditional on a. Similarly, we take B to be a Euclidean
vector space and we let dG (·, ·|a) denote the joint probability measure over B
and X conditional on a (when b ≡ x, we will write dF (·|a) instead of dG (·, ·|a)).
This structure is rich enough to encompass the possibilities enumerated in the
previous paragraph (and more).
Although we could capture the preferences of the principal and agent without
assuming the validity of the expected-utility approach to decision-making under
uncertainty (we could, for instance, take as primitives the indifference curves
shown in Figures 13.1 and 13.2), this approach has not been taken in the lit-
erature.2 Instead, the expected-utility approach is assumed to be valid and we
let W (s, x, b) and U (s, x, a) denote the respective von Neumann-Morgenstern
utilities of the principal and of the agent, where s denotes the transfer from the
principal to the agent (to principal from agent if s < 0).
In this situation, the obvious contract is a function that maps X into R. We
define such a contract as

Definition 14.1 A simple incentive contract is a reward schedule S : X → R

that determines the level of reward s = S(x) to be decided as a function of the
realized performance level x.

There is admittedly no other verifiable variable that can be used to write

more elaborate contracts. There is, however, the possibility of creating verifiable
variables, by having one or the other or both players take verifiable actions from
some specified action spaces. Consistent with the mechanism-design approach,
the most natural interpretation of these new variables are that they are public
announcements made by the players; but nothing that follows requires this
interpretation. For example, suppose both parties have to report to the third
party charged with enforcing the contract their observation of x, or the agent
2 Given some well-documented deficiencies in expected-utility theory (see, e.g., Epstein,

1992; Rabin, 1997), this might, at first, seem somewhat surprising. However, as Epstein, §2.5,
notes many of the predictions of expected-utility theory are robust to relaxing some of the
more stringent assumptions that support it (e.g., such as the independence axiom). Given the
tractability of the expected-utility theory combined with the general empirical support for the
predictions of agency theory, the gain from sticking with expected-utility theory would seem
to outweigh the losses, if any, associated with that theory.
195

must report which action he has chosen. We could even let the principal make
a “good faith” report of what action she believes the agent took, although
this creates its own moral-hazard problem because, in most circumstances, the
principal could gain ex post by claiming she believes the agent’s action was
unacceptable. It turns out, as we will show momentarily, that there is nothing
to be gained by considering such elaborate contracts; that is, there is no such
contract that can improve over the optimal simple contract.
To see this, let us suppose that a contract determines a normal-form game to
be played by both players after the agent has taken his action.3,4 In particular,
suppose the agent takes an action h ∈ H after choosing his action, but prior to
the realization of x; that he takes an action m ∈ M after the realization of x; and
that the principal also takes an action n ∈ N after x has been realized. One or
more of these sets could, but need not, contain a single element, a “null” action.
We assume that the actions in these sets are costless—if we show that costless
elaboration does no better than simple contracts, then costly elaboration also
cannot do better than simple contracts. Finally, let the agent’s compensation
under this elaborate contract be: s = S̃(x, h, m, n). We can now establish the
following:

Proposition
D 14.1 (Simple
E contracts are sufficient) For any general con-
tract H, M, N , S̃ (·) and associated (perfect Bayesian) equilibrium, there ex-
ists a simple contract S(·) that yields the same equilibrium outcome.

Proof: Consider a (perfect Bayesian) equilibrium of the original contract, in-

volving strategies (a∗ , h∗ (·) , m∗ (·, ·)) for the agent, where h∗ (a) and m∗ (a, x)
describe the agent’s choice within H and M after he’s taken action a and per-
formance x has been observed. Similarly, n∗ (x) gives the principal’s choice of
action as a function of the observed performance. Let us now consider the simple
contract defined as follows: For all x ∈ X ,

S(x) ≡ S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)).

Suppose that, facing this contract, the agent chooses an action a different from
a∗ . This implies that:
Z Z
U (S(x), x, a)dF (x|a) > U (S(x), x, a∗ )dF (x|a∗ ),
X X

3 Note this may require that there be some way that the parties can verify that the agent

has taken an action. This may simply be the passage of time: The agent must take his action
before a certain date. Alternatively, there could be a verifiable signal that the agent has acted
(but which does not reveal how he’s acted).
4 Considering a extensive-form game with the various steps just considered would not alter

the reasoning that follows; so, we avoid these unnecessary details by restricting attention to
a normal-form game.
196 Lecture Note 14: General Framework

or, using the definition of S(·),

Z
U S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)), x, a dF (x|a) >
ZX
U S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)), x, a∗ dF (x|a∗ ).
X

Because, in the equilibrium of the normal-form game that commences after the
agent chooses his action, h∗ (·) and m∗ (·, ·) must satisfy the following inequality:
Z
U S̃(x, h∗ (a) , m∗ (x, a), n∗ (x)), x, a dF (x|a) ≥
ZX
U S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)), x, a dF (x|a),
X

it follows that
Z
U S̃(x, h∗ (a) , m∗ (x, a), n∗ (x)), x, a dF (x|a) >
ZX
U S̃(x, h∗ (a∗ ) , m∗ (x, a∗ ), n∗ (x)), x, a∗ dF (x|a∗ ).
X

This contradicts the fact the a∗ is an equilibrium action in the game defined
by the original contract. Hence, the simple contract S(·) gives rise to the same
action choice, and therefore the same distribution of outcomes than the more
complicated contract.

As a consequence, there is no need to consider sophisticated announcement

mechanisms in this setting, at least in the simple situation we have described.
The style in which we proved Proposition 14.1 should be familiar.
The contracting problem under imperfect information can now easily be
stated. The principal, having the bargaining power in the negotiation process,
simply has to choose a (simple) contract, S(·), so as to maximize her expected
utility from the relationship given two constraints. First, the contract S(·)
induces the agent to choose an action that maximizes his expected utility (i.e.,
the ic constraint must be met). Second, given the contract and the action it
will induce, the agent must receive an expected utility at least as great as his
reservation utility (i.e., the ir constraint must be met). In this general setting,
the ic constraint can be stated as the action induced, a, must satisfy
Z
a ∈ argmax U (S(x), x, a′ )dF (x|a′ ) . (14.1)
a′ X

Observe that choosing S(·) amounts to choosing a as well, at least when there
exists a unique optimal choice for the agent. To take care of the possibility of
multiple optima for the agent, one can simply imagine that the principal chooses
a pair (S(·), a) subject to the incentive constraint (14.1). The ir constraint takes
the simple form: Z
max′
U (S(x), x, a′ )dF (x|a′ ) ≥ UR . (14.2)
a X
197

The principal’s problem is, thus,

Z
max W (S(x), x, b)dG(b, x|a) (14.3)
(S(·),a) X
s.t. (14.1) and (14.2).

Observe, as we did in Lecture Note 13, it is perfectly permissible to solve this

maximization program in two steps. First, for each action a, find the expected-
profit-maximizing contract that implements action a subject to the ic and ir
constraints; this amounts to solving a similar program, taking action a as fixed:
Z
max W (S(x), x, b)dG(b, x|a) (14.4)
S(·) X
s.t. (14.1) and (14.2).

Second, optimize the principal’s objectives with respect to the action to be

implemented; if we let Sa (·) denote the expected-profit-maximizing contract for
implementing a, this second step consists of:
Z
max W (Sa (x), x, b)dG(b, x|a).
a∈A X

In this more general framework, it’s worth revisiting the full-information

benchmark. Before doing that, however, it is worth assuming that the domain
of U (·, x, a) is sufficiently broad:

• Existence of a punishment: There exists some sP in the domain of

U (·, x, a) such that, for all a ∈ A,
Z
U (sP , x, a)dF (x|a) < UR .
X

• Existence of a sufficient reward: There exists some sR in the do-

main of U (·, x, a) such that, for all a ∈ A,
Z
U (sR , x, a)dF (x|a) ≥ UR .
X

In light of the second assumption, we can always satisfy (14.2) for any action a
(there is no guarantee, however, that we can also satisfy (14.1)).
With these two assumptions in hand, suppose that we’re in the full-infor-
mation case; that is, X = X ′ × A (note X ′ could be a single-element space, so
that we’re also allowing for the possibility that, effectively, the only performance
measure is the action itself). In the full-information case, the principal can rely
on forcing contracts; that is, contracts that effectively leave the agent with no
198 Lecture Note 14: General Framework

choice over the action he chooses. Hence, writing (x′ , a) for an element of X , a
forcing contract for implementing â is

S(x′ , a) = sP if a 6= â
= S F (x′ ) if a = â,

where S F (·) satisfies (14.2). Given that S F (x′ ) = sR satisfies (14.2) by as-
sumption, we know that we can find an S F (·) function that satisfies (14.2).
In equilibrium, the agent will choose to sign the contract—the ir constraint is
met—and he will take action â since this is his only possibility for getting at
least his reservation utility. Forcing contracts are very powerful because they
transform the contracting problem into a simple ex ante Pareto computation
program:
Z
max W (S(x), x, b)dG(b, x|a) (14.5)
(S(·),a) X
s.t. (14.2),

where only the agent’s participation constraint matters. This ex ante Pareto
program determines the efficient risk-sharing arrangement for the full-infor-
mation optimal action, as well as the full-information optimal action itself. Its
solution characterizes the optimal contract under perfect information.
At this point, we’ve gone about as far as we can go without imposing more
structure on the problem. The next couple of Lecture Notes consider more
structured variations of the problem.
The Finite Model 15
In this Lecture Note, we will assume A, the set of possible actions, is finite with
J elements. Likewise, the set of possible verifiable performance measures, X ,
is also taken to be finite with N elements, indexed by n (although, at the end
of this section, we’ll discuss the case where X = R). In many ways, this is the
most general version of the principal-agent model (but, alas, not necessarily the
most analytically tractable one).
Assume that the agent’s utility is additively separable between payments
and action. Moreover, it is not, directly, dependent on performance. Hence,

U (s, x, a) = u (s) − c(a);

where u : S → R maps some subset S of R into R and c : A → R maps the

action space into R. As before, assume that S = (s, ∞), where u (s) → −∞ as
s ↓ s. Observe that this assumption entails the existence of a punishment, sP ,
as described in the previously. We further assume that u(·) is strictly monotonic
and concave (at least weakly). Typically, we will assume that u(·) is, in fact,
strictly concave, implying the agent is risk averse. Note, that the monotonicity
of u(·) implies that the inverse function u−1 (·) exists and, since u(S) = R, is
defined for all u ∈ R.
We assume, now, that B ⊂ R and that the principal’s utility is a function
only of the difference between her benefit, b, and her payment to the agent; that
is,
W (s, x, b) = w (b − s) ,
where w (·) is assumed to be strictly increasing and concave. In fact, in most
applications, it is reasonable to assume that the principal is risk neutral. We
will maintain that assumption here (the reader interested in the case of a risk-
averse principal should consult Holmstrom, 1979, among other work). In what
follows, let B(a) = E{b|a}.
In addition to being discrete, we assume that there exists some partial order
on X (i.e., to give meaning to the idea of “better” or “worse” performance) and
that, with respect to this partial order, X is a chain (i.e., if is the partial
order on X , then x x′ or x′ x for any two elements, x and x′ in X ). Because
identical signals are irrelevant, we may also suppose that no two elements of X
are the same (i.e., x ≺ x′ or x′ ≺ x for any two elements in X ). The most
natural interpretation is that X is a subset of distinct real numbers—different
“performance scores”—with ≤ as the partial order. Given these assumptions,
we can write X = {x1 , . . . , xN }, where xm ≺ xn if m < n. Likewise the

199
200 Lecture Note 15: The Finite Model

distribution function F (x|a) gives, for each x, the probability that an x′ x

is realized conditional on action a. The corresponding density function is then
defined by
F (x1 ) , if n = 1
f (xn |a) = .
F (xn |a) − F (xn−1 |a) , if n > 1
In much of what follows, it will be convenient to write fn (a) for f (xn |a). It
will also be convenient to write the density as a vector:

f (a) = (f1 (a) , . . . , fN (a)) .

The “Two-step” Approach 15.1

As we have done already in Lecture Note 13, we will pursue a two-step approach
to solving the principal-agent problem:

Step 1: For each a ∈ A, the principal determines whether it can be imple-

mented. Let AI denote the set of implementable actions. For each a ∈ AI ,
the principal determines the least-cost contract for implementing a sub-
ject to the ic and ir constraints. Let C (a) denote the principal’s expected
cost (expected payment) of implementing a under this least-cost contract.
Step 2: The principal then determines the solution to the maximization prob-
lem
max B (a) − C (a) .
a∈AI

If a∗ is the solution to this maximization problem, the principal offers the

least-cost contract for implementing a∗ .

Note that this two-step process is analogous to a standard production prob-

lem, in which a firm, first, solves its cost-minimization problems to determine
the least-cost way of producing any given amount of output (i.e., derives its
cost function); and, then, it produces the amount of output that maximizes the
difference between revenues (benefits) and cost. As with production problems,
the first step is generally the harder step.

The full-information benchmark

As before, we consider as a benchmark the case where the principal can observe
and verify the agent’s action. Consequently, as we discussed at the end of
Lecture Note 14, the principal can implement any action â that she wants using
a forcing contract: The contract punishes the agent sufficiently for choosing
actions a 6= â that he would never choose any action other than â; and the
contract rewards the agent sufficiently for choosing â that he is just willing to
sign the principal’s contract. This last condition can be stated formally as

u (ŝ) − c (â) = UR , (irF )

15.1 The “Two-step” Approach 201

where ŝ is what the agent is paid if he chooses action â. Solving this last
expression for ŝ yields

ŝ = u−1 UR + c (â) ≡ C F (â) .

The function C F (·) gives the cost, under full information, of implementing ac-
tions.

The hidden-action problem

Now, and henceforth, we assume that a hidden-action problem exists. Conse-

quently, the only feasible contracts are those that make the agent’s compensation
contingent on the verifiable performance measure. Let s(x) denote the payment
made to the agent under such a contract if x is realized. It will prove convenient
to write sn for s(xn ) and to consider the compensation vector s = (s1 , . . . , sN ).
The optimal—expected-cost-minimizing—contract for implementing â (assum-
ing it can be implemented) is the contract that solves the following program:1

min f (â) · s
s

subject to
N
X
fn (â)u(sn ) − c(â) ≥ UR
n=1

(the ir constraint) and

N
X
â ∈ max fn (a)u(sn ) − c(a)
a
n=1

(the ic constraint—see (14.1)). Observe that an equivalent statement of the ic

constraint is
N
X N
X
fn (â)u(sn ) − c(â) ≥ fn (a)u(sn ) − c(a) ∀a ∈ A .
n=1 n=1

As we’ve seen above, it is often easier to work in terms of utility payments

than in terms of monetary payments. Specifically, because u(·) is invertible,
we can express a contract as an N -dimensional vector of contingent utilities,
u = (u1 , . . . , uN ), where un = u(sn ). Using this “trick,” the principal’s program
becomes
N
X
min fn (â)u−1 (un ) (15.1)
u
n=1

subject to
f (â) · u − c(â) ≥ UR (ir)

1 Observe, given the separability between the principal’s benefit and cost, minimizing her

expected wage payment is equivalent to maximizing her expected profit.

202 Lecture Note 15: The Finite Model

and
f (â) · u − c(â) ≥ f (a) · u − c(a) ∀a ∈ A . (ic)

Definition 15.1 An action â is implementable if there exists at least one con-

tract solving ( ir) and ( ic).

A key result is the following:

Proposition 15.1 If â is implementable, then there exists a contract that im-

plements â and satisfies ( ir) as an equality. Moreover, ( ir) is met as an equality
(i.e., is binding) under the optimal contract for implementing â.

Proof: Suppose not: let u be a contract that implements â and suppose that

f (â) · u − c(â) > UR .

Define
ε = f (â) · u − c(â) − UR .

By assumption, ε > 0. Consider a new contract, u e , where ũn = un − ε. By

construction, this new contract satisfies (ir). Moreover, because

f (a) · u
e = f (a) · u − ε

for all a ∈ A, this new contract also satisfies (ic). Observe, too, that this new
contract is superior to u: it satisfies the constraints, but costs the principal less.
Hence, a contract cannot be optimal unless (ir) is an equality under it.

In light of this proposition, it follows that an action â can be implemented

if there is a contract u that solves the following system:

f (â) · u − c(â) = UR (15.2)

and

f (a) · u − c(a) ≤ UR ∀a ∈ A\{â} (15.3)

(where (15.3) follows from (ic) and (15.2)). We are now in position to establish
the following proposition:

Proposition 15.2 Action â is implementable if and only if there is no strategy

for the agent that induces the same density over signals as â and which costs
the agent less, in terms of expected disutility, than â (where “strategy” refers to
mixed, as well as, pure strategies).
15.1 The “Two-step” Approach 203

Proof: Let j = 1, . . . , J − 1 index the elements in A other than â. Then the
system (15.2) and (15.3) can be written as J + 1 inequalities:

f (â) · u ≤ UR + c(â)
[−f (â)] · u ≤ −UR − c(â)
f (a1 ) · u ≤ UR + c(a1 )
..
.
f (aJ−1 ) · u ≤ UR + c(aJ−1 )

By a well-known result in convex analysis (see, e.g., Rockafellar, 1970, page 198),
there is a u that solves this system if and only if there is no vector

µ = (µ̂+ , µ̂− , µ1 , . . . , µJ−1 ) ≥ 0J+1

(where 0K is a K-dimensional vector of zeros) such that

J−1
X
µ̂+ f (â) + µ̂− [−f (â)] + µj f (aj ) = 0N (15.4)
j=1

and

J−1
X
µ̂+ UR + c(â) + µ̂− − UR − c(â) + µj UR + c(aj ) < 0 . (15.5)
j=1

Observe that if such a µ exists, then (15.5) entails that not all elements can be
zero. Define µ∗ = µ̂+ −µ̂− . By post-multiplying (15.4) by 1N (an N -dimensional
vector of ones), we see that
J−1
X
µ∗ + µj = 0. (15.6)
j=1

Equation (15.6) implies that µ∗ < 0. Define σj = µj /(−µ∗ ). By construction

each σ ≥ 0 (with at least some being strictly greater than 0) and, from (15.6),
PJ−1 j
j=1 σj = 1. Hence, we can interpret these σj as probabilities and, thus, as
a mixed strategy over the elements of A\{â}. Finally, dividing both sides of
(15.4) and (15.5) by −µ∗ and rearranging, we see that (15.4) and (15.5) are
equivalent to
J−1
X
f (â) = σj f (aj ) (15.7)
j=1

and
J−1
X
c(â) > σj c(aj ) ; (15.8)
j=1
204 Lecture Note 15: The Finite Model

that is, there is a contract u that solves the above system of inequalities if and
only if there is no (mixed) strategy that induces the same density over the per-
formance measures as â (i.e., satisfies (15.7)) and that has lower expected cost
(i.e., satisfies (15.8)).

The truth of the necessity condition (only if part) of Proposition 15.2 is

straightforward: were there such a strategy—one that always produced the
same expected utility over outcomes as â, but which cost the agent less than
â—then it would clearly be impossible to implement â as a pure strategy. What
is less obvious is the sufficiency (if part) of the proposition. Intuitively, if the
density over the performance measure induced by â is distinct from the density
induced by any other strategy, then the performance measure is informative
with respect to determining whether â was the agent’s strategy or whether he
played a different strategy. Because the range of u(·) is unbounded, even a
small amount of information can be exploited to implement â by rewarding the
agent for performance that is relatively more likely to occur when he plays the
strategy â, or by punishing him for performance that is relatively unlikely to
occur when he plays the strategy â, or both.2 Of course, even if there are other
strategies that induce the same density as â, â is still implementable if the agent
finds these other strategies more costly than â.
Before solving the principal’s problem (Step 1, page 200), it’s worth con-
sidering, and then dismissing, two “pathological” cases. The first is the ability
to implement least-cost actions at their full-information cost. The second is
the ability to implement any action at its full-information cost when there is a
shifting support (of the right kind).

Definition 15.2 An action e a ∈ arg mina∈A c(a). That

a is a least-cost action if e
is, e
a is a least-cost action if the agent’s disutility from choosing any other action
is at least as great as his disutility from choosing e a.

Proposition 15.3 If e a is a least-cost action, then it is implementable at its

full-information cost.

Proof: Consider the fixed-payment contract that pays the agent un = UR +c(e a)
for all n. This contract clearly satisfies (ir) and, because c(e a) ≤ c(a) for
all a ∈ A, it also satisfies (ic). The cost of this contract to the principal is

2 We can formalize this notion of informationally distinct as follows: the condition that
no strategy duplicate the density over performance measures induced by â is equivalent to
saying that there is no density (strategy) (σ1 , . . . , σJ−1 ) over the other J − 1 elements of A
such that
J−1
X
f (â) = σj f (aj ) .
j=1
Mathematically, that’s equivalent to saying that f (â) is not a convex combination of
{f (a)}a∈A\{â} ; or, equivalently that f (â) is not in the convex hull of {f (a)|a 6= â}. See
Hermalin and Katz (1991) for more on this “convex-hull” condition and its interpretation. Fi-
nally, from Proposition 15.2, the condition that f (â) not be in the convex hull of {f (a)|a 6= â}
is sufficient for â to be implementable.
15.1 The “Two-step” Approach 205

u−1 UR + c(e
a) = C F (e
a), the full-information cost.

Of course, there is nothing surprising to Proposition 15.3: When the princi-

pal wishes to implement a least-cost action, her interests and the agent’s are
perfectly aligned; that is, there is no agency problem. Consequently, it is not
surprising that the full-information outcome obtains.
Definition 15.3 There is a meaningful shifting support associated with action
a if there exists a subset of X , X0 , such that F (X0 |a) > 0 = F (X0 |e
e a) for all
actions a such that c(a) < c(ea).
Proposition 15.4 Let there be a meaningful shifting support associated with
action e
a. Then action e
a is implementable at its full-information cost.
Proof: Fix some arbitrarily small ε > 0 and define uP = u(s + ε). Consider
the contract u that sets um = uP if xm ∈ X0 (where X0 is defined above) and
that sets un = UR + c(ea) if xn ∈
/ X0 . It follows that f (e
a) · u = UR + c(e
a); that
f (a) · u → −∞ as ε ↓ 0 for a such that c(a) < c(ea); and that
f (a) · u − c(a) ≤ UR + c(e
a) − c(a) ≤ f (e
a) · u − c(e
a)
for all a such that c(a) ≥ c(e
a). Consequently, this contract satisfies (ir) and (ic).

Moreover, the equilibrium cost of this contract to the principal is u−1 UR +c(e a) ,
the full-information cost.

Intuitively, when there is a meaningful shifting support, observing an x ∈ X0

is proof that the agent took an action other than e a. Because the principal has
this proof, she can punish the agent as severely as she wishes when such an x
appears (in particular, she doesn’t have to worry about how this punishment
changes the risk faced by the agent, given the agent is never in jeopardy of
suffering this punishment if he takes the desired action, e a).3 Moreover, such a
draconian punishment will deter the agent from taking an action that induces
a positive probability of suffering the punishment. In effect, such actions have
been dropped from the original game, leaving e a as a least-cost action of the new
game. It follows, then, from Proposition 15.3, that e a can be implemented at its
full-information cost.
It is worth noting that the full-information benchmark is just a special case of
Proposition 15.4, in which the support of f (a) lies on a separate plane, X ′ × {a},
for each action a.
3 Itis worth noting that this argument relies on the principal’s being able to punish the
agent sufficiently in the case of an x ∈ X0 . Whether the use of such punishments is really
feasible could, in some contexts, rely on assumptions that are overly strong. First, that
the agent hasn’t (or can contractually waive) protections against severe punishments. For
example, in the English common-law tradition, this is generally not true; moreover, courts
in these countries are generally loath to enforce contractual clauses that are deemed to be
penalties. Second, that the agent has faith in his understanding of the distributions (i.e., he
is sure that taking action ea guarantees that an x ∈ X0 won’t occur). Third, that the agent
has faith in his own rationality; that is, in particular, he is sufficiently confident that won’t
make a mistake (i.e., choose an a such that F (X0 |a) > 0).
206 Lecture Note 15: The Finite Model

Typically, it is assumed that the action the principal wishes to implement

is neither a least-cost action, nor has a meaningful shifting support associated
with it. Henceforth, we will assume that the action that principal wishes to
implement, â, is not a least-cost action (i.e., ∃a ∈ A such that c(a) < c(â)).
Moreover, we will rule out all shifting supports by assuming that fn (a) > 0 for
all n and all a.
We now consider whether there is a solution to Step 1 when there is no
shifting support and the action to be implemented is not a least-cost action.
That is, we ask the question: if â is implementable is there an optimal contract
for implementing it? We divide the analysis into two cases: u(·) affine (risk
neutral) and u(·) strictly concave (risk averse).
Proposition 15.5 Assume the agent is risk neutral; that is, u(·) is affine.
Assume, too, that â is implementable. Then â is implementable at its full-
information cost.
Proof: Let u solve (ir) and (ic). From Proposition 15.1, we may assume that
(ir) is binding. Then, because u(·) and, thus, u−1 (·) are affine:
N N
!
X X
−1 −1
fn (â)u (un ) = u fn (â)un = u−1 UR + c(â) ,
n=1 n=1

where the last inequality follows from the fact that (ir) is binding.

Note, given that we can’t do better than implement an action at full-information

cost, this proposition also tells us that, with a risk-neutral agent, an optimal
contract exists for inducing any implementable action. The hidden-action prob-
lem (the lack of full information) is potentially costly to the principal for two
reasons. First, it may mean a desired action is not implementable. Second, even
if it is implementable, it may be implementable at a higher cost. Proposition
15.5 tells us that this second source of cost must be due solely to the agent’s
risk aversion; an insight consistent with what we saw earlier.
In fact, if we’re willing to assume that the principal’s benefit is alienable—
that is, she can sell the rights to receive it to the agent—and that the agent
is risk neutral, then we can implement the optimal full-information action, a∗
(i.e., the solution to Step 2 under full information) at full-information cost. In
other words, we can achieve the complete full-information solution in this case:
Proposition 15.6 (Selling the store) Assume that the agent is risk neutral;
that is, u(·) is affine. Assume, too, that the principal’s benefit is alienable. Then
the principal can achieve the same expected utility with a hidden-action problem
as she could under full information.
Proof: Note that because u(·) is affine, there is no loss of generality in assuming
it is the identity function.
Under full information, the principal would induce a∗ where
a∗ ∈ argmax B(a) − C F (a) .
a∈A
15.1 The “Two-step” Approach 207

Define
t∗ = B(a∗ ) − C F (a∗ ) .
Suppose the principal offers to sell the right to her benefit to the agent for t∗ .
If the agent accepts, then the principal will enjoy the same expected payoff she
would have enjoyed under full information. Will the agent accept? If he accepts,
he faces the problem
Z
max (b − t∗ )dG(b|a) − c(a) .
a∈A B

This is equivalent to

max B(a) − c(a) − B(a∗ ) + c(a∗ ) + UR ; or to

a∈A

max B(a) − c(a) + UR − B(a∗ ) + c(a∗ ) + 2UR .
a∈A

Because B(a) − c(a) + UR = B(a) − C F (a), rational play by the agent con-
ditional on accepting means he will choose a∗ if he accepts. If he accepts and
plays a∗ , his utility will be UR ; hence, he will accept.

People often dismiss the case where the agent is risk neutral by claiming that
there is no agency problem because the principal could “sell the store (produc-
tive asset)” to the agent. As this last proposition makes clear, such a conclusion
relies critically on the ability to literally sell the asset; that is, if the principal’s
benefit is not alienable, then this conclusion might not hold.4 In other words, it
is not solely the agent’s risk aversion that causes problems with a hidden action.

Corollary 15.1 Assume the agent is risk neutral; that is, u(·) is affine. As-
sume, too, that the principal’s benefit equals the performance measure (i.e.,
B = X and G(·|a) = F (·|a)). Then the principal can achieve the same expected
utility with a hidden-action problem as she could under full information.

Exercise 15.1.1: Prove Corollary 15.1 (Hint: let s(x) = x − t, where t is a constant.)

Now we turn our attention to the case where u(·) is strictly concave (the
agent is risk averse). Observe (i) this entails that u−1 (·) is strictly convex; (ii)
because S is an open interval, that u(·) is continuous; and (iii) that u−1 (·) is
continuous.
4 To see this, suppose the benefit is unalienable. Assume, too, that A = {1/4, 1/2, 3/4},
√
X = {1, 2}, c(a) = a, f2 (a) = a, UR = 0, and B(a) = 4 − 4(a − 1/2)2 . Then it is readily
seen that a = 1/2. However, from Proposition 15.2, a∗ is not implementable, so the full-
∗

information outcome is unobtainable when the action is hidden (even though the agent is risk
neutral).
208 Lecture Note 15: The Finite Model

Proposition 15.7 Assume that the agent is strictly risk averse in income; that
is, u (·) is strictly concave. If â is implementable, then there exists a unique
contract that implements â at minimum expected cost.

Proof: Existence.5 Define

N
X
Ω(u) = fn (â)u−1 (un ) . (15.9)
n=1

The strict convexity and continuity of u−1 (·) implies that Ω is also a strictly con-
vex and continuous function. Observe that the principal’s problem is to choose
u to minimize Ω(u) subject to (ir) and (ic). Let U be the set of contracts
that satisfy (ir) and (ic) (by assumption, U is not empty). Were U closed and
bounded, then a solution to the principal’s problem would certainly exist be-
cause Ω is a continuous real-valued function.6 Unfortunately, U is not bounded
(although it is closed given that all the inequalities in (ir) and (ic) are weak
inequalities). Fortunately, we can artificially bound U by showing that any so-
lution outside some bound is inferior to a solution inside the bound. Consider
any contract u0 ∈ U and consider the contract u∗ , where u∗n = UR + c(â). Let
U ir be the set of contracts that satisfy (ir). Note that U ⊂ U ir . Note, too, that
both U and U ir are convex sets.

Exercise 15.1.2: Prove that U is a convex set.

Exercise 15.1.3: Prove that U ir is a convex set.

Because Ω has a minimum on U ir , namely u∗ , the set

V ≡ u ∈ U ir Ω(u) ≤ Ω(u0 )

is closed, bounded, and convex.7 By construction, U ∩V is non-empty; moreover,

for any u1 ∈ U ∩ V and any u2 ∈ U \V, Ω(u2 ) > Ω(u1 ). Consequently, nothing
is lost be limiting the search for an optimal contract to U ∩ V. The set U ∩ V
is closed and bounded and Ω is continuous, hence it follows that an optimal
contract must exist.
Uniqueness. Suppose the optimal contract, u, were not unique. That is,
e such that Ω(u) = Ω(e
there exists another contract u u) (where Ω(·) is defined by

5 The existence portion of this proof is somewhat involved mathematically and can be

omitted without affecting later comprehension of the material.

6 This is a well-known result from analysis (see, e.g., Fleming, 1977, page 49).
7 The convexity of V follows because Ω is a convex function and U ir is a convex set. That
V is closed follows given U ir is also closed. To see that V is bounded, recognize that, as one
“moves away” from u∗ —while staying in U ir —Ω(u) increases. Because Ω is convex, any such
movement away from u∗ must eventually (i.e., for finite u) lead to a Ω(u) > Ω(u0 ) (convex
functions are unbounded above). Hence V is bounded.
15.2 Properties of the Optimal Contract 209

(15.9)). It is readily seen that if these two contracts each satisfy both the (ir)
and (ic) constraints, then any convex combination of them must as well (i.e.,
both are elements of U , which is convex). That is, the contract
uλ ≡ λu + (1 − λ)e
u,
λ ∈ (0, 1), must be feasible (i.e., satisfy (ir) and (ic)). Because Ω is strictly
convex, Jensen’s inequality implies
Ω(uλ ) < λΩ(u) + (1 − λ)Ω(e
u) = Ω(u) .
But this contradicts the optimality of u. By contradiction, uniqueness is estab-
lished.

Having concluded that a solution to Step 1 exists, we can—at last—calculate

what it is. From Proposition 15.5, the problem is trivial if u(·) is affine, so we will
consider only the case in which it is strictly concave. The principal’s problem
is a standard
PN nonlinear programming problem: minimize a convex function
(i.e., n=1 fn (â)u−1 (un )) subject to J constraints (one individual rationality
constraint and J −1 incentive compatibility constraints, one for each action other
than â). If we further assume, as we do henceforth, that u(·) is differentiable,
then standard Lagrange-multiplier techniques can be employed. Specifically, let
λ be the Lagrange multiplier on the ir constraint and let µj be the Lagrange
multiplier on the ic constraint between â and aj , where j = 1, . . . , J − 1 indexes
the elements of A other than â. It is readily seen that the first-order condition
with respect to the contract are
J−1
X
fn (â)
− λfn (â) − µj fn (â) − fn (aj ) = 0 ; n = 1, . . . , N .
′ −1
u u (un ) j=1

We’ve already seen (Proposition 15.1) that the ir constraint binds, hence λ > 0.
Because â is not a least-cost action and there is no shifting support, it is readily
shown that at least one ic constraint binds (i.e., ∃j such that µj > 0). It’s
convenient to rewrite the first-order condition as
1 X
J−1
fn (aj )

=λ+ µ j 1 − ; n = 1, . . . , N . (15.10)
u′ u−1 (un ) j=1
fn (â)

Note the resemblance between (15.10) and (13.8) in Section 13.4. The difference
is that, now, we have more than one Lagrange multiplier on the actions (as
we now have more than two actions). In particular, we can give a similar
interpretation to the likelihood ratios, fn (aj )/fn (â), that we had in that earlier
section; with the caveat that we now must consider more than one action.

Properties of the Optimal

Contract 15.2
Having solved for the optimal contract, we can now examine its properties. In
particular, we will consider three questions:
210 Lecture Note 15: The Finite Model

1. Under what conditions does the expected cost of implementing an action

under the optimal contract for the hidden-action problem exceed the full-
information cost of implementing that action?

2. Recall the performance measures, x, constitute distinct elements of a

chain. Under what conditions is the agent’s compensation increasing with
the value of the signal (i.e., when does x ≺ x′ imply s(x) ≤ s(x′ ))?

3. Consider two principal-agent models that are identical except that the
information structure (i.e., {f (a)|a ∈ A}) in one is more informative than
the information structure in the other. How do the costs of implementing
actions vary between these two models.

The answer to the first question is given by

Proposition 15.8 Consider a hidden-information problem. Assume there is

no shifting support (i.e., fn (a) > 0 for all n and all a). Assume, too, that u(·)
is strictly concave (the agent is risk averse). If â is not a least-cost action, then
it cannot be implemented at its full-information cost.

Proof: If â is not implementable, then the result is obvious; hence, assume â

is implementable. Define U ir to be the set of all contracts that satisfy the ir
constraint for â. Define u∗ as u∗n = UR + c(â) for all n. Note u∗ ∈ U ir . Finally
define,
XN
Ω(u) = fn (â)u−1 (un ) .
n=1

Because u(·) is strictly concave, the principal’s expected cost if the agent chooses
â under contract u, Ω(u), is a strictly convex function of u. By Jensen’s in-
equality and the fact that there is no shifting support, Ω, therefore, has a unique
minimum in U ir , namely u∗ . Clearly, Ω(u∗ ) = C F (â). The result, then, fol-
lows if we can show that u∗ is not incentive compatible. Given that â is not a
least-cost action, there exists an a such that c(â) > c(a). But

f (a) · u∗ − c(a) = UR + c(â) − c(a) > UR = f (â) · u∗ − c(â) ;

that is, u∗ is not incentive compatible.

Assuming â to be implementable, note the elements that go into this last propo-
sition: there must be an agency problem—misalignment of interests (i.e., â is
not least cost); there must, in fact, be a significant hidden-action problem (i.e.,
no shifting support); and the agent must be risk averse. We saw earlier that
without any one of these elements, an implementable action is implementable
at full-information cost (Propositions 15.3–15.5); that is, each element is indi-
vidually necessary for cost to increase when we go from full information to a
hidden action. This last proposition shows, inter alia, that they are collectively
sufficient for the cost to increase.
15.2 Properties of the Optimal Contract 211

Next we turn to the second question. We already know from our analysis of
the two-action model that the assumptions we have so far made are insufficient
for us to conclude that compensation will be monotonic. From our analysis
of that model, we might expect that we need some monotone likelihood ratio
property. In particular, we assume

MLRP: Assume there is no shifting support. Then the monotone likelihood

ratio property is said to hold if, for any a and â in A, c(a) ≤ c(â) implies that
fn (a)/fn (â) is non-increasing in n.

Intuitively, mlrp is the condition that actions that the agent finds more costly
be more likely to produce better outcomes.
Unlike the two-action case, however, mlrp is not sufficient for us to obtain
monotone compensation (see Grossman and Hart, 1983, for an example in which
mlrp is satisfied but compensation is non-monotone). We need an additional
assumption:

CDFP: The agency problem satisfies the concavity of distribution function

property if, for any a, a′ , and â in A,

c(â) = λc(a) + (1 − λ)c(a′ ) ∃λ ∈ (0, 1)

implies that F (·|â) first-order stochastically dominates λF (·|a)+(1−λ)F (·|a′ ).8

Another way to state the cdfp is that the distribution over performance is
better—more likely to produce high signals—if the agent plays a pure strategy
than it is if he plays any mixed strategy over two actions when that mixed
strategy has the same expected disutility as the pure strategy.
We can now answer the second question:

Proposition 15.9 Assume there is no shifting support, that u(·) is strictly con-
cave and differentiable, and that mlrp and cdfp are met. Then the optimal
contract given the hidden-action problem satisfies s1 ≤ · · · ≤ sN .

Proof: Let â be the action the principal wishes to implement. If â is a least-

cost action, then the result follows from Proposition 15.3; hence assume that
â is not a least-cost action. Let A′ = {a|c(a) ≤ c(â)}; that is, A′ is the set
of actions that cause the agent less disutility than â. Consider the principal’s
problem of implementing â under the assumption that the space of contracts is
A′ . By mlrp, fn (a)/fn (â) is non-increasing in n for all a ∈ A′ , so it follows
from (15.10) that s1 ≤ · · · ≤ sN under the optimal contract for this restricted
problem.

Exercise 15.2.1: Prove this last claim.

8 Recall that distribution G first-order stochastically dominates distribution H if G(z) ≤

H(z) for all z (see Theorem 11.1).

212 Lecture Note 15: The Finite Model

The result then follows if we can show that this contract remains optimal when
we expand A′ to A—adding actions cannot reduce the cost of implementing â,
hence we are done if we can show that the optimal contract for the restricted
problem is incentive compatible in the unrestricted problem. That is, if there
is no a, c(a) > c(â), such that

f (a) · u − c(a) > f (â) · u − c(â) , (15.11)

where u = u(s1 ), . . . , u(sN ) . As demonstrated in the proof of Proposition 15.8,
the incentive compatibility constraint between â and at least one a′ ∈ A′ , c(a′ ) <
c(â), is binding:
f (a′ ) · u − c(a′ ) = f (â) · u − c(â) . (15.12)

Because c(â) ∈ c(a′ ), c(a) , there exists a λ ∈ (0, 1) such that c(â) = (1 −
λ)c(a′ ) + λc(a). Using cdfp and the fact that u(s1 ) ≤ · · · ≤ u(sN ), we have

f (â) · u − c(â) ≥ (1 − λ)f (a′ ) · u + λf (a) · u − c(â)

= (1 − λ) f (a′ ) · u − c(a′ ) + λ f (a) · u − c(a) .

But this and (15.12) are inconsistent with (15.11); that is, (15.11) cannot hold,
as was required.

Lastly, we come toquestion 3. An information structure for a principal-

agent problem is F ≡ f (a) a ∈ A . A principal-agent problem can, then, be
summarized as P = hA, X , F, B(·), c(·), u(·), UR i.

Proposition 15.10 Consider two principal-agent problems that are identical

except for their information structures (i.e., consider

P1 = A, X , F1 , B(·), c(·), u(·), UR

and
P2 = A, X , F2 , B(·), c(·), u(·), UR .
Suppose there exists a stochastic transformation matrix Q (i.e., a garbling),9
such that f 2 (a) = Qf 1 (a) for all a ∈ A, where f i (a) denotes an element of
Fi . Then, for all a ∈ A, the principal’s expected cost of optimally implementing
action a in the first principal-agent problem, P1 , is not greater than her expected
cost of optimally implementing a in the second principal-agent problem, P2 .

Proof: Fix a. If a is not implementable in P2 , then the result follows im-

mediately. Suppose, then, that a is implementable in P2 and let u2 be the

9 A stochastic transformation matrix, sometimes referred to as a garbling, is a matrix in

which each column is a probability density (i.e., has non-negative elements that sum to one).
15.3 A Continuous Performance Measure 213

optimal contract for implementing a in that problem. Consider the contract

u1 = Q⊤ u2 .10 We will show that u1 implements a in P1 . Because
f 1 (a)⊤ u1 = f 1 (a)⊤ Q⊤ u2 = f 2 (a)⊤ u2 ,
the fact that u2 satisfies ir and ic in P2 can readily be shown to imply that u1
satisfies ir and ic in P1 . The principal’s cost of optimally implementing a in P1
is no greater than her cost of implementing a in P1 using u1 . By construction,
u1n = q⊤ 2
·n u , where q·n is the nth column of Q. Because sn = u
i −1 i
(un ) and
−1
u (·) is convex, it follows from Jensen’s Inequality that
N
X
s1n ≤ qmn s2m
m=1

(recall q·n is a probability vector). Consequently,

N
X N
X N
X N
X
fn1 (a)s1n ≤ fn1 (a) qmn s2m = fn2 (a)s2n .
n=1 n=1 m=1 m=1

The result follows.

Proposition 15.10 states that if two principal-agent problems are the same,
except that they have different information structures, where the information
structure of the first problem is more informative than the information structure
of the second problem (in the sense of Blackwell’s Theorem), then the principal’s
expected cost of optimally implementing any action is no greater in the first
problem than in the second problem. By strengthening the assumptions slightly,
we can, in fact, conclude that the principal’s expected cost is strictly less in the
first problem. In other words, making the signal more informative about the
agent’s action makes the principal better off. This is consistent with our earlier
findings that (i) the value of the performance measures is solely their statistical
properties as correlates of the agent’s action; and (ii) the better correlates—
technically, the more informative—they are, the lower the cost of the hidden-
action problem.
It is worth observing that Proposition 15.10 implies that the optimal incen-
tive scheme never entails paying the agent with lotteries over money (i.e., ran-
domly mapping the realized performance levels via weights Q into payments).

A Continuous Performance
Measure 15.3
Suppose that X were a real interval—which, without loss of generality, we can
take to be R—rather than a discrete space and suppose, too, that F (x|a) were
10 For this proof it is necessary to distinguish between row vectors and column vectors, as

well as transposes of matrices. All vectors should be assumed to be column vectors. To make
a vector x a row vector, we write x⊤ . Observe that H⊤ , where H is a matrix, is the transpose
of H. Observe that x⊤ y is the dot-product of x and y (what we’ve been writing as x · y).
214 Lecture Note 15: The Finite Model

a continuous and differentiable function with corresponding probability density

function f (x|a). How would this change our analysis? By one measure, the
answer is not much. Only three of our proofs rely on the assumption that
X is finite; namely the proofs of Propositions 15.2, 15.7 (and, there, only the
existence part),11 and 15.10. Moreover, the last of the three can fairly readily be
extended to the continuous case. Admittedly, it is troubling not to have general
conditions for implementability and existence of an optimal contract, but in
many specific situations we can, nevertheless, determine the optimal contract.12
With X = R, the principal’s problem—the equivalent of (15.1)—becomes
Z ∞

min u−1 u(x) f (x|â)dx
u(·) −∞

subject to
Z ∞
u(x)f (x|â)dx − c(â) ≥ UR ; and
−∞
Z ∞ Z ∞
u(x)f (x|â)dx − c(â) ≥ u(x)f (x|a)dx − c(a) ∀a ∈ A .
−∞ −∞

We know the problem is trivial if there is a shifting support, so assume the

support of x, supp{x}, is invariant with respect to a.13 Assuming an optimal
contract exists to implement â, that contract must satisfy the modified Borch
sharing rule:
J−1
X
1 f (x|aj )
= λ + µj 1 − for almost every x ∈ supp{x} .
u′ u−1 u(x) j=1
f (x|â)

Observe that this is just a variation on (13.8) or (15.10).

11 Where our existence proof “falls down” when X is continuous is that our proof relies on

the fact that a continuous function from RN → R has a minimum on a closed and bounded
set. But, here, the contract space is no longer a subset of RN , but rather the space of all
functions from X → R; and there is no general result guaranteeing the existence of a minimum
in this case.
12 Page (1987) considers conditions for existence in this case (actually he also allows for

A to be a continuous space). Most of the assumptions are technical, but not likely to be
considered controversial. Arguably a problematic assumption in Page is that the space of
possible contracts is constrained; that is, assumptions are imposed on an endogenous feature
of the model, the contracts. In particular, if S is the space of permitted contracts, then there
exist L and M ∈ R such that L ≤ s(x) ≤ M for all s(·) ∈ S and all x ∈ X . Moreover,
S is closed under the topology of pointwise convergence. On the other hand, it could be
argued that range of real-life contracts must be bounded: Legal and other constraints on
what payments the parties can make effectively limit the space of contracts to some set of
bounded functions.
13 That is,
x f (x|a) > 0 = x|f (x|a′ ) > 0
for all a and a′ in A.
15.3 A Continuous Performance Measure 215

Bibliographic Note

Much of the analysis in this section has been drawn from Grossman and Hart
(1983). In particular, they deserve credit for Propositions 15.1, 15.5, and 15.7–
15.10 (although, here and there, we’ve made slight modifications to the state-
ments or proofs). Proposition 15.2 is based on Hermalin and Katz (1991). The
rest of the analysis represent well-known results.
216 Lecture Note 15: The Finite Model
Continuous Action
Space 16
So far, we’ve limited attention to finite action spaces. Realistic though this may
be, it can serve to limit the tractability of many models, particularly when we
need to assume the action space is large. A large action space can be problematic
for two, related, reasons. First, under the two-step approach, we are obligated
to solve for the optimal contract for each a ∈ A (or at least each a ∈ AI ) then,
letting C(a) be the expected cost of inducing action a under its corresponding
optimal contact, we next maximize B(a) − C(a)—expected benefit net expected
cost. If A is large, then this is clearly a time-consuming and potentially im-
practical method for solving the principal-agent problem. The second reason a
large action space can be impractical is because it can mean many constraints
in the optimization program involved with finding the optimal contract for a
given action (recall, e.g., that we had J − 1 constraints—one for each action
other than the given action). Again, this raises issues about the practicality of
solving the problem.
These problems suggest that we would like a technique that allows us to
solve program (14.3) on page 197,
Z
max W (S(x), x, b)dG(b, x|a) (16.1)
(S(·),a) X

subject to Z
a ∈ arg max
′
U (S(x), x, a′ )dF (x|a′ ) (16.2)
a X
and Z
max
′
U (S(x), x, a′ )dF (x|a′ ) ≥ UR ,
a X
directly, in a one-step procedure. Generally, to make such a maximization pro-
gram tractable, we would take A to be a compact and continuous space (e.g., a
closed interval on R), and employ standard programming techniques. A number
of complications arise, however, if we take such an approach.
Most of these complications have to do with how we treat the ic constraint,
expression (16.2). To make life simpler, suppose that A = [a, ā] ⊂ R, X = R,
that F (·|a) is differentiable and, moreover, that the expression in (16.2) is itself
differentiable for all a ∈ A. Then, a natural approach would be to observe that
if a ∈ (a, ā) maximizes that expression, it must necessarily be the solution to
the first-order condition to (16.2):
Z

Ua S(x), x, a f (x|a) + U S(x), x, a fa (x|a) dx = 0 (16.3)
X

217
218 Lecture Note 16: Continuous Action Space

(where subscripts denote partial derivatives). Conversely, if we knew that the

second -order condition was also met, (16.3) would be equivalent to (16.2) and we
could use it instead of (16.2)—at least locally. Unhappily, we don’t, in general,
know (i) that the second-order conditions is met and (ii) that, even if it is, the a
solving (16.3) is a global rather than merely local maximum. For many modeling
problems in economics, we would avoid these headaches by simply assuming that
(16.2) is globally strictly concave in a, which would ensure both the second-order
condition and the fact that an a solving (16.3) is a global maximum. We can’t,
however, do that here: The concavity of (16.2) will, in general, depend on S(x);
but since S(·) is endogenous, we can’t make assumptions about it. If, then, we
want to substitute (16.3) for (16.2), we need to look for other ways to ensure
that (16.3) describes a global maximum.
An additional complication arises with whether (16.1) also satisfies the prop-
erties that would allow us to conclude from first-order conditions that a global
maximum has, indeed, been reached. Fortunately, in many problems, this issue
is less severe because we typically impose the functional form

W (S (x) , x, b) = x − S (x) ,

which gives the problem sufficient structure to allow us to validate a “first-order

approach.”
In the rest of this section, we develop a simple model in which a first-order
approach is valid.

The First-order Approach with

a Spanning Condition 16.1
Assume, henceforth, that A = [a, ā] ⊂ R, X = [x, x̄] ⊂ R, and that F (·|a) is
differentiable. Let f (·|a) be the associated probability density function for each
a ∈ A. We further assume that

1. U S(x), x, a = u S(x) − a;
2. u(·) is strictly increasing and strictly concave;
3. the domain of u(·) is (s, ∞), lims↓s u(s) = −∞, and lims↑∞ u(s) = ∞;
4. f (x|a) > 0 for all x ∈ X and for all a ∈ A (i.e., there is no shifting
support);

= γ(a)FH (x) + 1 − γ(a) FL (x) and f (x|a) = γ(a)fH (x) + 1 −
5. F (x|a)
γ(a) fL (x) for all x and a, where γ : A → [0, 1] and FH and FL are
distribution functions on X ;
6. γ(·) is strictly increasing, strictly concave, and twice differentiable; and
7. fL (x)/fH (x) satisfies the mlrp (i.e., fL (x)/fH (x) is non-increasing in x
and there exist x′ and x′′ in X , x′ < x′′ , such that fL (x′ )/fH (x′ ) >
fL (x′′ )/fH (x′′ )).
16.1 The First-order Approach with a Spanning Condition 219

Observe Assumptions 5–7 allow us, inter alia, to assume that c (a) = a without
loss of generality. Assumption 5 is known as a spanning condition.
In what follows, the following result will be critical:

Lemma 16.1 FH dominates FL in the sense of first-order stochastic domi-

nance.

Proof: This follows immediately from Propositions 11.13 and 11.15.

A consequence of the Lemma is that if φ(·) is an increasing function, then

Z x̄
φ(x) fH (x) − fL (x) dx > 0 . (16.4)
x

It follows, then, that if S(·) (and, thus, u S(x) ) is increasing, then (16.2) is
globally concave in a. To see this, observe
Z Z
x̄
U S(x), x, a dF (x|a) = u S(x) γ(a)fH (x) + 1 − γ(a) fL (x) dx − a ;
X x

so we have
Z Z x̄
d
U S(x), x, a dF (x|a) = u S(x) fH (x) − fL (x) γ ′ (a)dx > 0
da X x

by (16.4) and the assumption that γ ′ (·) > 0. Moreover,

Z Z x̄
d2
U S(x), x, a dF (x|a) = u S(x) fH (x) − fL (x) γ ′′ (a)dx < 0
da2 X x

by (16.4) and the assumption that γ ′′ (·) < 0. To summarize:

Corollary 16.1 If S(·) is increasing, then the agent’s choice-of-action problem

is globally concave. That is, we’re free to substitute
Z x̄
u S(x) fH (x) − fL (x) γ ′ (a)dx = 0 (16.5)
x

for (16.2).

We’ll proceed as follows. We’ll suppose that S(·) is increasing and we’ll solve
the principal’s problem. Of course, when we’re done, we’ll have to double check
that our solution indeed yields an increasing S(·). It will, but if it didn’t, then
our approach would be invalid. The principal’s problem is
Z x̄
max x − S(x) f (x|a)dx
S(·),a x
220 Lecture Note 16: Continuous Action Space

subject to (16.5) and the ir constraint,

Z x̄
u S(x) f (x|a)dx − a ≥ UR .
x

As we’ve shown many times now, this last constraint must be binding; so we
have a classic constrained optimization program. Letting λ be the Lagrange
multiplier on the ir constraint and letting µ be the Lagrange multiplier on
(16.5), we obtain the first-order conditions:

−f (x|a) + µu′ S(x) fH (x) − fL (x) γ ′ (a) + λu′ S(x) f (x|a) = 0

differentiating with respect to S (x); and

Z x̄
x − S(x) fH (x) − fL (x) γ ′ (a)dx
x
Z x̄
+µ u S(x) fH (x) − fL (x) γ ′′ (a)dx = 0 (16.6)
x

differentiating with respect to a (there’s no λ expression in the second condition

because, by (16.5), the derivative of the ir constraint with respect to a is zero).
We can rearrange the first condition into our familiar modified Borch sharing
rule:

1 fH (x) − fL (x) γ ′ (a)
=λ+µ
u′ S(x) γ(a)fH (x) + 1 − γ(a) fL (x)

1 − r(x) γ ′ (a)
=λ+µ
γ(a) 1 − r(x) + r(x)

where r(x) = fL (x)/fH (x). Recall that 1/u′ (·) is an increasing function; hence,
to test whether S(·) is indeed increasing, we need to see whether the rhs is
decreasing in r(x) given r(·) is decreasing. Straightforward calculations reveal
that the derivative of the rhs with respect to r(x) is

−γ ′ (a)
2 < 0.
r(x) + 1 − r(x) γ(a)

We’ve therefore shown that S(·) is indeed increasing as required; that is, our
use of (16.5) for (16.2) was valid.
Observe, from (16.6), that, because the agent’s second-order condition is
met, the first line in (16.6) must be positive; that is,
Z x̄
x − S(x) fH (x) − fL (x) dx > 0 .
x
16.1 The First-order Approach with a Spanning Condition 221

But this implies that, for this S(·), the principal’s problem is globally concave
in a:
Z x̄ Z x̄
d2
2
x − S(x) f (x|a)dx = x − S(x) fH (x) − fL (x) γ ′′ (a)dx < 0 .
da x x

Moreover, for any S(·), the principal’s problem is (trivially) concave in S(·).
Hence, we can conclude that the first-order approach is, indeed, valid for this
problem.
Admittedly, the spanning condition is a fairly stringent condition; although
it does have an economic interpretation. Suppose there are two distributions
from which the performance measure could be drawn, “favorable” (i.e., FH ) and
“unfavorable” (i.e., FL ). The harder the agent chooses to work—the higher is
a—the greater the probability, γ(a), that the performance measure will be drawn
from the favorable distribution. For instance, suppose there are two types of
potential customers, those who tend to buy a lot—the H type—and those who
tend not to buy much—the L type. By investing more effort, a, in learning his
territory, a salesperson (agent) increases the probability that he will sell to H
types rather than L types.

Bibliographic Note

The first papers to use the first-order approach were Holmstrom (1979) and
Shavell (1979). Grossman and Hart (1983) was, in large part, a response to the
potential invalidity of the first-order approach. The analysis under the spanning
condition draws, in part, from Hart and Holmström (1987).
222 Lecture Note 16: Continuous Action Space
Bibliography
Basov, Suren, Multidimensional Screening, Berlin: Springer-Verlag, 2010.

Benedetto, John J. and Wojciech Czaja, Integration and Modern Analysis,

Boston: Birkhäuser, 2009.

Borch, Karl H., The Economics of Uncertainty, Princeton, NJ: Princeton

University Press, 1968.

Caillaud, Bernard and Benjamin E. Hermalin, “The Use of an Agent in

a Signalling Model,” Journal of Economic Theory, June 1993, 60 (1), 83–113.

, Roger Guesnerie, Patrick Rey, and Jean Tirole, “Government Inter-

vention in Production and Incentives Theory: A Review of Recent Contribu-
tions,” RAND Journal of Economics, Spring 1988, 19 (1), 1–26.

Crémer, Jacques and Richard P. McLean, “Optimal Selling Strategies

under Uncertainty for a Discriminating Monopolist when Demands are Inter-
dependent,” Econometrica, March 1985, 53 (2), 345–361.

and , “Full Extraction of the Surplus in Bayesian and Dominant Strategy

Auctions,” Econometrica, November 1988, 56 (6), 1247–1257.

d’Aspremont, Claude and Louis-André Gérard-Varet, “Incentives and

Incomplete Information,” Journal of Public Economics, February 1979, 11
(1), 25–45.

Edlin, Aaron S. and Benjamin E. Hermalin, “Contract Renegotiation and

Options in Agency Problems,” Journal of Law, Economics, and Organization,
2000, 16 (2), 395–423.

Epstein, Larry G., “Behavior Under Risk: Recent Developments in The-

ory and Applications,” in Jean-Jacques Laffont, ed., Advances in Economic
Theory: Sixth World Congress, Vol. 2, Cambridge, England: Cambridge Uni-
versity Press, 1992.

Fleming, Wendell, Functions of Several Variables, Berlin: Springer-Verlag,

1977.

Gibbard, Allan, “Manipulation of Voting Schemes,” Econometrica, July 1973,

41 (4), 587–601.

223
224 BIBLIOGRAPHY

Gonik, Jacob, “Tie Salesmen’s Bonuses to Their Forecasts,” Harvard Business

Review, May-June 1978, pp. 116–122.

Green, Jerry and Jean-Jacques Laffont, “Characterization of Satisfactory

Mechanisms for the Revelation of Preferences for Public Goods,” Economet-
rica, March 1977, 45 (2), 427–438.

Grossman, Sanford J. and Oliver D. Hart, “An Analysis of the Principal-

Agent Problem,” Econometrica, 1983, 51, 7–46.

Hart, Oliver D. and Bengt Holmström, “Theory of Contracts,” in Tru-

man Bewley, ed., Advances in Economic Theory: Fifth World Congress, Cam-
bridge, England: Cambridge University Press, 1987.

Hermalin, Benjamin E., “Uncertainty and Imperfect Information in Mar-

kets,” in Mark Machina and W. Kip Viscusi, eds., Handbook of the Economics
of Risk and Uncertainty, Amsterdam: North-Holland, 2014.

and Michael L. Katz, “Moral Hazard and Verifiability: The Effects of

Renegotiation in Agency,” Econometrica, November 1991, 59 (6), 1735–1753.

Holmstrom, Bengt, “Moral Hazard and Observability,” Bell Journal of Eco-

nomics, Spring 1979, 10 (1), 74–91.

Kamien, Morton I. and Nancy L. Schwartz, Dynamic Optimization: The

Calculus of Variations and Optimal Control, Amsterdam: North-Holland,
1981.

Klemperer, Paul, Auctions: Theory and Practice, Princeton: Princeton Uni-

versity Press, 2004.

Körner, Thomas W., A Companion to Analysis: A Second First and First

Second Course in Analysis, Vol. 62 of Graduate Studies in Mathematics, Prov-
idence, RI: American Mathematical Society, 2004.

Krishna, Vijay, Auction Theory, San Diego: Academic Press, 2002.

Laffont, Jean-Jacques and Eric Maskin, “A Differential Approach to Dom-

inant Strategy Mechanisms,” Econometrica, September 1980, 48 (6), 1507–
1520.

and Jean Tirole, A Theory of Incentives in Procurement and Regulation,

Cambridge, MA: MIT Press, 1993.

Lewis, Tracy R. and David E.M. Sappington, “Countervailing Incentives

in Agency Problems,” Journal of Economic Theory, December 1989, 49 (2),
294–313.

Maggi, Giovanni and Andres Rodriguez-Clare, “On the Countervailing

Incentives,” Journal of Economic Theory, June 1995, 66 (1), 238–263.
BIBLIOGRAPHY 225

Mas-Colell, Andreu, Michael Whinston, and Jerry Green, Microeco-

nomic Theory, Oxford, England: Oxford University Press, 1995.
McAfee, R. Preston and Philip J. Reny, “Correlated Information and
Mechanism Design,” Econometrica, March 1992, 60 (2), 395–421.
Milgrom, Paul R., Putting Auction Theory to Work, Cambridge, UK: Cam-
bridge University Press, 2004.
and Chris Shannon, “Monotone Comparative Statics,” Econometrica, Jan-
uary 1994, 62 (1), 157–180.
and John Roberts, “Rationalizability, Learning, and Equilibrium in Games
with Strategic Complementarities,” Econometrica, November 1990, 58 (6),
1255–1277.
Myerson, Roger B., “Incentive Compability and the Bargaining Problem,”
Econometrica, 1979, 47, 61–73.
Page, Frank H., “The Existence of Optimal Contracts in the Principal-Agent
Model,” Journal of Mathematical Economics, 1987, 16 (2), 157–167.
Rabin, Matthew, “Risk Aversion, Diminishing Marginal Utility, and
Expected-Utility Theory: A Calibration Theorem,” 1997. mimeo, Depart-
ment of Economics, University of California at Berkeley.
Rochet, Jean-Charles, “The Taxation Principle and Multi-Time Hamilton-
Jacobi Equations,” Journal of Mathematical Economics, 1985, 14, 113–128.
and Philippe Choné, “Ironing, Sweeping, and Multidimensional Screen-
ing,” Econometrica, July 1998, 66 (4), 783–826.
Rockafellar, R. Tyrrell, Convex Analysis, Princeton: Princeton University
Press, 1970.
Salanié, Bernard, The Economics of Contracts: A Primer, Cambridge, MA:
MIT Press, 1997.
Sappington, David E.M., “Limited Liability Contracts between Principal
and Agent,” Journal of Economic Theory, February 1983, 29 (1), 1–21.
Shavell, Steven, “Risk Sharing and Incentives in the Principal and Agent
Relationship,” Bell Journal of Economics, Spring 1979, 10 (1), 55–73.
Steele, J. Michael, The Cauchy-Schwarz Master Class, Cambridge, England:
Cambridge University Press, 2004.
Topkis, Donald M., “Minimizing a Submodular Function on a Lattice,” Op-
erations Research, 1978, 26, 305–321.
Uhlig, Harald, “A Law of Large Numbers for Large Economies,” Economic
Theory, January 1996, 8, 41–50.
226 BIBLIOGRAPHY

van Tiel, Jan, Convex Analysis: An Introductory Text, New York: John Wiley
& Sons, 1984.

Varian, Hal R., “Price Discrimination,” in Richard Schmalensee and Robert

Willig, eds., Handbook of Industrial Organization, Vol. 1, Amsterdam: North-
Holland, 1989.

, Microeconomic Analysis, 3rd ed., New York: W.W. Norton, 1992.

Vickrey, William, “Counterspeculation, Auctions, and Competitive Sealed
Tenders,” Journal of Finance, March 1961, 16 (1), 8–37.

Willig, Robert D., “Consumer’s Surplus Without Apology,” American Eco-

nomic Review, September 1976, 66 (4), 589–597.

Wilson, Robert B., Nonlinear Pricing, Oxford: Oxford University Press,

1993.

Yeh, James, Real Analysis: Theory of Measure and Integration, 2nd ed., Sin-
gapore: World Scientific, 2006.
Index

,v of distribution function property,
∨, see join 211
∧, see meet pseudo-concave function, 19
condition
action cross-partial, 30
least-cost, 204 single-crossing, 100
affiliated, 159 Spence-Mirrlees, 62, 100
strongly, 160 constraint
agency capacity, 49
full information, 172 incentive-compatibility, 82, 102, 173
limited liability and, 172 individual-rationality, 40, 83, 102,
agent, 165 173
contractual screening, 77 participation, 83, 173
allocation revelation, see incentive compati-
dependent, 119 bility
independent, 119 self-selection, see incentive com-
arbitrage, 42, 49 patibility
auction truth-telling, see incentive compat-
English, 137 ibility
sealed-bid, second-price, 137 consumer surplus, 6
auctions aggregate, 9
common value, 148 contract
private value, 133 enforceable, 167
forcing, 173
Basov, Suren, 98 least-cost, 184
Benedetto, John J., 18n cost
Borch, Karl H., 191 marginal, 19
sharing rule, modified, 191, 214 Crémer, Jacques, 121n
bunching, 113 cs, see consumer surplus
Czaja, Wojciech, 18n
Caillaud, Bernard, 75, 99n, 166
Choné, Philippe, 98 d’Aspremont, Claude, 131
Clarke, Edward H., 125 deadweight loss, 26
collusion, 121 triangular shape of, 26
concavity demand
ideally quasi-concave function, 20 aggregate, 9
log, 14 elastic, 22

227
228 INDEX

inelastic, 22 Grossman, Sanford J., 191, 211, 215,

inverse, 5n 221
march down, 39 Groves, Theodore, 125
derivative Guesnerie, Roger, 75, 95n
left or right, 140n
distortion Hart, Oliver D., 191, 211, 215, 221
at the bottom, 64, 68 hazard rate, 11, 111n
none at the top, 64, 68 reverse, 157
duty Hermalin, Benjamin E., 20, 98, 99n,
export, 59 104n, 204n, 215
hidden action, 166
eBay, 148 hidden information, 75
Edlin, Aaron S., 20, 98 Holmstrom, Bengt R., 191, 199, 221
efficiency
ex post, 80 ic, see incentive-compatibility constraint
elasticity incentive
of demand, 12, 21 countervailing, 62, 102n
unit, 22 incidence
entry fee, 40 statutory, 35
ǫ, see elasticity increasing differences, 29, 53
Epstein, Larry G., 194 information rent, 75, 81
externality integration by parts, 108n
positive, 39 ir, see individual-rationality constraint

Jensen’s Inequality, 178n

Fleming, Wendell, 208n
join (∨), 52
function
density, 11 Kamien, Morton I., 112n
distribution, vin Katz, Michael L., 204n, 215
domain of, v Klemperer, Paul, 133n
generalized survival, 12 Körner, Thomas W., 126
image of, v Krishna, Vijay, 133n
information-rent, 82
linear, 3n Laffont, Jean-Jacques, 75, 93n, 127
range of, v lattice, 53n
strictly quasi-concave, 20n Lewis, Tracy R., 102
supermodular, 160n lhs: left-hand side, vi
strictly, 160 likelihood ratio, 185
survival, 10 log concave, 14

garbling, 212 Maggi, Giovanni, 102

gcv, 125 Mas-Colell, Andreu, 5
Gérard-Varet, Louis-André, 131 Maskin, Eric, 127
Gibbard, Allan, 93n matrix
Giffen good, 5n stochastic transformation, 212
Gonik, Jacob, 88n MC , see marginal cost
Green, Jerry R., 5, 93n McAfee, R. Preston, 121n
INDEX 229

McLean, Richard P., 121n perfect, see first degree

mechanism, 92, 116 quality distortions and, 61
balanced, 121, 127 quantity discounts and, 61
Bayesian, 128 second degree, 46, 61
Bayesian-Nash, 137 third degree, 46
design, 40 pricing
direct, 92, 117 linear, 3, 17
direct-revelation, 88, 92, 117 standard assumptions of, 23
dominant-strategy, 125, 134 uniform, 17
Groves-Clarke-Vickrey, 125 principal, 75, 165
shoot-them-all, 120 contractual screening, 77
standard framework for, 97 principle
Vickrey, 137 revelation, 93, 118
meet (∧), 52 taxation, 95–96
mhrp, see monotone hazard rate prop- two-instruments, 42
erty property
Milgrom, Paul, 133n monotone hazard rate, 16, 111
Milgrom, Paul R., 54, 55n, 101 monotone likelihood ratio, 187, 211
Mills ratio, 111
mlrp, see monotone likelihood ratio Rabin, Matthew, 194
property Reny, Philip J., 121n
model revealed preference, 29
screening, 77 revelation, 61
signaling, 77 revenue, 3
state-space, 168 marginal, 19
two-type, 79 Rey, Patrick, 75
moral hazard, 166 rhs: right-hand side, vi
MR, see marginal revenue Roberts, D. John, 54, 55n
mrs(marginal rate of substitution, 44 Rochet, Jean-Charles, 95n, 98
Myerson, Roger, 93n Rodriquez-Clare, Andres, 102
rule
packaging, 42, 49n Leibniz’s, 34n
Page, Frank H., 214n Lerner markup, 23
participation constraint, 40
performance measure, 167 Salanié, Bernard, 75
player Sappington, David E.M., 102, 172, 173n
informed, 77 schedule
uninformed, 77 payment, 81
point of inflection, 20n Schwartz, Nancy L., 112n
Pólya, George, 15 separation, 79
price Shannon, Chris, 101
choke, 13, 27 Shavell, Steven, 221
elasticity of demand, see elasticity shifting support, 179, 187
reserve, 146 meaningful, 205
price discrimination shutting down/out an agent, 86
first degree, 39 space
230 INDEX

type, 91 Varian, Hal R., 5, 7, 8, 44, 48

Steele, J. Michael, 15 verifiable, 167n
stochastic dominance Vickrey, William, 125, 137
reverse hazard-rate, 158 virtual
first-order, 155, 211n surplus, 110
strict, 156 valuation, 143
hazard-rate, 157
likelihood ratio, 158 welfare
sufficient statistic, 190 total, 24
supermodularity, 53 Whinston, Michael D., 5
support Willig, Robert, 8
definition of, 27n, 179n Wilson, Robert B., 44, 82
winner’s curse, 152
take-it-or-leave-it offer, 28
tariff, 3 Yeh, James, 104n, 140
linear, 3, 17
multi-part, 61
nonlinear, 3
two-part, 40
taxes
ad valorem, 34
excise, 34
sales, 34
Theorem
Blackwell’s , 213
Lebesgue Differentiation, 104n
revenue equivalence, 141
Topkis Characterization, 55
Topkis Monotonicity, 54
Weierstrass’s, 18n
tioli, see take-it-or-leave-it offer
Tirole, Jean, 75
Topkis
Donald M., 55n
Topkis, Donald M., 54n
two-instruments principle, 191n
type, 45, 77, 91
independent, 117
space, 45

Uhlig, Harald, 11n

utility
quasi-linear, 5
reservation, 80, 94, 168, 171
type-dependent, 102
van Tiel, Jan, 139, 140, 176n, 178n

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Jim Dai Textbook PDF
No ratings yet
Jim Dai Textbook PDF
168 pages
Munk - 2008 - Financial Asset Pricing Theory
100% (2)
Munk - 2008 - Financial Asset Pricing Theory
360 pages
Jim Dai Textbook
No ratings yet
Jim Dai Textbook
168 pages
Notesfor 201 AWaiver Exam
No ratings yet
Notesfor 201 AWaiver Exam
192 pages
Study Guide For Introductory Microeconomics: Ka-Fu WONG
No ratings yet
Study Guide For Introductory Microeconomics: Ka-Fu WONG
73 pages
Lecture Notes: Guoqiang TIAN Department of Economics Texas A&M University College Station, Texas 77843 (Gtian@tamu - Edu)
No ratings yet
Lecture Notes: Guoqiang TIAN Department of Economics Texas A&M University College Station, Texas 77843 (Gtian@tamu - Edu)
218 pages
Coyle Production Economics
No ratings yet
Coyle Production Economics
130 pages
Lecture Notes for ECON660 and ECON460-2022-08
No ratings yet
Lecture Notes for ECON660 and ECON460-2022-08
265 pages
IEA Book 2011jan12
No ratings yet
IEA Book 2011jan12
257 pages
Empirical IO Notes v4-1 PDF
No ratings yet
Empirical IO Notes v4-1 PDF
65 pages
Financial Asset Pricing Theory 2007
No ratings yet
Financial Asset Pricing Theory 2007
335 pages
lecture notes_Macro III_2425
No ratings yet
lecture notes_Macro III_2425
112 pages
Mathematics For Business Economics: Herbert Hamers, Bob Kaper, John Kleppe
No ratings yet
Mathematics For Business Economics: Herbert Hamers, Bob Kaper, John Kleppe
37 pages
3012 Notes
No ratings yet
3012 Notes
55 pages
Finanzas PT2008
No ratings yet
Finanzas PT2008
360 pages
Micro Notes Main PDF
No ratings yet
Micro Notes Main PDF
233 pages
Main Ai Games Markets
No ratings yet
Main Ai Games Markets
89 pages
Advanced Macroeconomic Theory Book 2
No ratings yet
Advanced Macroeconomic Theory Book 2
292 pages
Quantecon Python Intro
No ratings yet
Quantecon Python Intro
615 pages
Lecture Notes For Econ 101A: David Card Dept. of Economics UC Berkeley
No ratings yet
Lecture Notes For Econ 101A: David Card Dept. of Economics UC Berkeley
162 pages
Economic Networks: Theory and Computation: John Stachurski and Thomas J. Sargent July 4, 2022
No ratings yet
Economic Networks: Theory and Computation: John Stachurski and Thomas J. Sargent July 4, 2022
300 pages
ECON 2060 Contract Theory Notes
No ratings yet
ECON 2060 Contract Theory Notes
112 pages
Econometrics Streamlined Applied And Eaware Itebooks download
No ratings yet
Econometrics Streamlined Applied And Eaware Itebooks download
80 pages
Micro Math PDF
No ratings yet
Micro Math PDF
139 pages
FAPT2009
No ratings yet
FAPT2009
358 pages
Econ 202 - Section Notes
No ratings yet
Econ 202 - Section Notes
144 pages
Micro Notes Main
No ratings yet
Micro Notes Main
207 pages
NM4M
No ratings yet
NM4M
277 pages
Mathbook-Econ Prep
100% (1)
Mathbook-Econ Prep
278 pages
Advanced Microeconomics Lecture Notes
No ratings yet
Advanced Microeconomics Lecture Notes
57 pages
Introduction To Mathematical Economics
No ratings yet
Introduction To Mathematical Economics
164 pages
001 2018 4 b-14 PDF
No ratings yet
001 2018 4 b-14 PDF
66 pages
Section All
No ratings yet
Section All
63 pages
DP Book
No ratings yet
DP Book
428 pages
Decision Theory Handout
No ratings yet
Decision Theory Handout
202 pages
Concepts and Applications of The First and Second Derivatives
No ratings yet
Concepts and Applications of The First and Second Derivatives
12 pages
Quantecon Python Advanced
No ratings yet
Quantecon Python Advanced
1,074 pages
Microeconomic Analysis Notes
No ratings yet
Microeconomic Analysis Notes
23 pages
Mathematics - Mathematical Economics and Finance
100% (1)
Mathematics - Mathematical Economics and Finance
153 pages
Dynamic Programming: Thomas J. Sargent and John Stachurski January 16, 2024
No ratings yet
Dynamic Programming: Thomas J. Sargent and John Stachurski January 16, 2024
446 pages
(Synthesis Lectures On Operations Research and Applications) Mark S. Daskin - Bite-Sized Operations Management-Morgan & Claypool Publishers (2021)
No ratings yet
(Synthesis Lectures On Operations Research and Applications) Mark S. Daskin - Bite-Sized Operations Management-Morgan & Claypool Publishers (2021)
193 pages
Contract
No ratings yet
Contract
98 pages
Econometrics
No ratings yet
Econometrics
343 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Cybersecurity for Executives: A Guide to Protecting Your Business
From Everand
Cybersecurity for Executives: A Guide to Protecting Your Business
Matthew C. Smith
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
From Everand
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
Lane Conner
No ratings yet
Limits of Graph Sequences
No ratings yet
Limits of Graph Sequences
9 pages
Week9 Memo
No ratings yet
Week9 Memo
2 pages
Lecture10_sp15
No ratings yet
Lecture10_sp15
73 pages
Lecture3_sp15
No ratings yet
Lecture3_sp15
42 pages
5 St Gallen Oct2024 NetworkDensity
No ratings yet
5 St Gallen Oct2024 NetworkDensity
24 pages
7 St Gallen Oct2024 CausalInference
No ratings yet
7 St Gallen Oct2024 CausalInference
30 pages
12 St Gallen Oct2024 NetworkPeerEffects
No ratings yet
12 St Gallen Oct2024 NetworkPeerEffects
40 pages
Lecture Projection Theorem Oct2022
No ratings yet
Lecture Projection Theorem Oct2022
15 pages
U Statistics
No ratings yet
U Statistics
20 pages
Control Function Approach Slides
No ratings yet
Control Function Approach Slides
9 pages
Chapter 1: The Investment Environment: Assets Liabilities & Shareholders' Equity
No ratings yet
Chapter 1: The Investment Environment: Assets Liabilities & Shareholders' Equity
5 pages
Relations and Functions L2 DPP - 12th Elite
No ratings yet
Relations and Functions L2 DPP - 12th Elite
77 pages
Chapter 3 - Interpolation Curve Fitting With Tutorials
No ratings yet
Chapter 3 - Interpolation Curve Fitting With Tutorials
25 pages
EEOP 6315 Homework Assignment 1
No ratings yet
EEOP 6315 Homework Assignment 1
7 pages
1-Proximinal Retracts and Best Proximity Pair Theorems
No ratings yet
1-Proximinal Retracts and Best Proximity Pair Theorems
13 pages
拓扑优化方法综述
No ratings yet
拓扑优化方法综述
34 pages
TOC LP
No ratings yet
TOC LP
8 pages
GenMathQ1W2 02
No ratings yet
GenMathQ1W2 02
8 pages
BU FCAI BS111 P S Lec01
No ratings yet
BU FCAI BS111 P S Lec01
42 pages
Form 2 Math Chapter 4 Linear Equation
100% (1)
Form 2 Math Chapter 4 Linear Equation
3 pages
Transformation of Graphs by Greatest Integer Function 5
No ratings yet
Transformation of Graphs by Greatest Integer Function 5
7 pages
Constructing and Solving Equations
No ratings yet
Constructing and Solving Equations
5 pages
Electronics-1-Chap 2-17 - 11 - 2021
No ratings yet
Electronics-1-Chap 2-17 - 11 - 2021
5 pages
Factoring Polynomial Presentation
100% (3)
Factoring Polynomial Presentation
59 pages
Linear Inequalities 1Q
No ratings yet
Linear Inequalities 1Q
5 pages
02 Kesetimbangan Statik 2021
No ratings yet
02 Kesetimbangan Statik 2021
17 pages
Quizizz - Equivalent Fractions 4th Grade
No ratings yet
Quizizz - Equivalent Fractions 4th Grade
3 pages
3 - Combinatorics (Lecture)
No ratings yet
3 - Combinatorics (Lecture)
6 pages
23-24 Y12 IB Quiz 3
No ratings yet
23-24 Y12 IB Quiz 3
15 pages
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 104
No ratings yet
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 104
19 pages
Bell, Notes On Formal Logic
No ratings yet
Bell, Notes On Formal Logic
145 pages
Solving Rubik's Cube Using Group Theory
0% (1)
Solving Rubik's Cube Using Group Theory
52 pages
GR 10 Arithmetic Progressions Practice Worksheet - 2 Answer Key
No ratings yet
GR 10 Arithmetic Progressions Practice Worksheet - 2 Answer Key
31 pages
Plane Geometry Part 1 To Part 6
100% (1)
Plane Geometry Part 1 To Part 6
99 pages
Instant Access To Introduction To Scientific Computing and Data Analysis 2nd Edition Mark H. Holmes Ebook Full Chapters
100% (8)
Instant Access To Introduction To Scientific Computing and Data Analysis 2nd Edition Mark H. Holmes Ebook Full Chapters
79 pages
Business Statistics, 4e: by Ken Black
No ratings yet
Business Statistics, 4e: by Ken Black
58 pages
MA 108: Ordinary Differential Equations: Tutorial Solutions Aryaman Maithani
No ratings yet
MA 108: Ordinary Differential Equations: Tutorial Solutions Aryaman Maithani
32 pages
Shiksha Mantra: Mathematics
No ratings yet
Shiksha Mantra: Mathematics
1 page
BTT306 - Ktu Qbank
No ratings yet
BTT306 - Ktu Qbank
9 pages
Stress
100% (1)
Stress
17 pages
Euclid Combined Contest (CMC)
No ratings yet
Euclid Combined Contest (CMC)
95 pages

LectureNotes201b_v9

Uploaded by

LectureNotes201b_v9

Uploaded by

Copyright c 2015 by Benjamin E. Hermalin. All rights reserved.

Copyright c 2015 by Benjamin E. Hermalin. All rights reserved.

1 Buyers and Demand 5

3 First-degree Price Discrimination 39

4 Third-degree Price Discrimination 45

5 Second-degree Price Discrimination 61

6 The Basics of Contractual Screening 77

7 The Two-Type Screening Model 79

8 General Screening Framework 91

9 The Standard Framework 97

10 Mechanism Design with Multiple Agents 115

III Hidden Action and Incentives 163

12 The Moral-Hazard Setting 167

13 Basic Two-Action Model 171

14 General Framework 193

15 The Finite Model 199

16 Continuous Action Space 217

The notation x ∼ F means that the random variable x is distributed accord-

If X = X1 × · · · XN and x = (x1 , . . . , xN ) ∈ X , I will tend to write

s.t. Such that

a.e. Almost everywhere (i.e., true ev-

×Nn=1 Xn The Cartesian product space

R The set of real numbers. Rn =

E The expectations operator. If X

X \Y Set difference; that is, X \Y is

Table 1: Some Mathematical Notation

2 Remember in mathematics a function is linear if αf (x ) + βf (x ) = f (αx + βx ), where

Consumer Demand 1.1

where x is an N -dimensional vector of goods, p is the N -dimensional price

u(x, y) = v(x) + y . (1.2)

1 As set forth, for instance, in Mas-Colell et al. (1995) or Varian (1992).

Observe there is no further loss of generality in assuming that v(0) = 0, an

max v(x) − px + I . (1.3)

Because (1.4) gives price as a function of quantity, it is the individual’s inverse

3 To see why this is without loss of generality, suppose, instead, that

follows, we assume that this constraint doesn’t bind.

equivalent variation (see Chapter 10 of Varian), is quite small under assumptions

Firm Demand 1.2

The first-order condition with respect to input xn is

Utilizing the envelope theorem, it follows that

Demand Aggregation 1.3

that is, we have:

Additional Topics 1.4

In many modeling situations, it is convenient to imagine a continuum of con-

JΣ(p) − JΣ(v1 ) = JΣ(p) ,

Survival, Density, and Hazard Functions

Lemma 1.1 Consider a survival function Σ(·) andcorresponding hazard rate

The result follows by exponentiating both sides.

Demand as a Survival Function

realized demand tends to JΣ(p) according to some probability-theoretic convergence criterion.

where Σ(p) ≡ X(p)/X(0). To see that Σ(·) is a survival function on R+ , observe

Definition 1.2 A demand function, X : R+ → R+ is a generalized survival

(i) The integral Z p

exists (is defined) for all p ∈ [0, p̄);

(ii) The integral in (1.14) goes to infinity as p ↑ p̄; and

for p < p̄.

one on occasion) choose not to.

Consequently, at any point of differentiability h(·) = hX (·), where the latter is

Exercise 1.4.1: Prove Lemma 1.2.

Lemma 1.3 Suppose limp→∞ pX(p) = 0.11 Consumer surplus at price p,

11 This is an innocuous assumption because, otherwise, consumer surplus would be infinite,

Expression (1.17) follows by integration by parts. Expression (1.18) follows from

When demand is a generalized survival function, the integral in (1.16) is an

and −X ′ (b)/X(0) is a density. If we thought of the benefit enjoyed from each

Lemma 1.4 If g(·) is a positive concave function, then it is log concave.12

g(xλ ) ≥ g(x0 )λ g(x1 )1−λ . (1.21)

Because g(·) is concave by assumption,

g(xλ ) ≥ λg(x0 ) + (1 − λ)g(x1 ) .

Expression (1.21) will, therefore, hold if

λg(x0 ) + (1 − λ)g(x1 ) ≥ g(x0 )λ g(x1 )1−λ . (1.22)

Expression (1.22) follows from Pólya’s generalization of the arithmetic mean-

The last term simplifies to 1, so we have

Because the denominator on the left is just A, the result follows.

The hazard rate is monotone if it is either everywhere non-increasing or

be non-increasing (decreasing) in p. Clearly it will be if hX (·) is non-decreasing

Lemma 1.6 If demand is log concave, then so too is consumer expenditure

Lemma 1.1 Consider a survival function Σ(·) andcorresponding hazard rate