Lecture 14
Lecture 14
233
• Last time, we began decision-making under uncertainty
• We saw that risk aversion – preferring the expected value of a lottery to the lottery itself –
is the same as concavity of the Bernoulli utility function,
00
and that the Arrow-Pratt coefficient of absolute risk aversion, A(x) = − uu0 (x)
(x)
,
gives a measure of “how risk-averse” someone is
234
2 Relative Risk Aversion
00
• We saw last time that the Coefficient of Absolute Risk Aversion, − uu0 ,
gave a good local measure of how risk-averse you are;
we motivated it initially by showing that the risk premium for a small gamble,
or the probability premium for a small gamble,
are both proportional to it
• We saw that if two decisionmakers are ranked via the Coefficient of Absolute Risk Aversion,
this tells you a lot about their relative attitudes about risk;
and we could apply the same idea to one person at different wealth levels
• In that case, the relevant measure of risk aversion is the Arrow-Pratt Coefficient of
Relative Risk Aversion,
xu00 (x)
R(u, x) = −
u0 (x)
• (We could again calculate your risk premium for a small relative risk,
or your probability premium for a small relative risk,
and they would be proportional to R)
235
• If this function is increasing in x, you have increasing relative risk aversion,
and you become more averse to proportional risks as your wealth level increases;
so if you prefer at 2% gain to a fifty-fifty shot at a 5% gain at one wealth level,
you still prefer the sure thing at higher wealth levels;
if this is decreasing, you have decreasing relative risk aversion,
and you become less averse to proportional risks
• Finally, there’s a family of utility functions with constant relative risk aversion:
1
u(x) = x1−ρ
1−ρ
for ρ ≥ 0 and ρ 6= 1,
and its limit u(x) = ln(x) as ρ → 1
236
3 Subjective Expected Utility
• I emphasized last time that we were dealing with objective probabilities –
events where we knew the exact probability with which they’ll occur
• Now that we’ve seen the von Neumann-Morgenstern result about when preferences over ob-
jective lotteries can be represented by a utility function of the expected-utility form,
I want to make the link more explicit
• The details are a lot messier, but the core result is the same –
under the right assumptions about your preferences over lotteries,
we get a “subjective expected utility” representation,
where the probabilities you assign to each event are implied by your preferences over lotteries
1
see Savage (1954), The Foundations of Statistics ch. 2-3
2
Anscombe and Aumann (1963), “A Definition of Subjective Probability,” Annals of Mathematical Statistics 34.1.
237
4 When is a lottery better than another lottery?
• So far, we’ve established what it means for a decisionmaker to prefer one lottery to another –
R R
when u(x)dF (x) ≥ u(x)dG(x)
• And we’ve established what it means for one decisionmaker to be more risk-averse than
another –
00 00
when − uu0 ≥ − vv0 , or when u = g ◦ v and g is concave
• Next, we’ll think about what it means for one lottery to be better than another,
in the sense that every decisionmaker would prefer it
238
4.1 First-Order Stochastic Dominance
• To start: when would every decisionmaker – risk-averse, risk-neutral, or risk-loving, regardless
of u – agree that lottery F is better than lottery G?
• We could also phrase this as, how could we unambiguously improve a lottery?
• That new composite lottery – where for each prize x that you would have received under G,
you get a prize under F that’s at least x –
seems to pretty clearly be better
239
• so, shifting some probability toward higher prizes seems like an uncontroversial way to improve
a lottery,
and seems to relate to lowering the value of the CDF in some places
• It turns out, this gives us a clear characterization of when all decisionmakers should unani-
mously agree one lottery is better than another
• First, let’s prove the “only-if” – that F ≥F OSD G is necessary for this to hold,
or that if F 6≥F OSD G, there’s some expected-utility maximizer who prefers G
• So clearly, if every expected-utility maximizer (for u increasing but not necessarily concave)
prefers F to G,
F must first-order stochastically dominate G, or F (x) ≤ G(x) everywhere
240
• Now let’s prove that if F ≥F OSD G, everyone agrees it’s better
• Then Z ∞ Z ∞
U (F ) − U (G) = u(x)f (x)dx − u(x)g(x)dx
0 0
Z ∞Z x Z ∞Z x
= u0 (s)ds f (x)dx − u0 (s)ds g(x)dx
0 0 0 0
Z ∞Z ∞ Z ∞Z ∞
0
= f (x)dx u (s)ds − g(x)dx u0 (s)ds
0 s 0 s
Z ∞ Z ∞
0
= (1 − F (s))u (s)ds − (1 − G(s))u0 (s)ds
0 0
Z ∞
= (G(s) − F (s))u0 (s)ds
0
• So if F %F OSD G,
we can generate F from G by shifting some probability to the right
241
4.2 Second-Order Stochastic Dominance
• Next, we’ll consider the case where two lotteries have the same expected value,
but one is unambiguously more risky than the other
• but we’d like a more general condition that captures the same intuition
• But we still get a very clean characterization of when one distribution is riskier than another
242
• Definition. F second-order stochastically dominates G, or F ≥SOSD G, if they have
the same expected value and
Z x Z x
F (s)ds ≤ G(s)ds
−∞ −∞
for all x.
R∞
(Note EF = EG implies that −∞ (F (s) − G(s))ds = 0 when taken over the whole support.)
• Just like Pratt showed us the equivalence of several notions of a “more risk averse” decision-
maker,
Rothschild and Stiglitz showed the equivalence of several notions of a riskier lottery
• Theorem (Rothschild and Stiglitz). Let F and G be two lotteries with the same expected
value. The following definitions of “G is more risky than F ” are equivalent:
Rx Rx
1. −∞ F (s)ds ≤ −∞ G(s)ds for every x
2. F % G for any risk-averse expected utility maximizer,
R R
or u(x)dF (x) ≥ u(x)dG(x) for any increasing, concave u
3. G is derived from F via a mean-preserving spread –
the distribution of outcomes in G can be derived by first taking a draw from F and then
adding mean-0 noise
d
• (To be formal, if X ∼ F and Y ∼ G, condition three is that Y = X + Z,
where Z is another random variable which is not necessarily independent of X
d
but has E(Z|X = x) = 0 for every x, and = means “has the same distribution as”)
243
• I’m not going to give the full proof
• to see that 1 and 2 are equivalent, in the case where u is twice differentiable,
when EF = EG , we can show3
Z ∞ Z x Z x
00
U (F ) − U (G) = (−u (x)) G(s)ds − F (s)ds dx
−∞ −∞ −∞
• (show what the u looks like – it’s basically u(x) = min{x, y},
just smoothed a little to be differentiable right around y)
3
We already showed that U (F ) − U (G) = uR0 (x)(G(x) − F (x))dx. If we assume u is twice differentiable, and
R
∞
abuse notation a little and write u0 (x) = u0 (∞) − x u00 (s)ds, this is
R∞ R∞
u0 (∞) − x u00 (s)ds (G(x) − F (x))dx
U (F ) − U (G) = −∞
R∞ R R∞
= u0 (∞) −∞
(G(x) − F (x))dx + x
(−u00 (s))ds(G(x) − F (x))dx
R
Integration by parts allows us to show EF = EG ↔ (G(x) − F (x))dx = 0, so the first term vanishes; switching the
order of integration on the second gives
R ∞ R s R ∞ R s Rs
U (F ) − U (G) = −∞ −∞
(G(x) − F (x))dx (−u00 (s))ds = −∞ −∞
G(x)dx − −∞ F (x)dx (−u00 (s))ds
244
• Rothschild and Stiglitz show that the integral condition is also equivalent to being able to
get to G from F via a series of mean-preserving spreads,
by constructing the mean-preserving spreads for discrete distributions and making a limit
argument for the continuous case
Rx Rx
• so, the integral condition −∞ F (s)ds ≤ −∞ G(s)ds holding at every x,
is equivalent to every risk-averse decisionmaker preferring F to G,
which is equivalent to “G = F + noise” – you can generate G from F via a series of mean-
preserving spreads
245
4.3 a technical note I’ll skip in lecture
I’ll skip this in lecture, and feel free to skip over it, but I’m including it here because it’s neat. Here’s a
paragraph from Jonathan Levin’s 2006 lecture notes on Choice Under Uncertainty, available online:
“[The FOSD and SOSD proofs] rely heavily on integration by parts. There is, however, a general way to
prove stochastic dominance theorems that doesn’t require differentiability. The idea is that given a class
of utility functions U (e.g., all non-decreasing functions, all concave functions, etc.), it is often possible to
find a smaller set of “basis” functions B ⊂ U such that every function u ∈ U can be written as a convex
combination of functions in B. (If the set B is minimal, the elements of B are “extreme” points of U – think
R R R R
convex sets.) It is then the case that udG ≥ udF for all u ∈ U if and only if udG ≥ udF for all
u ∈ B, so one can focus on the basis functions to establish statistical conditions on G and F . You can try
this yourself for first order stochastic dominance: let B be the set of “step” functions, i.e., functions equal
to 0 below some point x and equal to 1 above it. For more, see Gollier (2001) [Christian Gollier (2001), The
Economics of Risk and Time, MIT Press.]; this was also the topic of Susan Athey’s Stanford dissertation.”
The idea that “an increasing function is a convex combination of step functions” is clearest when you consider
the differentiable case, since (up to an additive constant) we can write differentiable u on R+ as
Z x Z ∞
u(x) = u0 (y)dy = u0 (y)by (x)dy
0 0
where (
0 if x<y
by (x) =
1 if x≥y
So the step functions {by (x)} serve as a basis for the increasing and differentiable functions on R+ , with
R +∞
u0 (y) ≥ 0 giving the weight assigned to each basis function. Since −∞ by (x)dF (x) = 1 − F (y), it’s easy
R R
to see that F (·) ≤ G(·) everywhere if and only if by (x)dF (x) ≥ by (x)dG(x) for every basis function by .
And if we write any increasing Bernoulli utility function u(·) as a convex combination of basis functions by (·)
with weights w(y), then
Z Z Z Z Z
u(x)dF (x) = w(y)by (x)dy dF (x) = w(y) by (x)dF (x) dy
R R
so if we know by (x)dF (x) ≥ by (x)dG(x) for each basis function by , then it holds for every increasing
Bernoulli utility function.
If you want to try this for second -order stochastic dominance, then up to an additive constant, a basis for
the increasing, concave functions is the kinked functions
by (x) = min{x, y}
which are increasing with slope one on x < y and then constant on x > y.
246
5 Behavioral Critiques of Expected Utility Theory
• I mentioned last week that expected utility theory has been widely critiqued by a variety of
behavioral economics work
• Here are some classic examples of ways it doesn’t seem to match observations
• This makes sense, since risk aversion is related to the curvature of the utility function, not
the slope
• However, this means that expected utility maximizers should naturally become very nearly
risk-neutral when the stakes get small
• He uses this to basically argue that expected utility isn’t a useful theory at all levels –
that you can’t take seriously its implications on both large risks and small risks,
if you assume someone has a single, consistent expected utility function
• Rabin therefore argues that if people are risk-averse over small gambles,
it can’t be because their utility function is curved, but because it must be kinked at zero –
people get more disutility from a loss than utility from a gain, or are “loss-averse”
• (Ariel Rubinstein and others have responded that being that risk-averse “at any starting
wealth level” is unrealistic and that’s the “problem” here;
but I think it’s valid that concavity of u at the levels we want to assume for reasonable
“big-picture” behavior can’t generate substantial aversion to very small risky gambles;
if you don’t want to flip a coin to lose $1 or win $1.10, it can’t be because your u is concave)
4
M. Rabin (2000), “Risk Aversion and Expected-Utility Theorey: A Calibration Theorem,” Econometrica 68(5).
247
5.2 Prospect Theory
Daniel Kahneman and Amos Tversky, “Propsect Theory” (1979) – the paper that basically launched
behavioral economics – showed a bunch of experiments demonstrating that peoples’ choices are often
inconsistent with expected utility maximization5
A B
$2500 w.p. .33
$2400 w.p. .66 $2400 w.p. 1
$0 w.p. .01
C D
$2500 w.p. .33 $2400 w.p. .34
$0 w.p. .67 $0 w.p. .66
5
D. Kahneman and A. Tversky (1979), “Prospect Theory: An Analysis of Decision under Risk,” Econometrica
47(2).
248
• The Allais Paradox, to me, is really a rejection of the independence assumption
X Y
33
$2500 w.p. 34 $2400 w.p. 1
1
$0 w.p. 34
• By completeness, either X % Y or Y % X.
• But there are no rational preferences with independence that allow B A and C D
• (To see more explicitly that expected utility maximization can’t allow B A and C D,
note that
• So choice isn’t consistent with expected utility maximization, and seems to overvalue the
“sure thing”)
249
People overvalue small probabilities (skip?)
but
.01 ◦ $6000 .02 ◦ $3000
• (They interpret this as people overvaluing small probabilities, and argue this could
explain popularity of gambling as well as insurance)
• While they find people are typically risk-averse when it comes to gains,
they are often risk-loving with potential losses
• So while utility for money seems to be concave above w, it seems to be convex below 0!
• Kahneman and Tversky propose a different model – basically, using utility functions of dif-
ferent shapes above and below a consumer’s starting point, and multiplying each utility term
by a function of the probability, not the probability itself – to try to capture the patterns
they found
250
5.3 Discomfort over unknown uncertainty – the Ellsberg paradox
• One other fun example, introduced by Daniel Ellsberg
• a large container contains 300 balls; 100 are red, and the rest are some combination of white
and blue, but you don’t know exactly how many white and how many blue
• First, suppose you’re going to get $100 if you guess the color of the ball correctly; I’ll let you
guess either red or white
• Second, suppose you get $100 if you guess a color other than the color of the ball
• The first suggests people believe there are, on average, fewer white balls than red; the second
suggests believe there are more white balls
• In fact, people instead seem to prefer gambles where they know the exact probability, rather
than bets where they don’t
251
6 One last point – rational choice
• So yes, particularly when it comes to choice under uncertainty –
but also in other areas of consumer theory –
there’s a signifcant literature on how real behavior (either experimental or observed) conflicts
with our model
• But that said, in a real sense, we don’t “need” people to always act rational,
to believe there’s value in understanding a model of rational choice
6
Aumann (2019), “A Synthesis of Behavioural and Mainstream Economics,” Nature Human Behaviour 3
7
David Friedman (1996), Hidden Order: The Economics of Everyday Life, pp. 4-5
252
• For sure, in specific contexts,
people may behave in seemingly (and even systematically) “irrational” ways;
and if we know enough about a specific context,
we might have another model that offers more accurate and precise predictions
• and that’s why we’ve spent the last 7 weeks thinking about the question I posed day one –
“everyone optimizes; so what?”
• we’ve seen the implications when price-taking firms maximize their profits,
and when price-taking consumers maximize utility given a budget constraint;
seen what behavior is predicted by these models,
and therefore what observations would be consistent or inconsistent with them
• going forward, you all will be looking at similar questions – everyone optimizes, so what –
when we add in additional effects:
market-clearing – prices are affected by aggregate supply and demand –
or strategic concerns – my choices affect other peoples’ payoffs and vice versa –
or information – peoples’ behavior reveal information that may be relevant to me
• But the simple case is a key starting point to build out from
• I’ve enjoyed teaching you, and I wish you luck going forward
253