Gilboa Notes For Introduction To Decision Theory
Gilboa Notes For Introduction To Decision Theory
Theory
Itzhak Gilboa
March 6, 2013
Contents
1 Preference Relations 4
2 Utility Representations 6
2.1 Representation of a preference order . . . . . . . . . . . . . . . . 6
2.2 Characterization theorems for maximization of utility . . . . . . 7
3 Semi-Orders 15
3.1 Just Noticeable Difference . . . . . . . . . . . . . . . . . . . . . . 15
3.2 A note on the proof . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Uniqueness of the utility function . . . . . . . . . . . . . . . . . . 20
4 Choice Functions 23
7 de Finetti’s Theorem 43
7.1 Model and Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8 Anscombe-Aumann’s Theorem 46
8.1 Model and Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1
9 Savage’s Theorem 57
9.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.2 Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
9.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
9.3.1 Finitely additive measures . . . . . . . . . . . . . . . . . . 58
9.3.2 Non-atomic measures . . . . . . . . . . . . . . . . . . . . 59
9.3.3 Savage’s Theorem(s) . . . . . . . . . . . . . . . . . . . . . 60
9.4 The proof and qualitative probabilities . . . . . . . . . . . . . . . 61
13 References 75
2
These are notes for a basic class in decision theory. The focus is on decision
under risk and under uncertainty, with relatively little on social choice. The
notes contain the mathematical material, including all the formal models and
proofs that will be presented in class, but they do not contain the discussion of
background, interpretation, and applications. The course is designed for 30-40
hours.
3
1 Preference Relations
A subset of ordered pairs of a set X is called a binary relation. Formally, R is
a binary relation on X if R ⊂ X × X.
A binary relation R on X is
— reflexive if for every x ∈ X, xRx;
— complete if for every x, y ∈ X, xRy or yRx (or possibly both);
— symmetric if for every x, y ∈ X, xRy implies yRx;
— transitive if for every x, y, z ∈ X, xRy and yRz imply xRz.
Proof: Given x ∈ X apply the definition of completeness for the two elements
(x, y) being that x. Then xRy or yRx, and in both cases xRx. ¤
and
f (x) = { y | xRy } .
To see that (1) holds, assume, first, that xRy. Then y ∈ f (x) and by symmetry
also x ∈ f (y). Further, transitivity implies that z ∈ f (x) also satisfies z ∈ f (y)
4
and vice versa. Thus, f (x) = f (y). Conversely, if f (x) = f (y) we first note
that, by reflexivity, y ∈ f (y), hence y ∈ f (x) and xRy. ¤
The set { y | xRy } is called the equivalence class of x. The set A defined in
the proof, namely, the set of all equivalence classes (which obviously defines a
partition of X) is called the quotient set, denoted X/R.
Proof: To see that ˜ is transitive, assume that x˜y and y˜z, we have (x % y
and y % x) as well as (y % z and z % y). The first two parts imply (by
transitivity of %) x % z, and the second — z % x, so we get x˜z.
To see that  is transitive, assume x  y and y  z. That is, (x % y and
not y % x) as well as (y % z and not z % y). The first two parts imply x % z by
transitivity of % as above. We need to show that z % x does not hold. Indeed,
assume it did. Then we would have z % x and x % y, and transitivity (of %
again) would imply z % y, which is in contradiction to y  z. Hence ¬(z % x)
and x  z. ¤
5
2 Utility Representations
2.1 Representation of a preference order
Proof: (i) implies (ii): Let there be given x, y ∈ X. Assume first that
x  y. If u(y) ≥ u(x), by (i) we have y % x, a contradiction to x  y. Hence,
u(x) > u(y). Conversely, assume u(x) > u(y). If y % x we would have (by (i)
again) u(y) ≥ u(x), which isn’t true. Hence ¬(y % x). But completeness implies
that x % y or y % x has to hold, and if the latter doesn’t hold, the former does.
So we have x % y and ¬(y % x), that is, x  y.
(ii) implies (i): Let there be given x, y ∈ X. Assume first that x % y.
If u(y) > u(x), by (ii) we have y  x, a contradiction to x % y. Hence,
u(x) ≥ u(y). Conversely, assume that u(x) ≥ u(y). If x % y didn’t hold, we
would have, by completeness, y  x, and then, applying (ii), u(y) > u(x), a
contradiction. Hence u(x) ≥ u(y) implies x % y.
6
(i) implies (iii): Let there be given x, y ∈ X. Assume first that x˜y. Then
x % y and y % x. Applying (i) we have u(x) ≥ u(y) as well as u(y) ≥ u(x), hence
u(x) = u(y). Conversely, assume that u(x) = u(y). Then we have u(x) ≥ u(y),
which implies (by (i)) x % y, as well as u(y) ≥ u(x) which implies y % x, and
x˜y follows. ¤
To see that (iii) is strictly weaker than (i) and (ii), take a representation u
of a relation % with more than one equivalence class, and define v = −u. Such
a v will still represent indifferences as in (iii) but not preferences as in (i) or (ii).
Proof: It is easy to see that (ii) implies (i), independently of the cardinality
of X. Indeed, if u represents %, the latter is complete because ≥ is complete on
the real numbers, and % is transitive because so is ≥ (on the real numbers).
The main part of the proof is to show that (i) implies (ii). To this end,
we can assume, without loss of generality, that the equivalence classes of ˜ are
singletons, that is, that no two distinct alternatives are equivalent. The reason is
that from each equivalence class of ˜, say A, we can choose a single representative
xA ∈ A. Clearly, if we restrict attention to % on { xA | A ∈ X/˜ } (where X/˜
denotes the quotient set, consisting of equivalent classes of ˜), the relation is still
complete and transitive and the cardinality of X/˜ is finite or countably infinite.
Thus, is we manage to prove that (ii) implies (i) in this restricted case (of
singleton equivalence classes), we will have a function u : { xA | A ∈ X/˜ } → R
that represents % on { xA | A ∈ X/˜ }. It remains to extend it to all of X in the
7
obvious way (that respects equivalence, that is, set u(y) = u(xA ) for all y ∈ A
and for every A ∈ X/˜).
(As we will shortly see, assuming that the equivalence sets are singletons
doesn’t make a huge difference. However, it’s a good exercise to go over this
reasoning if it’s not immediately obvious.)
So let us now turn to the proof that (i) implies (ii) when the equivalence
classes of ˜ are singletons. Let us start with a simple proof for the finite case:
assume that (i) holds, that X is finite, and define, for x ∈ X,
u (x) = # { y | x % y }
u (x) = # { y | x % y }
we basically counted, for each x, how many y’s does it “beat”. This is as if an
alternative x collects “points” for its “victories” in the matches with other alter-
8
natives, and we assume that the points for all alternatives are equal. However,
we could do the same trick with points that are not necessarily equal. Suppose
that, for each y there is a “weight” αy > 0. Then, defining
X
u (x) = αy
{ y | x% y }
— if this is a real number — the proof above goes through: transitivity proves that
x % y implies u (x) ≥ u (y), and, because αx > 0, x  y implies u (x) > u (y).
All that is left is to choose weights αy > 0 such that the summation above is
always finite. This, however, can easily be done because X is countable. We can
take any enumeration of X, X = {x1 , ..., xn , ...} and set αxn = 21n . Since the
P
entire series n αxn converges, u (x) is well-defined, that it, it is a real number
for every x.
Let us now look at a second proof, which uses induction. We consider a enu-
meration of X, X = {x1 , ..., xn , ...}. Let Xn = {x1 , ..., xn } be the set consisting
of the first n elements of X according to this enumeration. Clearly, Xn ⊂ Xn+1
for all n ≥ 1 and X = ∪n≥1 Xn . We define u by induction: set u (x1 ) = 0 and
then, for each n ≥ 1, we are about to define u (xn+1 ) ∈ R given the definition
of u on Xn . We will prove that, according to this definition, for every n ≥ 1, if
u represents % on Xn , it will also represent % on Xn+1 , and then observe that
this means also that u represents % on X.
Observe that, when we say “u represents % on Xn ” we refer to the values of
u on Xn , which are the first n numbers defined in the proof. Formally speaking,
the function u on Xn is a different function that the function u defined on Xn+1 .
However, at stage n + 1 of the proof we only define u (xn+1 ) without changing
the values of u on Xn , and thus there is no need to use a different notation for
the function defined on the smaller set, Xn , and for its extension to the larger
set, Xn+1 .
The induction step, is, however, trivial: given Xn and u that is defined on it,
let there be given xn+1 . If xn+1 ≺ xi for all i ≤ n, set u (xn+1 ) = mini≤n u(xi )−
1. Symmetrically, if xn+1 Â xi for all i ≤ n, set u (xn+1 ) = maxi≤n u(xi ) + 1.
Otherwise, xn+1 is “between” two alternatives xi , xj that is, there are i, j ≤ n
9
such that
xi ≺ xn+1 ≺ xj
Proof: If there were such a function, then, for every value of x1 ∈ [0, 1] there
would be an open interval of utility values,
(with u (x1 , 1) > u (x1 , 0)) such that, for x1 > y1 , I(x1 ) and I(y1 ) are disjoint
(because u (x1 , 0) > u (y1 , 1)). However, on the real line we can only have
countably many disjoint open intervals (for instance, because each such interval
contains a rational number). ¤
10
One direction to follow (again, Debreu, 1959) is to assume that the set of
alternatives X is a topological space, and require that % be continuous with
respect to this topology. For example, in the case X = Rk , we define continuity
as follows: % is continuous if, for all x, y ∈ X and every {xn } ⊂ Rk such that
xn → x, (i) if xn % y for all n, then x % y, and (ii) if y % xn for all n, then
y % x.
We will not prove this theorem here. Rather, we follow the other direction,
due to Cantor (1915), where no additional assumptions are made on X. In
particular, it need not be a topological space, and if it happens to be one,
we still will not insist on continuity of u. Instead, Cantor used the notion of
separability: % is separable if there exists a countable set Z ⊂ X such that, for
every x, y ∈ X\Z, if x  y, then there exists z ∈ Z such that x % z % y.
11
To see that the separability axiom has some mathematical power, and that it
may give us hope for utility representation, let us see why it rules out Debreu’s
example above. In this example, suppose that Z ⊂ X = [0, 1]2 is a countable
set. Consider its projection on the first coordinate, that is,
Clearly, Z1 is also countable. Consider x1 ∈ [0, 1]\Z1 and note that (x1 , 1) Â
(x1 , 0). However, no element of Z can be between these two in terms of prefer-
ence, since it cannot have x1 as its first coordinate.
Proof of Theorem 9
As above, the discussion is simplified if we assume, without loss of generality,
that there are no equivalences.
We may go back to the two proofs of Theorem 5 and try to build on these
ideas. The second proof, by which the values of u (xn ) were defined by induction
on n, doesn’t generalize very graciously. For a countable set, one can have a
enumeration of the elements, such that each one of them has only finitely many
predecessors. This allowed us to find a value for u (xn ) for each n, so that u
represented % on the elements up to xn . However, when X is uncountable, no
such enumeration exists. Thus, there will be (many) elements x of X that have
infinitely many predecessors. And then it might be impossible to find a value
u (x) that allows u to represent % on all the elements up to x. (For example,
assume that x  z and yn  x where we have already assigned the values
1
u (z) = 0 and u (yn ) = n .)
However, the first proof does extend to the general set-up. Recall that, in
the countable case, we agreed that for each y there would be a “weight” αy > 0
and that, given these weights, we would define
X
u (x) = αy .
{ y | x% y }
You might think that, when X is uncountable, the corresponding idea would
be to have an integral (over all { y | x % y } for each x) instead of a sum. But
12
this would require a definition of an algebra on X (which is not the hard part)
and a definition of αy > 0 so that the function α· is integrable relative to
that algebra (which is harder). However, these complications are not necessary:
the separability requirement says that countably many elements “tell the entire
story”. Hence, we should take the sum not over all { y | x % y }, but only over
those elements of Z that are in this set.
Hence, for the proof that (i) implies (ii), let Z = {z1 , z2 , ...} and define
X 1 X 1
u(x) = i
− . (2)
2 2i
zi ∈X,x% zi zi ∈X,zi Âx
Clearly u(x) ∈ R for all x (in fact, u(x) ∈ [−1, 1]). It is easy to see that x % y
implies u(x) ≥ u(y). To see the converse, assume that x  y. If one of {x, y} is
in Z, u(x) > u(y) follows from the definition (2). Otherwise, invoke separability
to find z ∈ Z such that x  z  y and then use (2).
Another little surprise in this theorem, somewhat less pleasant, is how messy
the proof of the converse direction is. Normally we expect axiomatizations to
have sufficiency, which is a challenge to prove, and necessity which is simple. If
it is obvious that the axioms are necessary, they are probably quite transparent
and compelling. (If, by contrast, sufficiency is hard to prove, the theorem is
surprising in a good sense: the axioms take us a long way.) Yet, we should be
ready to sometimes work harder to prove the necessity of conditions such as
continuity, separability, and others that bridge the gap between the finite and
the infinite.
In our case, if we have a representation by a function u, and if the range
of u were the entire line R, we would know what to do: to select a set Z that
satisfies separability, take the rational numbers Q = {q1 , q2 , ...}, for each such
number qi select zi such that u(zi ) = qi , and then show that Z separates X.
The problem is that we may not find such a zi for each qi . In fact, it is even
possible that range(u) ∩ Q = ∅.
In some sense, we need not worry too much if a certain qi is not in the range
of u. In fact, life is a little easier: if no element will have this value, we will not
be asked to separate between x and y whose utility is this value. However, what
13
happens if we have values of u very close to qi on both sides, but qi is missing?
In this case, if we fail to choose elements with u-values close to qi , we may later
be confronted with x  y such that u(x) ∈ (qi , qi + ε) and u(y) ∈ (qi − ε, qi ) and
we will not have an element of z with x  z  y.
Now that we see what the problem is, we can also find a solution: for each
qi , find a countable non-increasing sequence {zik }k such that
assuming that the sets on the right hand sides are non empty. The (countable)
union of these countable sets will do the job. ¤
v = f (u)
also represents %. (In Theorem 8 one would need to require that f be continuous
to guarantee that v is continuous on X, as is u. In the other two theorems we
don’t have a topology on X, and continuity of u or v is not defined. Therefore it
makes no difference if f is continuous or not.) For that reason, a utility function
u that represent % is called ordinal. Importantly, when we say that “u is ordinal”
we don’t refer to a property of the function u as a mathematical object per se,
but to the way we use it. Saying that u is ordinal is like saying “I’m using
the function u, but don’t take me too literally; it’s actually but an example of
a function, a representative of a large class of functions that are equivalent in
terms of their observable content, and any monotone transformation of u can
be used as “the” utility function just as well. I’ll try not to say anything that
depends on the particular function u and that would not hold true if I were to
replace u by a monotone transformation thereof, v.”
14
3 Semi-Orders
3.1 Just Noticeable Difference
or
S0
>1+λ
S
or
log (S 0 ) − log (S) > δ ≡ log (1 + λ) > 0.
Inspired by this law, Luce (1956) was interested in strict preferences P that
can be described by a utility function u through the equivalence
15
If (3) for u : X → R and δ > 0, we say that the pair (u, δ) L-represents P .
Seeking to axiomatize L-representations, Luce considered the binary relation
P (on a set of alternatives X) as primitive. The relation P is interpreted as
strict preference, where I = (P ∪ P −1 )c — as absence of preference in either
direction, or “indifference”.1
Luce formulated three axioms, which are readily seen to be necessary for an
L-representation. He defined a relation P to be a semi-order if it satisfied these
three axioms, and showed that, if X is finite, the axioms are also sufficient for
L-representation.
To state the axioms, it will be useful to have a notion of concatenation of
relations. Given two binary relations B1 ,B2 ⊂ X × X, let B1 B2 ⊂ X × X be
defined as follows: for all x, y ∈ X,
We can finally state Luce’s axioms. The relation P (or (P, I)) is a semi-order
if:
L1. P is irreflexive (that is, xP x for no x ∈ X);
L2. P IP ⊂ P
L3. P P I ⊂ P .
16
X = R2 and P is defined by Pareto domination, P is transitive but you can
verify that it satisfies neither L2 nor L3.
Conditions L2 and L3 restrict the indifference relation I. For the Pareto
relation P , the absence of preference, I, means intrinsic incomparability. Hence
we can have, say, xP zP wIy without being able to say much on the comparison
between x and y. It is possible that y is incomparable to any of x, z, w because
one of y’s coordinates is higher than the corresponding coordinate for all of
x, z, w. This is not the case if I only reflects the inability to discern small
differences. Thus, L2 and L3 can be viewed as saying that the incomparability
of alternatives, reflected in I, can only be attributed to issues os discernibility,
and not to fundamental problems as in the case of Pareto dominance.
Looking at Luce’s three conditions, you may wonder why not require also
L4. IP P ⊂ P .
The answer is that it follows from the previous two. More precisely
Proof: (i) Assume L3. To see that L4 holds, let there be given x, y, z, w ∈ X
such that xIyP zP w. We need to show that xP w. If not, we have either wP x or
wIx. We argue that in either case, yP x. Indeed, if wP x, we have yP zP wP x.
Recall that P is transitive by L3. Hence yP x. If, however, wIx, we have
yP zP wIx and, by L3, yP x. However, this is a contradiction because xIy.
(ii) Assume L4. To see that L3 holds, let there be given x, y, z, w ∈ X such
that xP yP zIw. We need to show that xP w. If not, we have either wP x or wIx.
We argue that in either case, wP z. Indeed, if wP x, we have wP xP yP z and
(since L4 also implies transitivity of P ), wP z. If, however, wIx, then wIxP yP z
and L4 implies wP z. Thus, in both cases we obtain wP z, which contradicts
zIw.
17
(iii) Consider X = {x, y, z, w} and P = {(x, y) , (z, w)}. L3 and L4 hold
vacuously (as there are no chains of two P relations) but L2 doesn’t. (If it
did, we should have xP w because xP yIzP w — and, indeed, also zP y because
zP wIxP y.)
(iv) Consider X = {x, y, z, w} and P = {(x, y) , (y, z) , (x, z)}. L2 holds, as
P IP = {(x, z)} (because xP yIyP z but there is no other quadruple of elements
satisfying this chain of relations) and, indeed, (x, z) ∈ P . However, L3 does not
hold (if it did, xP yP zIw would imply xP w) nor does L4 (if it did, wIxP yP z
would imply wP z). ¤
We will not prove this theorem here, but we will make several comments
about it.
If we drop L3 (but do not add L4), we get a family of relations that Fishburn
(1970b, 1985) defined as interval relations. Fishburn proved that, if X is finite,
a relation is an interval relation if and only if it can be represented as follows:
for every x ∈ X we have an interval, (b(x), e(x)), with b(x) ≤ e(x), such that
that is, xP y iff the entire range of values associated with x, (b(x), e(x)), is higher
than the range of values associated with y, (b(y), e(y)).
Given such a representation, you can define u(x) = b(x) and δ(x) = e(x) −
b(x) to get an equivalent representation
Comparing (4) to (3), you can think of (4) as a representation with a variable
just noticeable difference, whereas (3) has a constant jnd, which is normalized
to 1.
18
3.2 A note on the proof
Claim 12 Q is transitive
Claim 13 E is transitive
Proof: Define ˜ as follows: x˜y if for every z, (xP z ⇔ yP z) and (zP x ⇔ zP y).
Clearly, ˜ is an equivalence relation. Also, x˜y implies xEy. To see the con-
verse, assume that xEy. If there exists z such that xP zP y (or yP zP x) then
xP y (yP x) and x˜y cannot hold. Hence xP z ⇒ yP z and vice versa. Similarly,
zP x ⇒ zP y and vice versa. ¤
19
Moreover, one can get an L-representation of P by a function u that simul-
taneously also satisfies
20
The functions u and v can be quite different on [0, 1]. But if x is such
that u(x) = 1, we will also have to have v(x) = 1. To see this, imagine that
v(x) > 1. Then there are alternatives y with v(y) ∈ (1, v(x)). This would mean
that, according to v, yP 0, while according to u, yI0, a contradiction.
The same logic applies to any point we start out with. That is, for every
x, y,
u(x) − u(y) = 1 ⇔ v(x) − v(y) = 1
and this obviously generalizes to (u(x) − u(y) = k ⇔ v(x) − v(y) = k) for every
k ∈ Z. Moreover, we obtain
we know that these inequalities will also hold for any other utility function v.
And the reason is, again, that utility differences became observable to a certain
degree: we have an observable distinction between “greater than the jnd” and
“smaller than (or equal to) the jnd”. This distinction, however coarse, gives
us some observable anchor by which we can measure distances along the utility
scale: we can count how many integer jnd steps exist between alternatives.
One can make a stronger claim if one recalls that the semi-orders were defined
for a given probability threshold, say, p = 75%. If one varies the probability,
one can obtain a different semi-order. Thus we have a family of semi-orders
{Pp }p>.5 . Under certain assumptions, all these semi-orders can be represented
simultaneously by the same utility function u, and a corresponding family of
jnd’s, {δ p }p>.5 such that
21
In this case, it is easy to see that the utility u will be unique to a larger degree
than before. We may even find that, as p → .5, δ p → 0, that is, that if we
are willing to make do with very low probabilities of detection, we will get very
low jnd’s, and correspondingly, any two functions u and v that L-represent the
semi-orders {Pp }p>.5 will be identical.
Observe that the uniqueness result depends discontinuously on the jnd δ:
the smaller is δ, the less freedom we have in choosing the function u, since
sup |u(x) − v(x)| ≤ δ. But when we consider the case δ = 0, we are back with a
weak order, for which u is only ordinal.
22
4 Choice Functions
The binary relation approach assumes that we observe choices between pairs
of alternatives. More generally, given a set of alternatives X, we may assume
that the choice is observed within various subsets of X, and not only between
pairs. Assume that X is finite, and denote by B ⊂ 2X \{∅} the collection of
(non-empty) subsets of X that are choice sets, that is, that choice within each of
them can be observed. This choice is assumed to be a subset of the set offered.
Thus we define a choice correspondence to be a function
C : B →2X \{∅}
with
C(B) ⊂ B ∀B ∈ B.
Further, we assume that B includes all subsets of size ≤ 3, so that the choice
functions we consider will be sufficiently informative.
This axiom states that, if in one context (B), where y was available (y ∈
B), x was chosen (x ∈ C(B)), then in any other context (B 0 ) where both are
available (x, y ∈ B 0 ), if y is good enough to be chosen (y ∈ C(B 0 )), then so is x
(x ∈ C(B 0 )). Thus, if in one instance x was observed to be at least as good as
y, we will never find that y is strictly better than x.
A choice function that satisfies WARP can be thought of as a binary relation.
To be precise, one may start with a choice function C, and, if it satisfies WARP,
define a binary relation %∗ such that C picks the %∗ -maximal elements in B for
every B ∈ B. Conversely, if one starts with a binary relation %, one may define
the choice function that selects the %-maximal elements (n B for every B ∈ B)
and show that it satisfies WARP. Details follow.
23
Let us first assume that a choice function C : B →2X \{∅} is given. Define
a binary relation %∗ =%∗ (C) as follows: for every x, y ∈ X, x %∗ y if (and only
if)2 there exists B ∈ B such that x, y ∈ B, and x ∈ C(B). That is, we say that
x %∗ y if there is a context in which x was revealed to be at least as desirable
as y.
Taking Â∗ to be the asymmetric part of %∗ , we find that x Â∗ y iff (i) for
at least one B ∈ B with x, y ∈ B, we have x ∈ C(B) but (ii) for no B ∈ B such
that x, y ∈ B, is it the case that y ∈ C(B).
Note that, if then there exists a B ∈ B such that x, y ∈ B, and x ∈ C(B)
but not y ∈ C(B). We could therefore say that x was “revealed to be strictly
preferred to” y. Indeed, it makes sense to define this formally: we write x Â0 y,
if there exists B ∈ B, with x, y ∈ B, such that x ∈ C(B) but y ∈
/ C(B). Thus,
x Â∗ y implies x Â0 y. But the converse isn’t generally true: it is possible that,
given one B only x is chosen, and given another, B 0 , only y is chosen (while x, y
are in both B and B 0 ). That is, the definition of Â0 allows for the possibility
that x Â0 y and y Â0 x. By contrast, the definition of Â∗ implies asymmetry:
if x Â∗ y, we know that in some contexts (sets B ∈ B with x, y ∈ B), x was
chosen, but in none was y chosen.
C ∗ (B) = C ∗ (B, %) = { x ∈ B | x % y ∀y ∈ B}
(observe that this is not necessarily a choice function as we’re not guaranteed
that the set hereby defined is non-empty.)
We can now state formally the equivalence between binary orders that are
complete and transitive and choice functions that satisfy WARP. Let us start
with the more immediate result:
2 In
case we haven’t mentioned this: definitions are always characterizations, that is, "if
and only if" statements. For this very reason, it is considered better style not to write "...
and only if" in definitions.
24
Proposition 14 If % is a weak order, then C ∗ (B, %) is a choice function sat-
isfies WARP. Furthermore, the relation corresponding to C ∗ is %: %=%∗ (C ∗ ).
Conversely, let us now start with a choice function that satisfies WARP and
define the relation from it.
Proof: Let there be given a choice function C that satisfies WARP. To see
that %∗ =%∗ (C ∗ ) is complete, consider B = {x, y} (which is in B as we assumed
that all sets with no more than three elements are in B). Because C({x, y}) 6= ∅,
it has to be the case that x ∈ B, and then x %∗ y, or y ∈ B, and then y %∗ x
(or both).
To see that %∗ is transitive, assume that x %∗ y and y %∗ z, and we will
prove that x %∗ z. As x %∗ y, there exists D ∈ B such that x, y ∈ B, and
x ∈ C(D). WARP then implies that the same would hold for D0 = {x, y} ∈ B:
x ∈ C({x, y}). Similarly, y %∗ z means that there exists some E ∈ B such
that y, z ∈ E, and y ∈ C(E) and this implies also y ∈ C({y, z}). Let us now
consider B = {x, y, z} ∈ B. We need to show that x ∈ C ({x, y, z}) (and then,
25
by definition of %∗ , x %∗ z is established). Assume that this is not the case, that
is, x ∈
/ C ({x, y, z}). Can it be the case that y ∈ C ({x, y, z})? The negative
answer is given by WARP: since x ∈ C({x, y}), x will be chosen whenever y
is (provided they are both available). Hence we find that x ∈
/ C ({x, y, z})
implies y ∈
/ C ({x, y, z}). But, by the same token, y ∈
/ C ({x, y, z}) implies
z∈
/ C ({x, y, z}) and it follows that, if x ∈
/ C ({x, y, z}) then C ({x, y, z}) = ∅, a
contradiction to the definition of choice functions. Hence x ∈ C ({x, y, z}) and
%∗ is transitive.
We now turn to show that, if we define C ∗ from the relation %∗ , we get the
function C that we started out with. That is, we wish to show that, for every
B ∈ B, C ∗ (B, %∗ ) = C(B).
Fix B. To see that C ∗ (B, %∗ ) ⊂ C(B), let x ∈ C ∗ (B, %∗ ), that is, x is a
%∗ -maximum in B. Choose y ∈ C(B). Since x is a %∗ -maximum in B, we know
that x %∗ y. By definition of %∗ , for some B 0 , x, y ∈ B 0 , x ∈ C(B 0 ). But then
WARP implies x ∈ C(B) and C ∗ (B, %∗ ) ⊂ C(B) is established.
To see the converse inclusion, namely, that, C(B) ⊂ C ∗ (B, %∗ ), let x ∈
C(B). By definition of %∗ , this implies that x %∗ y for every y ∈ B. That
is, x is a %∗ -maximum in B. But this, in turn, is precisely the definition of
C ∗ (B, %∗ ). Hence x ∈ C ∗ (B, %∗ ) and C(B) ⊂ C ∗ (B, %∗ ) also holds.
Finally, to see uniqueness of the relation %∗ , it suffices to consider the sets
B’s that are pairs, and to observe that C on these sets is sufficient to define %∗
uniquely. ¤
26
5 von Neumann-Morgenstern/Herstein-Milnor The-
orem
In this section we present a theorem that is some combination of results, by
people whose names are in the title. de Finetti was the first to indicate the
type of result he needed to have, and we’ll discuss the context of his result later
on. von Neumann and Morgenstern (vNM) had the famous theorem which we
will study later on. The theorem we present here is slightly more general than
the result they proved, as it will be used for other structures as well. The
generalized version suggested here is still a special case of the generalization of
vNM’s theorem provided by Herstein and Milnor (1953).
Let there be an underlying set A and suppose that we are interested in
objects of choice that are described as real-valued functions on A. Thus,
X ⊂ RA .
27
A relation %⊂ X × X will be assumed to satisfy the following three axioms:
A1. Weak order: % is complete and transitive.
A2. Continuity: For every x, y, z ∈ X, if x  y  z, there exist α, β ∈ (0, 1)
such that
αx + (1 − α)z  y  βx + (1 − β)z.
U (x) = cV (x) + d ∀x ∈ X.
28
(i) for every λ ∈ (0, 1),
x  λx + (1 − λ)y  y
x  λx + (1 − λ)y  μx + (1 − μ)y  y.
x0  λ0 x0 + (1 − λ0 )y  y
x ∼ λx + (1 − λ)y ∼ y
Proof: Let there be x˜y and assume that for some λ ∈ (0, 1), z ≡ λx+(1−λ)y
does not satisfy z˜x. Assume that z  x, y. (The proof for the case z ≺ x, y is
symmetric.) By the previous lemma, we know that
z  αz + (1 − α)x  x
z = λx + (1 − λ)y  μx + (1 − μ)y  x ∼ y.
29
Next consider μ > λ and observe that μx + (1 − μ)y  y. Pick one such μ
and denote w = μx + (1 − μ)y, so that z  w  y.
Since w  y, for every β ∈ (0, 1) we have
w  βw + (1 − β) y  y
but for β = λ
μ we obtain w  z, a contradiction. Hence we have z ∼ x ∼ y. ¤
that is, that there exists ν < α such that νx + (1 − ν)z  αy + (1 − α)z. But
this is in contradiction to
The next lemma is a key step in defining the utility value for an alternative:
30
Lemma 20 Assume that x, y, z ∈ X are such that x  y and x % z % y. Then
there exists a unique α = α (x, y, z) ∈ [0, 1] such that z˜αx + (1 − α) y.
G = { α ∈ [0, 1] | αx + (1 − α)y  z }
E = { α ∈ [0, 1] | αx + (1 − α)y ˜ z }
B = { α ∈ [0, 1] | αx + (1 − α)y ≺ z }
B = [0, α∗ ] ; G = (α∗ , 1]
or
B = [0, α∗ ) ; G = [α∗ , 1].
x  z  α∗ x + (1 − α∗ )y
λx + (1 − λ) α∗ x + (1 − α∗ )y
α∗ x + (1 − α∗ )y  z  y
λ [α∗ x + (1 − α∗ )y] + (1 − λ) y
31
for λ ∈ (0, 1).
αx + (1 − α)y  z
Thus, continuity necessitates that both B and G be open intervals in [0, 1].
Since [0, 1] cannot be split into two disjoint open intervals, we find that E 6= ∅.
¤
It will be useful to have a notation for the alternatives that are, in terms of
preferences, in the range of a set of alternatives. For Y ⊂ X, define
½ ¯ ¾
¯ ∃g ∈ Y, g % x
[Y ]% = x ∈ X ¯¯
∃b ∈ Y, x % b
(We will only use this notation for finite, and rather small sets Y . Still, this
notation will save some lines.) For example, for two alternatives, b, g ∈ X such
that g % b,
[{b, g}]% = { x ∈ X | g % x % b }
In this case, we can also simplify notation and write [b, g]% for [{b, g}]% . With
this notation, we can state the following.
Proof: By Lemma 20, for every x ∈ [b, g]% there is a unique α = α (g, b, x) ∈
[0, 1] such that x˜αg + (1 − α) b. Define U (x) = α (g, b, x).
To see that U represents %, consider x, y ∈ [b, g]% . We have
and thus
x % y if f
32
which, in light of Lemma 17, is equivalent to
α (g, b, x) ≥ α (g, b, y)
or to
U (x) ≥ U (y) .
z = λx + (1 − λ) y (6)
and from
y˜α (g, b, y) g + (1 − α (g, b, y)) b
Thus
Finally, we wish to prove that this U is unique. Assume that V : [b, g]% → R
is also affine and represents %. Define
d = V (b)
33
so that
V (x) = cUb,g (x) + d
= cUb,g (x) + d.
Clearly, we’re nearing the end of the proof. We have more or less what
we needed: an affine function that represents preferences. This function can be
defined over each preference interval separately, no matter how large it is. Thus,
if X happens to have a maximal and a minimal elements, we’re done: we only
need to apply Lemma 21 to the interval between the minimal and the maximal
element, which spans all of X. However, some more work is needed if maximal
or minimal elements fail to exist.
We define the function U as follows. If all elements in X are equivalent, we
set U (x) = 0 for all x. This function is affine, and it represents preferences.
Moreover, it is unique up to a positive affine transformation: any other function
that represents preferences has to be a constant as well. Otherwise, not all
elements of X are equivalent. Thus, there are b, g ∈ X such that g  b. Fix
these two alternatives until the end of the proof, and set U (b) = 0 and U (g) = 1.
For x 6= b, g, define U (x) as follows:
(i) for x ∈ [b, g], define U (x) = Ub,g (x);
34
(ii) for x  g, define U (x) = 1/Ub,x (g) so that
for c = 1 − Ux,g (b) and d = Ux,g (b) (observe that 0 < c, d < 1).
Thus, for every x, U (x) is the unique number such that the vector (0, 1, U (x))
(which is not necessarily an increasing list of numbers) is an increasing affine
¡ ¢
transformation of U[{b,g,x}] (b) , U[{b,g,x}] (g) , U[{b,g,x}] (x) . Put differently, for
each x there exists a unique function V[{b,g,x}] : [{b, g, x}] → R such that (i)
V[{b,g,x}] is an increasing affine transformation of U[{b,g,x}] , so that V[{b,g,x}] is
affine and represents % on [{b, g, x}]; (ii) V[{b,g,x}] (b) = 0 and V[{b,g,x}] (g) = 1.
And then U (x) = V[{b,g,x}] (x).
We wish to show that U so defined satisfies the two conditions, namely,
that it represents preferences and that it is affine. Let there be given x, y ∈ X
and consider the set Y = {b, g, x, y}. We know that there exists an affine U[Y ]
that represents preferences on all of [Y ], Y = {b, g, x, y} included. It has a
unique increasing affine transformation, V[Y ] that also satisfies V[Y ] (b) = 0 and
V[Y ] (g) = 1. Consider z ∈ [Y ]. We wish to show that U (z) = V[Y ] (z). Indeed,
we know that U (z) = V[{b,g,z}] (z); moreover, V[Y ] (weakly) extends V[{b,g,z}]
from [{b, g, z}] to all of [Y ]; since they are both affine, and both represent
preferences on [{b, g, z}], with V[{b,g,x}] (b) = V[Y ] (b) = 0 and V[{b,g,x}] (g) =
V[Y ] (g) = 1, V[{b,g,x}] (·) = V[Y ] (·) on [{b, g, z}]. Hence V[{b,g,x}] (z) = V[Y ] (z)
and U (z) = V[Y ] (z) follows. Because V[Y ] represents preference on [Y ], we have,
35
in particular,
36
6 vNM Expected Utility
6.1 Model and Theorem
Since the objects of choice are lotteries, the observable choices are modeled
by a binary relation on L, %⊂ L × L. The vNM axioms are:
37
Theorem 22 (vNM) %⊂ L × L satisfies V1-V3 if and only if there exists u :
X → R such that, for every P, Q ∈ L
X X
P %Q iff P (x)u(x) ≥ Q(x)u(x).
x∈X x∈X
6.2 Proof
u (x) = U ([x])
The first lemmas of Theorem 16 are needed whichever way we look at the
vNM or Herstein-Milnor theorems. However, once we established these, when
the time comes to define the utility function, there are two other ways to con-
tinue. The proof provided above is relatively general, yet it makes use of very
little machinery. Moreover, it has the advantage of mimicking a process by
which the decision maker’s utility is calibrated. However, this proof does not
shed much light on the geometry of preferences. The following approaches add
something in this respect.
38
X = {x1 , x2 , x3 } where x1 Â x2 Â x3 . Every lottery in L is a vector (p1 , p2 , p3 )
such that pi ≥ 0 and p1 + p2 + p3 = 1. For visualization, let us focus on the
probabilities of the best and worst outcomes. Formally, consider the p1 p3 plane:
draw a graph in which the x axis corresponds to p1 and the y axis — to p3 . The
Marschak-Machina Triangle is
∆ = {(p1 , p3 ) | p1 , p3 ≥ 0, p1 + p3 ≤ 1} .
Thus, the point (1, 0) corresponds to the best lottery x1 (with probability 1),
(0, 0) — to x2 , and (0, 1) — to the worst lottery x3 . Every lottery P corresponds
to a unique point (p1 , p3 ) in the triangle, and vice versa. We will refer to the
point (p1 , p3 ) by P as well.
Consider the point (0, 0). By reasoning as in the previous proof, we conclude
that, along the segment connecting (1, 0) with (0, 1) there exists a unique point
which is equivalent to (0, 0). Such a unique point will exist along the segment
connecting (1, 0) with (0, c) for every c ∈ [0, 1]. The continuity axiom implies (in
the presence of the independence axiom) that these points generate a continuous
curve, which is the indifference curve of x2 .
Lemmas 17 and 18 imply that the indifference curves are linear. (Otherwise,
they will have to be “thick”, and for some c we will obtain intervals of indifference
on the segment connecting (1, 0) with (0, c).) We want to show that they are
also parallel.3
Consider two lotteries P ∼ Q. Consider another lottery R such that S =
R + (Q − P ) is also in the triangle. (In this equation, the points are considered
as vectors in ∆.) We claim that R ∼ S. Indeed, if, say R Â S the independence
axiom would have implied 12 R+ 12 Q Â 12 S + 12 Q, and, by P ∼ Q, also 12 S + 12 Q ∼
1
2S + 12 P . We would have obtained 12 R + 12 Q Â 12 S + 12 P while we know that
these two lotteries are identical. (Not only equivalent, simply equal, because
S + P = R + Q.) Similarly S Â R is impossible. That is, the line segment
3 You may suggest that linear indifference curves that are not parallel would intersect,
contradicting transitivity. But if the intersection is outside the triangle, such preferences may
well be transitive. See Chew (1983) and Dekel (1986).
39
connecting R and S is also an indifference curve. However, by P − Q = R − S
we realize that the indifference curve going through R, S is parallel to the one
going through P, Q. This argument can be repeated for practically every R if
Q is sufficiently close to P . (Some care is needed near the boundaries.) Thus
all indifference curves are linear and parallel.
The Independence axiom might bring to mind some high school geometry.
Geometrically, the Independence axiom states that indifference curves should
be parallel: consider P, Q, R, and draw a triangle whose base is P Q and whose
apex is R. Assume that P ˜Q so that the base of the triangle is an indifference
curve. Then, when you consider points on the edges P R and QR that are
proportionately removed from P (Q) in the direction of R — that is, αP +(1−α)R
and αQ + (1 − α)R — you find that, by the Independence axiom, they are also
equivalent to each other. Thus, the segment connecting them is also part of
an indifference curve. But the proportionality means that we generated similar
triangles, and their bases are parallel.
Once we know that the indifference curves are linear and parallel, we’re more
or less done: linear and parallel lines can be described by a single linear function.
That is, one can choose two numbers a1 and a3 such that all the indifference
curves are of the form a1 p1 + a3 p3 = c (varying the constant c from one curve to
the other). Setting u(x1 ) = a1 , u(x2 ) = 0, and u(x3 ) = a3 , this is an expected
utility representation.
This argument can be repeated for any finite set of outcomes X. “Patching”
together the representations for all the finite subsets is done in the same way as
in the algebraic approach.
40
Consider the sets
© ª
A = P − Q ∈ RX |P % Q
and
© ª
B = P − Q ∈ RX |Q Â P .
where, for every R ∈ L, εR > 0. You may verify that this topology renders
vector operations continuous. (Observe that this is not the standard topology
on RX , even if X is finite, because εR need not be bounded away from 0. That
is, as we change the “target” R, the length of the interval coming out of P in
the direction of R, still inside the neighborhood, changes and may converge to
zero. Still, in each given direction R − P there is an open segment, leading from
P towards R, which is in the neighborhood.)
When we separate A from B by a linear functional, we can refer to the
functional as the utility function u. Linearity of the utility with respect to the
41
probability values guarantees affinity, i.e., that
Since every P has a finite support, using this property inductively results in the
expected utility formula.
42
7 de Finetti’s Theorem
7.1 Model and Theorem
x%y iff px ≥ py.
43
As a reminder, ∆n−1 is the set of probability vectors on {1, ..., n}. The
P
notation px refers to the inner product, that is, i pi xi , which is the expected
payoff of x relative to the probability p.
7.2 Proof
Let us first show that D1-D3 are equivalent to the existence of p ∈ Rn such that
x%y iff px ≥ py
for every x, y ∈ X.
Necessity of the axioms is immediate. To prove sufficiency, observe first that,
for every x, y ∈ X,
x%y iff x − y % 0.
Define
A = {x ∈ X |x % 0}
and
B = {x ∈ X |0 Â x} .
We wish to show that they are convex. To this end, we start by observing
that, if x % y, then x % z % y where z = x+y
2 . This is true because, defining
d= y−x
2 , we have x+d = z and z+d = y. D3 implies that x % z ⇔ x+d % z+d,
i.e. x % z ⇔ z % y. Hence z  x would imply y  z and y  x, a contradiction.
Hence x % z, and z % y follows from x % z.
Next we wish to show that if x % y, then x % z % y for any z = λx +(1− λ)y
with λ ∈ [0, 1]. If λ is a binary rational (i.e., of the form k/2i for some k, i ≥ 1),
the conclusion follows from an inductive application of the previous claim (for
λ = 1/2). As for other values of λ, z  x (y  z) would imply, by continuity,
the same preference in an open neighborhood of z, including binary rationals.
44
It follows that one can separate A from B by a linear function. That is,
there exists a linear f : X → R and a number c ∈ R such that
x ∈ A iff f (x) ≥ c
x % y
iff x−y % 0
iff x−y ∈ A
iff f (x − y) ≥ 0
iff px ≥ py.
45
8 Anscombe-Aumann’s Theorem
8.1 Model and Theorem
Anscombe-Aumann’s model has states of the world, and derives subjective prob-
abilities on them, as does de Finetti’s. However, in this model it is not assumes
that the outcomes are real numbers; rather, the outcomes are vNM lotteries.
So we have two levels of uncertainty: first, we do not know which state of the
world will obtain, and we don’t even have a probability for that uncertainty.
Second, given a state, the decision maker will be facing a lottery with known,
objective probabilities as in the vNM model.
Formally, we use we the set-up introduced by Fishburn (1970). As a re-
minder, the vNM lotteries are
½ ¯ ¾
¯ #{x|P (x) > 0} < ∞,
L = P : X → [0, 1] ¯ P
¯ P (x) = 1
x∈X
and this set is endowed with a mixing operation: for every P, Q ∈ L and every
α ∈ [0, 1], αP + (1 − α)Q ∈ L is given by
The state space is S. We wish to state that acts are functions from S to
L. In general we would need to endow S with a σ-algebra, and deal with
measurable and bounded acts. Both of these terms have to be defined in terms
of preferences, because we don’t have yet a utility function. Instead, we will
simplify our lives and assume that S is finite. However, the theorem holds also
for general measurable spaces.
The set of acts is F = LS . We will endow F with a mixture operation as
well, performed pointwise. That is, for every f, g ∈ F and every α ∈ [0, 1],
αf + (1 − α)g ∈ F is given by
46
P % Q, understood as fP % fQ where, for every R ∈ L, fR ∈ F is the constant
act given by fR (s) = R for all s ∈ S.
The interpretation is that, if the decision maker chooses f ∈ F and Nature
chooses s ∈ S, a roulette wheel is spun, with distribution f (s) over the outcomes
X, so that your probability to get outcome x is f (s)(x).
For a function u : X → R we will use the notation
X
EP u = P (x)u(x)
x∈X
for P ∈ L.
Thus, if you choose f ∈ F and Nature chooses s ∈ S, you will get a lottery
f (s), which has the expected u-value of
X
Ef (s) u = f (s)(x)u(x).
x∈X
Anscombe-Aumann’s axioms are the following. The first three are identical
to the vNM axioms. Observe that they now apply to more complicated crea-
tures: rather than to specific vNM lotteries, we now deal with functions whose
values are such lotteries, or, if you will, with vectors of vNM lotteries, indexed
by the state space S. The next two axioms are almost identical to de Finetti’s
last two axioms, guaranteeing monotonicity and non-triviality:
47
Theorem 24 (Anscombe-Aumann) % satisfies AA1-AA5 if and only if there
exist a probability measure μ on S and a non-constant function u : X → R such
that, for every f, g ∈ F
Z Z
f %g iff (Ef (s) u)dμ(s) ≥ (Eg(s) u)dμ(s)
S S
8.2 Proof
The first part of the proof is a direct application of Theorem 16. The objects
of choice can be thought of as matrices whose columns are states in S and their
rows are outcomes in X. For the sake of the concreteness, let’s assume that
X is also finite, as is S. Then, every act f can be thought of as a matrix of
non-negative numbers, such that in each column (that is, for every state s), the
numbers sum up to 1 (defining a probability distribution over the outcomes in
X). Viewed thus, an act f is an extreme point of the set F if, at each and
every column s, it assigns probability 1 to an outcome x. Thus, there are |X||S|
extreme points, and F is their convex hull.
The first three axioms mean that we can have a representation of % by
an affine function U . We now wish to show that this affine function can be
represented as
X
U (f ) = f (s) (x) u (x, s)
x,s
for some u : X × S → R.
48
proof would be immediate. However, the set F has the additional constraint
P
that x f (s) (x) = 1 for each s separately, and this means that it has many
more extreme points and a bit more needs to be said to obtain (7).
Let us choose x∗ ∈ X and shift U so that U ([x∗ ] , ..., [x∗ ]) = 0. This can be
done without loss of generality. Next, define, for each s,
u (x∗ , s) = 0.
us (·) ≡ u (·, s) : L → R
that is, to have us be defined for all lotteries on X, with u (x, s) = u ([x] , s) =
us ([x]), that is, to define us in such a way that the degenerate lottery [x],
assigning probability 1 to x, has the same value as the outcome x. (Obviously,
this is an abuse of notation, but we’re accustomed to such sins by now.)
For P ∈ L, s ∈ S, define
½
0 P s0 = s
hP,s (s ) =
[x∗ ] s0 =
6 s
and
us (P ) = U (hP,s ) .
That is, us (P ) is the U value of the act f that obtains [x∗ ] at each state s0 6= s
and takes the value P at s:
where n = |S|.
49
We can think of f 0 as the mixture of f and [x∗ ]: clearly, because of our
definition of the mixture operation as a pointwise operation,
µ ¶
0 1 1
f = f + 1− ([x∗ ] , ..., [x∗ ])
n n
1
U (f 0 ) = U (f ) . (8)
n
On the other hand, we can also think of f 0 as the n-fold mixture of acts,
each of which equals [x∗ ] in all but one state. Formally, define gs ∈ F by
½
0 f (s) s0 = s
gs (s ) = hf (s),s =
[x∗ ] s0 6= s
1 X1
U (f ) = U (gs )
n n
and
X
U (f ) = U (gs ) .
Next, note that, by definition of gs , which is ([x∗ ] , ..., [x∗ ] , f (s), [x∗ ] , ..., [x∗ ]),
and the definition of us (·), we get
U (gs ) = us (f (s))
so that
X
U (f ) = us (f (s)) .
s
50
It only remains to note that, at each and every state s,
X
us (f (s)) = f (s) (x) u (x, s)
x
as in the reasoning in the vNM case (where affinity of U yields the results
immediately as the extreme points are the degenerate lotteries). ¤
v (x, s) = u (x, s) + β s
then we get a matrix v that also satisfies (10): indeed, for every f ∈ F ,
X X
f (s) (x) v (x, s) = f (s) (x) [u (x, s) + β s ]
x,s x,s
X X
= f (s) (x) u (x, s) + f (s) (x) β s
x,s x,s
X X X
= f (s) (x) u (x, s) + βs f (s) (x)
x,s s x
X X
= f (s) (x) u (x, s) + βs
x,s s
P
because, for every f ∈ F and s ∈ S, f (s) is a vNM lottery, so that x f (s) (x) =
0. Thus, shifting the utility numbers u (x, s) by a constant β s in column s (for
P
every x) results in a shift of U (f ) = x,s f (s) (x) u (x, s) and thus in a new
matrix that still represents preferences as in (10).
Let us pick an outcome x∗ ∈ X and henceforth assume that u (x∗ , s) = 0 for
all s. In view of the above, this restriction entails no loss of generality. One may
verify that the remaining degree of freedom is only a positive multiplication of
all {u (x, s)}x,s (by the same positive number).
us (x) = u (x, s)
51
are non-negative multiples of a single function u : X → R. More precisely,
we will distinguish between two type of states: those that are “null”, intuitively
corresponding to having a zero subjective probability, and that do not matter for
the decision, and those that are “non-null”, intuitively corresponding to positive
subjective probabilities. For any two non-null states,s, s0 , we wish to show that
us0 is a positive multiple of us . Then, we can fix one function u : X → R and
write u (x, s) = us (x) = μs u(x) for some μs > 0. Without loss of generality,
assume that we normalized the coefficients μs so that they sum up to 1. This
allows us to think of them as probabilities, writing
X X
f (s) (x) u (x, s) = f (s) (x) μs u (x)
x,s x,s
X X
= μs f (s) (x) u (x)
s x,s
X ¡ ¢
= μs Ef (s) u
s
namely, the expected utility of u, where the inner expression is the expectation
relative to the objective probabilities given by the lottery f (s), and all these
are integrated over with respect to the probability vector μ, interpreted as the
decision maker’s subjective probability over the state space S.
us (x) = u (x, s)
52
Because f (s) (x) and g (s) (x) are independent of s, this can be written as
X X X X
f (s) (x) u (x, s) ≥ g (s) (x) u (x, s)
x s x s
that is,
X X
f (s) (x) u (x) ≥ g (s) (x) u (x) .
x x
P
In other words, the sum of state-utilities, u = s∈S us is a vNM function
that represents preferences over constant acts. We now wish to show that for
every s there is μs ≥ 0 such that us (·) = μs u (·).
az ≥ 0 ⇒ bz ≥ 0
P
for every z ∈ Rn with i zi = 0. Then there are λ ≥ 0 and c ∈ R such that
bi = λai + c
for every i ≤ n.
subject to
az ≥ 0
1z = 0
az ≥ 0 ⇒ bz ≥ 0 ∀z ∈ Rn , 1z = 0
53
is equivalent to (P) being bounded, which is equivalent to its dual being feasible.
The dual will have two variables — say, λ for the first constraint and c for the
second. Its objective function is
0λ + 0c = 0
M azλ,c 0
subject to
λai + c = bi ∀i
bi = λai + c
Monotonicity implies, that whenever this is the case (f (s) % g (s)), we have
f % g. However, f % g is equivalent to
X X
f (s0 ) (x) u (x, s0 ) ≥ g (s0 ) (x) u (x, s0 )
x,s0 x,s0
54
and, since f (s0 ) = g (s0 ) for s0 6= s, also to
X X
f (s) (x) us (x) ≥ g (s) (x) us (x)
x x
or
X
[f (s) (x) − g (s) (x)] us (x) ≥ 0
x
that is,
[f (s) (·) − g (s) (·)] b ≥ 0.
Consider a vNM lottery P such that P (x) = 1/n for all x. Select f such
¡ ¢n
that f (s) = P . For z ∈ − n1 , n1 , 1z = 0, select g such that f (s0 ) = g (s0 ) for
1
s0 6= s and g (s) (x) = n − z. So that f (s) (·) − g (s) (·) = z. For every such z
we therefor get that az ≥ 0 ⇒ bz ≥ 0. Due to homogeneity, this also implies
that az ≥ 0 ⇒ bz ≥ 0 holds for every vector z ∈ Rn , 1z = 0. By the lemma, we
have λ ≥ 0 and c ∈ R such that
us (xi ) = λu (xi ) + c.
us (x) = λs u (x) ∀x ∈ X.
55
Conversely, if λs = 0 it follows that us (x) vanishes for all x, and then s is null.
Thus, λs > 0 iff s is non-null (and λs = 0 iff s is null). Since there are non-null
P
states, s λs > 0 and
λs
μs = P
s λs
for all f ∈ F .
56
9 Savage’s Theorem
9.1 Set-up
Savage’s model includes two primitive concepts: states and outcomes. The set
of states, S, should be thought of as an exhaustive list of all scenarios that might
unfold. An event is any subset A ⊂ S. There are no measurability constraints,
and S is not endowed with an algebra of measurable events. If you wish to
be more formal about it, you can define the set of events to be the maximal
σ-algebra, Σ = 2S , with respect to which all subsets are measurable.
The set of outcomes will be denoted by X. An outcome x is assumed to
specify all that is relevant to your well-being, insomuch as it may be relevant to
your decision.
The objects of choice are acts, which are defined as functions from states to
outcomes, and denoted by F . That is,
F = X S = {f | f : S → X} .
Acts whose payoffs do not depend on the state of the world s are constant
functions in F . We will abuse notation and denote them by the outcome they
result in. Thus, x ∈ X is also understood as x ∈ F with x(s) = x.
Since the objects of choice are acts, Savage assumes a binary relation %⊂
F × F . The relation will have its symmetric and asymmetric parts, ∼ and Â,
defined as usual. It will also be extended to X with the natural convention.
Specifically, for two outcomes x, y ∈ X, we say that x % y if and only if the
constant function that yields always x is related by % to the constant function
that yields always y.
For two acts f, g ∈ F and an event A ⊂ S, define an act fAg by
½
g(s) s ∈ A
fAg (s) = .
f (s) s ∈ Ac
Think of fAg as “f , where on A we replaced it by g”.
An event A is null if, for every f, g ∈ F , f ∼A g. That is, if you know
that f and g yield the same outcomes if A does not occur, you consider them
equivalent.
57
9.2 Axioms
P1 % is a weak order.
x
yA % yB
x z
iff wA % wB
z
.
fAhi  g h
and f  gA i
.
9.3 Results
9.3.1 Finitely additive measures
58
and μ(Ω) = 1. Condition (11) is referred to as σ-additivity.
Finite additivity is the condition known as μ (A ∪ B) = μ (A) + μ (B) when-
ever A ∩ B = ∅, which is clearly equivalent to (11) if you replace ∞ by any
finite n:
n
X
μ (∪ni Ai ) = μ (Ai ) (12)
i
whenever i 6= j ⇒ Ai ∩ Aj = ∅
(11) means
³ ´ ∞
X
μ lim Bn = μ (∪∞
i Ai ) = μ (Ai ) = lim μ (Bn )
n→∞ n→∞
i
that is, σ-additivity of μ is equivalent to saying that the measure of the limit is
the limit of the measure, when increasing sequences of events are concerned.
59
In the case of a σ-additive μ, all three definitions coincide. But this is not
true for finite additivity. Moreover, the condition that Savage needs, and the
condition that turns out to follow from P6, is the strongest.
Hence, we will define a finitely additive measure μ to be non-atomic if for
every event A with μ(A) > 0, and for every r ∈ [0, 1], there is an event B ⊂ A
such that μ(B) = rμ(A).
Observe that this theorem restricts u to be bounded. (Of course, this was not
stated in Theorem 27 because when X is finite, u is bounded.) The boundedness
of u follows from P3. Indeed, if u is not bounded one can generate acts whose
expected utility is infinite (following the logic of the St. Petersburg Paradox).
This, in and of itself, is not an insurmountable difficulty, but P3 will not hold for
such acts: you may strictly improve f from, say, x to y on a non-null event A,
and yet the resulting act will be equivalent to the first one, both having infinite
60
expected utility. Hence, as stated, P3 implies that u is bounded. An extension
of Savage’s theorem to unbounded utilities is provided in Wakker (1993a).5
A corollary of the theorem is that an event A is null if and only if μ(A) = 0. In
Savage’s formulation, this fact is stated on par with the integral representation
(13).
Savage’s proof is too long and involved to be covered here. Savage (1954) de-
velops the proof step by step, alongside conceptual discussions of the axioms.
Fishburn (1970) provides a more concise proof, which may be a bit laconic, and
Kreps (1988, pp. 115-136) provides more details. Here I will only say a few
words about the strategy of the proof, and introduce another concept in this
context.
Savage first deals with the case |X| = 2. That is, there are two outcomes,
say, 1 and 0, with 1 Â 0. Thus every f ∈ F is characterized by an event A,
that is, f = 1A . Correspondingly, %⊂ F × F can be thought of as a relation
%⊂ Σ × Σ with Σ = 2S .
In this set-up P4 has no bite. Let us translate P1-P3 and P5 to the language
of events. P1 would mean, again, that % (understood as a relation on events)
is a weak order. P2 is equivalent to the condition:
A%B iff A ∪ C % B ∪ C
Fishburn reports that this became obvious during a discussion they had later on.
61
A binary relation on an algebra of events that satisfies these conditions was
defined by de Finetti to be a qualitative probability. The idea was that subjective
judgments of “at least as likely as” on events that satisfied certain regularities
might be representable by a probability measure, that is, that a probability
measure μ would satisfy
If such a measure existed, and if it were unique, one could use the likelihood
comparisons % as a basis for the definition of subjective probability. Observe
that such a definition would qualify as a definition by observable data if you are
willing to accept judgments such as “I find A at least as likely as B” as valid
data.6
de Finetti conjectured that every qualitative probability has a (quantitative)
probability measure that represents it. It turns out that this is true if |S| ≤ 4,
but a counterexample can be constructed for n = 5. Such a counterexample was
found by Kraft, Pratt, and Seidenberg (1959), who also provided a necessary
and sufficient condition for the existence of a representing measure.
You can easily convince yourself that even if such a measure exists, it will
typically not be unique. The set of measures that represent a given qualitative
probability is defined by finitely many inequalities. Generically, one can expect
that the set will not be a singleton.
However, Savage found that for |X| = 2 his relation was a qualitative prob-
ability defined on an infinite space, which also satisfied P6. This turned out to
be a powerful tool. With P6 one can show that every event A can be split into
two, B ⊂ A and A\B, such that B ∼ A\B.7 Equipped with such a lemma,
one can go on to find, for every n ≥ 1, a partition S into 2n equivalent events,
Πn = {An1 , ..., An2n }. Moreover, using P2 we can show that the union of every k
events from Πn is equivalent to the union of any other k events from the same
partition. Should there be a probability measure μ that represents %, it has to
6 We will discuss such cognitive data in Part IV.
7 Kopylov (2007) provides a different proof, which also generalizes Savage’s theorem.
62
1 k
satisfy μ(Ani ) = 2n and μ(∪ki=1 Ani ) = 2n .
Given an event B such that S Â B, one may ask, for every n, what is the
number k such that
i=1 Ai  B % ∪i=1 Ai .
∪k+1 n k n
k+1 k
> μ(B) ≥ n .
2n 2
With a little bit of work one can convince oneself that there is a unique μ(B)
that satisfies the above for all n. Moreover, it is easy to see that
The problem then is that the converse is not trivial. In fact, Savage provides
beautiful examples of qualitative probability relations, for which there exists a
unique μ satisfying (15) but not the converse direction.
Here P6 is used again. Savage shows that P6 implies that % (applied to
events) satisfies two additional conditions, which he calls fineness and tightness.
(Fineness has an Archimedean flavor, while tightness can be viewed as a conti-
nuity of sorts.) With these conditions, it can be shown that the only μ satisfying
(15) satisfies also
BÂC implies μ(B) > μ(B).
conditions. He shows a qualitative probability relation that has a unique μ satisfying (15),
which is fine but not tight, and one which is tight but not fine, and neither of these has a
probability that represents it as in (14).
63
proceed. He first shows that if two acts have the same distribution (with finite
support), according to μ, they are equivalent. This means that, for a finite X,
one can deal with equivalence classes defined by distributions over outcomes.
Then Savage proves that the preference relation over these classes satisfies the
vNM axioms, and finally he extends the representation to an infinite X.
64
10 Choquet Expected Utility
10.1 Capacities and Choquet Integration
65
with the convention xm+1 = 0. If v is additive, this integral is equivalent to
Pm
the Riemann integral (and to j=1 xj v(Ej )). You can also verify that (16)
is equivalent to the following definition, which applies to any bounded non-
negative f (even if S were infinite, as long as f were measurable with respect
to the algebra on which v is defined):
Z Z ∞
f dv = v(f ≥ t)dt
S 0
where the integral on the right is a standard Riemann integral. (Observe that
it is well defined, because v(f ≥ t) is a non-increasing function of t.)
For functions that may be negative, the integral is defined so that, for every
function f and constant c,
Z Z
(f + c)dv = f dv + c
S S
— a property that holds for non-negative f and c. So we make sure the property
holds: given a bounded f , take a c > 0 such that g = f + c ≥ 0, and define
R R
S
f dv = S gdv − c.
10.2 Comonotonicity
The Choquet integral has many nice properties — it respects “shifts”, namely,
the addition of a constant, as well as multiplication by a positive constant. It
is also continuous and monotone in the integrand. But it is not additive in
general. Indeed, if we had
Z Z Z
(f + g)dv = f dv + gdv
S S S
for every f and g, we could take f = 1A and g = 1B for disjoint A and B, and
show that v(A ∪ B) = v(A) + v(B).
However, there are going to be pairs of functions f, g for which the Choquet
integral is additive. To see this, observe that (16) can be re-written also as
Z m
X h i
f dv = xj v(∪ji=1 Ei ) − v(∪j−1
i=1 Ei ) .
S j=1
66
Assume, without loss of generality, that Ei is a singleton. (This is possible
because we only required a weak inequality xj ≥ xj+1 .) That is, there is some
permutation of the states, π : S → S, defined by the order of the xi ’s, such that
∪ji=1 Ei consists of the first j states in this permutation. Given this π, define a
probability vector pπ on S by pπ (∪ji=1 Ei ) = v(∪ji=1 Ei ). It is therefore true that
Z Z
f dv = f dpπ
S S
that is, the Choquet integral of f equals the integral of f relative to some
additive probability pπ . Note, however, that pπ depends on f . Since different
f ’s have, in general, different permutations π that rank the states from high f
values to low f values, the Choquet integral is not additive in general.
Assume now that two functions, f and g, happen to have the same permu-
tation π. They will have the same pπ and then
Z Z Z Z
f dv = f dpπ and gdv = gdpπ .
S S S S
In other words, if f and g are two functions such that there exists a permu-
tation of the states π, according to which both f and g are non-increasing, we
will have additivity of the integral for f and g. When will f and g have such a
permutation? It is not hard to see that a necessary and sufficient condition is
the following:
f and g are comonotonic if there are no s, t ∈ S such that f (s) > f (t) and
g(s) < g(t).
67
For two acts f, g ∈ F , we say that f and g are comonotonic if there are no
s, t ∈ S such that f (s) Â f (t) and g(s) ≺ g(t).
(where the integrals are in the sense of Choquet). Furthermore, in this case v
is unique, and u is unique up to positive linear transformations.
68
still inside the set. Applying Theorem 16, one gets an equivalent of Anscombe-
Aumann representation, restricted to the cone of π-non-decreasing vectors. For
this cone, we therefore obtain a representation by a probability vector pπ . One
then proceeds to show that all these probability vectors can be described by a
single non-additive measure v.
69
11 Maxmin Expected Utility
11.1 Model and Theorem
Thus, uncertainty aversion requires that the decision maker have a preference
for mixing. Two equivalent acts can only improve by mixing, or “hedging”
between them. Observe that uncertainty aversion is also a weakened version
of Anscombe-Aumann’s independence axiom (which would have required αf +
(1 − α)g ∼ f whenever f ∼ g).
70
11.2 Idea of Proof
f %g ⇔ J (f ) ≥ J (g)
by letting
J ((c, c, ..., c)) = c
J (f ) = c
af f = J(f )
af g ≥ J(g) ∀g
and
J(f ) = min ag f
g
for all α ∈ [0, 1]. This implies that the supporting hyperplane defined by af has
to coincide with J on the segment connecting f and (c, c, ..., c). Hence
af (c, c, ..., c) = c
71
12 Arrow’s Impossibility Theorem
Assume that there is a set of alternatives A = {1, 2, ..., m} with m ≥ 3 and a
set of individuals N = {1, 2, ..., n} with n ≥ 2.
Let the set of linear orderings be R = {Â⊂ A × A| Â complete, transitive,
a-symmetric}
A preference aggregation function maps profiles of preferences to a preference
that is attributed to society. That is, a preference aggregation function is f :
Rn → R.
Given such a function, define:
1. Unanimity: For all a, b ∈ A, if a Âi b ∀i ∈ N , then af ((Âi )i )b
2. Independence of Irrelevant Alternatives (IIA): For all a, b ∈ A, (Âi )i , (Â0i )i
if
a Âi b ⇔ a Â0i b ∀i ∈ N
then
af ((Âi )i )b ⇔ af ((Â0i )i )b.
Proof: Clearly, all (the n different) dictatorial functions satisfy the two con-
ditions. The interesting (in fact, amazing) fact is that the opposite it true as
well. We turn to prove it now (based on one of the short proofs provided by
Geanakoplos, 2005).
72
Assume not. Then, there exists a profile (Âi )i and an alternative a such
that a is extreme in each of Âi but not in Â. Thus, there are b, c ∈ A such that
b  a  c. We can modify the profile (Âi )i to get another profile (Â0i )i such
that
(i) a is top (bottom) at Â0i whenever it is top (bottom) at Âi ;
(ii) c Âi b for all i
— simply by switching between b and c, if needed, in the profile Âi .
The ranking between a and any other alternative has not changed (it is the
same in Â0i as in Âi for each i), and, by IIA, a is ranked, relative to b and c,
in Â0 = f ((Â0i )i as it was in Â= f ((Âi )i ). Thus, b Â0 a Â0 c while unanimity
implies c  b. ¤
b Âi c ⇒ b  c ∀b, c ∈ A\{a}
(I) d  ja ∀j ≤ i
a  jd ∀j > i
⇒dÂa
and
(II) d  ja ∀j < i
a  jd ∀j ≥ i
⇒ a  d.
73
Given distinct b, c ∈ A\{a}, consider
b Âi a Âi c
d  ja ∀d 6= a, j < i
a  jd ∀d 6= a, j > i.
Then on {b, a} preferences look like pattern (I) and b  a follows. On {c, a}
preferences look like pattern (II) and a  c follows. Hence b  c. Finally,
due to the IIA, this has to be the case whenever the individuals have the same
rankings between b and c as in such profiles. However, the b/c rankings of the
other individuals were not constrained above, which means that individual i’s
ranking between b and c determine that of society’s. ¤
a  i(c) b (17)
b  i(a) c
c  i(b) a
which is possible unless i(a) = i(c) = i(b). However, in this case society’s
preferences would be cyclical. Hence, it has to be the case that there is no profile
for which (17) happens. That is, it has to be the case that i(a) = i(c) = i(b)
and the conclusion follows. ¤
74
13 References
(Not all of the following are mentioned above, but many of them might be
mentioned in class.)
75
de Finetti, B. (1930) Funzione caratteristica di un fenomeno aleatorio. Atti
Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Nat. 4 , 86-133.
–––— (1937), “La Prevision: Ses Lois Logiques, Ses Sources Subjectives”,
Annales de l’Institut Henri Poincare, 7, 1-68.
Ellsberg, D. (1961), “Risk, Ambiguity and the Savage Axioms", Quarterly Jour-
nal of Economics, 75: 643-669.
Fishburn, P.C. (1970a) Utility Theory for Decision Making. John Wiley and
Sons, 1970.
–––— (1985), Interval Orders and Interval Graphs. New York: Wiley and
Sons.
76
Gilboa, I. and R. Lapson (1995), “Aggregation of Semi-Orders: Intransitive
Indifference Makes a Difference”, Economic Theory, 5: 109-126.
Karni, E., D. Schmeidler and K. Vind (1983), “On state dependent preferences
and subjective probabilities,” Econometrica, 51: 1021-1031.
Knight, F. H. (1921), Risk, Uncertainty, and Profit. Boston, New York: Houghton
Mifflin.
77
Maccheroni, F., M. Marinacci, and A. Rustichini (2006a), “Ambiguity Aversion,
Robustness, and the Variational Representation of Preferences,” Econometrica,
74: 1447-1498.
Savage, L. J. (1954), The Foundations of Statistics. New York: John Wiley and
Sons. (Second addition in 1972, Dover)
78
Shafer, G. (1986), “Savage Revisited”, Statistical Science, 1: 463-486.
79