Neuro Economics
Neuro Economics
Neuro Economics
2015
Grüße an Oma Käthi, Elisabeth Eiche und Wilhelm und Margarete Busch.
Behavioral Game Theory
HH Nax, 2015
Habilitation Committee:
2
Thesis outline
In one of his last publications, “The Agencies Method for Modeling Coalitions
and Cooperation in Games” (IGTR 10: 539-564, 2008), John F. Nash wrote
I feel, personally, that the study of experimental games is the proper route of
travel for finding “the ultimate truth” in relation to games as played by human
players. But in practical game theory the players can be corporations or states;
so the problem of usefully analyzing a game does not, in a practical sense, re-
duce to a problem only of the analysis of human behavior. It is apparent that
actual human behavior is guided by complex human instincts promoting cooper-
ation among individuals and that moreover there are also the various cultural
influences acting to modify individual human behavior and at least often to
influence the behavior of humans toward enhanced cooperativeness.
To most, this quote will come as a surprise because the name John Nash is
associated, not with behavioral or experimental research, but rather with the
mathematics of game theory and, in particular, with the ‘Nash equilibrium’.
The Nash equilibrium is an interactive solution concept mostly applied to
idealized decision-making situations amongst infallible optimizers in the sense
of pure material self-interest. But before we turn to Nash’s contributions and
what the scope of this present thesis in addressing some of the issues raised
above will be, let us first turn to the origins and broad evolution of the fields
of game theory and of the study of behavior in games.
Beginning with the publication of “The Theory of Games and Economic Be-
havior” by John von Neumann and Oskar Morgenstern in 1944 (Princeton
University Press), the study of human interactions by what since is known as
“game theory” has revolutionized the sciences, especially the social sciences
and biology. In economics, for instance, game theory changed the way equilib-
rium concepts are understood and eleven Nobel Prizes have since gone to game
theorists. Game theory provides a sharp language to formulate mathematical
models of underlying interactions that promise clean predictions, now integral
parts of the social sciences toolbox.
A game is defined by a mapping from various combinations of “strategies”
taken by the involved “players” into resulting consequences in terms of “pay-
offs”. A “solution” predicts which outcomes of the game are to be expected.
A major issue with traditional/neoclassical game theory, however, has been
that its solution concepts, such as the Nash equilibrium (John F. Nash, 1950)
or the strong equilibrium (Robert Aumann, 1959), rely on four rather extreme
behavioral and informational assumptions. These are:
1. The joint strategy space is common knowledge.
2. The payoff structure is common knowledge.
3. Players have correct beliefs about other players’ behaviors and beliefs.
4. Players optimize their behavior so as to maximize their own material
payoffs.
3
In the real-world, the whole ensemble of these assumptions is often unten-
able. Players often do not behave like infallible optimizers in the sense of
pure material self-interest, and it would be negligent to think of the resulting
consistent and structured deviations as inexplicable irrationalities. “Behav-
ioral game theory”, the title and subject of this thesis, seeks to model these.
Broadly speaking, behavioral game theory is separable into two strands of
models.
The first strand of models presumes that the mismatch between theory and
real-world behaviors may be the result of capacity and/or informational con-
straints. Hence, decisions are not best described by strictly maximizing be-
havior. In particular, maximization fails when players have incomplete infor-
mation about the structure of the game and/or about the payoff consequences
of different actions taken by themselves and others for the other players. In a
repeated game setting, moreover, players may be unable -or only imperfectly
able- to observe other players’ actions and payoffs as the game goes on. Hence,
to describe more realistic human behaviors in complex game settings, models
of boundedly rational behavior, possibly allowing for learning dynamics, are
necessary.
The second strand of explanations for behaviors that consistently contradict
equilibrium predictions based on the standard assumptions of self-interest and
unbounded rationality may be that players are guided by alternative pref-
erences. In other words, provided the assumption of material self-interest
is flawed and that, instead, higher-order motives such as altruism or social
norms guide a player’s actions, then self-interest predictions are misguided,
even if players follow maximizing behavior. It is not that players’ behavior
is not maximizing, rather their maximand is something else than narrow self-
interest.
The two strands of explanations both have their respective appeals, and which
model is preferable will depend on the context of the application. To describe
the trading behavior of agents on financial markets, for example, one may favor
the first type of explanation; intention but failure to maximize own material
payoff. Similarly, such an approach may be preferable to describe behavior in
traffic/congestion games. By contrast, richer preference formulations allowing
for, for example, reciprocal considerations may be suitable to model volun-
tary contributions in situations such as community effort tasks or collective
bargaining. Of course, in reality, we would expect an admixture of both ex-
planations to matter in most situations, and their relative degrees to depend
on the precise context and setting of the game.
“Behavioral game theory”, with the subtitle “Experiments in strategic inter-
action”, is also the title of one of the first and best-known textbooks that
introduce this area of research to a broader audience (Camerer 2003; Prince-
ton University Press). Therein, the two strands of explanations sketched above
are expertly summarized and reviewed. This thesis builds on this body of work,
its aim being to synthesize the two approaches. New methods to combine and
4
to disentangle the two are proposed and applied to different games, illustrating
the need for a more nuanced theory, allowing for context-dependent behavior
in games. The thesis consists of theoretical and behavioral studies. Moreover,
the thesis also proposes a theoretical model of the complex decision-making of
coalitions of individuals, and not just of individuals. The thesis is structured
as follows:
Chapter 1. Introduction
Chapter 2. Learning
5
out as the winner, but explanations related to social preferences, especially
conditional cooperation, continue to matter.
This chapter is joint work with Maxwell Burton-Chellew (Department of Zo-
ology and Magdalen College, University of Oxford) and Stuart West (Depart-
ment of Zoology and Nuffield College, University of Oxford). It is published
in the Proceedings of the Royal Society B (“Payoff-based learning explains
the decline in cooperation in public goods games”, Proceedings of the Royal
Society of London B 282: 20142678, 2015). Maxwell Burton-Chellew and the
author are joint first authors.
6
and option-implied returns. The analysis reveals super-exponential growth
expectations leading up to the Global Financial Crisis.
This chapter is joint work with Matthias Leiss (Department of Humanities, So-
cial and Political Sciences, ETH Zürich) and Didier Sornette (Department of
Management, Technology and Economics, ETH Zürich & Swiss Finance Insti-
tute, University of Geneva). It is published in the Journal of Economic Dynam-
ics and Control (“Super-exponential growth expectations and the Global Fi-
nancial Crisis”, Journal of Economic Dynamics and Control 55: 1-13, 2015).
Conclusion
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
1 Introduction:
Estimating social preferences 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Preference estimation . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Interactive preferences . . . . . . . . . . . . . . . . . . . . . . . 26
2 Learning:
Directional learning and the provisioning of public goods 41
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Complex cooperation:
Agreements with multiple spheres of cooperation 138
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.2 A worked example . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.4 Coalitional stability and the core . . . . . . . . . . . . . . . . . 153
5.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 158
i
6 Dynamics of financial expectations:
Super-exponential growth expectations and crises 164
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . 169
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8 Conclusion 292
ii
Chapter 1
Introduction:
Estimating social preferences
1
Abstract
concern for other agents, this suggests a population bias toward pro-
In this paper, we argue that the consensus has settled on this expla-
are in line with the pursuit of pure material self-interest are found to
ingly when playing the two different games. Hence, learning and social
2
Acknowledgements. This research was supported by the ERC, Nuffield Col-
lege and the Calleva Research Centre, Magdalen College. We thank Dan Wood
for several important discussions and Stefano Balietti for the experimental de-
sign of the ‘meritocracy’ project (http://nodegame.org/games/merit/). We
are also grateful to Yoshi Saijo and members of the Research Center for So-
cial Design Engineering at Kochi University of Technology for helpful com-
ments.
3
Motivation.
“The reader will find that the public goods environment is a very
sensitive one. Many factors interact with each other in unknown
ways. Nothing is known for sure. [. . . ] There appear to be three
types of players: dedicated Nash players who act pretty much as
predicted by game theory with possibly a small number of mistakes,
a group of subjects who will respond to self-interest as will Nash
players if the incentives are high enough but who also make mistakes
and respond to decision costs, fairness, altruism, etc., and a group
of subjects who behave in an inexplicable (irrational?) manner.
Casual observation suggests that the proportions are 50 percent, 40
percent, 10 percent in many subject pools. Of course, we need a lot
more data before my outrageous conjectures can be tested.”
(Ledyard, 1995)
4
1.1 Introduction
Research from over three decades has gone into understanding how people be-
have in these situations (see an early review by Ledyard, 1995). Typically, the
design of such experiments has meant that the strictly dominant strategy is to
contribute nothing (at least in the one-shot game and/or in the final stage of
a repeated game), providing they conform to the predictions of models based
on rationality and material self-interest (homo oeconomicus). The evidence
from laboratory experiments, however, has been that many players contribute
substantially and consistently, and therefore their behaviour contradicts the
rational homo oeconomicus model. Subsequently, the consensus seems to have
settled on explanations of this phenomenon according to which many people
are not purely self-interested but prefer to consciously sacrifice own mate-
rial payoff to increase the welfare of others instead (see a more recent review
by Chaudhuri, 2011). An important question is whether this social preference
explanation captures the actual thought process of agents or is more of an ‘as
if’ kind?
5
Importantly, the social preference explanation, whilst rejecting one assumption
of the homo oeconomicus model, that of pure self-interest, relies upon another
assumption, that of perfect rationality. Rational choice theory assumes indi-
viduals to be fully rational and thus capable of expressing their preferences
perfectly through the consequences of their actions (Becker, 1976). Therefore,
a player who does not maximize his income in such games must have differ-
ent preferences. For example, he willingly pays a price in terms of his own
income in order to increase the income of others, which overall increases his
utility.
Before we turn to details regarding our approach, let us first turn to typical
6
results as recorded in public goods experiments, which can be summarized by
the following four regularities (see Ledyard, 1995; Chaudhuri, 2011; Anderson,
Goeree, and Holt, 1998; Laury and Holt, 2008). First, aggregate contribu-
tion levels are increasing with the rate of return, even if the Nash equilibrium
remains unchanged (at no-contribution). Second, contributions increase with
the population size. Third, average contributions lie between the Nash equi-
librium level and half the budget. Fourth, the population bias toward above-
equilibrium contributions is increasingly neutralized (or even reversed) if the
game is modified so that the Nash equilibrium becomes an interior (or high
contribution) solution. The ensemble of these regularities can actually not be
explained by most pro-social preference models, but Goeree and Holt, 2005
show their relation with quantal-response equilibrium (Palfrey and Prisbrey,
1997; Anderson, Goeree, and Holt, 1998); the logic being that off-equilibrium
decisions occur with probabilities that are decreasing in their costliness vis-à-
vis the reply that would maximize material self-interest.1
Despite the fact that social preference theory is yet to provide a fully conclu-
sive account of experimental regularities, it seems that pro-social preference
explanations have become predominant in the experimental economics litera-
ture, and whole areas of economics are now psychologically loaded with this
tendency.2
7
Second, we propose a novel way of discerning whether behavior is actually due
to (fully rational) other-regarding preferences or due to bounded rationality
by investigating the consistency within-subject. We compare how people play
the standard public goods game with play of variants thereof where individ-
ual and collective interests are aligned. We shall refer to the class of variants
as ‘profitable public goods games’. One way of creating a ‘profitable public
goods game’ is to group players by contributions so that contributing more is
rewarded by being matched with others doing likewise (Gunnthorsdottir et al.,
2010; Nax et al., 2014). Hence, players who still do not contribute in these
variants not only hurt others but may also not maximize their own payoffs.
Another way to align individual and group interests is to make the public good
sufficiently valuable in that the benefits of contributing outweigh the costs, not
just at the group level (when contributions are multiplied by 1 < r < s), but
also at the individual contributor level (when contributions are multiplied by
r > s). In such situations, a purely self-interested and rational player (homo
oeconomicus) will contribute fully, as will any pro-social and rational player.
Less-than-full contributions may either be due to an agent’s sub-rationality,
thus hurting himself and others, in which case he may learn to contribute more
with experience. Or the agent is rational but anti-social, in which case he may
consistently contribute less than fully.
We use data from two experiments.3 Half of the data from each experiment
corresponds to the standard setting whereby rational self-interest predicts non-
contributing behavior (‘standard public goods game’). The other half corre-
sponds to a ‘profitable public goods game’. The two experiments differ with
respect to the implementations of the two treatment types. In our analysis,
3
Experimental instructions for one, based on data first reported in Burton-Chellew
and West, 2013, can be found at http://www.pnas.org/content/suppl/2012/12/14/
1210960110.DCSupplemental/pnas.201210960SI.pdf; instructions for the other, first re-
ported in Nax et al., 2014, are at http://nodegame.org/games/merit/.
8
we first examine which preferences are being expressed in a given game under
the assumption of rational choice, allowing players to have varying degrees of
concerns for the payoffs of the other players. Similar efforts have previously
focused on standard public goods games with/without punishment (e.g. Fehr
and Gächter, 2000; Fehr and Gächter, 2002). The advantage of our study is
that we are not restricted to inferring only utility functions that are either self-
interested (and rational) or pro-social (and rational). By also using the data
from profitable public goods, we are furthermore able (i) to infer anti-social
(and rational) preferences, (ii) to examine within-player (in)consistencies be-
tween standard and profitable public goods, and (iii) to assess if the total
population is on average pro-social, anti-social, or neither/both. If our players
appear to have inconsistent social preferences, then it may be that our assump-
tion of full rationality is unjustified instead. We can therefore use our findings
to classify people into the three types as proposed by Ledyard, 1995; dedicated
Nash players (ca. 50%), social players (ca. 35%, of which pro-social ca. 20%/
anti-social ca. 15%), and inexplicable/irrational players (ca. 15%).
9
players. Compared to most existing studies (see Chaudhuri, 2011; Saijo, 2014)
ours is different with regards to the non-linearity of the utility function which
allows us to rationalize (in terms of degrees of pro/anti-sociality) intermediate
contribution decisions.4 This is important as ca. 30% of all contributions are
intermediate even in the final period of play. A linear utility function makes
‘bang-bang’ predictions of extreme decisions, and explains intermediate con-
tribution decisions only at the knife-edge of indifference (e.g. Levine, 1998;
Saijo, 2014). Previously, the Cobb-Douglas function has been used to gener-
ate the payoffs of the game (Andreoni, 1993; Chan et al., 2002; Cason, Saijo,
and Yamato, 2002) but not as a utility function to measure players’ altruistic
concerns regarding game payoffs as we do here.5 Both data sets used in this
study contain one ‘dilemma treatment’ (corresponding to a ‘standard public
good’) and one ‘provision treatment’ (corresponding to a ‘profitable public
good’). Burton-Chellew and West, 2013’s design follows Andreoni, 1988, and
complements their design with a high-rate-of-return game for the ‘provision
treatment’ as pioneered in Saijo and Nakamura, 1995 (see also Cason, Saijo,
and Yamato, 2002). Nax et al., 2014 also follow Andreoni, 1988 in the ‘dilemma
treatment’ (with a different rate of return), but adopt Gunnthorsdottir et al.,
2010’s mechanism in the ‘provision treatment’.6
10
geneously pro-social players. The distributions implied by the two datasets
are similar and consistent with many previous studies. Diametrically op-
posed results come out of the provision treatments. In the provision treat-
ment by Burton-Chellew and West, 2013, the social preference story is liter-
ally turned upside-down by failure to play Nash equilibrium, suggesting that
there exists a fraction of homo oeconomicus and, in addition, an array of het-
erogeneously anti-social players. This is a result of incentive structure and
contribution decisions being mirror images of the dilemma treatments. The
provision treatment by Nax et al., 2014 reveals an even larger fraction of
players indistinguishable from homo oeconomicus and, in addition, behaviors
inexplicable by social preferences. Combining the two treatments and checking
for within-player consistencies, we find roughly half of the population to be
consistent with the homo oeconomicus model, one third consistent with social
preferences (characterized by either pro-sociality or anti-sociality), and roughly
15% of players to be inexplicable in terms of (social) preferences. Our findings
show that social preference estimations may be highly sensitive to equilibrium-
relevant parameter changes even within the same class of games played by the
same population, to the extent that implied signs of pro/anti-sociality may
actually be reversed. The current status of social preference theory is far from
being predictive as regards such phenomena. In light of other factors that have
previously been shown to affect social preference distributions such as framing,
stakes, beliefs, learning, or preference conditionality, it appears unlikely that
this will change soon outside very specific classes of games, where the reason
for predictability may be due to data patterns being reliably similar rather
than pro-sociality actually accurately explaining the nature of man. This may
imply that the proclaimed generality and solidity of findings regarding pro-
social preferences (e.g. Fehr and Camerer, 2007; Bowles and Gintis, 2011) was
exaggerated, perhaps even grossly.
11
Apart from our aforementioned companion papers on learning (Burton-Chellew
and West, 2013; Nax et al., 2013; Burton-Chellew, Nax, and West, 2015; Nax
et al., 2014), the paper that is closest to ours is Saijo and Nakamura, 1995
(see also Saijo and Yamato, 1999; Cason, Saijo, and Yamato, 2002; Brandts,
Saijo, and Schram, 2004; Saijo, 2008). Similarly to Burton-Chellew and West,
2013, Saijo and Nakamura, 1995 have an experimental design with one high-
rate-of-return game and one low-rate-of-return game, leading to off-equilibrium
behavior that is ‘kind’ in one (low rate) and ‘spiteful’ (high rate) in the other.7
The part of our analysis based on Burton-Chellew and West, 2013’s data can
be seen as a formal extension of their study beyond the two-by-two case which
allows us to infer degrees of kindness/spite using a nonlinear utility repre-
sentation. Saijo and Nakamura, 1995 classify people without formal utility
assumptions. In addition, we compare our estimates with the data from alter-
native matching mechanisms (the Nax et al., 2014 data). Also related to our
study is Levine, 1998 who estimates spite and kindness in a linear (negative or
positive) altruism framework. Most of Levine’s analysis is based on the ulti-
matum game, but he also considers the low-rate-of-return contributions game
data of Isaac and Walker, 1988, where he finds estimates consistent with our
provision treatments. Again, the key difference with respect to the parameter
estimation as compared to his model is our choice of a non-linear utility func-
tion, hence our model does not have a ‘bang-bang’ solution, that is, we do not
explain intermediate contributions by randomization at indifference, but by in-
termediate levels of kindness/spite. Moreover, we complement Levine, 1998’s
analysis (which across games yields a distribution of spite and kindess that is
qualitatively somewhat similar to ours) by a balanced within-subject rather
than between-subject design in the context of public goods games. Finally, we
investigate the presence of contradictory social motives across games.
7
An important difference is that Saijo and Nakamura, 1995’s experimental design is
between-subject, not within-subject.
12
In the remainder of this note, we shall dive straight into the analysis next, and
come back to the existing literature in our concluding discussion.
Xr
φi (c) = (1 − ci ) + cj ,
j∈S
s
where S is the group (of fixed size s) of which i is a member. Write φ for a
r
resulting payoff vector. We shall call s
the game’s marginal per-capita rate of
return (mpcr).
Information. All players get full instructions about the game (see Burton-
Chellew and West, 2013; Nax et al., 2014 for details).8 After the experiment,
total earnings are paid out according to a known exchange rate. Each round,
players get information about other players’ contributions (about the players
in one’s own group only in Burton-Chellew and West, 2013, about all players
in Nax et al., 2014).
Group re-matching. Each round, players are re-matched into groups of a fixed
size of four players in all cases. In all but the provision treatment of Nax et al.,
8
For details, please see the Supporting Information (201210960SI) for Burton-Chellew
and West, 2013, and http://nodegame.org/games/merit/ for Nax et al., 2014.
13
2014, this occurs randomly as in Andreoni, 1988: that is, group formation is in-
dependent of contribution decisions. In Nax et al., 2014’s provision treatment,
players are matched according to the group-based mechanism (Gunnthorsdot-
tir et al., 2010), that is, groups form in order of their contributions; the highest
contributors form group one, etc. (with random tie-breaking).
ui (c) = φ1−α
i
i
∗ φα−ii ,
where φα−i is the average payoff to players j 6= i about whom i learns in the
relevant treatment.
Concern for others. αi measures player i’s concern for others. He has no con-
cern for others when αi = 0 (homo oeconomicus). He has anti-social concern
for others when αi < 0 in which case he is willing to sacrifice own payoff if oth-
ers’ suffer even greater payoff losses (competitive). He has pro-social concern
for others when 0 < αi ≤ 10 (other-regarding): when 0 < αi < 0.5 he cares
14
more for himself (moderate altruism), when αi = 0.5 he cares equally (impar-
tial altruism), when 0.5 < αi < 1 he cares more for others (strong altruism),
when αi = 1 he cares only for others (pure altruism). When αi > 1, he is
willing to sacrifice others’ payoff only when he loses even more (such behav-
ior is somewhat anti-competitive or simply irrational ). These latter irrational
motivations (when αi > 1) are so strange that we shall consider them as evi-
dence for (weak) inconsistency. Note that such an agent would prefer burning
own and others’ payoffs as long as he makes himself worse-off than others as a
result. Figure 1.1 illustrates the types of concerns for others.
15
ate contributions and typically associates them with a unique level of concern
for others. A linear function in the dilemma treatment, for example, by con-
trast would imply ‘bang-bang’ behavior, i.e. an insufficiently pro-social agent
would always free-ride, while any agent with pro-sociality above some threshold
would contribute fully. Intermediate contributions could only be rationalized
by randomization at the threshold (see Levine, 1998).
In our main analysis, we focus on the last period of each experiment since
earlier decisions may be rationalizable in other ways, for which alternative
preference representation such as Fehr-Schmidt, Bolton-Ockenfels or Charness-
Rabin preferences may be preferable. (We will also compare with early-game
evidence later on.) Focus on the final period also has the advantage that
conditional cooperation and other phenomena are likely to be ‘over’ in the
sense that dynamics of such determinants are likely to have settled and/or led
to equilibrium. We estimate, for each individual separately, the concern for
others as implied by his action taking as given the others’ penultimate-period
average contributions. If his action coincides with what homo oeconomicus
would do, we set αi = 0. Otherwise, we assume an interior solution and obtain
∂ui (c)
an expression for αi using the first-order condition ∂ci
= 0, where c−i is
taken to be ct−1
−i . Solving this toward αi for random re-matching gives
16
others that are too tiny to actually ever matter are not estimated. In other
words, we do not distinguish between an agent with concerns for others (either
pro-social or anti-social ones) that are too small to ever matter and an actual
homo oeconomicus.
Example 1. Take the dilemma treatment by Nax et al., 2014. Suppose every
other player, from the viewpoint of player i, made a contribution of zero in the
penultimate period (t = 39). If player i decides to contribute half his budget in
the last period then this implies a concern for others of αi = 0.625, i.e. he cares
relatively more about others than about himself (is pro-social of type ‘strong
altruist’). If he contributes zero, then we assume he is homo oeconomicus and
set αi = 0.
Obtaining an expression for αi in the provision treatment for Nax et al., 2014
where re-matching follows the group-based mechanism (Gunnthorsdottir et
al., 2010) is more complicated because no strategy is a priori dominated (see
chapter 7 for more detail on equilibrium structure; see also Gunnthorsdottir
et al., 2010 and Nax, Murphy, and Helbing, 2014 for details of equilibrium
analysis). We illustrate at hand of an example how we obtain expressions for
αi .
Example 3. Take the provision treatment by Nax et al., 2014. Suppose every
player but player i made a full contribution in the penultimate period (t = 39).
If player i decides to free-ride in the last period this coincides with what homo
17
oeconomicus would do and we set αi = 0. If he decides to contribute say 10,
then we conclude that his concern for others is pro-social of order αi = 0.802
(is strongly altruistic).
Figure 1.2 shows the final-round contributions, Figure 1.3 the implied social
preferences.
Figure 1.4 combines Figures 1.2 and 1.3. Table 1.2 summarizes the moments
of Figure 1.4. The shaded areas in Figure 1.4 indicate the ranges of preferences
over which actions by homo oeconomicus and those by agents with preferences
from this range coincide.
18
Figure 1.4: Distributions combined (final period).
Of the 96 individuals in Nax et al., 2014, 67 (69.8%) are consistent with pure
homo oeconomicus preferences in both treatments. Another 20 (20.1%) are
consistent in the sense that homo oeconomicus in the provision treatment
and pro-social in the dilemma treatment. 7 (7.3%) individuals are weakly
inconsistent in that they are homo oeconomicus in the dilemma treatment
but act anti-competitively in the provision treatment (burning others’ and
19
their own payoffs). 2 (2.1%) individuals are strongly inconsistent in the sense
of pro-social in the dilemma treatment and anti-social in the contribution
treatment.
20
so in the initial period. Indeed, over 90% percent did (98% in the dilemma
treatment, 82% in the provision treatment), suggesting a high degree of internal
consistency over time.9
1.3 Discussion
21
ducted, and they have been interpreted as evidence for the fact that humans
not only care about their own material payoffs (pure self-interest) but also, in a
pro-social way, about those of others (e.g. Fehr and Schmidt, 1999; Bolton and
Ockenfels, 2000; Fischbacher and Gächter, 2010; see Murphy and Ackermann,
2014 for a recent review that includes psychology references). Quite obviously
this challenges the homo oeconomicus model very fundamentally. The impor-
tant question is whether, for the purpose of economics, these alternative models
of man are ultimately better and more useful for applications. For that, the
first testing ground should probably again be the economics laboratory.
22
2010; Bayer, Renner, and Sausgruber, 2013; Burton-Chellew and West, 2013;
Burton-Chellew, Nax, and West, 2015).
23
dissolves: contributing fully is both individually rational and beneficial to oth-
ers. Hence, not contributing fully becomes indicative of ‘spiteful’ motives in
the same way as positive contributions indicate pro-sociality in the standard
settings. We have estimated social preferences from both classes of games and
identified an almost-symmetrical distribution of anti-social, homo oeconomicus
and pro-social preferences with a slight skew towards pro-sociality. Moreover,
we reveal another category of players who are inconsistent with respect to
their pro/anti-social motivations, and thus inexplicable/irrational in terms of
preferences.
Apart from the studies discussed in the introduction (Saijo and Nakamura,
1995; Levine, 1998, etc.) and the ones whose data we use (Burton-Chellew
and West, 2013; Nax et al., 2014), we are aware of the following experiments
that are also related. The high-rate-of-return variation of the voluntary con-
tributions game is an approach previously taken in Kümmerli et al., 2010.
Group-based mechanisms similar to Gunnthorsdottir et al., 2010 (see also Gun-
nthorsdottir, Vragov, and Shen, 2010; Gunnthorsdottir and Thorsteinsson,
2014) are also proposed in Rabanal and Rabanal, 2014. The novelty of this
present paper is the exploitation of the balanced within-subject design, fea-
turing both ‘dilemma’ and ‘provision’ treatments to check for individual-level
(in)consistencies.
In terms of results, first and foremost, our analysis validates the use of hetero-
geneous agent models instead of representative agent models. Agents seem to
vary substantially in terms of their social concerns and/or rationality levels,
and this may have consequences for the (in)stability of equilibrium, which is
not a priori guaranteed under non-linear utilities (Saijo, 2014). Moreover, our
findings also cast doubt on whether the typical (median and mean) agents
in a population are really as pro-social as previous experiments suggested.
24
The image our work depicts of the population is rather one of an equilibrium-
dependent distribution of pro/anti-sociality around the homo oeconomicus me-
dian, with the overall mean agent not lying far off either. To extrapolate this
finding, we need to repeat similar analyses for other datasets and for other
classes of games. How social preferences change with equilibrium properties
of the game is an avenue left open for future research. Whilst we shall not
attempt to extrapolate our findings to other games, we would like to point
out that Bardsley, 2008, when considering generalizations of the dictator game
that allow a balanced view of pro/anti-sociality in that context, concludes that
altruism may well be an artefact of experimentation. In summary, there is ev-
idence suggesting the need for future work in this direction for many more
games, motivated by the fact that there are many contradictions that have
been revealed even in those games about which we thought we knew a lot. A
‘ban’ (Camerer, 2003) on games like the ultimatum game or the public goods
game, as has been recently proposed by Camerer because (supposedly) we
know what is going on and why, appears premature.
25
of our findings, homo oeconomicus would probably still be the one to choose
amongst all other types when forced to to pick only one as the representative
agent to make a prediction about a game drawn from a distribution over games
that is ex ante unbalanced. Better, however, would be to use models with
heterogeneous populations, consisting of learning types, homo oeconomicus,
and various socially motivated agents.
The above findings lead to the pursuit, jointly with Kurt Ackermann and Ryan
Murphy, of the question of whether preferences may be interactive, and not
unresponsive as standard game theory presumes. The resulting analysis is
reported in the below note.
Game theory presumes that agents have unique preference orderings over out-
comes that prescribe unique preference orderings over actions in response to
other players’ actions, independent of other players’ preferences. This indepen-
dence assumption is necessary to permit game-theoretic best response reason-
ing, but at odds with introspection, because preferences towards one another
often dynamically depend on each other. In this note, we propose a model of
interactive preferences. The model is validated with data from a laboratory
experiment. The main finding of our study is that pro-sociality diminishes
over the course of the interactions.
Introduction
Mother Teresa does not defect in prisoners’ dilemmas, because she cares for her
opponents in ways that transform the games’ mixed motives into other games
where her and common motives are aligned (e.g., harmony). Cooperation thus
26
emerges as a dominant strategy. The experimental economics literature is
concerned with ‘subjective expected utility corrections’ (Gigerenzer and Selten,
2001) that modify players’ utility representations to account for such other-
regarding concerns. Numerous corrections have been proposed (e.g., Rabin
1993; Levine 1998; Fehr and Schmidt 1999; Bolton and Ockenfels 2000 in
light of laboratory evidence that manifests systematic deviations from narrow
self-interest predictions (see Ledyard 1995 and Chaudhuri 2011 for reviews).10
This route of enquiry is bothersome for many theoretical game theorists who
question how these findings generalize beyond the laboratory.11
27
Methods
28
Our analysis focusses on 22 data points p.p., namely his 2 – initial and final
– SVOs, plus his 10 contributions and 10 guesses about others’ contributions
from the VCM, yielding a total of 2,816 data points.
The model
ui (c) = φ1−α
i
i
∗ φα−ii , (1.1)
where αi ∈ [0, 1] measures player i’s concern for others. The nonlinearity of
expression 1.1 distinguishes it from most representations, including Levine,
1998, thus rationalizing intermediate contributions in terms of intermediate
concerns. We obtain the following expression for αi by assuming ci is chosen
optimally given his guess about c−i (expressed as b
c−i ):
0.6φ−i (ci , b
c−i )
αi = (1.2)
0.4φi (ci , b
c−i ) + 0.6φ−i (ci , b
c−i )
29
librium, αi = α b−i is i’s belief about α−i .15 The resulting set of
b−i , where α
equilibria, the general structure of which is under investigation in an ongoing
study, contains the standard case (when αi = α−i = 0) but also new ones when
αi = α−i > 0 as in fairness equilibria (Rabin, 1993).
t−1 t−1
where α
e−i is i’s deduction of α−i from previous-period evidence, and βit ∈
[0, 1] measures i’s period-t degree of belief responsiveness.
Estimation strategy
‘Responsive’ and ‘unresponsive’ types are classified based on the VCM data.
Individual i is said to be responsive (unresponsive) if the estimation of ex-
pression 1.3 in light of his VCM decisions from rounds 2-10 yields an average
coefficient for βit which is positive (not positive).
Prediction. We use our estimated 2×2 typology (from initial SVO and VCM)
to make predictions regarding final SVO classifications, which we shall assess
in light of the recorded final SVOs. We shall use the following terminology:
an individual is associated with a VCM group matching that is said to be
15
A weaker assumption in the same spirit would be to weigh this dependence by some
parameter as in Levine, 1998, something we shall introduce via ‘responsiveness’ instead.
16
See Murphy and Ackermann 2013 for a more fine-grained categorization.
30
‘individualistic’ (‘pro-social’) if those players he is matched with, on average,
contribute less (more) than himself.
Table 1.3: Regressions 1 and 2 (standard errors adjusted for 128 individual
clusters)
Regression 1 Regression 2
‘Contribution’ (VCM, t=1) ‘Responsiveness’ (VCM, t=1-10)
∗
Initial pro-sociality 3.54 (1.19) αt−1 −0.35∗ (0.04)
t−1
Constant 10.76∗ (2.72) α
e−i 0.44∗ (0.15)
Controls not listed Controls not listed
N 128 N 1,152
2 2
R 0.13 R 0.20
∗
: significance level < 0.01
Results
31
levels (omitted) for the VCM, we find 71% responsives (34% pro-socials, 37%
individualists); 14% (20%) are responsive pro-socials (individualists) matched
by chance in individualistic (pro-social) groups.
Predictions compare with the data as follows. Final SVOs categorize 64%
individualists and 36% pro-socials (62% and 38% predicted). 47% are individ-
ualistic in initial and final SVOs, which means that 6% individualists turned
pro-socials (5% predicted). 30% were pro-social in both, hence 17% pro-socials
turned individualists (14% predicted). The model made two types of errors.
First, 7% changed preferences whom we classified unresponsive. Second, we
predicted 1% (3%) too few individualists turning pro-socials (vice versa), thus
incorrectly predicting flow of 3% responsives. Overall, our model was there-
fore accurate in predicting global preferences (95%), less in individualizing flow
(90%).
Conclusion
32
procity (Alexander, 1987; Fischbacher and Gächter, 2010) as driven by natural
dynamics governing the interactions of preferences. Since stakes and intentions
of players certainly matter more outside the laboratory, such phenomena are
likely not to be artifacts. Preference dynamics should therefore be studied
further, as the long-run predictions of models without preference interactions
are potentially misguided.
References
33
Bardsley, Nicholas (2008). “Dictator game giving: altruism or artefact?” In:
Experimental Economics 11.2, pp. 122–133.
Bayer, Ralph-C, Elke Renner, and Rupert Sausgruber (2013). “Confusion and
learning in the voluntary contributions game”. In: Experimental Economics
16.4, pp. 478–496.
Becker, Gary S (1976). The economic approach to human behavior. University
of Chicago press.
Binmore, Ken and Avner Shaked (2010a). “Experimental Economics: Where
Next?” In: Journal of Economic Behavior & Organization 73.1, pp. 87–100.
– (2010b). “Experimental Economics: Where Next? Rejoinder”. In: Journal of
Economic Behavior & Organization 73.1, pp. 120–121.
Bolton, GE and A Ockenfels (2000). “ERC: A theory of equity, reciprocity,
and competition”. In: AER 90, pp. 166–193.
Bowles, Samuel and Herbert Gintis (2011). A cooperative species: Human reci-
procity and its evolution. Princeton University Press.
Brandts, J, T Saijo, and A Schram (2004). “How universal is behavior? A
four country comparison of spite and cooperation in voluntary contribution
mechanisms”. In: Public Choice 119.3-4, pp. 381–424.
Burton-Chellew, MN, HH Nax, and SA West (2015). “Payoff-based learning
explains the decline in cooperation in public goods games”. In: Proc. Roy.
Soc. B 282.1801, p. 20142678.
Burton-Chellew, MN and SA West (2013). “Prosocial preferences do not ex-
plain human cooperation in public-goods games”. In: PNAS 110.1, pp. 216–
221.
Camerer, Colin F (2003). Behavioral game theory: Experiments in strategic
interaction. Princeton University Press.
34
Cason, TN, T Saijo, and T Yamato (2002). “Voluntary participation and spite
in public good provision experiments: An international comparison”. In: EE
5.2, pp. 133–153.
Chan, Kenneth S et al. (2002). “Crowding-out voluntary contributions to pub-
lic goods”. In: Journal of Economic Behavior & Organization 48.3, pp. 305–
317.
Charness, G and M Rabin (2002). “Understanding social preferences with sim-
ple tests”. In: QJE 117, pp. 817–869.
Chaudhuri, A (2011). “Sustaining cooperation in laboratory public goods ex-
periments: A selective survey of the literature”. In: EE 14.1, pp. 47–83.
Cobb, Charles W and Paul H Douglas (1928). “A theory of production”. In:
The American Economic Review 18, pp. 139–165.
Eckel, Catherine and Herbert Gintis (2010). “Blaming the messenger: Notes
on the current state of experimental economics”. In: Journal of Economic
Behavior & Organization 73.1, pp. 109–119.
Ellsberg, Daniel (1961). “Risk, ambiguity, and the Savage axioms”. In: Quar-
terly Journal of Economics 75, pp. 643–669.
Falkinger, Josef et al. (2000). “A simple mechanism for the efficient provision
of public goods: Experimental evidence”. In: American Economic Review
90, pp. 247–264.
Fehr, E and KM Schmidt (1999). “A theory of fairness, competition, and co-
operation”. In: QJE 114, pp. 817–868.
– (2010). “On inequity aversion: A reply to Binmore and Shaked”. In: JEBO
73.1, pp. 101–108.
Fehr, Ernst and Colin F Camerer (2007). “Social neuroeconomics: the neu-
ral circuitry of social preferences”. In: Trends in Cognitive Sciences 11.10,
pp. 419–427.
35
Fehr, Ernst and Simon Gächter (2000). “Cooperation and punishment in public
goods experiments”. In: American Economic Review 90.4, pp. 980–994.
Fehr, Ernst and Simon Gächter (2002). “Altruistic punishment in humans”.
In: Nature 415.6868, pp. 137–140.
Ferraro, Paul J and Christian A Vossler (2010). “The source and significance
of confusion in public goods experiments”. In: The BE Journal of Economic
Analysis & Policy 10.1, p. 53.
Fischbacher, U and S Gächter (2010). “Social preferences, beliefs, and the dy-
namics of free riding in public goods experiments”. In: AER 100.1, pp. 541–
556.
Gigerenzer, G and R Selten (2001). Bounded Rationality: The Adaptive Tool-
box. MIT Press.
Goeree, Jacob K and Charles A Holt (2005). “An explanation of anomalous
behavior in models of political participation”. In: American Political Science
Review 99.2, pp. 201–213.
Goeree, Jacob K, Charles A Holt, and Susan K Laury (2002). “Private costs
and public benefits: unraveling the effects of altruism and noisy behavior”.
In: Journal of Public Economics 83.2, pp. 255–276.
Griesinger, Donald W and James W Livingston (1973a). “Toward a model
of interpersonal motivation in experimental games”. In: Behavioral Science
18.3, pp. 173–188.
Griesinger, DW and JW Livingston (1973b). “Toward a model of interpersonal
motivation in experimental games”. In: Behavioral Science 18.3, pp. 173–
188.
Gunnthorsdottir, Anna, Daniel Houser, and Kevin McCabe (2007). “Disposi-
tion, history and contributions in public goods experiments”. In: Journal of
Economic Behavior & Organization 62.2, pp. 304–315.
36
Gunnthorsdottir, Anna and Palmar Thorsteinsson (2014). “Tacit coordination
and equilibrium selection in a merit-based grouping mechanism: A cross-
cultural validation study”. In: mimeo.
Gunnthorsdottir, Anna, Roumen Vragov, and Jianfei Shen (2010). “Tacit co-
ordination in contribution-based grouping with two endowment levels”. In:
Research in Experimental Economics 13, pp. 13–75.
Gunnthorsdottir, Anna et al. (2010). “Near-efficient equilibria in contribution-
based competitive grouping”. In: Journal of Public Economics 94.11, pp. 987–
994.
Houser, Daniel and Robert Kurzban (2002). “Revisiting kindness and con-
fusion in public goods experiments”. In: American Economic Review 92,
pp. 1062–1069.
Isaac, R Mark, James Walker, and S Thomas (1991). “On the suboptimality
of voluntary public goods provision: Further experimental evidence”. In:
Research in Experimental Economics 4, pp. 211–221.
Isaac, R Mark and James M Walker (1998). “Nash as an organizing princi-
ple in the voluntary provision of public goods: Experimental evidence”. In:
Experimental Economics 1.3, pp. 191–206.
Isaac, RM, KF McCue, and CR Plott (1985). “Public goods provision in an
experimental environment”. In: J. Pub. Econ. 26.1, pp. 51–74.
Isaac, RM and JM Walker (1988). “Group size effects in public goods pro-
vision: The voluntary contributions mechanism”. In: Quarterly Journal of
Economics 103, pp. 179–199.
Kahneman, A, JL Knetsch, and RH Thaler (1986). “Fairness and the assump-
tions of economics”. In: Journal of Business 59.4, pp. 285–300.
Kahneman, D and A Tversky (1979). “Prospect theory: An analysis of decision
under risk”. In: Econometrica 47, pp. 263–291.
37
Keser, Claudia (1996). “Voluntary contributions to a public good when partial
contribution is a dominant strategy”. In: Economics Letters 50.3, pp. 359–
366.
Kreps, David M et al. (1982). “Rational cooperation in the finitely repeated
prisoners’ dilemma”. In: Journal of Economic Theory 27.2, pp. 245–252.
Kümmerli, R et al. (2010). “Resistance to extreme strategies, rather than
prosocial preferences, can explain human cooperation in public goods games”.
In: PNAS 107.22, pp. 10125–10130.
Laury, Susan K and Charles A Holt (2008). “Voluntary provision of public
goods: experimental results with interior Nash equilibria”. In: Handbook of
Experimental Economics Results 1, pp. 792–801.
Laury, Susan K, James M Walker, and Arlington W Williams (1999). “The vol-
untary provision of a pure public good with diminishing marginal returns”.
In: Public Choice 99.1-2, pp. 139–160.
Ledyard, JO (1995). “Public goods: A survey of experimental research”. In:
The Handbook of Experimental Economics. Ed. by JH Kagel and AE Roth.
Princeton University Press, pp. 111–194.
Levine, DK (1998). “Modeling altruism and spitefulness in experiments”. In:
Review of Economic Dynamics 1.3, pp. 593–622.
Murphy, RO and KA Ackermann (2013). “Explaining Behavior in Public Goods
Games: How Preferences and Beliefs Affect Contribution Levels”. In: SSRN
2244895.
– (2014). “Explaining behavior in public goods games: How preferences and
beliefs affect contribution levels”. In: mimeo.
Murphy, RO, KA Ackermann, and MJJ Handgraaf (2011). “Measuring social
value orientation”. In: Journal of Judgment and Decision Making 6, pp. 771–
781.
38
Nax, Heinrich H, Ryan O Murphy, and Dirk Helbing (2014). “Stability and wel-
fare of meritocratic matching in voluntary contribution games”. In: mimeo.
Nax, Heinrich H and Matjaž Perc (2015). “Directional learning and the provi-
sioning of public goods”. In: Scientific reports 5, p. 8010.
Nax, Heinrich H et al. (2013). “Learning in a black box”. In: Department of
Economics WP, University of Oxford 653.
Nax, Heinrich H et al. (2014). “How assortative re-matching regimes affect
cooperation levels in public goods interactions”. In: mimeo.
Neumann, J von and O Morgenstern (1944). Theory of Games and Economic
Behaviour. Princeton University Press.
Palfrey, Thomas R and Jeffrey E Prisbrey (1997). “Anomalous behavior in
public goods experiments: How much and why?” In: American Economic
Review 87, pp. 829–846.
Palfrey, TR and JE Prisbrey (1996). “Altruism, reputation and noise in linear
public goods experiments”. In: Journal of Public Economics 61.3, pp. 409–
427.
Rabanal, Jean Paul and Olga A Rabanal (2014). “Efficient investment via
assortative matching: A laboratory experiment”. In: mimeo.
Rabin, M (1993). “Incorporating fairness into game theory and economics”.
In: AER 83.5, pp. 1281–1302.
Ramsey, Frank Plumpton (1931). “Foundations: Essays in Philosophy, Logic,
Mathematics and Economics”. In: Humanities Press.
Saijo, T (2008). “Spiteful behavior in voluntary contribution mechanism ex-
periments”. In: Handbook of Experimental Economics Results. Ed. by CR
Plott and VL Smith. Vol. 1, pp. 802–816.
Saijo, T and H Nakamura (1995). “The “spite” dilemma in voluntary con-
tribution mechanism experiments”. In: Journal of Conflict Resolution 39.3,
pp. 535–560.
39
Saijo, T and T Yamato (1999). “A voluntary participation game with a non-
excludable public good”. In: JET 84.2, pp. 227–242.
Saijo, Tatsuyoshi (2014). “The instability of the voluntary contribution mech-
anism”. In: mimeo.
Savage, Leonard J (1954). “The Foundations of Statistics”. In: Wiley.
Sefton, Martin and Richard Steinberg (1996). “Reward structures in public
good experiments”. In: Journal of Public Economics 61.2, pp. 263–287.
Smith, Vernon L and James M Walker (1993). “Monetary rewards and decision
cost in experimental economics”. In: Economic Inquiry 31.2, pp. 245–261.
Van Dijk, Frans, Joep Sonnemans, and Frans van Winden (2002). “Social ties
in a public good experiment”. In: Journal of Public Economics 85.2, pp. 275–
299.
Von Neumann, John and Oskar Morgenstern (1944). Theory of Games and
Economic Behavior. Princeton University Press.
Walker, James M, Roy Gardner, and Elinor Ostrom (1990). “Rent dissipa-
tion in a limited-access common-pool resource: Experimental evidence”. In:
Journal of Environmental Economics and Management 19.3, pp. 203–211.
West, Stuart A, Claire El Mouden, and Andy Gardner (2011). “Sixteen com-
mon misconceptions about the evolution of cooperation in humans”. In:
Evolution and Human Behavior 32.4, pp. 231–262.
Wicherts, Jelte M et al. (2006). “The poor availability of psychological research
data for reanalysis.” In: American Psychologist 61.7, p. 726.
40
Chapter 2
Learning:
Directional learning and the
provisioning of public goods
41
Abstract
about the game and about the other players in the population. The
that, together with the parameters of the learning model, the maxi-
directional (mis)learning.
42
Acknowledgements. This research was supported by the European Com-
mission through the ERC Advanced Investigator Grant ‘Momentum’ (Grant
324247), by the Slovenian Research Agency (Grant P5-0027), and by the
Deanship of Scientific Research, King Abdulaziz University (Grant 76-130-
35-HiCi).
43
2.1 Introduction
Cooperation in sizable groups has been identified as one of the pillars of our
remarkable evolutionary success. While between-group conflicts and the ne-
cessity for alloparental care are often cited as the likely sources of the other-
regarding abilities of the genus Homo (Bowles and Gintis, 2011; Hrdy, 2011), it
is still debated what made us the “supercooperators” that we are today (Nowak
and Highfield, 2011; Rand and Nowak, 2013). Research in the realm of evo-
lutionary game theory (Maynard Smith, 1982; Weibull, 1995; Hofbauer and
Sigmund, 1998; Mesterton-Gibbons, 2001; Nowak, 2006a; Myatt and Wallace,
2008) has identified a number of different mechanisms by means of which coop-
eration might be promoted (Mesterton-Gibbons and Dugatkin, 1992; Nowak,
2006b), ranging from different types of reciprocity and group selection to pos-
itive interactions (Rand et al., 2009), risk of collective failure (Santos and
Pacheco, 2011), and static network structure (Santos, Santos, and Pacheco,
2008; Rand et al., 2013).
The public goods game (Isaac, McCue, and Plott, 1985), in particular, is estab-
lished as an archetypical context that succinctly captures the social dilemma
that may result from a conflict between group interest and individual interests
(Ledyard, 1997; Chaudhuri, 2011). In its simplest form, the game requires that
players decide whether to contribute to a common pool or not. Regardless of
the chosen strategy by the player himself, he receives an equal share of the pub-
lic good which results from total contributions being multiplied by a fixed rate
of return. For typical rates of return it is the case that, while the individual
temptation is to free-ride on the contributions of the other players, it is in the
interest of the collective for everyone to contribute. Without additional mech-
anisms such as punishment (Fehr and Gächter, 2000), contribution decisions
in such situations (Ledyard, 1997; Chaudhuri, 2011) approach the free-riding
44
Nash equilibrium (Nash, 1950) over time and thus lead to a “tragedy of the
commons” (Hardin, 1968). Nevertheless, there is rich experimental evidence
that the contributions are sensitive to the rate of return (Fischbacher, Gächter,
and Fehr, 2001) and positive interactions (Rand et al., 2009), and there is evi-
dence in favor of the fact that social preferences and beliefs about other players’
decisions are at the heart of individual decisions in public goods environments
(Fischbacher and Gächter, 2010).
Suppose each player knows neither who the other players are, nor what they
earn, nor how many there are, nor what they do, nor what they did, nor what
the rate of return of the underlying public goods game is. Players do not even
know whether the underlying rate of return stays constant over time (even
though in reality it does), because their own payoffs are changing due to the
strategy adjustments of other players, about which they have no information.
Without any such knowledge, players are unable to determine ex ante whether
contributing or not contributing is the better strategy in any given period, i.e.,
players have no strategically relevant information about how to respond best.
As a result, the behavior of players has to be completely uncoupled (Foster
45
and Young, 2006; Young, 2009), and their strategy adjustment dynamics are
likely to follow a form of reinforcement (Roth and Erev, 1995; Erev and Roth,
1998) feedback or, as we shall call it, directional learning (Selten and Stoecker,
1986; Selten and Buchta, 1994). We note that, in our model, due to the one-
dimensionality of the strategy space, reinforcement and directional learning
are both adequate terminologies for our learning model. Since reinforcement
applies also to general strategy spaces and is therefore more general we will
prefer the terminology of directional learning. Indeed, such directional learning
behavior has been observed in recent public goods experiments (Bayer, Renner,
and Sausgruber, 2013; Young et al., 2013). The important question is how
well will the population learn to play the public goods game despite the lack
of strategically relevant information. Note that well here has two meanings
due to the conflict between private and collective interests: on the one hand,
how close will the population get to playing the Nash equilibrium, and, on the
other hand, how close will the population get to playing the socially desirable
outcome.
46
we assume that players make these adjustments at a fixed incremental step
size δ, even though this could easily be generalized. In essence, each player ad-
justs its mixed strategy directionally depending on a Markovian performance
assessment of whether a previous-round contribution increase/decrease led to
a higher/lower payoff.
Since the mixed strategy weights represent a well-ordered strategy set, the
resulting model is related to the directional learning/ aspiration adjustment
models (Sauermann and Selten, 1962; Selten and Stoecker, 1986; Selten and
Buchta, 1994), and similar models have previously been proposed for bid ad-
justments in assignment games (Nax, Pradelski, and Young, 2013), as well as in
two-player games (Laslier and Walliser, 2014). In Nax, Pradelski, and Young,
2013 the dynamic leads to stable cooperative outcomes that maximize total
payoffs, while Nash equilibria are reached in Laslier and Walliser, 2014. The
crucial difference between these previous studies and our present study is that
our model involves more than two players in a voluntary contributions setting,
and, as a result, that there can be interdependent directional adjustments of
groups of players including more than one but not all the players. This can
lead to uncoordinated (mis)learning of subpopulations in the game.
Consider the following example. Suppose all players in a large standard public
goods game do not contribute to start with. Then suppose that a group of
players in a subpopulation uncoordinatedly but by chance simultaneously all
decide to contribute. If this group is sufficiently large (the size of which de-
pends on the rate of return), then this will result in higher payoffs for all players
including those in the ‘contributors group’, despite the fact that not contribut-
ing is the dominant strategy in terms of unilateral replies. In our model, if
indeed this generates higher payoffs for all players including the freshly-turned
contributors, then the freshly-turned contributors would continue to increase
47
their probability to contribute and thus increase the probability to trigger a
form of stampede or herding effect, which may thus lead away from the Nash
equilibrium and towards a socially more beneficial outcome.
In what follows, we present the results, where we first set up the model and
then deliver our main conclusions. We discuss the implications of our results
in section 3. Further details about the applied methodology are provided in
the Methods section.
2.2 Results
48
common pool. Given a fixed rate of return r > 0, the resulting payoff of
P
player i is then ui = (1 − ci ) + (r/n) ∗ j∈N cj . We shall call r/n the game’s
marginal per-capita rate of return and denote it as R. Note that for simplicity,
but without loss of generality, we have assumed that the group is the whole
population. In the absence of restrictions on the interaction range of players
(Perc et al., 2013), i.e., in well-mixed populations, the size of the groups and
their formation can be shown to be of no relevance in our case, as long as R
rather than r is considered as the effective rate of return.
upward: if ui (cti ) > ui (cit−1 ) and cti > ct−1 t t−1 t t−1
i , or if ui (ci ) < ui (ci ) and ci < ci ,
then pt+1
i = pti + δ if pti < 1; otherwise, pt+1
i = pti .
Note that the second, neutral rule above allows random deviations from any
intermediate probability 0 < pi < 1. However, pi = 0 and pi = 1 for all i
are absorbing state candidates. We therefore introduce perturbations to this
directional learning dynamics and study the resulting stationary states. In
particular, we consider perturbations of order such that, with probability
1 − , the dynamics is governed by the original three “directional bias” rules.
49
However, with probability , either pt+1
i = pti , pt+1
i = pti − δ or pt+1
i = pti + δ
happens equally likely (with probability /3) but of course obeying the pt+1
i ∈
[0, 1] restriction.
The maximal k-strengths of the equilibria that still exist in our public goods
game as a function of r are depicted in Fig. 2.1 for n = 16. The cyan-shaded
region indicates the “public bad game” region for r < 1 (R < 1/n), where
the individual and the public motives in terms of the Nash equilibrium of the
game are aligned towards defection. Here ci = 0 for all i is the unique Aumann-
strong equilibrium, or in terms of the definition of the k−strong equilibrium,
ci = 0 for all i is k−strong for all k ∈ [1, n]. The magenta-shaded region
indicates the typical public goods game for 1 < r < n (1/n < R < 1), where
individual and public motives are conflicting. Here there exists no Aumann-
strong equilibria. The outcome ci = 0 for all i is the unique Nash equilibrium,
and that outcome is also k-strong equilibrium for some k ∈ [1, n), where the
50
size of k depends on r and n in that ∂k/∂r ≤ 0 while ∂k/∂n ≥ 0. Finally,
the gray-shaded region indicates the unconflicted public goods game for r > n
(R > 1), where individual and public motives are again aligned, but this
time towards cooperation. Here ci = 1 for all i abruptly becomes the unique
Nash and Aumann-strong equilibrium, or equivalently the unique k−strong
equilibrium for all k ∈ [1, n].
We begin the proof by noting that the perturbed process given by our dynam-
ics results in an irreducible and aperiodic Markov chain, which has a unique
stationary distribution. When = 0, any absorbing state must have pti = 0 or
1 for all players. This is clear from the positive probability paths to either ex-
treme from intermediate states given by the unperturbed dynamics. We shall
now analyze whether pti = 0 or 1, given that ptj = 0 or 1 for all j 6= i, has a
larger attraction given the model’s underlying parameters.
If R ≥ 1, the probability path for any player to move from pti = 0 to pt+T
i =1
in some T = 1/δ steps requires a single perturbation for that player and is
therefore of the order of a single . By contrast, the probability for any player
to move from pti = 1 to pit+T = 0 in T steps is of the order 3 , because at least
two other players must increase their contribution in order for that player to
51
experience a payoff increase from his non-contribution. Along any other path
or if pt is such that there are not two players j with ptj = 0 to make this
move, then the probability for i to move from pti = 1 to pt+T
i = 0 in T steps
requires even more perturbations and is of higher order. Notice that, for any
one player to move from pti = 0 to pt+T
i = 1 we need at least two players to
move away from pti = 0 along the least-resistance paths. Because contributing
1 is a best reply for all R ≥ 1, those two players will also continue to increase
if continuing to contribute 1. Notice that the length of the path is T = 1/δ
steps, and that the path requires no perturbations along the way, which is less
likely the smaller δ.
If R < 1, the probability for any player to move from pti = 1 to pt+T
i = 0
in some T = 1/δ steps requires a single perturbation for that player and is
therefore of the order of a single . By contrast, the probability for any player
to move from pti = 0 to pit+T = 1 in some T steps is at least of the order k ,
because at least k players (corresponding to the maximal k-strength of the
equilibrium) must contribute in order for all of these players to experience a
payoff increase. Notice that k decreases in R. Again, the length of the path is
T = 1/δ steps, and that path requires no perturbations along the way, which
is less likely the smaller δ. With this, we conclude the proof of the proposition.
However, it is also worth noting a direct corollary of the proposition; namely,
as → 0, E[pt ] → 1 if R ≥ 1, and E[pt ] → 0 if R < 1.
Lastly, we simulate the perturbed public goods game with directional learning
and determine the actual average contribution levels in the stationary state.
Color encoded results in dependence on the normalized rate of return R and the
responsiveness of players to the success of their past actions δ (alternatively,
the sensitivity of the individual learning process) are presented in Fig. 2.2
for = 0.1. Small values of δ lead to a close convergence to the respective
52
Nash equilibrium of the game, regardless of the value of R. As the value of δ
increases, the pure Nash equilibria erode and give way to a mixed outcome. It
is important to emphasize that this is in agreement, or rather, this is in fact
a consequence of the low k−strengths of the non-contribution pure equilibria
(see Fig 2.1). Within intermediate to large δ values the Nash equilibria are
implemented in a zonal rather than pinpoint way. When the Nash equilibrium
is such that all players contribute (R > 1), then small values of δ lead to more
efficient aggregate play (recall any such equilibrium is n−strong). Conversely,
by the same logic, when the Nash equilibrium is characterized by universal free-
riding, then larger values of δ lead to more efficient aggregate play. Moreover,
the precision of implementation also depends on the rate of return in the
sense that uncoordinated deviations of groups of players lead to more efficient
outcomes the higher the rate of return. In other words, the free-riding problem
is mitigated if group deviations lead to higher payoffs for every member of an
uncoordinated deviation group, the minimum size of which (that in turn is
related to the maximal k−strength of equilibrium) is decreasing with the rate
of return.
53
in Fig. 2.3. We would like to note that by “qualitative invariance” it is meant
that, regardless of the value of > 0, the population always diverges away
from the Nash equilibrium towards a stable mixed stationary state. But as
can be observed in Fig. 2.3, the average contribution level and its variance both
increase slightly as increases. This is reasonable if one perceives as an ex-
ploration/mutation rate. More precisely, it can be observed that, the lower the
value of , the longer it takes for the population to move away from the Nash
equilibrium where everybody contributes zero in the case that 1/n < R < 1
(which was also the initial condition for clarity). However, as soon as initial
deviations (from pi = 0 in this case) emerge (with probability proportional to
), the neutral rule in the original learning dynamics takes over, and this drives
the population towards a stable mixed stationary state. Importantly, even if
the value of is extremely small, the random drift sooner or later gains mo-
mentum and eventually yields similar contribution levels as those attainable
with larger values of . Most importantly, note that there is a discontinuous
jump towards staying in the Nash equilibrium, which occurs only if is exactly
zero. If is bounded away from zero, then the free-riding Nash equilibrium
erodes unless it is n−strong (for very low values of R ≤ 1/n).
2.3 Discussion
We have introduced a public goods game with directional learning, and we have
studied how the level of contributions to the common pool depends on the rate
of return and the responsiveness of individuals to the successes and failures of
their own past actions. We have shown that directional learning alone suffices
to explain deviations from the Nash equilibrium in the stationary state of the
public goods game. Even though players have no strategically relevant infor-
mation about the game and/ or about each others’ actions, the population
54
could still end up in a mixed stationary state where some players contributed
at least part of the time although the Nash equilibrium would be full free-
riding. Vice versa, defectors emerged where cooperation was clearly the best
strategy to play. We have explained these evolutionary outcomes by introduc-
ing the concept of k−strong equilibria, which bridge the gap between the Nash
equilibrium and Aumann-strong equilibrium concepts. We have demonstrated
that the lower the maximal k−strength and the higher the responsiveness of
individuals to the consequences of their own past strategy choices, the more
likely it is for the population to (mis)learn what is the objectively optimal
unilateral (Nash) play.
These results have some rather exciting implications. Foremost, the fact that
the provisioning of public goods even under adverse conditions can be explained
without any sophisticated and often lengthy arguments involving selflessness
or social preference holds promise of significant simplifications of the rationale
behind seemingly irrational individual behavior in sizable groups. It is simply
enough for a critical number (depending on the size of the group and the
rate of return) of individuals to make a “wrong choice” at the same time
once, and if only the learning process is sufficiently fast or naive, the whole
subpopulation is likely to adopt this wrong choice as their own at least part
of the time. In many real-world situations, where the rationality of decision
making is often compromised due to stress, propaganda or peer pressure, such
“wrong choices” are likely to proliferate. As we have shown in the context of
public goods games, sometimes this means more prosocial behavior, but it can
also mean more free-riding, depending only on the rate of return.
55
that might be appealing in many real-life situations, also those that extend
beyond the provisioning of public goods. Fashion trends or viral tweets and
videos might all share a component of directional learning before acquiring
mainstream success and recognition. We hope that our study will be inspira-
tional for further research in this direction. The consideration of directional
learning in structured populations (Szabó and Fáth, 2007; Perc and Szolnoki,
2010), for example, appears to be a particularly exciting future venture.
56
2.4 Methods
57
r
01 4 8 12 16 20 24 28 32
14
-strong for
12
10
8
i k
is
6
for all
2
ci
1
0.0 0.4 0.8 1.2 1.6 2.0
R
Figure 2.1: The maximal k-strength of equilibria in the studied public goods
game with directional learning. As an example, we consider the population
size being n = 16. As the rate of return r increases above 1, the Aumann-
strong (n−strong) ci = 0 for all i (full defection) equilibrium looses strength.
It is still the unique Nash equilibrium, but its maximal strength is bounded
by k = 17 − r. As the rate of return r increases further above n (R > 1),
the ci = 1 for all i (full cooperation) equilibrium suddenly becomes Aumann-
strong (n−strong). Shaded regions denote the public bad game (r < 1), and
the public goods games with conflicting (1 < r < n) and aligned (R > 1)
individual and public motives in terms of the Nash equilibrium of the game
(see main text for details). We note that results for other population and/or
group sizes are the same over R, while r and the slope of the red line of course
scale accordingly.
58
2.0
1.8
1.6
1.4
1.2
R
1.0
0.8
0.6
0.4
0.2
0.01 0.1 1
59
Figure 2.3: Time evolution of average contribution levels, as obtained for
R = 0.7, δ = 0.1 and different values of (see legend). If only > 0, the Nash
equilibrium erodes to a stationary state where at least some members of the
population always contribute to the common pool. There is a discontinuous
transition to complete free-riding (defection) as → 0. Understandably, the
lower the value of (the smaller the probability for the perturbation), the
longer it may take for the drift to gain on momentum and for the initial
deviation to evolve towards the mixed stationary state. Note that the time
horizontally is in logarithmic scale.
References
60
Bowles, Samuel and Herbert Gintis (2011). A Cooperative Species: Human
Reciprocity and Its Evolution. Princeton, NJ: Princeton University Press.
Chaudhuri, Ananish (2011). “Sustaining cooperation in laboratory public goods
experiments: a selective survey of the literature”. In: Experimental Eco-
nomics 14, pp. 47–83.
Erev, Ido and Alvin E Roth (1998). “Predicting how people play games: Re-
inforcement learning in experimental games with unique, mixed strategy
equilibria”. In: American Economic Review 88, pp. 848–881.
Fehr, Ernst and Simon Gächter (2000). “Cooperation and Punishment in Pub-
lic Goods Experiments”. In: Am. Econ. Rev. 90, pp. 980–994.
Fischbacher, U., S. Gächter, and E. Fehr (2001). “Are people conditionally
cooperative? Evidence from a public goods experiment”. In: Econ. Lett. 71,
pp. 397–404.
Fischbacher, Urs and Simon Gächter (2010). “Social preferences, beliefs, and
the dynamics of free riding in public goods experiments”. In: The American
Economic Review 100, pp. 541–556.
Foster, Dean P and H Peyton Young (2006). “Regret testing: learning to play
Nash equilibrium without knowing you have an opponent”. In: Theoretical
Economics 1, pp. 341–367.
Goeree, Jacob K. and Charles A. Holt (2005). “An Explanation of Anoma-
lous Behavior in Models of Political Participation”. In: American Political
Science Review 99, pp. 201–213.
Hardin, Gerrett (1968). “The Tragedy of the Commons”. In: Science 162,
pp. 1243–1248.
Hofbauer, Josef and Karl Sigmund (1998). Evolutionary Games and Population
Dynamics. Cambridge, U.K.: Cambridge University Press.
Hrdy, Sarah Blaffer (2011). Mothers and Others: The Evolutionary Origins of
Mutual Understanding. Cambridge, MA: Harvard University Press.
61
Isaac, Mark R., Kenneth F. McCue, and Charles R. Plott (1985). “Public goods
provision in an experimental environment”. In: Journal of Public Economics
26, pp. 51–74.
Laslier, J.-F. and B. Walliser (2014). “Stubborn Learning”. In: Theory and
Decision forthcoming.
Ledyard, J. O. (1997). “Public Goods: A Survey of Experimental Research”.
In: The Handbook of Experimental Economics. Ed. by J. H. Kagel and A. E.
Roth. Princeton, NJ: Princeton University Press, pp. 111–194.
Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge,
U.K.: Cambridge University Press.
Mesterton-Gibbons, M. (2001). An Introduction to Game-Theoretic Modelling,
2nd Edition. Providence, RI: American Mathematical Society.
Mesterton-Gibbons, M. and L. A. Dugatkin (1992). “Cooperation among unre-
lated individuals: Evolutionary factors”. In: The Quarterly Review of Biology
67, pp. 267–281.
Moreno, D. and J. Wooders (1996). “Coalition-proof equilibrium”. In: Games
and Economic Behavior 17, pp. 82–112.
Myatt, D. P. and C. Wallace (2008). “An evolutionary analysis of the volun-
teer’s dilemma”. In: Games Econ. Behav. 62, pp. 67–76.
Nash, John (1950). “Equilibrium points in n-person games”. In: Proc. Natl.
Acad. Sci. USA 36, pp. 48–49.
Nax, Heinrich H., Bary S. R. Pradelski, and H. Peyton Young (2013). “Decen-
tralized dynamics to optimal and stable states in the assignment game”. In:
Proc. IEEE 52, pp. 2398–2405.
Nowak, Martin A. (2006a). Evolutionary Dynamics. Cambridge, MA: Harvard
University Press.
– (2006b). “Five Rules for the Evolution of Cooperation”. In: Science 314,
pp. 1560–1563.
62
Nowak, Martin A. and Roger Highfield (2011). SuperCooperators: Altruism,
Evolution, and Why We Need Each Other to Succeed. New York: Free Press.
Palfrey, Thomas R. and Jeffrey E. Prisbey (1997). “Anomalous Behavior in
Public Goods Experiments: How Much and Why?” In: The American Eco-
nomic Review 87, pp. 829–846.
Perc, M. and A. Szolnoki (2010). “Coevolutionary games – a mini review”. In:
BioSystems 99, pp. 109–125. doi: 10.1016/j.biosystems.2009.10.003.
Perc, M. et al. (2013). “Evolutionary dynamics of group interactions on struc-
tured populations: a review”. In: J. R. Soc. Interface 10, p. 20120997.
Rand, D. A. and M. A. Nowak (2013). “Human cooperation”. In: Trends in
Cognitive Sciences 17, pp. 413–425.
Rand, D. G. et al. (2009). “Positive Interactions Promote Public Cooperation”.
In: Science 325, pp. 1272–1275.
Rand, D. G. et al. (2013). “Evolution of fairness in the one-shot anonymous
Ultimatum Game”. In: Proc. Natl. Acad. Sci. USA 110, pp. 2581–2586.
Roth, Alvin E and Ido Erev (1995). “Learning in extensive-form games: Ex-
perimental data and simple dynamic models in the intermediate term”. In:
Games and Economic Behavior 8, pp. 164–212.
Santos, F. C. and J. M. Pacheco (2011). “Risk of collective failure provides an
escape from the tragedy of the commons”. In: Proc. Natl. Acad. Sci. USA
108, pp. 10421–10425. doi: 10.1073/pnas.1015648108.
Santos, F. C., M. D. Santos, and J. M. Pacheco (2008). “Social diversity pro-
motes the emergence of cooperation in public goods games”. In: Nature 454,
pp. 213–216.
Sauermann, Heinz and Reinhard Selten (1962). “Anspruchsanpassungstheorie
der unternehmung”. In: Journal of Institutional and Theoretical Economics
118, pp. 577–597.
63
Selten, R. and J. Buchta (1994). Experimental Sealed Bid First Price Auctions
with Directly Observed Bid Functions. Discussion Paper Series B.
Selten, R. and R. Stoecker (1986). “End behavior in sequences of finite Pris-
oner’s Dilemma supergames A learning theory approach”. In: Journal of
Economic Behavior & Organization 7, pp. 47–70.
Szabó, György and Gábor Fáth (2007). “Evolutionary games on graphs”. In:
Phys. Rep. 446, pp. 97–216.
Weibull, J. W. (1995). Evolutionary Game Theory. Cambridge, MA: MIT
Press.
Young, H Peyton (2009). “Learning by trial and error”. In: Games and Eco-
nomic Behavior 65, pp. 626–643.
Young, H. Peyton et al. (2013). Learning in a Black Box. Economics Series
Working Papers.
64
Chapter 3
65
Abstract
primates. The results of such games have been used to argue that
people are pro-social, and that humans are uniquely altruistic, willingly
are mistaken, but learn, during the game, how to improve their personal
(iii) that social preferences, if they exist, are more anti-social than pro-
social.
66
Acknowledgements. We thank Jay Biernaskie, Innes Cuthill, Claire El
Mouden, Nichola Raihani and two anonymous referees for comments. We
thank the ERC, Nuffield College and the Calleva Research Centre, Magdalen
College, for funding.
67
3.1 Introduction
The results from economic games have been used to argue that humans are
altruistic in a way that differs from most if not all other organisms (Fehr and
Gächter, 2002; Gintis et al., 2003; Fehr and Fischbacher, 2003; Henrich, 2006).
In public goods games experiments, participants have to choose how much of
their monetary endowment they wish to keep for themselves and how much
to contribute to a group project (Ledyard, 1995; Chaudhuri, 2011). Contribu-
tions to the group project are automatically multiplied by the experimenter
before then being shared out equally among all group members regardless
of their relative contributions (Isaac and Walker, 1988b; Isaac and Walker,
1988a). The multiplication is usually less than the group size, so that a con-
tributor receives back less from her contribution than she contributed. In this
case, participants have to choose between retaining their full endowment and
thus maximizing their personal income, or sacrificing some of their earnings
to the benefit of the group. Hundreds of experiments have shown that most
people partially contribute to the group project and thus fail to maximize
personal income (Ledyard, 1995; Chaudhuri, 2011). It has been argued that
this robust result demonstrates that humans have a unique regard for the
welfare of others, termed pro-social preferences, which cannot be explained by
kin selection (Hamilton, 1964), reciprocity (Trivers, 1971) and/or via improved
reputation (Alexander, 1987; Nowak and Sigmund, 1998; Wedekind and Milin-
ski, 2000; Nowak and Sigmund, 2005). Consequently, economic games are also
increasingly being used in non-human primates in attempts to explore the evo-
lutionary origins of such puzzling social behaviours (Brosnan and Waal, 2003;
Jensen, Call, and Tomasello, 2007; Proctor et al., 2013).
The conclusion that humans are especially, perhaps uniquely, altruistic has re-
lied on the assumption that individuals play ‘perfectly’ in experiments such as
68
the public goods game. Specifically, that individuals have a full understanding
of the game, in terms of the consequences of their behaviour for themselves and
others, such that their play reflects how they value the welfare of others (so-
cial preferences) as in Fehr and Gächter, 2002; Fehr and Schmidt, 1999. This
results in the inference that the costly decisions that players make knowingly
inflict a personal cost in order to benefit others (Fehr and Fischbacher, 2003).
Consequently the typical decline in contributions when players are made to
play the game repeatedly (Ledyard, 1995; Chaudhuri, 2011); see figure 3.1, is
argued to be a withdrawal of cooperation in response to a minority of non-
cooperators (Fischbacher, Gächter, and Fehr, 2001; Fischbacher and Gächter,
2010; Camerer, 2013).
An alternative explanation for the data is that individuals are trying to maxi-
mize their financial gain, but they are not playing the game ‘perfectly’ (Kümmerli
et al., 2010; Burton-Chellew and West, 2013). This hypothesis predicts indi-
viduals initially cooperate to some degree, because they are uncertain and bet-
hedge (Burton-Chellew and West, 2013), or they are mistaken about how the
payoffs operate (Kümmerli et al., 2010; Houser and Kurzban, 2002; Andreoni,
1995), or perhaps they operate a heuristic from every-day life that starts off
cooperating without calculating the consequences (Rand, Greene, and Nowak,
2012). This hypothesis consequently predicts a decline in cooperation over
time as individuals learn, albeit imperfectly, how behaviour influences payoffs.
Consistent with this alternative hypothesis, individuals have been found to
contribute similar amounts over time to the group project (as observed in stan-
dard experiments) even in low-information environments, that is, even when
they do not know they are playing the public goods game with others (Burton-
Chellew and West, 2013; Bayer, Renner, and Sausgruber, 2013). However, this
alternate hypothesis has been argued against, with the suggestion that the de-
cline in cooperation is better explained by pro-social individuals conditionally
69
Figure 3.1: We analyse the data from Burton-Chellew & West Burton-Chellew
and West, 2013. Participants played a public goods game for 20 repeated
rounds, with random group composition each round. There were three differ-
ent information treatments (see text for details). The results conform to the
stereotypical results of public goods games, in that contributions commence
at intermediate values and decline steadily with repetition of the game.
70
Sausgruber, 2013; Erev and Haruvy, 2013; Cross, 1983; Selten and Stoecker,
1986; Sauermann and Selten, 1962; Selten and Buchta, 1998)). Our second
and third rules are based on two forms of pro-social behaviour that have been
previously argued to lead to altruistic behaviour in public goods games (Fis-
chbacher, Gächter, and Fehr, 2001; Fischbacher and Gächter, 2010; Croson,
Fatas, and Neugebauer, 2005; Croson, 2007). Our second rule assumes that
individuals are trying to maximize a weighted function of their own income
and that of their group-mates (Croson, 2007). This also allows directional
learning, but in a way that takes account of the consequences of behaviour
for others. Our third rule is conditional cooperation, in response to the co-
operation of others (Fischbacher, Gächter, and Fehr, 2001; Fischbacher and
Gächter, 2010; Croson, Fatas, and Neugebauer, 2005; Böhm and Rockenbach,
2013). For example, if the average contributions of one’s group-mates increase
from one round to the next, then one will respond by contributing more in the
next round.
We analysed data from three public goods games, all with the same payoff-
structure, but which differ in the amount of information that the players are
given about the consequences of their behaviour for others. Specifically, in-
dividuals had no knowledge that their behaviour even benefited others (black
box), or were told at the start how their behaviour benefited others (standard),
or were also shown after each round of play that contributions benefited oth-
ers (enhanced).1 By comparing behaviour in these different games, we could
explicitly examine the extent to which behaviour was influenced by conse-
quences for the actor himself/herself (the only concern in the black box), and
consequences for others (increasingly highlighted in the standard and enhanced
treatments). In addition, we told players in the standard and enhanced treat-
ment the decisions of their group-mates after each round. This allows us to
1
See Burton-Chellew and West, 2013 for further instruction details.
71
test whether players are attempting to condition their cooperation and whether
this depends on how clear the benefits of contributing are for others.
We analysed the dataset from our previously published study, where the ex-
perimental methods are described in detail (Burton-Chellew and West, 2013);
see figure 3.1. This experiment examined the behaviour of 236 individuals,
distributed among 16 sessions. Here, we provide a brief summary of the parts
of the experimental design relevant to this study.
We tested three versions of the public goods game and used an identical set-
up and payoff matrix, but provided different levels of social information, each
time. In each session, we had 12 or 16 participants and we grouped them into
groups of four and had them play the public good game, before repeating the
game again and again for a total of 20 rounds. Groups were randomly created
every round. In all treatments, we gave our participants a fresh endowment
of 40 monetary units (MU), or 40 coins (for the black box), per round, and
multiplied the contributions of players by 1.6 before sharing them out equally
among all four group members. This meant that the marginal-per-capita-
return (MPCR) for each unit contributed was 0.4. Consequently, contributions
were always personally costly and to not contribute was the payoff-maximizing
(strictly dominant) strategy in each round.
Our most extreme condition was an entirely asocial set-up, with no social
framing, and where instead of allowing participants to contribute to a group-
project, we let them contribute to a ‘black box’, even though they were in
72
reality playing a standard inter-connected public goods game. We told the
participants that the black box ‘performs a mathematical function that con-
verts the number of coins inputted into a number of coins to be outputted.’
This allowed us to deliberately create participants that would not know the
payoff-maximizing strategy, and are also unconcerned by other-regarding pref-
erences. In such a condition, the participants could only be motivated to try
and adjust their behaviour so as to maximize their own income, as much as
participants are ever so motivated.
Our other two treatments were revealed public goods games, where we told
our participants they could either contribute each MU to a group project
(the public good) or keep it for themselves. We told our players how the
game works, specifically that contributions are multiplied by 1.6 before being
shared out equally among all four players. In both of these ‘revealed’ versions
of the game, we gave our participants the exact same instructions, but we
gave more information after each round of play in one treatment than the
other. Specifically, in the ‘standard’ set-up, we told participants after each
round what their own payoffs were, and also what the decisions of their three
group-mates were. This is the most typical information content of public
goods game studies (e.g. Fehr and Gächter, 2002), which has provided the
template for many subsequent studies. In our ‘enhanced’ treatment, we also
informed our participants what their groupmates individual returns from the
group project were and their subsequent individual earnings. Note that in
this enhanced treatment, there is strictly speaking no new information relative
to the standard treatment, if players (i) understood the game and (ii) were
calculating the earnings of their group-mates from their contributions.
73
to enable a within-participant analysis. We presented the two games as two
entirely separate experiments to minimize spill-over effects: in one they could
‘input’ ‘coins’ into a ‘black box’, in the other they could ‘contribute’ ‘MU’ to
a ‘group project’, and the order of play of these games was counter-balanced
across sessions.
We tested three learning rules (figure 3.2). In all cases, we assumed that
players adjusted their behaviour according to whether previous behavioural
adjustments lead to positive or negative consequences for the proposed under-
lying utility function. For example, if players derive utility only from their
personal income, and a previous reduction (or increase) in their contributions
led to an increase in their personal income, then in the next time step they
would gravitate towards the lesser (or greater), more successful, level of con-
tribution. Similarly, if players value the payoffs to others, then ceteris paribus,
others’ changes in income would be responded to in an equivalent way. The
three underlying utility functions that we examine were as follows:
74
Figure 3.2: We considered the explanatory power of three behavioural response
rules: (a) payoff-based learning based on increasing own income; (b) pro-social
directional learning, based on own income and the income of others (weighted
by α); and (c) conditional cooperation, based on own income and a desire to
equalize incomes (weighted by β).
P
such that the resulting utility is ui (c) = ϕi (c)+−βi j6=i (cj − ci ) , where
βi measures the agent’s concern to match others’ contributions.
ment by player i, is the response variable and Xit is the vector of the predictor
variables including those for the three hypotheses. β is the vector of parame-
ters to be estimated and βi is the estimator of predictor variable xi ’s positive
effect on the response variable for a unit change in xi . et+1 represents the
standard (normally distributed) error term for this model. We focus on ad-
justments in periods 1 − 10 because median contributions, having reached zero
75
in the enhanced treatment, and near zero otherwise (5/6 for black box and 4
for standard), change little after this and we are interested in modelling how
cooperative behaviour changes over time.
The predictor variables xi represent the three different learning rules above by
encoding the previous relationship between an agent’s contributions and (I)
their payoffs, (II) their group-mates’ payoffs or (III) their group-mates’ ac-
tions, respectively. They take integer values from −1 to 1. Specifically, for
utility function (I), payoff-based learning, if a player’s contribution increased
across the two rounds (if cti > ct−1 t t−1
i ) along with their payoff (ϕi ≥ ϕi ), then we
(relative to the mean of the two previous rounds) and encode −1, and we
predict 0 for all other cases.
For utility function (II), pro-social learning, we likewise encode the value
+1 following either a contribution increase and ‘other-regarding success’ (if
cti > ct−1 and j6=i ϕtj ≥ t−1
P P
i j6=i ϕj ) or a contribution decrease and ‘other-
76
Table 3.1: Summary of results from testing the three different learning rules
together. (The table details the statistical significance of the three learning
rules (payoff-based learning, pro-social learning and conditional cooperation)
for the three information treatments (black box, standard and enhanced). 3,
estimators significantly support direction of hypothesis in this treatment. 7,
estimators significantly contradict direction of hypothesis in this treatment,
n.s., non-significant. The values represent the estimate of the effects of unit
changes in the hypothesis-specific predictor variables on the response variable;
positive (negative) parameter estimators support (contradict) the respective
hypothesis. Table 3.2 details the regressions fully.)
a contribution decrease coupled with ‘other-regarding success’ (if cti < ct−1
i and
t t−1
P P
j6=i ϕj ≥ j6=i ϕj ) or a contribution increase with ‘other-regarding failure’
(if cti > cit−1 and j6=i ϕtj < j6=i ϕt−1
P P
j ), and 0 otherwise. Thus this variable,
along with the payoff-based learning variable, is also positive if the prior di-
rectional changes in contributions were maintained after success or reversed
after failure, but success and failure are now judged in terms of others’ payoffs
instead of own payoffs. For our third utility function, (III), conditional cooper-
ation, we encode +1 when there has been an increase in the mean contribution
of group-mates across the previous two rounds (if j6=i ctj ≥ j6=i ct−1
P P
j ) and 0
otherwise.
77
respective hypothesis, whereas negative estimators, meaning a negative corre-
lation between the learning rule and the subsequent changes in contributions,
contradict the respective hypothesis. For pro-social learning, the coefficient
indicates whether the average of weights αi on others’ income is supportive of
pro-sociality (positive) or not. Table 3.1 summarizes the results according to
their implications for the various hypotheses. Table 3.2 provides full details of
the parameter estimates for all models on all the data. The electronic supple-
mentary material provides the parameter estimates for models that analysed
sub-sets of the data according to which game-frame order they belonged to
(see Material and methods, data collection). We also provide a table detailing
the utility functions and their quantitative relationship to the data (electronic
supplementary material, table 3.2).
We found that our payoff-based learning rule was significant for all three
versions of the public goods game, in contrast to both our pro-social and
conditional-cooperation rules which were typically non-significant or signifi-
cant in the wrong direction (tables 3.1 and 3.2; electronic supplementary ma-
terial).
In the black box treatment, the behaviour of individuals could best be ex-
plained by payoff-based responses, with players significantly learning to im-
prove their income (tables 3.1 and 3.2). Figure 3.1 confirms that, this leads
to behaviour at the group level which is strikingly similar to play in standard
public goods games. By contrast, the pro-social response rule estimate was sig-
78
Table 3.2: A comparison of the different behavioural rules, plus one combining them all together, across three different information
treatments. (PBL, payoff-based learning (own success); PSL, pro-social learning (own success and others’ success); CC, conditional
cooperation (own success and others’ actions). All, a combination of all the components from the three rules (own success, others’
success, and others’ actions). The parameters in the first three rows estimate the effects of unit changes in the predictor variables
that act as components in the three learning rules; positive (negative) parameter estimators support (contradict) the respective
hypothesis.)
black box estimate (significance) standard estimate (significance) enhanced estimate (significance)
PBL PSL CC All PBL PSL CC All PBL PSL CC All
own 0.31 0.29 0.30 0.30 0.28 0.22 0.30 0.25 0.22 0.14 0.19 0.14
79
success (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
others’ −0.12 −0.13 −0.16 −0.23 −0.29 −0.29
success (0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
others’ −0.04 0.05 0.09 0.21 −0.11 −0.001
actions (0.241) (0.178) (0.038) (0.001) (0.012) (0.95)
r2 0.09 0.10 0.09 0.10 0.07 0.09 0.07 0.10 0.04 0.11 0.05 0.11
no. 1888 1888 1888 1888 928 928 928 928 960 960 960 960
obs.
nificantly negative, attributing a negative weight to the welfare of other players.
This would represent anti-social preferences if it were not for the asocial frame
of the black box treatment and provides a baseline estimate for the anti-social
nature of payoff-based learning. The conditional-cooperation response rule was
not significant when payoff-based learning is controlled for.
Conditional cooperation was significant in the standard version but not the
enhanced version of the game, which has identical instructions and game struc-
ture, but where individuals were explicitly shown the returns to the other group
members from the group project. This enhanced information could of course
in principle be calculated by participants in the standard treatment as they
knew the decisions of their group-mates. In the standard version, the condi-
tional cooperation rule was not so significant unless controlling for anti-social
responses to others’ success (table 3.2).
80
butions over time (Fischbacher, Gächter, and Fehr, 2001; Fischbacher and
Gächter, 2010), but contributions declined faster in the enhanced treatment
where conditional cooperation was either non-significant (combined model, ta-
ble 3.2) or significantly negative (non-combined model, table 3.2). This sug-
gests that the conditional cooperation in the standard treatment is more to do
with social learning than social preferences, as the reduced uncertainty in the
enhanced treatment may reduce uncertain participants’ reliance upon imita-
tion (Carpenter, 2004). In addition, if some participants have incorrect beliefs
about how the payoffs are determined and choose to match others in the stan-
dard treatment, they may be less likely to do so in the enhanced treatment as
they revise their mistaken beliefs.
However, such payoff-based learning does not require that people realize that
the dominant strategy is independent of their group-mates’ actions. There-
fore the re-start phenomenon (Andreoni, 1988; Croson, 1996) whereby average
81
cooperation levels temporarily increase from a previous decline when the ex-
periment is ‘re-started’, while challenging, does not falsify learning hypotheses,
and may also be partly owing to selfish players attempting to manipulate oth-
ers (Andreoni, 1988; Kreps et al., 1982; Ambrus and Pathak, 2011).
Overall, our analyses suggest that changes in behaviour over time in public
goods games are largely explained by participants learning how to improve
personal income. We found conflicting support for conditional cooperation
as such behaviour disappeared when the consequences of contributing were
made clearer. This suggests that conditional cooperation is largely due to
confusion/error and not pro-sociality. This is reinforced by our lack of evidence
of a desire to help others (pro-sociality). Indeed, we found that, if anything,
the benefits to others are weighted negatively, with individuals adjusting their
behaviour to better reduce the income of others. We are not suggesting that
humans are anti-social, nor that they are never pro-social — pro-sociality is
found across the tree of life from genes to cells to vertebrates (West, Griffin,
and Gardner, 2007) — rather, that public goods games do not demonstrate
that humans are uniquely altruistic.
82
Chellew and West, 2013). Second, there has been an implicit assumption that
humans behave as utility-maximizers, such that their costly choices reliably
reveal their (social) preferences (Fehr and Schmidt, 1999).
Data accessibility. All the data have been submitted to Dryad and are
available at doi:10.5061/dryad.cr829.
References
83
Andreoni, James (1995). “Cooperation in public-goods experiments: kindness
or confusion?” In: The American Economic Review 85.4, pp. 891–904.
Bayer, Ralph-C., Elke Renner, and Rupert Sausgruber (2013). “Confusion and
learning in the voluntary contributions game”. In: Experimental Economics
16.4, pp. 478–496.
Böhm, Robert and Bettina Rockenbach (2013). “The Inter-Group Compari-
son – Intra-Group Cooperation Hypothesis: Comparisons between Groups
Increase Efficiency in Public Goods Provision”. In: PloS ONE 8.2, e56152.
Brosnan, Sarah F and Frans B M de Waal (2003). “Monkeys reject unequal
pay”. In: Nature 425.6955, pp. 297–299.
Burnham, Terence C and Brian Hare (2007). “Engineering Human Coopera-
tion”. In: Human Nature 18.2, pp. 88–108.
Burton-Chellew, Maxwell N and Stuart A West (2012). “Pseudocompetition
among groups increases human cooperation in a public-goods game”. In:
Animal Behaviour 84.4, pp. 947–952.
– (2013). “Prosocial preferences do not explain human cooperation in public-
goods games”. In: Proceedings of the National Academy of Sciences 110.1,
pp. 216–221.
Camerer, Colin F (2003). Behavioral game theory: Experiments in strategic
interaction. Princeton, NJ: Princeton University Press.
– (2013). “Experimental, cultural, and neural evidence of deliberate prosocial-
ity”. In: Trends in Cognitive Sciences 17.3, pp. 106–108.
Carpenter, Jeffrey P (2004). “When in Rome: conformity and the provision of
public goods”. In: The Journal of Socio-Economics 33.4, pp. 395–408.
Chaudhuri, Ananish (2011). “Sustaining cooperation in laboratory public goods
experiments: a selective survey of the literature”. In: Experimental Eco-
nomics 14.1, pp. 47–83.
84
Croson, Rachel T A (1996). “Partners and strangers revisited”. In: Economics
Letters 53.1, pp. 25–32.
– (2007). “Theories of commitment, altruism and reciprocity: evidence from
linear public goods games.” In: Economic Inquiry 45.2, pp. 199–216.
Croson, Rachel T A, Enrique Fatas, and Tibor Neugebauer (2005). “Reci-
procity, matching and conditional cooperation in two public goods games”.
In: Economics Letters 87.1, pp. 95–101.
Cross, John G (1983). A Theory of Adaptive Economic Behavior. Cambridge,
UK: Cambridge University Press.
Erev, Ido and Ernan Haruvy (2013). “Learning and the economics of small
decisions”. In: The handbook of experimental economics. Ed. by John H
Kagel and Alvin E Roth. Vol. 2. Princeton, NJ: Princeton University Press.
Fehr, Ernst and Urs Fischbacher (2003). “The nature of human altruism”. In:
Nature 425.6960, pp. 785–791.
Fehr, Ernst and Simon Gächter (2002). “Altruistic punishment in humans”.
In: Nature 415.6868, pp. 137–140.
Fehr, Ernst and Klaus M Schmidt (1999). “A Theory of Fairness, Competition,
and Cooperation”. In: The Quarterly Journal of Economics 114.3, pp. 817–
868.
Fischbacher, Urs and Simon Gächter (2010). “Social Preferences, Beliefs, and
the Dynamics of Free Riding in Public Goods Experiments”. In: American
Economic Review 100.1, pp. 541–556.
Fischbacher, Urs, Simon Gächter, and Ernst Fehr (2001). “Are people condi-
tionally cooperative? Evidence from a public goods experiment”. In: Eco-
nomics Letters 71.3, pp. 397–404.
Gintis, Herbert et al. (2003). “Explaining altruistic behavior in humans”. In:
Evolution and Human Behavior 24.3, pp. 153–172.
85
Hamilton, William D (1964). “The genetical evolution of social behaviour. I &
II”. In: Journal of Theoretical Biology 7.1, pp. 1–52.
Hauert, Christoph et al. (2002). “Volunteering as Red Queen Mechanism for
Cooperation in Public Goods Games”. In: Science 296.5570, pp. 1129–1132.
Henrich, Joseph (2006). “Costly Punishment Across Human Societies”. In:
Science 312.5781, pp. 1767–1770.
Houser, Daniel and Robert Kurzban (2002). “Revisiting Kindness and Confu-
sion in Public Goods Experiments”. In: American Economic Review 92.4,
pp. 1062–1069.
Isaac, R Mark and James M. Walker (1988a). “Communication and free-riding
behavior: The voluntary contribution mechanism”. In: Economic Inquiry
26.4, pp. 585–608.
– (1988b). “Group Size Effects in Public Goods Provision: The Voluntary
Contributions Mechanism”. In: The Quarterly Journal of Economics 103.1,
p. 179.
Jensen, Keith, Josep Call, and Michael Tomasello (2007). “Chimpanzees Are
Rational Maximizers in an Ultimatum Game”. In: Science 318.5847, pp. 107–
109.
Kreps, David M et al. (1982). “Rational cooperation in the finitely repeated
prisoners’ dilemma”. In: Journal of Economic Theory 27.2, pp. 245–252.
Kümmerli, Rolf et al. (2010). “Resistance to extreme strategies, rather than
prosocial preferences, can explain human cooperation in public goods games”.
In: Proceedings of the National Academy of Sciences 107.22, pp. 10125–
10130.
Ledyard, John O (1995). “Public goods: A survey of experimental research”.
In: Handbook of experimental economics. Ed. by John H Kagel and Alvin E
Roth. Princeton, NJ: Princeton University Press, pp. 253–279.
86
Nettle, Daniel et al. (2013). “The watching eyes effect in the Dictator Game:
it’s not how much you give, it’s being seen to give something”. In: Evolution
and Human Behavior 34.1, pp. 35–40.
Nowak, Martin A and Karl Sigmund (1998). “Evolution of indirect reciprocity
by image scoring”. In: Nature 393.6685, pp. 573–577.
– (2005). “Evolution of indirect reciprocity”. In: Nature 437.7063, pp. 1291–
1298.
Proctor, Darby et al. (2013). “Chimpanzees play the ultimatum game”. In:
Proceedings of the National Academy of Sciences 110.6, pp. 2070–2075.
Rand, David G., Joshua D. Greene, and Martin A. Nowak (2012). “Sponta-
neous giving and calculated greed”. In: Nature 489.7416, pp. 427–430.
Sauermann, Heinz and Reinhard Selten (1962). “Anspruchsanpassungstheorie
der Unternehmung”. In: Zeitschrift für die gesamte Staatswissenschaft/Journal
of Institutional and Theoretical Economics, pp. 577–597.
Selten, Reinhard and Joachim Buchta (1998). “Experimental sealed bid first
price auctions with directly observed bid functions”. In: Games and human
behavior: essays in honor of Amnon Rapoport. Ed. by Amnon Rapoport et
al. Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 79–104.
Selten, Reinhard and Rolf Stoecker (1986). “End behavior in sequences of finite
Prisoner’s Dilemma supergames: A learning theory approach”. In: Journal
of Economic Behavior & Organization 7.1, pp. 47–70.
Semmann, Dirk, Hans-Jürgen Krambeck, and Manfred Milinski (2003). “Vol-
unteering leads to rock-paper-scissors dynamics in a public goods game”. In:
Nature 425.6956, pp. 390–393.
Trivers, Robert L (1971). “The Evolution of Reciprocal Altruism”. In: The
Quarterly Review of Biology 46.1, pp. 35–57.
Wedekind, Claus and Manfred Milinski (2000). “Cooperation Through Image
Scoring in Humans”. In: Science 288.5467, pp. 850–852.
87
West, Stuart A, Ashleigh S Griffin, and Andy Gardner (2007). “Evolutionary
Explanations for Cooperation”. In: Current Biology 17.16, R661–R672.
88
Chapter 4
Evolution of market
equilibria:
Equity dynamics in matching
markets
89
Abstract
90
Acknowledgements. Foremost, we thank Peyton Young for his guidance.
He worked with us throughout large parts of this project and provided in-
valuable guidance. Further, we thank Itai Arieli, Peter Biró, Gabrielle De-
mange, Gabriel Kreindler, Jonathan Newton, Tom Norman, Tamás Solymosi
and anonymous referees for suggesting a number of improvements to earlier
versions. We are also grateful for comments by participants at the 23rd Inter-
national Conference on Game Theory at Stony Brook, the Paris Game Theory
Seminar, the AFOSR MUIR 2013 meeting at MIT, and the 18th CTN Work-
shop at the University of Warwick. The research was supported by the United
States Air Force Office of Scientific Research Grant FA9550-09-1-0538 and the
Office of Naval Research Grant N00014-09-1-0751.
91
4.1 Introduction
Many matching markets are decentralized and agents interact repeatedly with
very little knowledge about the market as a whole. Examples include online
markets for bringing together buyers and sellers of goods, matching workers
with firms, matching hotels with clients, and matching men and women. In
such markets matchings are repeatedly broken, reshuffled, and restored. Even
after many encounters, however, agents may still have little information about
the preferences of others, and they must experiment extensively before the
market stabilizes.
In this paper we propose a simple adaptive process that reflects the partic-
ipants’ limited information about the market. Agents have aspiration levels
that they adjust from time to time based on their experienced payoffs. Matched
agents occasionally experiment with higher bids in the hope of extracting more
from another match, while single agents occasionally lower their bids in the
hope of attracting a partner. There is no presumption that market partici-
pants or a central authority know anything about the distribution of others’
preferences or that they can deduce such information from prior rounds of play.
Instead they follow a process of trial and error in which they adjust their bids
and offers in the hope of increasing their payoffs. Such aspiration adjustment
rules are rooted in the psychology and learning literature.1 A key feature of
the rule we propose is that an agent’s behavior does not require any informa-
tion about other agents’ actions or payoffs: the rule is completely uncoupled.2
1
There is an extensive literature in psychology and experimental game theory on trial
and error and aspiration adjustment. See in particular the learning models of Thorndike,
1898, Hoppe, 1931, Estes, 1950, Bush and Mosteller, 1955, Herrnstein, 1961, and aspiration
adjustment and directional learning dynamics of Heckhausen, 1955, Sauermann and Selten,
1962, Selten and Stoecker, 1986, Selten, 1998.
2
This idea was introduced by Foster and Young, 2006 and is a refinement of the concept
of uncoupled learning due to Hart and Mas-Colell, 2003; Hart and Mas-Colell, 2006. Recent
work has shown that there exist completely uncoupled rules that lead to Nash equilibrium
in generic noncooperative games (Germano and Lugosi 2007, Marden et al. 2009, Young
2009, Pradelski and Young 2012).
92
It is therefore particularly well-suited to environments such as decentralized
online markets where players interact anonymously and trades take place at
many different prices. We shall show that this simple adaptive process leads to
equitable solutions inside the core of the associated assignment game (Shapley
and Shubik 1972). In particular, core stability and equity are achieved even
though agents have no knowledge of the other agents’ strategies or preferences,
and there is no ex ante preference for equity.
The paper is structured as follows. The next section discusses the related
literature on matching and core implementation. Section 3 formally introduces
assignment games and the relevant solution concepts. Section 4 describes the
process of adjustment and search by individual agents. In sections 5 and 6 we
show that the stochastically stable states of the process lie in the least core.
Section 7 concludes with several open problems.
Our results fit into a growing literature showing how cooperative game so-
lutions can be implemented via noncooperative dynamic learning processes
(Agastya 1997; Agastya 1999, Arnold and Schwalbe 2002, Newton 2010; New-
ton 2012, Sawa 2011, Rozen 2013). A particularly interesting class of coopera-
tive games are assignment games, in which every potential matched pair has a
cooperative ‘value’. Shapley and Shubik, 1972 showed that the core of such a
game is always nonempty.3 Subsequently various authors have explored refine-
ments of the assignment game core, including the kernel (Rochford 1984) and
the nucleolus (Huberman 1980, Solymosi and Raghavan 1994, Nunez 2004,
Llerena, Nunez, and Rafels 2012). To the best of our knowledge, however,
3
Important subsequent papers include Crawford and Knoer, 1981, Kelso and Crawford,
1982, Demange and Gale, 1985, and Demange, Gale, and Sotomayor, 1986.
93
there has been no prior work showing how a core refinement is selected via a
decentralized learning process, which is the subject of the present paper.
This paper establishes convergence to the core of the assignment game for
a class of natural dynamics and selection of a core refinement under payoff
perturbations. We are not aware of prior work comparable with our selection
result. There are, however, several recent papers that also address the issue of
core convergence for a variety of related processes (Chen, Fujishige, and Yang
2011, Biró et al. 2012, Klaus and Payot 2013, Bayati et al. 2014). These pro-
cesses are different from ours, in particular they are not aspiration-adjustment
learning processes, and they do not provide a selection mechanism for a core
refinement as we do here. The closest relative to our paper is the concurrent
paper by Chen, Fujishige, and Yang, 2011, which demonstrates a decentralized
process where, similarly as in our process, pairs of players from the two market
sides randomly meet in search of higher payoffs. This process also leads almost
surely to solutions in the core. Chen, Fujishige, and Yang, 2011 and our paper
are independent and parallel work. They provide a constructive proof based on
their process which is similar to ours for the proof of the convergence theorem.
Thus, theirs as well as our algorithm (proof of Theorem 1) can be used to find
core outcomes. Biró et al., 2012 generalizes Chen, Fujishige, and Yang, 2011
to transferable-utility roommate problems. In contrast to Chen, Fujishige,
and Yang, 2011 and our proof, Biró et al., 2012 use a target argument which
cannot be implemented to obtain a core outcome. Biró et al., 2012’s proof
technique is subsequently used in Klaus and Payot, 2013 to prove the result
of Chen, Fujishige, and Yang, 2011 for continuous payoff space in the assign-
ment game. A particularity in this case is the fact that the assignment may
continue to change as payoffs approximate a core outcome. Finally, Bayati
et al., 2014 study the rate of convergence of a related bargaining process for
the roommate problem in which players know their best alternatives at each
94
stage. The main difference of this process to ours is that agents best reply
(i.e. they have a lot of information about their best alternatives), the order
of activation is fixed, not random, and matches are only formed once a stable
outcome is found.4 An important feature of our learning process is that it is
explicitly formulated in terms of random bids of workers and random offers of
firms (as in Shapley and Shubik 1972), which allows a completely uncoupled
set-up of the dynamic.
There is also a related literature on the marriage problem (Gale and Shapley
1962). In this setting the players have ordinal preferences for being matched
with members of the other population, and the core consists of matchings such
that no pair would prefer each other to their current partners.5 Typically, many
matchings turn out to be stable. Roth and Vande Vate, 1990 demonstrate a
random blocking pair dynamic that leads almost surely to the core in such
games. Chung, 2000, Diamantoudi, Xue, and Miyagawa, 2004 and Inarra,
Larrea, and Molis, 2008; Inarra, Larrea, and Molis, 2013 establish similar
results for nontransferable-utility roommate problems, while Klaus and Klijn,
2007 and Kojima and Uenver, 2008 treat the case of many-to-one and many-
to-many nontransferable-utility matchings. Another branch of the literature
considers stochastic updating procedures that place high probability on core
solutions, that is, the stochastically stable set is contained in the core of the
game (Jackson and Watts 2002, Klaus, Klijn, and Walzl 2010, Newton and
Sawa 2013).
The key difference between marriage problems and assignment games is that
the former are framed in terms of nontransferable (usually ordinal) utility,
whereas in the latter each potential match has a transferable ‘value’. The core
of the assignment game consists of outcomes such that the matching is optimal
4
In a recent paper Pradelski, 2014 discusses the differences to our set-up in more detail.
He then investigates the convergence rate properties of a process closely related to ours.
5
See Roth and Sotomayor, 1992 for a text on two-sided matching.
95
and the allocation is pairwise stable. Generically, the optimal matching is
unique and the allocations supporting it infinite. On the face of it one might
suppose that the known results for marriage games would carry over easily to
assignment games but this is not the case. The difficulty is that in marriage
games (and roommate games) a payoff-improving deviation is determined by
the players’ current matches and their preferences, whereas in an assignment
game it is determined by their matches, the value created by these matches,
and by how they currently split the value of the matches. Thus the core of the
assignment game tends to be significantly more constrained and paths to the
core are harder to find than in the marriage game.
ity
96
4.3.1 The assignment game
We assume that these numbers are specific to the agents and are not known
to the other market participants or to a central market authority.
Match value. Assume that utility is linear and separable in money. The
value of a match (i, j) ∈ F × W is the potential surplus
−
αij = (p+
ij − qij )+ . (4.1)
−
It will be convenient to assume that all values p+
ij , qij , and αij can be expressed
We shall introduce time at this stage to consistently develop our notation. Let
t = 0, 1, 2, ... be the time periods.
Assignment. For all pairs of agents (i, j) ∈ F × W , let atij ∈ {0, 1}.
atij = 1,
matched
then
If (i, j) is (4.2)
atij
unmatched
then = 0.
6
The two sides of the market could also, for example, represent buyers and sellers, or
men and women in a (monetized) marriage market.
97
If for a given agent i ∈ N there exists j such that atij = 1 we shall refer to
that agent as matched ; otherwise i is single. An assignment A = (atij )i∈F,j∈W
is such that if atij = 1 for some (i, j), then atik = 0 for all k 6= j and atlj = 0 for
all l 6= i.
where the maximum is taken over all sets {(i1 , j1 ), ..., (ik , jk )} consisting of
disjoint pairs that can be formed by matching firms and workers in S. The
number v(N ) specifies the value of an optimal assignment.
98
4.3.2 Dynamic components
Aspiration level. At the end of any period t, a player has an aspiration level,
dti , which determines the minimal payoff at which he is willing to be matched.
Let dt = {dti }i∈F ∪W .
Bids. In any period t, one pair of players is drawn at random and they make
bids for each other. We assume that the two players’ bids are such that the
resulting payoff to each player is at least equal to his aspiration level, and with
positive probability is exactly equal to his aspiration level.
Consider, for example, worker j’s bid for firm i. The amount qij− is the minimum
that j would ever accept to be matched with i, while dt−1
j is his previous
aspiration level over and above the minimum. Thus Qtij is j’s attempt to get
even more in the current period. Note that if the random variable is zero, the
agent bids exactly according to his current aspiration level.
99
Payoffs. Given [At , dt ] the payoff to firm i / worker j is
p + − π t πijt − qij−
ij ij if i is matched to j, if j is matched to i,
φti = φtj =
0
if i is single. 0
if j is single.
(4.4)
Note that, players’ payoffs can be deduced from the aspiration levels and the
assignment matrix.
Note that if two players’ bids are at their aspiration levels and ptij = qijt they
are only profitable if both players are currently single. Also note that a pair
of players (i, j) with αij = 0 will never match.
To summarize, when a new match forms that is profitable, both agents receive a
higher payoff in expectation due to the full support of the resulting price.8
P
Optimality. An assignment A is optimal if (i,j)∈F ×W aij · αij = v(N ).
8
In this sense any alternative match that may block a current assignment because it is
profitable (as defined earlier) is a strict blocking pair.
100
Pairwise stability. An aspiration level vector dt is pairwise stable if ∀i, j and atij = 1,
−
p+ t t
ij − di = qij + dj , (4.5)
−
and p+ t t 0 0
i0 j − di0 ≤ qi0 j + dj for every alternative firm i ∈ F with i 6= i and
qij−0 + dtj 0 ≥ p+ t 0 0
ij 0 − di for every alternative worker j ∈ W with j 6= j.
Core (Shapley and Shubik 1972). The core of any assignment game is
always non-empty and consists of the set C ⊆ Ω of all states Z such that A is
an optimal assignment and d is pairwise stable.
Excess. Given state Z t , the excess for a player i who is matched with j
is
eti = φti − max(αik − φtk )+ . (4.6)
k6=j
The excess for player i describes the gap to his next-best alternative, that is,
the smallest amount he would have to give up in order to profitably match with
some other player k 6= j. If a player has negative excess, pairwise stability is
violated. In a core allocation, therefore, all players have nonnegative excess.
For the analysis of absorbing core states, note that the excess in payoff can be
equivalently expressed in terms of the excess in aspiration level. This is the
case since in absorbing core states aspiration levels are directly deducible from
payoffs.
9
See, for example, Roth and Sotomayor, 1992, Balinski and Gale, 1987, Sotomayor, 2003.
101
Minimal excess. Given state Z t , the minimal excess is
Based on the minimal excess of a state, we can define the kernel (Davis and
Maschler 1965). For assignment games, the kernel coincides with the solution
concept proposed by Rochford, 1984, which generalizes a pairwise equal split
solution à la Nash, 1950.
For an analysis of the welfare properties and of the links between the kernel
and the nucleolus of the assignment game see Nunez, 2004 and Llerena, Nunez,
and Rafels, 2012.
Least core (Maschler, Peleg, and Shapley 1979). The least core L of an
10
Note that the excess for coalitions, et (S), is usually defined with a reversed sign. In
order to make it more concurrent in light of definition (4.6) we chose to reverse the sign.
102
assignment game is the set of states Z such that the matching is optimal and
the minimum excess is maximized, that is,
Note that our definition of excess applies to essential coalitions only (that is, for
the case of the assignment game, to two-player coalitions involving exactly one
agent from each market side). Hence, the least core generalizes the nucleolus
of the assignment game in the following sense. Starting with the nucleolus,
select any player with minimum excess (according to equation (4.6)): the least
core contains all outcomes with a minimum excess that is not smaller.11
N ∈ (K ∩ L), K ⊆ C, L ⊆ C. (4.10)
103
support. The two players enter a new match if their match is profitable, which
they can see from their current bids, offers and their payoffs. If the two players
are already matched with each other, they remain so.
The essential steps and features of the learning process are as follows. At the
start of period t + 1:
2a. If the encounter is profitable given their current bids and assignment,
the pair matches.
2b. If the match is not profitable, both agents return to their previous
matches (or remain single).
3a. If a new match (i, j) forms, the price is set anywhere between bid and
offer. The aspiration levels of i and j are set to equal their realized
payoffs.
3b. If no new match is formed, the active agent, if he was previously matched,
keeps his previous aspiration level and stays with his previous partner.
If he was previously single, he remains single and lowers his aspiration
level with positive probability.
104
and about the identity of the other market participants. Following aspira-
tion adjustment theory (Sauermann and Selten 1962, Selten 1998) and related
bargaining experiments on directional and reinforcement learning (e.g., Tietz
and Weber 1972, Roth and Erev 1995), we shall assume a simple directional
learning model: matched agents occasionally experiment with higher offers if
on the sell-side (or lower bids if on the buy-side), while single agents, in the
hope of attracting partners, lower their offers if on the sell-side (or increase
their bids if on the buy-side).
We shall now describe the process in more detail, distinguishing the cases
where the active agent is currently matched or single. Let Z t be the state at
the end of period t (and the beginning of period t + 1), and let i ∈ F be the
unique active agent which for ease of exposition we assume to be a firm.
At the end of period t + 1, the aspiration levels of the newly matched pair (i, j)
are adjusted according to their newly realized payoffs:
dt+1
i = p+ t+1
ij − πij and dt+1
j = πijt+1 − qij− . (4.11)
All other aspiration levels and matches remain fixed. If i, j are not profitable,
i remains matched with his previous partner and keeps his previous aspiration
level. See Figure 4.1 for an illustration.
105
II. The active agent is currently single and meets j
At the end of period t + 1, the aspiration levels of the newly matched pair (i, j)
are adjusted to equal their newly realized payoffs:
dt+1
i = p+ t+1
ij − πij and dt+1
j = πijt+1 − qij− . (4.12)
All other aspiration levels and matches remain as before. If i, j are not prof-
itable, i remains single and, with positive probability, reduces his aspiration
level,
dt+1
i = (dti − Xit+1 )+ , (4.13)
4.4.2 Example
106
f1 f2
(40,31,20) (20,31,40)
Then one can compute the match values: α11 = α23 = 20, α12 = α22 = 11, and
αij = 0 for all other pairs (i, j). Let δ = 1.
Suppose that, at the end of some period t, (f1 , w1 ) and (f2 , w2 ) are matched
and w3 is single.
The current aspiration level is shown next to the name of that agent, and the
values αij are shown next to the edges (if positive). Bids will be shown to the
right of the aspiration level. Solid edges indicate matched pairs, and dashed
edges indicate unmatched pairs. (Edges with value zero are not shown.) Note
that no player can see the bids or the status of the players on the other side
of the market.
Note that some matches can never occur. For example f1 is never willing to
pay more than 20 for w3 , but w3 would only accept a price above 30 from
f1 .
107
f1 f2
13 10
20 11 11 20
7 1 10
w1 w2 w3
Note that the aspiration levels satisfy dti + dtj ≥ αij for all i and j, but the
assignment is not optimal (firm 2 should match with worker 3).
20 11 11 20 20 11 11 20
7 1 10→30 7 1 10−1
w1 w2 w3 w1 w2 w3
108
f2 encounters w3 Successful match; f2 increases aspiration level
f1 f2 f1 f2
13 10 →30 13 10 +1
20 11 11 20 20 11 11 20
7 1 9→29 7 1 9
w1 w2 w3 w1 w2 w3
w2 ’s current aspiration level is too high in the sense that he has no profitable
matches and thus in particular is not profitable with f2 . Hence he remains
single and, with positive probability, reduces his aspiration level by 1.
20 11 11 20 20 11 11 20
Z t +3 20 11 11 20
7 0 9
w1 w2 w3
15
Note that the states Z t+2 and Z t+3 are both in the core, but Z t+3 is absorbing whereas
t+2
Z is not.
109
4.5 Core stability – absorbing states of the un-
perturbed process
Throughout the proof we shall omit the time superscript since the process is
time-homogeneous. The general idea of the proof is to show a particular path
leading into the core which has positive probability. The proof uses integer
programming arguments (Kuhn 1955, Balinski 1965) but no single authority
‘solves’ an integer programming problem. It will simplify the argument to
restrict our attention to a particular class of paths with the property that the
realizations of the random variables Pijt , Qtij are always 0 and the realizations
of Xit are always δ. Pijt , Qtij determine the gaps between the bids and the
aspiration levels, and Xit determines the reduction of the aspiration level by a
single agent. One obtains from equation (4.3) for the bids:
Recall that any two agents encounter each other in any period with positive
probability. It shall be understood in the proof that the relevant agents in any
period encounter each other. Jointly with equation (4.3), we can then say that
110
a pair of aspiration levels (dti , dtj ) is profitable if
either dti +dtj < αij or dti +dtj = αij and both i and j are single.
(4.15)
Restricting attention to this particular class of paths will permit a more trans-
parent analysis of the transitions, which we can describe solely in terms of the
aspiration levels.
Any aspiration levels satisfying Claim 1 will be called good. Note that, even
if aspiration levels are good, the assignment does not need to be optimal and
not every agent with a positive aspiration level needs to be matched. (See the
period-t example in the preceding section.)
Claim 2. Starting at any state with good aspiration levels, there is a positive
probability path to a pair (A, d) where d is good, A is optimal, and all singles’
aspiration levels are zero.16
Proof of Claim 1.
Case 1. Suppose the aspiration levels d are such that di + dj < αij for some
i, j. Note that this implies that i and j are not matched with each other
since otherwise the entire surplus is allocated and di + dj = αij . With posi-
tive probability, either i or j is activated and i and j become matched. The
new aspiration levels are set equal to the new payoffs. Thus the sum of the
16
Note that this claim describes an absorbing state in the core. It may well be that the
core is reached while a single’s aspiration level is more than zero. The latter state, however,
is transient and will converge to the corresponding absorbing state.
111
aspiration levels is equal to the match value αij . Therefore, there is a positive
probability path along which d increases monotonically until di + dj ≥ αij for
all i, j.
Case 2. Suppose the aspiration levels d are such that di + dj ≥ αij for all i, j.
We can suppose that there exists a single agent i with di > 0 and di + dj > αij
for all j, else we are done. With positive probability, i is activated. Since no
profitable match exists, he lowers his aspiration level by δ. In this manner, a
suitable path can be constructed along which d decreases monotonically un-
til the aspiration levels are good. Note that at the end of such a path, the
assignment does not need to be optimal and not every agent with a positive as-
piration level needs to be matched. (See the period-t example in the preceding
section.)
Proof of Claim 2.
Suppose that the state (A, d) satisfies Claim 1 (d is good) and that some
single exists whose aspiration level is positive. (If no such single exists, the
assignment is optimal and we have reached a core state.) Starting at any such
state, we show that, within a bounded number of periods and with positive
probability (bounded below), one of the following holds:
The aspiration levels are good, the number of single agents with posi-
tive aspiration level decreases, and the sum of the aspiration levels
remains constant.
(4.16)
112
The aspiration levels are good, the sum of the aspiration levels
decreases by δ > 0, and the number of single agents with a positive
aspiration level does not increase.
(4.17)
In general, say an edge is tight if di + dj = αij and loose if di + dj = αij −
δ. Define a maximal alternating path P to be a path that starts at a single
player with positive aspiration level, and that alternates between unmatched
tight edges and matched tight edges such that it can not be extended (hence
maximal). Note that, for every single with a positive aspiration level, at least
one maximal alternating path exists. Figure 3 (left panel) illustrates a maximal
alternating path starting at f1 . Unmatched tight edges are indicated by dashed
lines, matched tight edges by solid lines and loose edges by dotted lines.
Without loss of generality, let f1 be a single firm with positive aspiration level.
Case 1a. All firms on the path have a positive aspiration level.
Let P = (f1 , w1 , f2 , w2 , ..., wk−1 , fk , wk ). Note that, since the path is maximal
and of odd length, wk must be single. With positive probability, f1 is activated.
Since no profitable match exists, he lowers his aspiration level by δ. With
positive probability, f1 is activated again next period, he snags w1 and with
positive probability he receives the residual δ. At this point the aspiration
levels are unchanged but f2 is now single. With positive probability, f2 is
activated. Since no profitable match exists, he lowers his aspiration level by δ.
With positive probability, f2 is activated again next period, he snags w2 and
with positive probability he receives the residual δ. Within a finite number
113
of periods a state is reached where all players on P are matched and the
aspiration levels are as before. (Note that fk is matched with wk without a
previous reduction by fk since wk is single and thus their bids are profitable.)
In summary, the number of matched agents has increased by two and the num-
ber of single agents with positive aspiration level has decreased by at least one.
The aspiration levels did not change, hence they are still good.
… …
w1 w2 wk w1 w2 wk
Case 1b. At least one firm on the path has aspiration level zero.
In summary, the number of single agents with a positive aspiration level has
decreased by one because f1 is no longer single and the new single agent fi has
aspiration level zero. The aspiration levels did not change, hence they are still
114
good.
… …
dw dw dw dw dw dw
1 2 k 1 2 k
w1 w2 wk w1 w2 wk
Case 2a. All firms on all maximal alternating paths starting at f1 have a
positive aspiration level.
115
f10 receives the residual δ. (Such a transition occurs with strictly positive
probability whether or not f10 is matched because aspiration levels are strictly
below the match value of (w10 , f10 ).) Note that f20 and possibly f10 ’s previous
partner, say w100 , are now single. With positive probability f20 is activated.
Since no profitable match exists, he lowers his aspiration level by δ. (This
occurs because all firms on any maximal alternating path starting at f1 have
an aspiration level at least δ.) With positive probability, f20 is activated again,
snags w10 , and with positive probability w10 receives the residual δ. Finally, with
positive probability f10 is activated. Since no profitable match exists, he lowers
his aspiration level by δ. If previously matched, f10 is activated again in the
next period and matches with the single w100 (note that there is no additional
surplus to be split). At the end of this sequence the matching is the same
as at the beginning. Moreover, w10 ’s aspiration level went up by δ while f20 ’s
aspiration level went down by δ and all other aspiration levels stayed the same.
The originally loose edge between f10 and w10 is now tight.
We iterate the latter construction for f10 = f1 until all loose edges at f10 have
been eliminated. However, given f20 ’s reduction by δ there may be new loose
edges connecting f20 to workers (possibly on several alternating paths). In
this case we repeat the preceding construction for f20 until all of the loose
edges at f20 have been eliminated. If any agents still exist with loose edges
we repeat the construction again. This iteration eventually terminates given
the following observation. Any worker on a maximal alternating path who
previously increased his aspiration level cannot still be connected to a firm by
a loose edge. Similarly, any firm that previously reduced its aspiration level
cannot now be matched to a worker with a loose edge because such a worker
increased his aspiration level. Therefore the preceding construction involves
any given firm (or worker) at most once. It follows that, in a finite number of
periods, all firms on any maximal alternating paths starting at f1 have reduced
116
their aspiration level by δ and all workers have increased their aspiration level
by δ. (Again, note that it is necessary to use this construction on all maximal
alternating paths starting at f1 .)
Note that the δ-reductions may lead to new tight edges, resulting in new
maximal alternating paths of odd or even lengths.
f1' f2'
df' df ' −δ
1 2
w1'' w1'
Case 2b. At least one firm on a maximal alternating paths starting at f1 has
aspiration level zero.
117
Let P = (f1 , w1 , f2 , w2 , ..., wk−1 , fk ) be a maximal alternating path such that
a firm has aspiration level zero. There exists a firm fi ∈ P with current
aspiration level zero (f2 in the illustration), hence no further reduction by
fi can occur. (If multiple firms on P have aspiration level zero, let fi be
the first such firm on the path.) With positive probability f1 is activated.
Since no profitable match exists, he lowers his aspiration level by δ. With
positive probability, f1 is activated again next period, he snags w1 and with
positive probability he receives the residual δ. Now f2 is single. With positive
probability f2 is activated, lowers, snags w2 , and so forth. This sequence
continues until fi is reached, who is now single with aspiration level zero.
In summary, the number of single agents with a positive aspiration level has
decreased. The aspiration levels did not change, hence they are still good.
… …
dw dw dw dw
1 2 1 2
w1 w2 w1 w2
Let us summarize the argument. Starting in a state [A, d] with good aspira-
tion levels d, we successively (if any exist) eliminate the odd paths starting
at firms/workers followed by the even paths starting at firms/workers, while
maintaining good aspiration levels. This process must come to an end because
at each iteration either the sum of aspiration levels decreases by δ and the
number of single agents with positive aspiration levels stays fixed, or the sum
118
of aspiration levels stays fixed and the number of single agents with positive
aspiration levels decreases. The resulting state must be in the core and is ab-
sorbing because single agents cannot reduce their aspiration level further and
no new matches can be formed. Since an aspiration level constitutes a lower
bound on a player’s bids we can conclude that the process Z t is absorbed into
the core in finite time with probability 1. Finally note that, starting from
d0 = 0 we can trivially reach any state in C0 .
119
k ·(1−) for all k ∈ N0 .17 Note that for = 0 the process is unperturbed.
The immediate result of a given shock is that players receive a different payoff
than anticipated. We shall assume that players update their aspiration levels
to their new perturbed payoff if positive and zero if negative. If, in a given
match, one of the players experiences a negative payoff the match breaks and
both players become single. Note that if the partnership remains matched the
price does not change.
120
a path of one-period transitions from Z to Z 0 . Then a least cost transition
minimizes k−1
P
l=0 r(Zl , Zl+1 ) over all such paths. For a core state Z ∈ C we shall
say that a transition out of the core is a least cost transition if it minimizes
the resistance among all transitions from Z to any non-core state.
Young, 1993 shows that the computation of the stochastically stable states
can be reduced to an analysis of rooted trees on the set of recurrent classes of
the unperturbed dynamic. Define the resistance between two recurrent classes
Z and Z 0 , r(Z, Z 0 ) to be the sum of resistances of a least cost transition that
starts in Z and ends in Z 0 . Now identify the recurrent classes with the nodes
of a graph. Given a node Z, a collection of directed edges T forms a Z-tree if
from every node Z 0 6= Z there exists a unique outgoing edge in T , Z has no
outgoing edge, and the graph has no cycles.
Theorem 4 in Young, 1993 states that the stochastically stable states are pre-
cisely those states where ρ is minimized.
4.6.2 Analysis
With this machinery at hand we shall show that the stochastically stable states
are contained in the least core. To establish this result we shall adapt a proof
technique due to Newton and Sawa, 2013 and show that the least core is the
set of states which is most stable against one-shot deviations. We shall also
provide conditions on the game under which the stochastically stable set is
identical with the least core.
121
Recall that the least core consists of states that maximize the following term:
Case A holds if the minimal cost deviation is such that two players who are
currently not matched experience shocks such that they become profitable.
Case B, on the other hand, is the case where the minimal cost deviation is
such that a matched agent experiences a shock that leads to a negative payoff
and thus to him breaking up his relationship.
X
D(Z, Z ∗ ) = |φi − φ∗i |. (4.22)
i∈F ∪W
Case A. Suppose that the least-cost transition to a non-core state is such that
two (currently not matched) players experience trembles such that their match
becomes profitable. That is, there exists i, matched to j, and a nonempty
set J 0 such that i, j 0 is least costly to destabilize for any j 0 ∈ J 0 . Note that
di +dj 0 −αij 0 is minimal for all j 0 ∈ J 0 and thus constant and also the difference
is non-negative since we are in a core state.
122
Case A.1. di > d∗i .
Now we shall explain the sequence in detail. Suppose the tremble occurs such
that i reduces his aspiration level by at least δ and i and j 0 match at a price
such that i’s aspiration level does not increase. Consequently j and i0 (j 0 ’s
former partner, if he is matched in the core assignment) are now single. In
the following period i and j are profitable. With positive probability, they
match at a price such that di decreases by δ. Now i0 and j 0 are both single.
With positive probability they reduce their aspiration levels and rematch at
their previous price, returning to their original aspiration levels. Thus, with
positive probability the prices are set such that di decreases by δ, dj increases
by δ, and all other aspiration levels do not change. Hence D decreased and
given the earlier observation the resulting state is again in the core, since now
for all j 0 , di + dj 0 ≥ αij 0 and all other inequalities still hold.
123
costly to destabilize, we would have Z ∈ L. But then j 0 must be matched in
the core assignment and we have for j 0 ’s partner i0 that di0 > d∗i0 . Hence an
aspiration level reduction by δ by i0 and a δ-increase by j 0 and all j 00 for whom
di0 + dj 00 = αi0 j 00 yields a reduction in D and leads to a core state.
We have dj > d∗j and again a similar argument applies. An aspiration level
reduction by δ by j and a δ-increase by i and all i0 for whom di0 + dj = αi0 j
yields a reduction in D and leads to a core state.
Case B. Suppose that the least cost deviation to a non-core state is such that
one player experiences a shock and therefore wishes to break up. That is, there
exists i such that di is least costly to destabilize.
Now we shall explain the sequence in detail. Suppose the tremble occurs such
that i turns single. Consequently j is now single too and, given that we are in
a core state (i0 , j) is not profitable for any i0 6= i. Therefore if j encounters any
i0 6= i he will reduce his aspiration level. Now i can rematch with his optimal
match j at a new price such that i can increase his aspiration level by δ while
dj decreases his by δ. (Note that, for the latter transition it is crucial that any
matched couple has match value at least δ.) Hence D decreased and given the
earlier observation the resulting state is again in the core, since now for all i0 ,
di0 + dj ≥ αi0 j and all other inequalities still hold.
124
Case B.2. There exists I 0 6= ∅ and i ∈
/ I 0 such that for all i0 ∈ I 0 , di0 + dj = αi0 j .
Similar to case B.1 we can construct a sequence such that i increased his
aspiration level by δ, j reduced his by δ, and all i0 ∈ I 0 increased their aspiration
level by δ (which will only reduce D further). The resulting state is in the
core.
Z ∗ → Z1 → Z2 → . . . → Zk = Z ∗∗ (4.23)
We can formulate natural conditions under which the stochastically stable set
coincides with the least core:
125
Well-connected. An assignment game is well-connected if for any non-core
state and for any player i ∈ F × W there exists a sequence of transitions in
the unperturbed process such that i is single at its end.
Rich. An assignment game with match values α is rich if for every player
i ∈ F there exists a player j ∈ W such that (i, j) is never profitable, that is
αij = 0.
126
By richness il and jl reduce their aspiration levels to zero with positive proba-
bility. Suppose that, by well-connectedness, il can match with jl+1 and there-
fore make il+1 single. Suppose that the price is set such that il keeps aspiration
level 0. Then in the next period il and jl can match at any price. Suppose
they match such that di ≤ d∗∗ ∗∗
i and dj ≤ dj . (This can actually occur since
otherwise the aspiration level vector d∗∗ would not be pairwise stable, contra-
dicting that Z ∗∗ is a core state. Also note that (il , jl ) are profitable since both
have aspiration level zero and αil jl > 0 given they are on a well-connected
sequence.) Now il+1 and jl+1 are both single. Thus we can apply the same
argument for any two subsequent couples in the sequence to conclude that any
couple in the sequence can rematch at any aspiration levels such that for all i
along the sequence, di ≤ d∗∗
i .
Next we have to show how Z ∗∗ is reached from the latter state. Successively
match all (i, j) who are matched in Z ∗∗ , who are not matched yet, and for whom
di + dj < αij (they are profitable) at a price such that their new aspiration
levels are d∗∗ ∗∗
i , dj (note that it is here that we need the fact that the core
matching is unique). This leads to a state where aspiration levels are at d∗∗ .
Note that these aspiration levels are good. Further note that a reduction of
the sum of aspiration levels will lead to a state which is not good. Cases 1a,b
and 2b of the proof of Claim 2 of Theorem 1 can now be applied iteratively
(Case 2a can not hold, given that otherwise aspiration levels will no longer be
good). These cases do not change the aspiration levels but only the matchings.
Hence eventually the desired core state Z ∗∗ is going to be reached.
127
We showed that once the process is in a non-core state any core state can be
reached. Hence the analysis of stochastic stability reduces to the resistance of
exiting a core state. But this resistance is uniquely maximized by the states
in the least core which thus coincides with the set of stochastically stable
states.
4.6.3 Example
We shall illustrate the predictive power of our result for the 3 × 3 game studied
by Shapley and Shubik, 1972. Let three sellers (w1 , w2 , w3 ) and three buyers
(f1 , f2 , f3 ) trade houses. Their valuations are as follows:
These prices lead to the following match values, αij (units of 1,000), where
sellers are occupying rows and buyers columns:
5 8 2
α=
7 9 6
(4.25)
2 3 0
The unique optimal matching is shown in bold numbers. Shapley and Shu-
bik, 1972 note that it suffices to consider the 3-dimensional imputation space
spanned by the equations dw1 + df2 = 8, dw2 + df3 = 6, dw3 + df1 = 2. Figure
4.3 shows the possible core allocations.
We shall now consider the least core, L. Note that the particular states in
L depend on the step size δ. We shall consider δ → 0 to best illustrate the
core selection. By an easy calculation one finds that the states which are least
vulnerable to one-period deviations are such that
128
The minimal excess in the least core is emin = 1/3. The bold line in figure
4.4 shows the set L. The nucleolus, dw1 = 4, dw2 = 17/3, dw3 = 1/3, is
indicated by a cross. (One can verify, that here the kernel coincides with the
nucleolus.)
4.7 Conclusion
In this paper we have shown that agents in large decentralized matching mar-
kets can learn to play equitable core outcomes through simple trial-and-error
learning rules. We assume that agents have no information about the distri-
bution of others’ preferences, their past actions and payoffs, or the value of
different matches. The unperturbed process leads to the core with probability
one but no authority ‘solves’ an optimization problem. Rather, a path into the
core is discovered in finite time by a random sequence of adjustments by the
agents themselves. This result is similar in spirit to that of Chen, Fujishige,
and Yang, 2011, but in addition our process selects equitable outcomes within
the core. In particular, the stochastically stable states of the perturbed process
are contained in the least core, a subset of the core that generalizes the nucle-
olus for assignment games. This result complements the stochastic stability
analysis of Newton and Sawa, 2013 in ordinal matching and of Newton, 2012
in more general coalitional games. It is an open problem to extend the analysis
to more general classes of cooperative games and matching markets.
129
References
130
Demange, G., D. Gale, and M. Sotomayor (1986). “Multi-item auctions”. In:
Journal of Political Economy 94, pp. 863–872.
Diamantoudi, E., L. Xue, and E. Miyagawa (2004). “Random paths to stability
in the roommate problem”. In: Games and Economic Behavior 48.1, pp. 18–
28.
Driessen, T. S. H. (1998). “A note on the inclusion of the kernel in the core for
the bilateral assignment game”. In: International Journal of Game Theory
27.2, pp. 301–303.
– (1999). “Pairwise-Bargained consistency and game theory: the case of a two-
sided firm”. In: Fields Institute Communications 23, pp. 65–82.
Estes, W. (1950). “Towards a statistical theory of learning”. In: Psychological
Review 57, pp. 94–107.
Foster, D. and H. P. Young (1990). “Stochastic evolutionary game dynamics”.
In: Theoretical Population Biology 38, pp. 219–232.
Foster, D. P. and H. P. Young (2006). “Learning to play Nash equilibrium with-
out knowing you have an opponent”. In: Theoretical Economics 1, pp. 341–
367.
Gale, D. and L. S. Shapley (1962). “College admissions and stability of mar-
riage”. In: American Mathematical Monthly 69, pp. 9–15.
Germano, F. and G. Lugosi (2007). “Global Nash convergence of Foster and
Young’s regret testing”. In: Games and Economic Behavior 60, pp. 135–154.
Hart, S. and A. Mas-Colell (2003). “Uncoupled Dynamics Do Not Lead to
Nash Equilibrium”. In: American Economic Review 93, pp. 1830–1836.
– (2006). “Stochastic uncoupled dynamics and Nash equilibrium”. In: Games
and Economic Behavior 57, pp. 286–303.
Heckhausen, H. (1955). “Motivationsanalyse der Anspruchsniveau-Setzung”.
In: Psychologische Forschung 25.2, pp. 118–154.
131
Herrnstein, R. J. (1961). “Relative and absolute strength of response as a func-
tion of frequency of reinforcement”. In: Journal of Experimental Analysis of
Behavior 4, pp. 267–272.
Hoppe, F. (1931). “Erfolg und Mißerfolg”. In: Psychologische Forschung 14,
pp. 1–62.
Huberman, G. (1980). “The nucleolus and the essential coalitions”. In: Analysis
and Optimization of Systems, Lecture Notes in Control and Information
Systems 28, pp. 417–422.
Inarra, E., C. Larrea, and E. Molis (2008). “Random paths to P-stability in the
roommate problem”. In: International Journal of Game Theory 36, pp. 461–
471.
– (2013). “Absorbing sets in roommate problem”. In: Games and Economic
Behavior 81, pp. 165–178.
Jackson, M. O. and A. Watts (2002). “The evolution of social and economic
networks”. In: Journal of Economic Theory 106, pp. 265–295.
Kandori, M., G. J. Mailath, and R. Rob (1993). “Learning, mutation, and long
run equilibria in games”. In: Econometrica 61.1, pp. 29–56.
Kelso, A. S. and V. P. Crawford (1982). “Job Matching, Coalition Formation,
and Gross Substitutes”. In: Econometrica 50, pp. 1483–1504.
Klaus, B. and F. Klijn (2007). “Paths to stability for matching markets with
couples”. In: Games and Economic Behavior 58.1, pp. 154–171.
Klaus, B., F. Klijn, and M. Walzl (2010). “Stochastic stability for roommates
markets”. In: Journal of Economic Theory 145, pp. 2218–2240.
Klaus, B. and F. Payot (2013). Paths to Stability in the Assignment Problem.
working paper.
Kojima, F. and M. U. Uenver (2008). “Random paths to pairwise stability
in many-to-many matching problems: a study on market equilibrium”. In:
International Journal of Game Theory 36, pp. 473–488.
132
Kuhn, H. W. (1955). “The Hungarian Method for the assignment problem”.
In: Naval Research Logistic Quarterly 2, pp. 83–97.
Llerena, F. and M. Nunez (2011). “A geometric characterization of the nucle-
olus of the assignment game”. In: Economics Bulletin 31.4, pp. 3275–3285.
Llerena, F., M. Nunez, and C. Rafels (2012). An axiomatization of the nucleolus
of the assignment game. Working Papers in Economics 286, Universitat de
Barcelona.
Marden, J. R. et al. (2009). “Payoff-based dynamics for multi-player weakly
acyclic games”. In: SIAM Journal on Control and Optimization 48, pp. 373–
396.
Maschler, M., B. Peleg, and L. S. Shapley (1979). “Geometric properties of
the kernel, nucleolus, and related solution concepts”. In: Mathematics of
Operations Research 4.4, pp. 303–338.
Nash, J. (1950). “The bargaining problem”. In: Econometrica 18, pp. 155–162.
Newton, J. (2010). “Non-cooperative convergence to the core in Nash demand
games without random errors or convexity assumptions”. PhD thesis. Uni-
versity of Cambridge.
– (2012). “Recontracting and stochastic stability in cooperative games”. In:
Journal of Economic Theory 147.1, pp. 364–381.
Newton, J. and R. Sawa (2013). A one-shot deviation principle for stability in
matching markets. Economics Working Papers 2013-09, University of Syd-
ney, School of Economics.
Nunez, M. (2004). “A note on the nucleolus and the kernel of the assignment
game”. In: International Journal of Game Theory 33.1, pp. 55–65.
Pradelski, B. S. R. (2014). Evolutionary dynamics and fast convergence in the
assignment game. Department of Economics Discussion Paper Series 700,
University of Oxford.
133
Pradelski, B. S. R. and H. P. Young (2012). “Learning Efficient Nash Equilibria
in Distributed Systems”. In: Games and Economic Behavior 75, pp. 882–
897.
Rochford, S. C. (1984). “Symmetrically pairwise-bargained allocations in an
assignment market”. In: Journal of Economic Theory 34.2, pp. 262–281.
Roth, A. E. and I. Erev (1995). “Learning in Extensive-Form Games: Exper-
imental Data and Simple Dynamic Models in the Intermediate Term”. In:
Games and Economic Behavior 8, pp. 164–212.
Roth, A. E. and M. Sotomayor (1992). “Two-sided matching”. In: Handbook
of Game Theory with Economic Applications 1, pp. 485–541.
Roth, A. E. and H. Vande Vate (1990). “Random Paths to Stability in Two-
Sided Matching”. In: Econometrica 58, pp. 1475–1480.
Rozen, K. (2013). “Conflict Leads to Cooperation in Nash Bargaining”. In:
Journal of Economic Behavior and Organization 87, pp. 35–42.
Sauermann, H. and R. Selten (1962). “Anspruchsanpassungstheorie der Un-
ternehmung”. In: Zeitschrift fuer die gesamte Staatswissenschaft 118, pp. 577–
597.
Sawa, R. (2011). Coalitional stochastic stability in games, networks and mar-
kets. Working Paper, Department of Economics, University of Wisconsin-
Madison.
Schmeidler, D. (1969). “The nucleolus of a characteristic function game”. In:
SIAM Journal of Applied Mathematics 17, pp. 1163–1170.
Selten, R. (1998). “Aspiration Adaptation Theory”. In: Journal of Mathemat-
ical Psychology 42.2-3, pp. 191–214.
Selten, R. and R. Stoecker (1986). “End Behavior in Sequences of Finite Pris-
oner’s Dilemma Supergames: A Learning Theory Approach”. In: Journal of
Economic Behavior and Organization 7, pp. 47–70.
134
Shapley, L. S. and M. Shubik (1963). “The core of an economy with nonconvex
preferences”. In: The Rand Corporation 3518.
– (1966). “Quasi-cores in a monetary economy with nonconvex preferences”.
In: Econometrica 34.4.
– (1972). “The Assignment Game 1: The Core”. In: International Journal of
Game Theory 1.1, pp. 111–130.
Solymosi, T. and T. E. S. Raghavan (1994). “An algorithm for finding the
nucleolus of assignment games”. In: International Journal of Game Theory
23, pp. 119–143.
Sotomayor, M. (2003). “Some further remarks on the core structure of the
assignment game”. In: Mathematical Social Sciences 46, pp. 261–265.
Thorndike, E. (1898). “Animal Intelligence: An Experimental Study of the
Associative Processes in Animals”. In: Psychological Review 8, pp. 1874–
1949.
Tietz, A. and H. J. Weber (1972). “On the nature of the bargaining process
in the Kresko-game”. In: H. Sauermann (Ed.) Contribution to experimental
economics 7, pp. 305–334.
Young, H. P. (1993). “The Evolution of Conventions”. In: Econometrica 61.1,
pp. 57–84.
– (2009). “Learning by trial and error”. In: Games and Economic Behavior
65, pp. 626–643.
135
Figure 4.1: Transition diagram for active, matched agent (period t + 1).
meets meets no
profitable match profitable match
Figure 4.2: Transition diagram for active, single agent (period t + 1).
meets meets no
profitable match profitable match
136
Figure 4.3: Imputation space for the sellers.
6 u2
u3 2
2
1
0 0
0 2 4 6 8
u1
u3 1 6 u2
17/ 3
1/ 3
0 5
3 11/ 3 13/ 3 5
u1
137
Chapter 5
Complex cooperation:
Agreements with multiple
spheres of cooperation
138
Abstract
and Lucas and Thrall (1963) is proposed to allow for multiple member-
ship. The definition of the core is adapted analogously and the possi-
and discussed.
139
Acknowledgements. Nax thanks Peyton Young, Françoise Forges,
man workshop and at the LSE Choice Group for helpful comments
324247).
140
5.1 Introduction
The coalitional game as defined by Von Neumann and Morgenstern,
act with one another (e.g., a country might belong to both the United
Nations and the European Union)” (Maskin, 2003). (See Haas, 1980;
141
viewpoint may turn out to be destabilized (stabilized) by the multi-
definitions of the core (Gillies, 1959; Shapley, 1952) of the von Neumann-
the Lucas-Thrall game (Thrall and Lucas, 1963) in Hafalir, 2007, us-
dareva, 1963; Shapley, 1967). To achieve this, the set of feasible devi-
our lead example but some of our later examples are borrowed and gen-
eralized from theirs. See also Tijs and Brânzei, 2003 on the additivity
and 4, we introduce the general game, define its core, and illustrate the
with mergers and spillovers. (See, for example, Bloch, 1996; Ray and
142
competition games in this spirit.)
sf = {s1f , . . . , sm k
f }, where each sf is a real number representing firm
Fixed costs of merger. xkS (M), the fixed cost of merging S in market
0 if |S| = 1
xkS (M) = κ if |S| > 1 and there exists k 0 6= k: S ∈ ρk0
λ if |S| > 1 and there does not exist k 0 =
6 k: Sıρk0
k, the firms in S select the lowest marginal cost firm to be the only active
ckS = min{skf }f ∈S .
the marginal cost of the lowest marginal cost firm amongst C in market
0 0
k 0 . For all C such that C ∩ S = ∅, ckC = ckC . For all C such that
143
C ∩ S 6= ∅, given some α ∈ (0, 1),
0 0 0 0
ckC = min{ckC ; α × ckS + (1 − α) × ckC }.
The motivation for this marginal cost effect across markets is that firms
others’ production technologies and thus also improve (to some extent)
main unmerged.
X
pk = 1 − Qk where Qk ≡ qfk .
f ∈N
three different externality effects on the other firms in the same market
the resulting quantity and price competition will change in that market,
since the merged firms will be represented by the firm with the lowest
will also change in the markets where the merger did not occur because
144
same merger were to occur in more than one market, the fixed costs of
in not only that same market because both the technology spillovers
and the fixed cost effects may additionally influence the optimization
erative game could not make these effects explicit. We shall illustrate
5/9, s1f2 = 5/9, s2f2 = 4/9 (firm f1 is specialized in market 1 and firm
in none, one, or both of the markets yields four cases: the no merger
case, two one merger cases, and the full merger case. The equilibrium
profits of these four cases are obtained by solving for the firms’ profit-
(1), (2) means “no merger” in the underlying market and writing (1, 2)
means “merger”.)
cross effect is negative on the strong firm in market 2 (profits fall from
145
4 to 4 − α × 1.2), and positive on the weak firm (profits increase from 1
merging market due to the high direct costs of merger, a partial view
of one market suggests that merger is not in the firms’ interests. When
effects of mergers are internalized. Since the cross effects are net pos-
itive if α > 0.45, these effects would already render a single merger
worthwhile overall.
When no merger takes place, each firm’s profits from both markets
are 4 + 1 = 5 and the total profits are 10. When one merger takes
place, the firms can agree on sharing the total payoffs of (5 + α × 0.5) +
4.75 = 9.75 + α × 0.5. Under full merger, contracts can share the total
the total profits efficiently, paying each player at least 5 (which are the
profits that each firm can guarantee itself from no merger). Whether
Let P(N ) be the set of partitions of N and P(S) the set of partitions
146
{ρ1 , . . . , ρm }, and MS = {ρ1 (S), . . . , ρm (S)} for a subsociety consisting
layer (when no multiple membership exists). With only one layer, the
5.3.1 Externalities
ities over different parts of the game. This allows to model interesting
situations like the above Cournot model: merger in one market has
both positive and negative effects on the other firms and on the other
markets.
player Cournot game, for example, where one firm’s payoff varies with
147
externality.
The other “cross” externality stems from the effects of the formation of
and two in one market affects the profits of firm three in another.
ρ0k while being identical w.r.t. the coalitions that all members of
148
S ∈ ρ0k ) with the same S in both), and
tions ρki and ρ0ki are identical w.r.t. the coalitions that all members
are needed (Von Neumann and Morgenstern, 1944; Aumann, 1967). For
der and Tulkens, 1997; Hart and Kurz, 1983; De Clippel and Serrano,
sults for several of these cores.). Suppose the partition was ρ before
the existing conjecture rules that have been proposed in PFG environ-
ments (see Bloch and Nouweland, 2014 for a detailed classification and
an axiomatic analysis):
149
1. Max rule (Bloch and Nouweland, 2014): (N \ S), taking ρ(S) as
total worth
ing (C 0 \ S)
Note that conjecture rules 1–3 depend on ρ(S) and on the underlying
PFG, but not on the original partition ρ. Rules 4–5 depend only on S.
for example, starting with the grand coalition in some layer, each S ⊂ N
may deviate in many ways: in some or all of the layers, forming different
players may deviate in one layer but continue to form the grand coali-
150
tion in another layer. When cross externalities are present, however,
the worth of coalitions vary with the coalition constellations across lay-
ers and deviators need to endogenize the cross external effects of their
separable.
optimal subsociety M
dS such that, given conjecture Z,
X X X X
\
vk (C; M S (N )) = max vk (C; {MS , Z((N \S); MS )}).
MS ∈P(S)m
k∈K C∈ρk (S) k∈K C∈ρk (S)
151
z, summarizes the conjectured worth for all coalitions: given Z,
X X
z(S) = \
vk (C; M S (N ))
k∈K C∈ρ\
k (S)
Note that z filters the information in the MMG to obtain a CFG view
of deviating demands.
5.3.3 Superadditivity
one agent is able to take free ride on the coalition formed by others,
for example, the grand coalition may no longer be the efficient coalition
of coalition formation, which takes into account all external and di-
rect effects, is positive for those that come together to cooperate even
layers.
cases when α > 0.45, for instance, is superadditive because the total
profits of the firms rise with every further merger: the no merger case
has total payoffs of 10, compared with the 10 + (α − 0.45) × 0.5 of both
one merger cases, and compared with the 10.5 of full merger. The below
152
definition of MMG superadditivity embeds definitions of superadditivity
for CFGs and PFGs and implies efficiency of forming the grand coalition
in all layers.
X X X X
vk (C; M) ≥ vk (C; M0 ).
k∈K C∈ρk k∈K C∈ρ0k
peradditivity when there is only one layer has also been defined as full
ditivity does not imply the efficiency of the grand coalition.) Note that
gains. To assess its stability, we will use the conjectured worth function.
153
obtained in all layers. Consequently, for some S ⊆ N , x(S) is a vector
consider full merger with contract x = (5.25, 5.25), paying both firms
deviating. In fact, any split of full merger paying each firm at least his
154
5.4.1 Core Stability
X X
ζ(G(v, K, N ); Z) = {x ∈ Rn ; xf ≤ z(N ) and xf ≥ z(S) ∀ (S ⊆ N )}.
f ∈N f ∈S
Theorem. The Z-core of G(v, K, N ) is nonempty if, and only if, its
theorem via the conjectured worth function in our setup (see Bondareva,
nonempty.
allocation of forming the grand coalition in that layer exists if, and
155
balanced, the Z-core of G(v, k, N ) is, therefore forcedly, nonempty
nal variations that may still exist and affect them. The need to
core of forming the grand coalition in any layer of the MMG may
always beneficial.
156
negative cross externality of coalition formation in one layer on
v1 (2)) + v2 (N ) > 2 = 1 + 1 = v1 (N ) + v2 (N ).
all layers have empty cores, the core of an MMG may be nonempty
157
vk (1; {ρ1 , ρ2 }) = 2 ∀i if ρ1 = ρ2 = {(1), (2, 3)} and vk (C; M ) = 0
otherwise.
being the singleton in both layers, e.g., x = (2/3, 2/3, 2/3) is such
example, v 0 with vk0 (N ; {{N }, {N }}) = 1 for all k, vk0 (1; {ρ1 , ρ2 }) =
decision of one set of agents has payoff consequences for another set
existence of the first externality type (Hafalir, 2007) and of the sec-
how the two externality types may interact with coalitional incentives to
deviate. Moreover, our model highlights one crucial issue with defining
158
the core in the presence of multiple membership externalities, namely
not expect to form coalitions in any of the layers with any of the play-
this note, and we aim to relax this assumption in future work, likely in
159
References
160
Gillies, Donald B (1959). “Solutions to general non-zero-sum games”.
In: Contributions to the Theory of Games. Ed. by Albert William
Tucker and Robert Duncan Luce. Vol. 4. 40. Princeton, NJ:
Princeton University Press, pp. 47–85.
Haas, Ernst B. (1980). “Why Collaborate? Issue-Linkage and In-
ternational Regimes”. In: World Politics 32.3, pp. 357–405.
Hafalir, Isa E (2007). “Efficiency in coalition games with external-
ities”. In: Games and Economic Behavior 61.2, pp. 242–258.
Hart, Sergiu and Mordecai Kurz (1983). “Endogenous Formation
of Coalitions”. In: Econometrica 51.4, pp. 1047–1064.
Le Breton, Michel et al. (2013). “Stability and fairness in models
with a multiple membership”. In: International Journal of Game
Theory 42.3, pp. 673–694.
Maskin, Eric (2003). “Bargaining, coalitions and externalities”. In:
Presidential Address to the Econometric Society. Princeton, NJ:
Institute for Advanced Study.
Ray, Debraj and Rajiv Vohra (1999). “A Theory of Endogenous
Coalition Structures”. In: Games and Economic Behavior 26.2,
pp. 286–336.
Shapley, Lloyd S (1952). “Notes on the n-person game, III: Some
variants of the von Neumann-Morgenstern definition of solu-
tion”. In: Rand Memorandum. Santa Monica, CA: Rand Cor-
poration.
– (1953). “A value for n-person games”. In: Contributions to the
Theory of Games. Ed. by Harold W Kuhn and Albert W Tucker.
Vol. 2. Princeton, NJ: Princeton University Press, pp. 307–317.
– (1967). “On balanced sets and cores”. In: Naval Research Logis-
tics 14.4, pp. 453–460.
161
Shenoy, Prakash P (1979). “On coalition formation: a game-theoretical
approach”. In: International Journal of Game Theory 8.3, pp. 133–
164.
Thrall, Robert M and William F Lucas (1963). “N-person games
in partition function form”. In: Naval Research Logistics 10.1,
pp. 281–298.
Tijs, Stef and Rodica Brânzei (2003). “Additive stable solutions on
perfect cones of cooperative games”. In: International Journal of
Game Theory 31.3, pp. 469–474.
Von Neumann, John and Oskar Morgenstern (1944). Theory of
Games and Economic Behavior. Princeton, NJ: Princeton Uni-
versity Press.
162
Table 5.1: Numerical illustration.
163
Chapter 6
Dynamics of financial
expectations:
Super-exponential growth
expectations and crises
164
Abstract
500 options data over the decade 2003 to 2013, separable into pre-crisis,
to Treasury Bill yields in the pre-crisis regime with a lag of a few days,
and the other way round during the post-crisis regime with much longer
lags (50 to 200 days). This suggests a transition from an abnormal regime
165
6.1 Introduction
166
which is in agreement with the timeline given by the Federal Reserve Bank of
St. Louis, 2009.2
The resulting pre-crisis, crisis and post-crisis regimes differ from each other
in several important aspects. First, during the pre-crisis period, but not in
the crisis and post-crisis periods, we identify a continuing increase of S&P 500
expected returns. This corresponds to super-exponential growth expectations
of the price. By contrast, regular expectation regimes prevail in the crisis
and post-crisis periods. Second, the difference between realized and option-
implied returns remains roughly constant prior to the crisis but diverges in
the post-crisis phase. This phenomenon may be interpreted as an increase of
the representative investor’s risk aversion. Third, Granger-causality tests show
that changes of option-implied returns Granger-cause changes of Treasury Bill
yields with a lag of few days in the pre-crisis period, while the reverse is true
at lags of 50 to 200 days in the post-crisis period. This role reversal suggests
that Fed policy was responding to, rather than leading, the financial market
development during the pre-crisis period, but that the economy returned to a
“new normal” regime post-crisis.
The majority of related option market studies have used option data for
the evaluation of risk. An early contribution to this strand of work is Aı̈t-
Sahalia and Lo, 2000 who proposed a nonparametric risk management ap-
proach based on a value at risk computation with option-implied state-price
densities. Another popular measure of option-implied volatility is the Volatil-
ity Index (VIX), which is constructed out of options on the S&P 500 stock
index and is meant to represent the market’s expectation of stock market
volatility over the next 30 days (Exchange, 2009). Bollerslev and Todorov,
2011 extended the VIX framework to an “investor fears index” by estimating
2
See section 6.3.2 for more details on market and policy events marking the Global
Financial Crisis of 2008.
167
jump tail risk for the left and right tail separately. Bali, Cakici, and Chabi-Yo,
2011 define a general option-implied measure of riskiness taking into account
an investor’s utility and wealth leading to asset allocation implications. What
sets our work apart is the focus on identifying the long and often slow build-
up of risk during an irrationally exuberant market that typically precedes a
crisis.
Inverting the same logic, scholars have used option price data to estimate
the risk attitude of the representative investor as well as its changes. These
studies, however, typically impose stationarity in one way or another. Jackw-
erth, 2000, for example, empirically derives risk aversion functions from option
prices and realized returns on the S&P 500 index around the crash of 1987 by
assuming a constant return probability distribution. In a similar way, Rosen-
berg and Engle, 2002 analyze the S&P 500 over four years in the early 1990s
by fitting a stochastic volatility model with constant parameters. Bliss and
Panigirtzoglou, 2004, working with data for the FTSE 100 and S&P 500, pro-
pose another approach that assumes stationarity in the risk aversion functions.
Whereas imposing stationarity is already questionable in “normal” times, it is
certainly hard to justify for a time period covering markedly different regimes
as around the Global Financial Crisis of 2008. We therefore proceed differ-
ently and merely relate return expectations implicit in option prices to market
developments, in particular to the S&P 500 stock index and yields on Trea-
sury Bills. We use the resulting data trends explicitly to identify the pre-crisis
exuberance in the trends of market expectations and to make comparative
statements about changing risk attitudes in the market.
The importance of market expectation trends has not escaped the attention
of many researchers who focus on ‘bubbles’ (Galbraith, 2009; Sornette, 2003;
Shiller, 2005; Soros, 2009; Kindleberger and Aliber, 2011). One of us summa-
168
rizes their role as follows: “In a given financial bubble, it is the expectation of
future earnings rather than present economic reality that motivates the aver-
age investor. History provides many examples of bubbles driven by unrealistic
expectations of future earnings followed by crashes” (Sornette, 2014). While
there is an enormous econometric literature on attempts to test whether a mar-
ket is in a bubble or not, to our knowledge our approach is the first trying to
do so by measuring and evaluating the market’s expectations directly.3
This paper is structured as follows. Section 2 details the estimation of the risk-
neutral return probability distributions, the identification of regime change
points, and the causality tests regarding market returns and expectations.
Section 3 summarizes our findings, in particular the evidence concerning pre-
crisis growth of expected returns resulting in super-exponential price growth.
Section 4 concludes with a discussion of our findings.
169
K can therefore be expressed as
Z ∞
−rf T −rf T
C0 (K) = e EQ
0 [max(ST − K, 0)] = e (ST − K)f (ST )dST , (6.1)
K
where rf is the risk-free rate and T the time to maturity. From this equation,
we would like to extract the density f (ST ), as it reflects the representative
investor’s expectation of the future price under risk-neutrality. Since all quan-
tities but the density are observable, inverting equation (6.1) for f (ST ) becomes
a numerical task.
Several methods for inverting have been proposed, of which Jackwerth, 2004
provides an excellent review. In this study, we employ a method by Figlewski,
2010 that is essentially model-free and combines standard smoothing tech-
niques in implied-volatility space and a new method of completing the density
with appropriate tails. Tails are added using the theory of Generalized Ex-
treme Value distributions, which are capable of characterizing very different
behaviors of extreme events.4 This method cleverly combines mid-prices of
call and put options by only taking into account data from at-the-money and
out-of-the-money regions, thus recovering non-standard features of risk-neutral
densities such as bimodality, fat tails, and general asymmetry.
Our analysis covers fundamentally different market regimes around the Global
Financial Crisis. A largely nonparametric approach, rather than a parametric
one, seems therefore appropriate, because an important question that we shall
ask is whether and how distributions actually changed from one regime to
the next. We follow Figlewski’s method in most steps, and additionally weight
points by open interest when interpolating in implied-volatility space – a proxy
4
As Birru and Figlewski, 2012 note, the theoretically correct extreme value distribution
class is the Generalized Pareto Distribution (GPD) because estimating beyond the range of
observable strikes corresponds to the peak-over-threshold method. For our purposes, both
approaches are known to lead to equivalent results.
170
of the information content of individual sampling points permitted by our data.
We give a more detailed review of the method in appendix 6.4.
6.2.2 Data
We use end-of-day data for standard European call and put options on the
S&P 500 stock index provided by Stricknet5 for a period from January 1st,
2003 to October 23rd, 2013. The raw data includes bid and ask quotes as
well as open interest across various maturities. For this study, we focus on
option contracts with quarterly expiration dates, which usually fall on the
Saturday following the third Friday in March, June, September and December,
respectively. Closing prices of the index, dividend yields and interest rates of
the 3-month Treasury Bill as a proxy of the risk-free rate are extracted from
Thomson Reuters Datastream.
171
6.2.3 Subperiod classification
As the Global Financial Crisis had a profound and lasting impact on option-
implied quantities, it is informative for the sake of comparison to perform
analyses to subperiods associated with regimes classifiable as pre-crisis, crisis
and post-crisis. Rather than defining the relevant subperiods with historical
dates, we follow an endogenous segmentation approach for identifying changes
in the statistical properties of the risk-neutral densities. Let us assume we
have an ordered sequence of data x1:n = (x1 , x2 , . . . , xn ) of length n, e.g. daily
values of a moment or tail shape parameter of the risk-neutral densities over
n days. A change point occurs if there exists a time 1 ≤ k < n such that
the mean of set {x1 , . . . , xk } is statistically different from the mean of set
{xk+1 , . . . , xn } (Killick, Fearnhead, and Eckley, 2012). As a sequence of data
may also have multiple change points, various frameworks to search for them
have been developed. The binary segmentation algorithm by Scott and Knott,
1974 is arguably the most established detection method of this kind. It starts
by identifying a single change point in a data sequence, proceeds iteratively on
the two segments before and after the detected change and stops if no further
change point is found.
172
formulation of the test statistic in appendix 6.4.7
m
X
Xt = aj Xt−j + εt , (6.2)
j=1
m
X m
X
Xt = bj Xt−j + cj Yt−j + νt , (6.3)
j=1 j=1
173
subperiods as described in section 6.2.3.
6.3 Results
We start by analyzing the moments and tail shape parameters of the option-
implied risk-neutral densities over the whole period (see Figure 6.1). For com-
parability, we rescale the price densities by the S&P 500 index level St , i.e.
assess f (ST /St ) instead of f (ST ).8 In general, we recover similar values to the
ones found by Figlewski, 2010 over the period 1996 to 2008. The annualized
option-implied log-returns of the S&P 500 stock index excluding dividends are
defined as
Z ∞
1 ST
rt = log f (ST )dST . (6.4)
T −t 0 St
They are on average negative with a mean value of −3%, and exhibit strong
fluctuations with a standard deviation of 4%. This surprising finding may be
explained by the impact of the Global Financial Crisis and by risk aversion
of investors as explained below. The annualized second moment, also called
risk-neutral volatility, is on average 20% (standard deviation of 8%). During
the crisis from June 22nd, 2007 to May 4th, 2009, we observe an increase in
risk-neutral volatility to 29 ± 12%.
174
crisis. Birru and Figlewski, 2012 find a similar dynamic using intraday prices
for S&P 500 Index options. For the period from September 2006 until October
2007, they report an average skewness of −1.9 and excess kurtosis of 11.9,
whereas from September to November 2008 these quantities change to −0.7
and 3.5, respectively.
175
European options on the S&P 500 for the period before and after the crash
of October 1987. They observed that the risk-neutral probability of a one-
standard deviation loss is larger after the crash than before, while the reverse
is true for higher-level standard deviation losses. The explanation is that, after
the 1987 crash, option traders realized that large tail risks were incorrectly
priced, and that the volatility smile was born as a result thereafter (Mackenzie,
2008).
The left tail shape parameter ξ with values of 0.03 ± 0.23 is surprisingly small:
a value around zero implies that losses are distributed according to a thin
tail.11 Moreover, with −0.19 ± 0.07, the shape parameter ξ for the right tail is
consistently negative indicating a distribution with compact support, that is,
a finite tail for expected gains.
A striking feature of the time series of the moments and shape parameters is
a change of regime related to the Global Financial Crisis, which is the basis
of our subperiod classification. A change point analysis of the left tail shape
parameter identifies the crisis period as starting from June 22nd, 2007 and
ending in May 4th, 2009. As we obtain similar dates up to a few months
for the change points in risk-neutral volatility, skewness and kurtosis, this
identification is robust and reliable (see Table 6.1 for details). Indeed, the
determination of the beginning of the crisis as June 2007 is in agreement with
the timeline of the build-up of the financial crisis12 (Federal Reserve Bank of
11
When positive, the tail shape parameter ξ is related to the exponent α of the asymptotic
power law tail by α = 1/ξ.
12
(i) S&P’s and Moody’s Investor Services downgraded over 100 bonds backed by second-
lien subprime mortgages on June 1, 2007, (ii) Bear Stearns suspended redemption of its credit
strategy funds on June 7, 2007, (iii) S&P put 612 securities backed by subprime residential
mortgages on credit watch, (iv) Countrywide Financial warned of “difficult conditions” on
July 24, 2007, (v) American Home Mortgage Investment Corporation filed for Chapter 11
bankruptcy protection on July 31, 2007 and (vi) BNP Paribas, France’s largest bank, halted
176
St. Louis, 2009), opening the gates of loss and bankruptcy announcements.
Interestingly, when applying the analysis to option-implied returns instead,
we detect the onset of the crisis only on September 5th, 2008, more than
a year later. This reflects a time lag of the market to fully endogenize the
consequences and implication of the crisis. This is in line with the fact that
most authorities (Federal Reserve, US Treasury, etc.) were downplaying the
nature and severity of the crisis, whose full blown amplitude became apparent
to all only with the Lehmann Brother bankruptcy.
The identification of the end of the crisis in May 2009 is confirmed by the
timing of the surge of actions from the Federal Reserve and the US Treasury
Department to salvage the banks and boost the economy via “quantitative
easing”, first implemented in the first quarter of 2009.13 Another sign of a
change of regime, which can be interpreted as the end of the crisis per se, is
the strong rebound of the US stock market that started in March 2009, thus
ending a strongly bearish regime characterized by a cumulative loss of more
than 60% since its peak in October 2007.
Finally, note that the higher moments and tail shape parameters of the risk-
neutral return densities in the post-crisis period from May 4th, 2009 to October
23, 2013 progressively recovered their pre-crisis levels.
the crash
Apart from the market free fall, which was at its worst in September 2008,
the second most remarkable feature of the time series of option-implied stock
redemptions on three investment funds on Aug. 9, 2007 and so on.
13
On March 18, 2009 the Federal Reserve announced to purchase $750 billion of mortgage-
backed securities and up to $300 billion of longer-term Treasury securities within the subse-
quent year, with other central banks such as the Bank of England taking similar measures.
177
returns shown in kap6/Figures 6.1a and 6.2a is its regular rise in the years
prior to the crisis. For the pre-crisis period from January 2003 to June 2007,
a linear model estimates an average increase in the option-implied return of
about 0.01% per trading day (p-value < 0.001, R2 = 0.82, more details can
be found in Table 6.2). As a matter of fact, this increase is also present
in the realized returns, from January 2003 until October 2007, i.e. over a
slightly longer period, as shown in Figure 6.2a. Note, however, that realized
returns have a less regular behavior than the ones implied by options since the
former are realized whereas the latter are expected under Q. An appropriate
smoothing such as the exponentially weighted moving average is required to
reveal the trend, see Figure 6.2a for more details.
The upward trends of both option-implied and realized returns pre-crisis sig-
nal a transient “super-exponential” behavior of the market price, here of the
S&P500 index. To see this, if the average return r(t) := ln[p(t)/p(t−1)] grows,
say, linearly according to r(t) ≈ r0 + γt as can be approximately observed in
Figure 6.2a from 2003 to 2007, this implies p(t) = p(t−1)er0 +γt , whose solution
2
is p(t) = p(0)er0 t+γt . In absence of the rise of return (γ = 0), this recovers the
standard exponential growth associated with the usual compounding of inter-
178
ests. However, as soon as γ > 0, the price is growing much faster, in this case
2 β
as ∼ et . Any price growth of the form ∼ et with β > 1 is faster than expo-
nential and is thus referred to as “super-exponential.” Consequently, if the rise
of returns is faster than linear, the super-exponential acceleration of the price
is even more pronounced. For instance, Hüsler, Sornette, and Hommes, 2013
t
reported empirical evidence of the super-exponential behaviour p(t) ∼ ee in
controlled lab experiments (which corresponds formally to the limit β → ∞).
Corsi and Sornette, 2014 presented a simple model of positive feedback be-
tween the growth of the financial sector and that of the real economy, which
predicts even faster super-exponential behaviour, termed transient finite-time
singularity (FTS). This dynamics can be captured approximately by the novel
FTS-GARCH, which is found to achieve good fit for bubble regimes (Corsi and
Sornette, 2014). The phenomenon of super-exponential price growth during a
bubble can be accommodated within the framework of a rational expectation
bubble (Blanchard, 1979; Blanchard and Watson, 1982), using for instance
the approach of Johansen, Sornette, and Ledoit, 1999; Johansen, Ledoit, and
Sornette, 2000 (JLS model).14 In a nutshell, these models represent crashes
by jumps, whose expectations yield the crash hazard rate. Consequently, the
condition of no-arbitrage translates into a proportionality between the crash
hazard rate and the instantaneous conditional return: as the return increases,
the crash hazard rate grows and a crash eventually breaks the price unsustain-
able ascension. See Sornette et al., 2013 for a recent review of many of these
models.
179
sustainable regime whose growing return at the same time embodies and feeds
over-optimism and herding through various positive feedback loops. This fea-
ture is precisely what allows the association of these transient super-exponential
regimes with what is usually called a “bubble” (Kaizoji and Sornette, 2009),
an approach that has allowed bubble diagnostics ex-post and ex-ante (see e.g.
Johansen, Sornette, and Ledoit, 1999; Sornette, 2003; Lin and Sornette, 2011;
Sornette and Cauwels, 2014a; Sornette and Cauwels, 2014b).
Realized S&P 500 and option-implied S&P 500 returns exhibit different be-
haviors over time (Figure 6.2a). Note that this difference persists even after
filtering out short-term fluctuations in the realized returns.16 During the pre-
crisis period (from January 2003 to June 2007), the two grow at roughly the
same rate, but the realized returns grow are approximately 8% larger than the
option-implied returns. This difference can be ascribed to the “risk premium”
that investors require to invest in the stock market, given their aggregate risk
aversion.17 This interpretation of the difference between the two return quan-
tities as a risk premium, which one may literally term “realized-minus-implied
risk premium”, is based on the fact that the option-implied return is deter-
mined under the risk-neutral probability measure while the realized return
extremely broad domain of application (Yule, 1925; Simon, 1955; Saichev, Malevergne, and
Sornette, 2009).
16
Realized S&P 500 returns show more rapid fluctuations than option-implied ones, which
is not surprising given that the former are realized whereas the latter are expected (under
Q). In this section we only focus on dynamics on a longer timescale, thus Figure 6.2a
presents realized returns smoothed by an exponential weighted moving average (EWMA) of
daily returns over 750 trading days. Different values or smoothing methods lead to similar
outcomes.
17
To understand variations in the risk premium in relation to the identification of different
price regimes, we cannot rely on many of the important more sophisticated quantitative
methods for derivation of the the risk premium, but refer to the literature discussed in
the introduction. There are many avenues for promising future research to develop hybrid
approaches between these more sophisticated approaches and ours which a priori allows the
premium to vary freely over time.
180
is, by construction, unfolding under the real-world probability measure.18 In
other words, the risk-neutral world is characterized by the assumption that
all investors agree on asset prices just on the basis of fair valuation. In con-
trast, real-world investors are in general risk-adverse and require an additional
premium to accept the risks associated with their investments. During the
crisis, realized returns plunged faster and deeper in negative territory than the
option-implied returns, then recovered faster into positive and growing regimes
post-crisis. Indeed, during the crisis, the realized-minus-implied risk premium
surprisingly became negative.
181
6.3.5 Granger causality between option-implied returns
where rt is the option-implied return (6.4) and yt is the Bill yield at trading
day t. Before testing, we standardize both SPt and T Bt , i.e. we subtract the
mean and divide by the standard deviation, respectively.
182
that stock markets led Treasury Bills yields as well as longer term bonds yields
during bubble periods (Zhou and Sornette, 2004; Guo et al., 2011). It is partic-
ularly interesting to find a Granger causality of the forward-looking expected
returns, as extracted from option data, onto a backward-looking Treasury Bill
yield in the pre-crisis period and the reverse thereafter. Thus, expectations
were dominant in the pre-crisis period as is usually the case in efficient markets,
while realized monetary policy was (and still is in significant parts) shaping
expectations post-crisis (as shown in Table 6.3 and Figure 6.3b). The null
of no influence is rejected for Treasury Bill yields Granger causing option-
implied returns lagged by 50 to 200 days. This is coherent with the view that
the Fed monetary policy, developed to catalyze economic recovery via mone-
tary interventionism, has been the key variable influencing investors and thus
options/stock markets.
6.4 Conclusion
183
crisis as taking place from mid-2007 to mid-2009. The evolution of risk-neutral
return probability distributions characterizing the pre-crisis, crisis and post-
crisis regimes reveal a number of remarkable properties. Indeed paradoxically
at first sight, the distributions of expected returns became very close to a
normal distribution during the crisis period, while exhibiting strongly negative
skewness and especially large kurtosis in the two other periods. This reflects
that investors may care more about the risks being realized (volatility) during
the crisis, while they focus on potential losses (fat left tails, negative skewness
and large kurtosis) in quieter periods.
184
ties and concern with uncertainties, fostered possibly by unconventional finan-
cial and monetary policy and unexpectedly sluggish economic recovery.
Finally, our Granger causality tests demonstrate that, in the pre-crisis period,
changes of option-implied returns lead changes of Treasury Bill yields with a
short lag, while the reverse is true with longer lags post-crisis. In a way, the
post-crisis period can thus be seen as a return to a “normal” regime in the sense
of standard economic theory, according to which interest rate policy determines
the price of money/borrowing, which then spills over to the real economy and
the stock market. What makes it a “new normal” (El-Erian, 2011) is that zero-
interest rate policies in combination with other unconventional policy actions
actually dominate and bias investment opportunities. The pre-crisis reveals
the opposite phenomenon in the sense that expected (and realized returns)
lead the interest rate, thus in a sense “slaving” the Fed policy to the markets.
It is therefore less surprising that such an abnormal period, previously referred
to as the “Great Moderation” and hailed as the successful taming of recessions,
was bound to end in disappointments as a bubble was built up (Sornette and
Woodard, 2010; Sornette and Cauwels, 2014a).
185
Returns and distributional moments implied by S&P 500 options
0.7
0.00
0.6
0.5
−0.10
0.4
0.3
−0.20
0.2
0.1
−0.30
0.0
2004 2006 2008 2010 2012 2014 2004 2006 2008 2010 2012 2014
60
50
−1
40
−2
30
−3
20
−4
10
−5
2004 2006 2008 2010 2012 2014 2004 2006 2008 2010 2012 2014
−0.1
0.4
−0.2
0.2
−0.3
0.0
−0.4
−0.2
−0.4
−0.5
2004 2006 2008 2010 2012 2014 2004 2006 2008 2010 2012 2014
(e) Left tail shape parameter. (f) Right tail shape parameter.
Figure 6.1: This figure presents returns and distributional moments implied by S&P
500 options. Structural changes around the financial crisis are identified consistently
with a change point analysis of the means of the higher moments and tail shape
parameters (vertical lines).
186
Option-implied returns vs realized returns and Treasury Bill yields
0.2
Implied returns Realized returns
0.1
Annualized log−return
0.0
−0.4 −0.3 −0.2 −0.1
0.0
−0.1
−0.2
−0.3
Figure 6.2: This figure presents time series of option-implied S&P 500 returns,
realized returns and Treasury Bill yields over the time period 2003–2013.
187
Subperiod Granger causality tests
0.6
0.4
0.2
0.0
0.6
0.4
0.2
0.0
188
Risk Neutral Density and Fitted GEV Tail Functions on 2010−10−06 for 2010−12−18
Risk-neutral densitiy implied by options
0.007
Empirical RND
Left tail GEV function
Right tail GEV function
0.006
●
Connection points
0.005
0.004
Density
0.003
0.002
●
0.001
●
0.000
Figure 6.4: Risk-neutral density implied by S&P 500 options from 2010-10-
06 for index levels on 2010-12-18. The empirical part is directly inferred from
option quotes, whereas tails must be estimated to account for the range beyond
observable strike prices. Together, they give the full risk-neutral density. The
method is reviewed in section 6.2.1 and appendix 6.4.
189
Table 6.1: Start and end dates of the Global Financial Crisis as identified by
a change point analysis of statistical properties of option-implied risk-neutral
densities. The dates found in the left tail shape parameter and higher moments
identify consistently the crisis period as ca. June 2007 to ca. October 2009.
Interestingly, the return time series signals the beginning only more than a
year later, as September 2008. See section 6.2.3 for a review of the method,
and 6.3.2 for a more detailed discussion of the results.
190
Table 6.3: This table reports the results of a Granger-causality test of option-
implied S&P 500 returns and Treasure Bill yields by sub-period. While we do
not find evidence that Treasury Bill yields may have Granger-caused implied
returns pre-crisis, there is Granger-influence in the other direction at a lag of
5 trading days both pre- and especially post-crisis. Notably, our test strongly
suggests that post-crisis Treasury Bill yields have Granger-causal influence on
option-implied returns at lags of 50 to 200 trading days.
Pre-crisis
Post-crisis
191
Estimating the risk-neutral density from option
quotes
192
tails of the family of generalized extreme value (GEV20 ) distributions with
connection conditions: a) matching value at the 2%, 5%, 92% and 95% quantile
points, and b) matching probability mass in the estimated tail and empirical
density. An example can be seen in Figure 6.4. The empirical density together
with the tails give the complete risk-neutral density.
The following framework is used for significance testing in section 6.3.2 and
Table 6.1. For more details, see Csörgö and Horváth, 1997. Let x1 , x2 , . . . , xn
be independent, real-valued observations. We test the “no change point” null
hypothesis,
H0 : E(x1 ) = E(x2 ) = . . . = E(xn ), (6)
p 1 1
A(x) := 2 log log x , D(x) := 2 log log x + log log log x − log π. (8)
2 2
Then, following corollary 2.1.2 and in light of remark 2.1.2. Csörgö and
Horváth, 1997, pp. 67-68, under mild regularity conditions, H0 and for large
sample sizes, one has
1/2 !
1 n k
S(k) − S(n) − D(n) ≤ t = exp −2e−t ,
P A(n) max
k σ̂n k(n − k) n
(9)
20
See Embrechts, Klüppelberg, and Mikosch, 1997 for a detailed theoretical discussion of
GEV distributions and modeling extreme events.
193
Pt
where σ̂n is the sample standard deviation and S(t) := i=1 xi the cumulative
sum of observations.
References
194
Brodsky, E. and B.S. Darkhovsky (1993). Nonparametric Methods in Change
Point Problems. Mathematics and Its Applications. Springer.
Corsi, Fulvio and Didier Sornette (2014). “Follow the money: The monetary
roots of bubbles and crashes”. In: International Review of Financial Analysis
32, pp. 47–59.
Csörgö, Miklós and Lajos Horváth (1997). Limit Theorems in Change-point
Analysis. Wiley series in probability and statistics. Wiley.
Delbaen, Freddy and Walter Schachermayer (1994). “A general version of the
fundamental theorem of asset pricing”. In: Mathematische Annalen 300.1,
pp. 463–520.
Duarte, Fernando and Carlo Rosa (2013). “The Equity Risk Premium: A Con-
sensus of Models”. In: SSRN WP 2377504.
El-Erian, Mohamed (2011). “Spain is not Greece and need not be Ireland”. In:
Financial Times Retrieved 2011-08-18, February 3.
Embrechts, Paul, Claudia Klüppelberg, and Thomas Mikosch (1997). Mod-
elling extremal events: for insurance and finance. Vol. 33. Springer.
Exchange, Chicago Board Options (2009). “The CBOE volatility index – VIX”.
In: White Paper.
Federal Reserve Bank of St. Louis (2009). “The Financial Crisis: A Timeline
of Events and Policy Actions”. In: timeline.stlouisfed.org.
Fernandez, Pablo (2013). “The Equity Premium in 150 Textbooks”. In: SSRN
WP 1473225.
Figlewski, Stephen (2010). “Estimating the Implied Risk Neutral Density”.
In: Volatility and Time Series Econometrics. Ed. by Tim Bollerslev, Jeffrey
Russell, and Mark Watson. Oxford: Oxford University Press.
Galbraith, J.K. (2009). The Great Crash 1929. Mariner Books (Reprint Edi-
tion).
195
Gibrat, R. (1931). “Les Inégalités économiques: Aux Inégalités des Richesses,
à la Concentration des Entreprises, Aux Populations des Villes, Aux Statis-
tiques des Familles, etc., d’une Loi Nouvelle: La Loi de l’Effect Proportion-
nel”. In:
Gifford, S. (2013). “Risk and uncertainty”. In: Z.J. Acs, D.B. Audretsch (eds.),
Handbook of Entrepreneurship Research International Handbook Series on
Entrepreneurship 5, pp. 303–318.
Graham, John R. and Campbell R. Harvey (2013). “The Equity Risk Premium
in 2013”. In: SSRN WP 2206538.
Granger, C. W. J. (1969). “Investigating Causal Relations by Econometric
Models and Cross-spectral Methods”. In: Econometrica 37.3, pp. 424–438.
Greenspan, Alan (2005). Consumer Finance. Federal Reserve System’s Fourth
Annual Community Affairs Research Conference. Federal Reserve Board.
Guo, Kun et al. (2011). “The US stock market leads the Federal funds rate
and Treasury bond yields”. In: PloS One 6.8, e22794.
Hellwig, Martin (2009). “Systemic Risk in the Financial Sector: An Analysis of
the Subprime-Mortgage Financial Crisis”. In: De Economist 157.2, pp. 129–
207.
Hüsler, Andreas, Didier Sornette, and Cars H Hommes (2013). “Super-exponential
bubbles in lab experiments: evidence for anchoring over-optimistic expec-
tations on price”. In: Journal of Economic Behavior & Organization 92,
pp. 304–316.
Jackwerth, Jens Carsten (2000). “Recovering risk aversion from option prices
and realized returns”. In: Review of Financial Studies 13.2, pp. 433–451.
– (2004). Option-implied risk-neutral distributions and risk aversion. Research
Foundation of AIMR Charlotteville.
196
Jackwerth, Jens Carsten and Mark Rubinstein (1996). “Recovering probability
distributions from option prices”. In: The Journal of Finance 51.5, pp. 1611–
1631.
Johansen, Anders, Olivier Ledoit, and Didier Sornette (2000). “Crashes as
critical points”. In: International Journal of Theoretical and Applied Finance
3.2, pp. 219–255.
Johansen, Anders, Didier Sornette, and Olivier Ledoit (1999). “Predicting fi-
nancial crashes using discrete scale invariance”. In: Journal of Risk 1.4,
pp. 5–32.
Kaizoji, T. and D. Sornette (2009). “Market Bubbles and Crashes”. In: The
Encyclopedia of Quantitative Finance.
Killick, Rebecca, Paul Fearnhead, and IA Eckley (2012). “Optimal detection of
changepoints with a linear computational cost”. In: Journal of the American
Statistical Association 107.500, pp. 1590–1598.
Kindleberger, Charles P. and Robert Z. Aliber (2011). Manias, Panics and
Crashes: A History of Financial Crises. Palgrave Macmillan; Sixth Edition.
Lin, L. and D. Sornette (2011). “Diagnostics of Rational Expectation Finan-
cial Bubbles with Stochastic Mean-Reverting Termination Times”. In: The
European Journal of Finance.
Lin, Li, R.E. Ren, and Didier Sornette (2014). “The Volatility-Confined LPPL
Model: A Consistent Model of ‘Explosive’ Financial Bubbles With Mean-
Reversing Residuals”. In: International Review of Financial Analysis 33,
pp. 210–225.
Lleo, Bastien and William T. Ziemba (2012). “Stock market crashes in 2007
– 2009: were we able to predict them?” In: Quantitative Finance 12.8,
pp. 1161–1187.
Mackenzie, Donald (2008). An Engine, Not a Camera: How Financial Models
Shape Markets. The MIT Press.
197
Page, ES (1954). “Continuous inspection schemes”. In: Biometrika 41.1/2,
pp. 100–115.
Phillips, P. C. B., S.-P. Shi, and J. Yu (2012). “Testing for multiple bubbles 1:
Historical episodes of exuberance and collapse in the S&P 500”. In: Cowles
Foundation Discussion Paper No. 1843.
Phillips, P. C. B., Yangru Wu, and J. Yu (2011). “Explosive behavior in the
1990s Nasdaq: when did exuberance escalate asset values?” In: International
Economic Review 52.1, pp. 201–226.
Rosenberg, Joshua V and Robert F Engle (2002). “Empirical pricing kernels”.
In: Journal of Financial Economics 64.3, pp. 341–372.
Saichev, A., Y. Malevergne, and D. Sornette (2009). “A Theory of Zipf’s Law
and beyond”. In: Lecture Notes in Economics and Mathematical Systems
632, pp. 1–171.
Scott, AJ and M Knott (1974). “A cluster analysis method for grouping means
in the analysis of variance”. In: Biometrics 30.3, pp. 507–512.
Shiller, Robert J (2005). Irrational exuberance. Random House LLC.
Simon, H.A. (1955). “On a class of skew distribution functions”. In: Biometrika
52, pp. 425–440.
Sornette, D. (2014). “Physics and Financial Economics (1776-2013): Puzzles,
Ising and agent-based models”. In: Rep. Prog. Phys. 77, 062001 (28 pp.)
Sornette, D. and J. V. Andersen (2002). “A Nonlinear Super-Exponential Ra-
tional Model of Speculative Financial Bubbles”. In: International Journal of
Modern Physics C 13.2, pp. 171–188.
Sornette, D. and P. Cauwels (2014a). “1980-2008: The Illusion of the Perpetual
Money Machine and what it bodes for the future”. In: Risks 2, pp. 103–131.
– (2014b). “Financial Bubbles: Mechanism, diagnostic and state of the world
(Feb. 2014)”. In: Review of Behavioral Economics (in press).
198
Sornette, Didier (2003). Why stock markets crash: critical events in complex
financial systems. Princeton University Press.
Sornette, Didier and Ryan Woodard (2010). “Financial bubbles, real estate
bubbles, derivative bubbles, and the financial and economic crisis”. In: Econo-
physics Approaches to Large-Scale Business Data and Financial Crisis. Springer,
pp. 101–148.
Sornette, Didier et al. (2013). “Clarifications to Questions and Criticisms on
the Johansen-Ledoit-Sornette bubble Model”. In: Physica A: Statistical Me-
chanics and its Applications 392.19, pp. 4417–4428.
Soros, G. (2009). The Crash of 2008 and What it Means: The New Paradigm
for Financial Markets. Public Affairs; Revised edition.
Stock, James H. and Mark W. Watson (2003). “Has the Business Cycle Changed
and Why?” In: NBER Macroeconomics Annual 2002, MIT Press 17, pp. 159–
230.
Summers, Lawrence et al. (1999). Over-the-Counter Derivatives Markets and
the Commodity Exchange Act. Report of The President’s Working Group on
Financial Markets.
Yule, G. U. (1925). “A Mathematical Theory of Evolution, based on the Con-
clusions of Dr. J. C. Willis, F.R.S.” In: Philosophical Transactions of the
Royal Society B 213.402-410, pp. 21–87.
Zhou, Wei-Xing and Didier Sornette (2004). “Causal slaving of the US treasury
bond yield antibubble by the stock market antibubble of August 2000”. In:
Physica A: Statistical Mechanics and its Applications 337.3, pp. 586–608.
199
Chapter 7
Meritocratic mechanism
design:
Theory and experiments
200
Abstract
efficiency and equality. The challenge for the right design of many in-
split into a theory part and an experimental part. In the theory part,
201
Part 1: Theory
Abstract
several more efficient equilibria emerge, but only the inefficient equilib-
tocracy only under extreme inequality aversion, with low rates of return
or in small populations.
202
Acknowledgements. The authors would like to thank Lukas Bischofberger
for help with simulations, Bary Pradelski, Anna Gunnthorsdottir, Matthias
Leiss, Michael Mäs, Francis Dennig and Stefan Seifert for helpful comments on
earlier drafts, Peyton Young for help with framing of the questions, Luis Cabral
for help with proposition (16), Ingela Alger and Jörgen Weibull for a helpful
discussion, and finally members of GESS at ETH Zurich, the participants
at Norms Actions Games 2014 and at the 25th International Conference on
Game Theory 2014 at Stony Brook, as well as anonymous referees for helpful
feedback. All remaining errors are ours.
203
7.1 Motivation
From the perspective of social planning, one must therefore address two or-
thogonal questions. First, given a fixed level of meritocracy, which equilibrium
is stable? Second, what level of meritocracy maximizes welfare? Our analysis
suggests that, other than in the aforementioned contexts such as education,
204
job matching, or marriage markets, an intermediate but substantial level of
meritocracy generally maximizes welfare in our setting. This result surprises
as one would expect that welfare comparisons depend more subtly on the so-
cial planner’s degree of inequality aversion. We obtain our result from focus
on states that are stochastically stable (Foster and Young 1990, Young 1993),
requiring the social planner to choose amongst stable states.
The rest of this paper is structured as follows. Next, we discuss related litera-
ture, including voluntary contributions mechanisms and the broad conceptual
approach. In section 3, we develop a formal model of meritocratic matching,
calculate its equilibria, and detail the stability and welfare properties. We
conclude in section 4.
205
allowing to close the gap between full meritocracy and random (re-)matching.
Second, given any intermediate degree of meritocracy, the stability of alterna-
tive equilibria is assessed using evolutionary refinement concepts. Third, we
compare the welfare of different meritocracies.
206
via meritocratic matching.
207
ences (Alger and Weibull 2013, Grund, Waloszek, and Helbing 2013).
Before we proceed to formalize the set-up of our model, we would like to provide
more intuition for the basic flavour of meritocratic matching. While none
of the following real-world examples of institutions coincides one-to-one with
meritocratic matching as it will be instantiated in our simple model of a linear
and symmetric public goods game, they do mirror meritocratic matching’s
key features. Importantly, all of these real-world examples are typically both
noisy and not always fair. The first example is school/university admission.
Entrance examinations to schools or universities assort individuals based on an
imperfect measure of applicants’ adequacies for different streams of education
and to enter different schools. An important feature of this sorting mechanism
is that the resulting differences in educational quality amongst the different
schools are not only determined by the institutional design, but also by the
208
different quality levels of students present in them. Better students tend to
study with better students, and worse students with worse students. The
incentive to work hard for the examinations is getting into a good school. The
second example is team-based payment. Imagine an assortative employment
regime with team-based payments that rewards employees for performance
by matching them with similarly performant other employees. Real-world
situations with this structure include trading desks in large investment banks,
and again this type of competitive grouping incentivizes hard work through
promise of being matched into better teams. Team formation in professional
sports also has features that are similar to this: performant athletes tend to
be rewarded by joining successful teams with better contracts.
Suppose population N = {1, 2, ..., n} plays the following game, of which all
aspects are common knowledge. The game is divisible into three steps. First,
players make simultaneous voluntary contributions. Second, players receive
ranks that imperfectly represent their contributions. Third, groups and payoffs
realize based on the ranking.
209
binary action structure in order to facilitate our evolutionary analysis.
We shall assume that all functions f are continuous in β, and that the follow-
ing properties are the key ingredients to constitute a ‘meritocratic matching’
mechanism:
210
(i) no meritocracy. if β = 0, then, for any c, fπ0 = 1/n! for all π ∈ Π; hence
β (n+1)
ki = 2
∀i
ci = m, fπe1 = 0 for
P
(ii) full meritocracy. if β = 1, then, for any c with i∈N
1
e, and fπb1 =
all mixed orderings π m!(n−m)!
for all perfect orderings π
b; hence
kiβ (ci = 1) = m+1
2
for all i with ci = 1, and kjβ (cj = 0) = n+m+1
2
for all j
with cj = 0
(iii) imperfect meritocracy. if 0 < β < 1, then, for all players i and for any
c−i ,
h β β
i
E k i (ci = 0) − k i (ci = 1) > 0, (7.1)
h β β
i
∂E k i (ci = 0) − k i (ci = 1) /∂β > 0. (7.2)
Groupings. Finally, groups form based on the ranking and payoffs realize
based on the contributions made in each group. Given π, we assume that m
groups {S1 , S2 , ..., Sm } of a fixed size s < n form the partition ρ of N (where
s = n/m > 1 for some s, m ∈ N+ ): every group Sp ∈ ρ (s.t. p = 1, 2, ..., m)
consists of all players i for whom ki ∈ ((p − 1)s + 1, ps].
X
φi (ci |c−i , ρ) = (1 − ci ) + (R) ∗ cj . (7.3)
| {z }
j∈S
remainder from budget | {z }
return from the public good
211
meritocracy” (details are provided in the analysis of the Nash equilibria in the
next section).
Examples
βci
Meritocratic matching via logit. Given β and c, let li := 1−β
. Suppose
ranks are assigned according to the following logit-response ordering: if any
arbitrary number of (k − 1) ranks from 1 to (k − 1) < n have been taken by
some set of players S ⊂ N (with |S| = k − 1), then any player’s i ∈ {N \ S}
probability to take rank k is
eli
pi (k) = P . (7.4)
j∈N \S elj
212
7.3.2 Nash equilibria
From expression (7.3), the expected payoff of contributing ci given c−i for any
i is
X
E [φi (ci |c−i )] 1 −
= |{z} (1 − R) ∗ ci + R ∗ E cj |ci ,
j6=i: j∈Siπ
| {z } | {z }
expected return from ci (i) budget (ii) sure loss on own contribution
| {z }
(iii) expected return from others’ contributions
(7.5)
where Siπ ∈ ρ is the subgroup into which player i is grouped. Note that term
(iii), the expected return from others’ contributions, is a function of one’s own
contribution due to meritocratic matching, which, if ci = 1, is increasing in
both c−i and β.
First, let us consider candidates for Nash equilibria in pure strategies. Write 1m
for “m players contribute, all others free-ride”, and 1m
−i for the same statement
excluding player i. The following two conditions must hold for 1m to constitute
a Nash equilibrium:
m−1
E φi (1|1m
−i ) ≥ E φi (0|1−i ) (7.6)
m+1
E φi (0|1m
−i ) ≥ E φi (1|1−i ) (7.7)
A special case is 10 when all players free-ride, and we shall reserve the expres-
sion 1m to refer to cases with m > 0. It is easy to verify that 10 is always a
Nash equilibrium (see Appendix A, proposition 10). Gunnthorsdottir et al.,
2010 show that, when β = 1, there exists a Nash equilibrium of the form 1m
n−s+1
with m > 0 provided R ≥ ns−s2 +1
=: mpcr. We shall extend this analysis to
show that, given any R > mpcr, there exists a β < 1 such that there exists
a Nash equilibrium of the form 1m with m > 0 (see Appendix A, proposition
11). The minimum level of β, denoted by β, for which such a Nash equilibrium
213
exists, is an implicit function that is decreasing in R provided R > mpcr.
E φi (0|1p−i ) = E φi (1|1p−i ) .
(7.8)
We shall prove that, for every β, there exists a R ∈ (mpcr, 1) such that there
exist two Nash equilibria of the form 1p with p > 0, one with a high p and one
with a low p (see Appendix A, proposition 13). Write mpcr for the necessary
marginal per capita rate of return when β = 1. Expressed differently, given
any R > mpcr, there exists a β < 1 such that there exist two Nash equilibria
of the form 1p with p, p such that 1 > p > p > 0.
It should be noted that the particular interest of this paper is the analysis
of the evolutionary stability and welfare analysis of the system’s equilibria
as a function of the meritocratic matching parameter β. We shall therefore
assume that our implicit bound mpcr is satisfied, meaning that all equilibria
are at least guaranteed to exist. Thus, our work complements the analysis of
Gunnthorsdottir et al., 2010, where the focus of analysis is the dependence of
equilibria existence on the model parameters including the rate of return for
the case when β = 1. Note that this bound becomes generally satisfied for
large n (see Appendix A, remark 8).
For the case when R > mpcr, the following observations summarize the equi-
librium analysis:
214
E φi (0|1m m
−i ) > E φi (1|1−i ) for β < β and for any 1m ≥ 0
Observation A states that, when there is not enough meritocracy, then free-
riding is a better reply given any set of actions by the other players.
C. Contribute-free-ride indifference.
E φi (0|1p−i ) = E φi (1|1p−i ) for β ≥ β and for p = p or p
7.3.3 Stability
215
this analysis is that we view β as a policy choice. We want to understand
how the stability of different equilibria depends on the level of meritocracy
in matching. The analysis of evolutionary stability will provide us with the
candidates for stability, and stochastic stability with a unique prediction for
every level of meritocracy.
We shall begin by defining the following dynamic game played by agents that
we shall assume act myopically. A large population N = {1, 2, ..., n} plays our
game in continuous time. Let a state of the process be described by p, which
is a proportion of players contributing, while the remaining (1 − p) free-ride.
Let Ω = [0, 1] be the state space.
Evolutionary (bi-)stability
Suppose the two respective population proportions grow according to the fol-
lowing replicator equation (Maynard Smith and Price 1973, Taylor and Jonker
1978, Helbing 1996):
6
We shall speak of evolutionarily stable ‘states’ here instead of evolutionarily stable
‘strategies’ because of the asymmetry of the state.
216
Lemma 5. Given population size n, group size s such that n > s > 1 and
rate of return r such that R ∈ (mpcr, 1), there exists a β > 0 below which
the only ESS is the free-riding Nash equilibrium. When β > β, the free-riding
Nash equilibrium remains ESS, and, in addition, the population proportions
given by the near-efficient symmetric mixed-strategy Nash equilibrium is also
an ESS.
Proof. The proof of Lemma 5 and the cut-off structure of the ESS as given
by the analysis of symmetric mixed strategy Nash equilibria in Proposition 11
(see Appendix A for both) led to the summary of best replies as given by Ob-
servations A-C. Denote by β the necessary meritocracy level in Proposition
11. Observation A implies that the the only ESS when β < β is given by the
free-riding Nash equilibrium because there is only one Nash equilibrium. Ob-
servations B1 implies that the free-riding Nash equilibrium is also ESS when
β ≥ β. Observation B1, B2 and C, jointly, imply that population proportions
given by the near-efficient symmetric mixed-strategy Nash equilibrium also
describe an ESS since it is a local attractor.
Figure 7.1 illustrates the implied replicator phase transitions for proportions
of players contributing as a function of β under meritocratic matching via logit
(Equation 7.4) for s = 4 and r = 1.6 starting with n = 16 (note that the phase
transitions assume the long-run behavior as the population becomes large).
In particular, the figure shows how, for large enough values of β, a relatively
217
Figure 7.1: Evolutionary stability of population strategies for an economy
initialized with s = 4, r = 1.6 and n = 16.
.9
SMSNE (p=p)
.8
.7
.6
p
.5
.4
.3
.2
SMSNE (p=p)
.1
0 FRNE (p=0)
0 .2 .4 .6 .8 1
ȕ
ȕ
In any case when β < β, and when β > β then if p is either in excess of the
near-efficient symmetric mixed-strategy Nash equilibrium (p > p) or short of
the less-efficient symmetric mixed-strategy Nash equilibrium (p < p),
∂p/∂t < 0 (replicator tendency is down). When β > β and p > p > p, then
∂p/∂t > 0 (replicator tendency is up). Depending on the location along the
bifurcation, the evolutionarily stable states are therefore when either p = 0
(free-riding Nash equilibrium) and when p is set according to the
near-efficient symmetric mixed-strategy Nash equilibrium (p = p). Solid lines
in the figure indicate stable equilibria, dashed lines indicate unstable
equilibria.
small ‘jump up’ is needed starting at the free-riding equilibrium to reach the
basin of attraction of the high-contribution equilibrium. By contrast, for low
values of β, a small ‘draw down’ is sufficient to fall out of the high equilibrium
into the free-riding equilibrium.
218
Stochastic stability
posite action with probability . When both actions are best replies, i replies
by playing ctj = ct−1
j with probability 1 − and ctj = 1 − ct−1
j with probability
.
1
State. Let a state of the process be defined by pt = cti .
P
n i∈N
Let us begin with a couple of observations. First, the perturbed process (when
> 0) is ergodic, that is, it reaches every state from any state with positive
probability in finitely many steps (at most n). The process, therefore, has
a unique stationary distribution over Ω. Second, for any given level of β,
the absorbing states of the unperturbed process (when = 0) are the various
Nash equilibria in pure strategies of the game as identified in section 3.2 (and in
particular the free-riding Nash equilibrium and the near-efficient pure-strategy
Nash equilibrium).
219
Critical mass. Let the critical mass, Mβp ∈ [0, n − 1], necessary to destabilize
state p given β be the minimum number of players |S| needed to switch strategy
simultaneously corresponding to an arbitrary set of players, S ⊂ N , such that
as a result of their switch playing current strategy for at least one player in
N \ S ceases to be a best reply.
Proof. When pure strategy Nash equilibria exist, stochastically stable states
must be pure strategy Nash equilibria of the unperturbed process. Candidates
are the free-riding Nash equilibrium and the (nm ) near-efficient pure-strategy
Nash equilibria.
Obviously, the critical mass for any non-equilibrium state p is Mβp = 0 for
all values of β. When β < β, there exists no critical mass to destabilize the
unique equilibrium which is the free-riding Nash equilibrium; Mβ0 = ∅. In
other words, the free-riding Nash equilibrium is the only absorbing state and
therefore the unique stochastically stable state. When β = β, the near-efficient
pure-strategy Nash equilibrium has a critical mass of Mβp = 1. When β > β,
for all less-efficient p ≥ p, the critical mass is Mβp = 1 because one more
contribution of some player incentivizes other non-contributors to contribute
(see Observations A, B1, B2), or one contribution fewer incentivizes all to not
contribute. Moreover, for β > β, ∆Mβ0 /∆β < 0 and ∆Mβp /∆β > 0 provided
β β
∆β is large enough. If Mp1 > M10 at β = 1, then, since Mp < M0 , it must be
that there exists a β ∈ (β, 1) above which the near-efficient pure-strategy Nash
equilibrium has a larger critical mass than the free-riding Nash equilibrium.
The proof of the lemma is now a direct application of Theorem 3.1 in Young,
220
1998, and follows from the fact that the resistances of transitions between
p = p and p = 0 are given by the critical masses, thus yielding the stochastic
potential for each candidate state.
7.3.4 Welfare
Finally, we turn to our welfare analysis. We shall compare the efficiency and
equality properties of equilibria induced by stochastically stable outcomes un-
der varying meritocracy levels. We use this comparison to asses, given a general
class of social welfare functions, which meritocracy level is welfare-optimal for
a given social planner.
Outcome. Let (ρ, φ) describe an outcome, that is, realized groups and pay-
offs.
Social welfare. Given outcome (ρ, φ), let We (φ) be the social welfare function
measuring its welfare given the inequality aversion parameter e ∈ [0, ∞):
1 X
We (φ) = φ1−e (7.10)
n(1 − e) i∈N i
1
Q
When e = 1, it is standard that W1 (φ) = n i∈N φi , i.e. be the Nash prod-
uct.
221
cial welfare functions.7 When e = 0, expression (7.10) reduces to W0 (φ) =
1
P
n i∈N φi , i.e. a Utilitarian social welfare function measuring the state’s effi-
Which equilibrium is preferable in terms of social welfare for any given social
welfare function depends on the social planner’s relative weights on efficiency
and equality and is related to whether an ex ante or an ex post view is taken
with regards to payoff dominance (Harsanyi and Selten, 1988a).9 Critical
for this assessment is the inequality aversion e. For the economy illustrated
in Table 7.1 (with n = 16, s = 4 and r = 1.6), suppose a social planner
considers moving from β = 0 to β = 1. To assess this, he makes an ex-post
We -comparison. It turns out that for any We with e < 10.3 he prefers the
near-efficient pure-strategy Nash equilibrium, while for a We with e ≥ 10.3 he
prefers the free-riding Nash equilibrium.10
222
stochastically stable states. Moreover, assume the social planner expects
the near-efficient pure strategy Nash equilibrium (here denoted by p) to
be played when Mβ0 = Mpβ (both are stochastically stable).
Proposition 8. For any R > max{mpcr, 1/(s − 1)}, there exists a population
size n < ∞ such that E[We (φ); β = β] > E[We (φ); β] at “sufficient meri-
tocracy” (β = β) for all β 6= β given any parameter of inequality aversion
e ∈ [0, ∞).
Proof. Suppose there exists a β ∈ (β, 1) above which the near-efficient pure-
strategy Nash equilibrium is stochastically stable. Write q1n for the probability
of having more than one free-rider in any group for a realized outcome (ρ, φ)
given n < ∞. Since the number of free-riders does not increase as n increases,
∂q1n /∂n < 0. Since contributors in groups with at most one free-rider receive
a payoff strictly greater than one ((s − 1)R > 1), we have E[We (φ); β] >
(1 − q1n ) × We (φi = (s − 1)R ∀i). Because, given any β < 1, ∂q1n /∂n < 0, there
therefore exists n < ∞ above which E[We (φ)] > We (φi = 1 ∀i).
Remark 9. E[We (φ); β = β] > E[We (φ); β] at “sufficient meritocracy” (β =
β) is also the case for n smaller than implied by the proposition when (a) e is
set below some bound e < ∞ and/ or (b) set above some bound R > 1/(s − 1).
7.3.5 Summary
In our analysis, we have addressed three issues. First, we assessed the robust-
ness of equilibrium predictions for meritocracy levels everywhere in between
“no meritocracy” and “full meritocracy”. We found that the minimum mer-
itocracy threshold (“necessary meritocracy”) that may enable equilibria with
high contributions decreases with the population size, the number of groups
and with the rate of return. Second, we analyzed the stability properties
223
Table 7.1: Stem-and-leaf plot of individual payoffs for the free-riding Nash
equilibrium when β = 0 and for the near-efficient pure-strategy Nash equilib-
rium when β = 1 with n = 16, s = 4, r = 1.6 and β = 1.
0 0.0 0
0 0.2 0
0 0.4 0
0 0.6 0
13 14 (ci = 1) 2 0.8 0
0 1.0 16 (ci = 0) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 1.2 0
0 1.4 0
1 2 3 4 5 6 7 8 9 10 11 12 (ci = 1) 12 1.6 0
15 16 (ci = 0) 2 1.8 0
24.4 efficiency 16
The stem of the table are payoffs. The leafs are the number of players
receiving that payoff (with their contribution decision), and the individual
ranks of players corresponding to payoffs in the two equilibria. At the bottom,
the efficiencies of the two outcomes are calculated. Note that the
near-efficient pure-strategy Nash equilibrium is more efficient, whereas the
free-riding Nash equilibrium is more equitable.
of the equilibria. It turned out that there exists a second threshold (“suffi-
cient meritocracy”) between “necessary meritocracy” and “full meritocracy”,
above (below) which the high contributions equilibria (zero contributions equi-
librium) is stable. Qualitatively, the same comparative statics apply to this
second threshold as with respect to the first. Third, we assessed the relative
welfare properties of the candidate stable equilibria to identify, given varying
degrees of inequality aversion, the uniquely welfare-maximizing regime. We
found that setting meritocracy at “sufficient meritocracy” maximizes welfare
for any inequality-averse social welfare functions when the population is large
enough. Group size does not matter. For smaller populations, the same result
holds if (a) the inequality aversion is not extreme and (b) the rate of return is
high. Only for extremely inequality-averse social planners should efficiency be
sacrificed and meritocracy be set to zero.
224
7.4 Theoretical predictions
There are two reasons why a social planner in our model should generally go
for an intermediate level of meritocracy. First, compared with no meritocracy,
levels of meritocracy above a first threshold we termed “necessary meritoc-
racy” gain a lot of efficiency. Second, compared with even higher levels of
meritocracy, marginally less meritocracy gains (a lot of) equality without los-
ing (much) efficiency. Ideally, the social planner would therefore like to reduce
meritocracy down to the level of necessary meritocracy, but obeying stabil-
ity forces him to settle at one and the same “sufficient meritocracy” level in
general.
Our findings seem to contradict the general social choice theory wisdom that
meritocracy leads to inequality. The reason for this contradiction is that, on
the one hand, we focus on situations that are strictly non-constant sum, and,
on the other hand, that we do neither consider repeated game effects such as
inheritance, wealth, or reputation, nor do we allow for heterogeneity in the
population. The former is a crucial feature of our model and a fundamental
difference in environments compared to what is usually considered. It has im-
portant implications regarding the role of meritocracy. The latter restrictions
come with serious loss of generality. It is an avenue left for future research to
enrich our model to allow for such features and to evaluate their welfare conse-
quences. In a way, the purpose of this paper was to “resurrect meritocracy” in
a specific interactive setting where it represents an almost unambiguously ben-
eficial mechanism. We view this as a first step toward a much larger research
agenda that aims at a more subtle assessment of meritocracy than recently
voiced perceptions culminating in statements like “the meritocracy of capital-
ism is a big, fat lie”.11
11
This is how The Guardian’s Heidi Moore summarizes Thomas Piketty’s bestseller book
on inequality in her article “Thomas Piketty is a rock-star economist – can he re-write the
225
Appendices
Proposition 10. For any population size n > s, group size s > 1, rate of
return r ∈ (1, s), and meritocratic matching factor β ∈ [0, 1], there always
exists a free-riding Nash equilibrium such that all players free-ride.
The proof of Proposition 10 follows from the fact that, given any β and for c−i
P
such that j6=i cj = 0, we have:
n−s+1
the same statement excluding player i, and that mpcr = ns−s2 +1
.
Proposition 11. Given population size n > s, group size s > 1 and rate
of return r such that R ∈ (mpcr, 1), there exists a necessary meritocracy
level, β ∈ (0, 1), above which there is a pure-strategy Nash equilibrium,
where m > 0 agents contribute and the remaining n − m agents free-ride.
American dream?” (April 27, 2014).
226
Proof. The following two conditions must hold for Proposition 11 to be true:
m−1
E φi (1|1m
−i ) ≥ E φi (0|1−i ) (7.12)
m+1
E φi (0|1m
−i ) ≥ E φi (1|1−i ) (7.13)
The proof for the existence of an equilibrium in which some appropriate (pos-
itive) number of contributors m exists for the case when β = 1 and R ≥ mpcr
follows from Theorem 1 in Gunnthorsdottir et al., 2010, in which case both
equations (7.12) and (7.13) are strictly satisfied.
The fixed point argument behind that result becomes clear by inspection of
terms (ii) and (iii) in expression (7.5): namely, the decision to contribute
rather than to free-ride is a trade-off between (ii), ‘the sure loss on own con-
tribution’, which is zero for free-riding, versus (iii), ‘the expected return on
others’ contributions’, which may be larger by contributing rather than by
free-riding depending on how many others also contribute. Obviously, when
P P
c−i is such that j6=i cj = 0 or j6=i cj = (n − 1) (i.e. if either all others free-
ride or all others contribute), it is the case that φi (0|c−i ) > φi (1|c−i ). Hence,
in equilibrium, 0 < m < n.
∂E φi (1|1m
−i ) /∂β > 0 (7.14)
∂E φi (0|1m
−i ) /∂β < 0 (7.15)
227
by existence of the equilibrium with m > 0 contributors when β = 1, provided
that R > mpcr is satisfied, there must exist some maximum value of β ∈ (0, 1),
at which either equation (7.12) or equation (7.13) first binds due to continuity
of expressions (7.14) and (7.15) in β. That level is the bound on β above which
the pure-strategy Nash equilibrium with m > 0 exists.
Remark 12. Note that, for a finite population of size n, a group size s larger
than one implies that mpcr > 1/s for Proposition 11 to be true, but as n → ∞,
mpcr converges to 1/s.12
Now we shall compare the asymmetric equilibria in pure strategies (in partic-
ular the near-efficient pure-strategy Nash equilibrium) with symmetric mixed-
strategy Nash equilibria. For this, we define pi ∈ [0, 1] as a mixed strategy with
which player i plays ‘contributing’ (ci = 1) while playing ‘free-riding’ (ci = 0)
with (1 − pi ). Write p = {pi }i∈N for a vector of mixed strategies. Write 1p
for “all players play p”, and 1p−i for the same statement excluding some player
i.
Proposition 13. Given population size n > s and group size s > 1, there
exists a rate of return r such that R ∈ [mpcr, 1) beyond which there exists
a necessary meritocracy level, β ∈ (0, 1), such that there always are two
mixed strategy profiles, where every agent places weight p > 0 on contribut-
12
It is easy to check that limn→∞ mpcr = 1/s.
228
ing and 1 − p on free-riding, that constitute symmetric mixed-strategy
Nash equilibrium. One will have a high p (the near-efficient symmetric
mixed-strategy Nash equilibrium) and one will have a low p (the less-efficient
symmetric mixed-strategy Nash equilibrium).
E φi (0|1p−i ) = E φi (1|1p−i ) ,
(7.16)
because, in that case, player i has a best response also playing pi = p, guaran-
teeing that 1p is a Nash equilibrium. Proposition 11 implies that, if R > mpcr,
equations (7.12) and (7.13) are strictly satisfied when β = 1 for m contributors
corresponding to the near-efficient pure-strategy Nash equilibrium. Indeed, ex-
pressions (7.12) and (7.13) imply lower and upper bounds (see Gunnthorsdottir
et al. 2010) on the number of free-riders given by
n − nR n − nR
l= , u=1+ . (7.17)
1 − R + nR − r 1 − R + nR − r
Part 1. First, we will show, given any game with population size n and group
size s, for the case when β = 1, that there is (i) at least one symmetric mixed-
strategy Nash equilibrium when R → 1; (ii) possibly none when R = mpcr;
and (iii) a continuity in R such that there is some intermediate value of R ∈
[mpcr, 1) above which at least one symmetric mixed-strategy Nash equilibrium
exists but not below.
(i) Because ∂E φi (ci |1p−i ) /∂p > 0 for all ci , there exists a p ∈ ( m−1 , m+1
n n
)
such that expression (7.16) holds if R → 1. This is the standard symmetric
mixed-strategy Nash equilibrium, which always exists in a symmetric two-
229
action n-person game where the only pure-strategy equilibria are asymmetric
and of the same kind as the near-efficient pure-strategy Nash equilibrium (see
the proof of Theorem 1 in Cabral 1988). In this case, the presence of the free-
riding Nash equilibrium makes no difference because the incentive to free-ride
vanishes as R → 1.
(ii) If R = mpcr, one or both of the equations, (7.12) or (7.13), bind. Hence,
unless expression (7.16) holds exactly at p = m/n (which is a limiting case
in n that we will address in proposition 16), there may not exist any p such
that expression (7.16) holds. This is because the Binomially distributed pro-
portions of contributors implied by p, relatively speaking, place more weight
on the incentive to free-ride than to contribute because universal free-riding is
consistent with the free-riding Nash equilibrium while universal contributing
is not a Nash equilibrium. In this case, the incentive to free-ride is too large
for a symmetric mixed-strategy Nash equilibrium to exist.
(iii) ∂E φi (ci |1p−i ) /∂r is a different linear, positive constant for both ci = 0
exist some β < 1 and p0 < p satisfying equation (7.16) while still satisfying
E φi (0|1p−i ) = E φi (1|1p−i ) > 1. Note that this implicit bounds here may be
230
different from that in Proposition 11.
Part 2. If R > mpcr and β > β, existence of two equilibria with p > p > 0 is
shown by analysis of the comparative statics of equation (7.16).
First note that, for any R > mpcr and β > β, ∂E φi (0|1p−i ) /∂β < 0 while
∂E φi (1|1p−i ) /∂β > 0. p therefore has to take different values for equation
(7.16) to hold for two different values of β above β. Unclear is whether it has
to take a higher or lower value. Note also that both ∂E φi (0|1p−i ) /∂p > 0
and ∂E φi (1|1p−i ) /∂p > 0 for all β ∈ (0, 1). We can rearrange the partial
Claim 14. The denominator of Equation 7.18 is negative when p is low, and
positive when p is high.
Write wici and wici respectively for the probabilities with which agent i is
matched in an above- or below-average group when playing ci where the aver-
h i
p p
age is taken over contributions excluding i. Write E φi (ci |1−i ) and E φi (ci |1−i )
for the corresponding expected payoffs.
Recall that, for β > 0 and 1p−i ∈ (0, 1), Expression 7.2 holds, where b
k is
compatible with a perfect ordering π
b, and e
k is any rank compatible with a
e. When 1p−i = 0 or 1p−i = 1, the probability of agent i to take
mixed ordering π
rank j, fijβ , depends on his choice of ci , but wici = wici = 0 for any choice of
contribution ci .
For p(0, 1), we shall rewrite ∂E φi (0|1p−i ) /∂p in the denominator of Equation
231
Figure 7.2: Expected payoffs of contributing versus free-riding if all others play
p and the meritocratic matching fidelity is β for the economy with n = 16,
s = 4, r = 1.6.
expected value
of free-riding
expected value
expected value
of contributing
merito
cratic m
atchin rs
g fide o f contributo
lity proportion
7.18 as
∂ i p
∂ h i h
p
ii
w ∗ E φi (0|1−i ) + w ∗ E φi (0|1−i ) (7.19)
∂p 0 ∂p 0
∂ i ∂ h i h ii
w1 ∗ E φi (1|1p−i ) + w1 ∗ E φi (1|1p−i ) .
(7.20)
∂p ∂p
Notice that, for large β, wi0 wi0 when p is close to zero, and wi1 wi1 when
p is close to one. Moreover, notice that the existence of the pure-strategy
Nash equilibrium with high contribution for high levels of β ensures that
E φi (0|1p−i ) is not always larger than E φi (1|1p−i ) . It therefore follows from
232
Figure 7.3: Expected payoffs of contributing versus free-riding if all others play
p and the meritocratic matching fidelity is β > β.
expected value
sR+(1-R)
Expected value
of contributing sR
1 Expected value
of free-riding
R
0 p 1
and that Expression 7.19 exceeds Expression 7.20 when p is high, hence the
denominator of Equation 7.18 is negative when p is low, and positive when p
is high. Figure 3 illustrates.
Proof. Suppose R > mpcr, i.e. that both symmetric mixed-strategy Nash
equilibrium and near-efficient pure-strategy Nash equilibrium exist. Let 1m
233
describe the near-efficient pure-strategy Nash equilibrium and 1p describe the
near-efficient symmetric mixed-strategy Nash equilibrium. Recall that expres-
sions under (7.17) summarize the lower and upper bound on the number of
free-riders, (n − m) in the near-efficient pure-strategy Nash equilibrium. Tak-
1
ing limn→∞ for those bounds implies a limit lower bound of R−r/n , and a
1+n 1−R
1 1
limit upper bound of the expected proportion of free-riders of n
+ R−r/n ,
1+n 1−R
and thus bounds on the number of free-riders that contain at most two inte-
gers and at least one free-rider. (Notice that the limits imply that exactly one
person free-rides as R → 1.) We know that, if there is one more free-rider than
given by the upper bound, then equation (7.13) is violated. Similarly, if there
is one fewer free-rider than given by the lower bound, then equation (7.12) is
violated.
can rewrite E φi (ci |1p−i ) as E [φi (ci |B)], where B is the number of other
234
Remark 17. In light of the limit behavior, it is easy to verify, ceteris paribus,
that the value of the marginal per capita rate of return necessary to ensure
existence of the symmetric mixed-strategy Nash equilibrium is decreasing in
population size n, but increasing in group size s; i.e. decreasing in relative
group size s/n.
235
Part 2: Experiments
Abstract
236
7.5 The efficiency-equality tradeoff
Making policy decisions often requires tradeoffs between different goals. One
of the most fundamental tradeoffs is that between efficiency and equality. The
basic idea of institutional meritocracy (Young, 1958b) is to devise a system
of rewards that “is intended to encourage effort and channel it into socially
productive activity. To the extent that it succeeds, it generates efficient econ-
omy. But that pursuit of efficiency necessarily creates inequalities. And hence
society faces a tradeoff between equality and efficiency.” (Arthur M. Okun,
Equality and efficiency, the big tradeoff, The Brookings Institution, 1975, p.
1.)
One could argue that inherent to this statement is the view that societal activ-
ity can be modeled in the language of game theory as a public-goods provision/
voluntary contributions game (Isaac, McCue, and Plott, 1985b; Ledyard, 1997;
Chaudhuri, 2011b). In the baseline model, voluntary contributions games cre-
ate no incentives for contributors and universal free-riding is the only stable
equilibrium (Nash, 1950). In such a setting, the “tragedy of the commons”
cannot be circumvented (Hardin, 1968). However, even if this outcome is max-
imally inefficient, one positive thing about it is that it comes with a very high
degree of equality (at the cost of low average payoffs). For this reason, the out-
come of universal free-riding has been controversially associated with extreme
forms of socialism (Mises, 1922; Hayek, 1935). Fortunately, an array of mech-
anisms exists with the potential to foster contributions to public goods. One
such mechanism that has been extensively studied in the literature is punish-
ment (Fehr and Gächter, 2000; Ledyard, 1997; Chaudhuri, 2011b). However,
237
mechanisms such as punishment tend to be “leaky buckets” (Okun, 1975), in
the sense that some of the efficiency gains generated by the increase in contri-
butions are spent in order to uphold them (e.g. on punishment costs).
238
of inequality.14 The contrast between these two outcomes is well illustrated by
the tensions that would exist between an ideal Benthiam (utility-maximizing)
social planner, on the one hand, and an ideal Rawlsian (inequality-minimizing)
social planner on the other: in many games, the Benthiam (Bentham, 1907)
would strictly favor perfect action-assortativity, while the Rawlsian (Rawls,
1971) would rather prefer complete non-assortativity. In comparison, a real-
world social planner typically exercises a certain degree of ‘inequality aversion’,
aiming for an outcome between these two extremes (Atkinson, 1970).
239
the selection of a certain degree of meritocracy. This tradeoff is at the heart
of social choice theory (see e.g. (Arrow, 1951; Sen, 1970; Arrow, Bowles, and
Durlauf, 2000b)) and welfare economics (see e.g. (Samuelson, 1980; Feldman,
1980; Atkinson, 2012)). Zero meritocracy represents maximal equality, but also
minimal efficiency; full meritocracy represents the opposite. For any degree of
inequality aversion away from the two extremes (given by (Bentham, 1907)
and (Rawls, 1971)), there exist, at least in theory, an intermediate degree of
meritocracy that maximizes social welfare (Nax, Murphy, and Helbing, 2014).
Unfortunately, this is a difficult tradeoff as the buckets are leaky in both direc-
tions: reducing meritocracy increases equality at the expense of efficiency, and
increasing meritocracy increases efficiency at the expense of equality.
In this paper we set out to test this tradeoff experimentally by analysis of inter-
mediate regimes of meritocracy. We are thus the first to bridge the rich experi-
mental literature on public-goods games under random interactions (zero mer-
itocracy) (Ledyard, 1997; Chaudhuri, 2011b) with the more recent literature
on full meritocracy (group-based mechanisms) (Gunnthorsdottir et al., 2010;
Rabanal and Rabanal, 2010). The experiments reveal that the strict trade-
off implied by theory is dissolved in practice. Higher degrees of meritocracy
turn out to increase welfare for any symmetric and additive objective function
(Atkinson, 1970), including Benthiam utility-maximization (Bentham, 1907)
and Rawslian inequality minimization (Rawls, 1971). In other words, meritoc-
racy increases both efficiency and equality, leading to unambiguous welfare
improvements as we illustrate for a variety of measures. We argue that the
dissolution of the tradeoff is driven by the agents’ distastes of ‘meritocratic’
unfairness, and by the corrections to their actions that these considerations im-
240
ply. The view of fairness that we adopt and test here generalizes the concept
of distributive fairness/ inequity aversion (Fehr and Schmidt, 1999; Ockenfels
and Bolton, 2000) to settings with positive levels of meritocracy. This fair-
ness definition is a game-theoretic application of a notion related to systemic
fairness (Adams, 1965; Greenberg, 1987), which has been long recognized in
organizational theory, but not previously applied to game theory (and the
problem of public-goods provision in particular). The patterns associated with
reactions to between-group comparisons, however, have been noted as robust
phenomena without being interpreted as driven by norms of fairness (Bohm
and Rockenbach, 2013).
241
outcome of the game “unfair” if another agent B contributed less than
A, but B was placed in a better group. As a consequence, agent A is
assumed to respond by decreasing his/her contribution.
242
with other players j 6= i receives:
X
φi (c) = (B − (1 − m) ∗ ci ) + m ∗ cj , (7.21)
| {z } | {z }
j∈G−i
payoff return from private account
| {z }
return from group account
where m represents the marginal per capita rate of return, and G−i indicates
the members of group Gi excluding i.
NOTE that the game is equivalent to play under the group-based mechanism
(here, ‘perfect meritocracy’) (Gunnthorsdottir et al., 2010) if σ 2 = 0, and
that the case of σ 2 → ∞ corresponds to random re-matching (here, ‘zero
meritocracy’) (Andreoni, 1988).
Equilibrium play
To highlight the structure of the Nash equilibria (Nash, 1950) for this class of
games, it is useful to evaluate the value of the expected payoff E [φi (c)] dur-
ing the decision stage, i.e. before groups are formed. In Eq. (7.21), the first
term, i.e. the private-account return, is completely determined by the agent’s
contribution choice. The second term, i.e. the group-account return, however,
depends on the players’ contributions in a probabilistic way. In the case of zero
meritocracy (i.e. random re-matching) (σ 2 = ∞), E [φi (c)] is strictly decreasing
in the player’s own contribution because the marginal per capita rate of return
is less than one. Under zero meritocracy, the player’s own contribution has
no effect on group matching, and, therefore, the only equilibrium is universal
free-riding. Conversely, for positive levels of meritocracy, the player’s contribu-
243
tion choice influences the probability of being ranked in a high group. Hence,
making a positive contribution is a tradeoff between the sure loss on the own
contribution and the promise of a higher return from the group-account. How-
ever, the chances of being ranked in a better group are decreasing with growing
variance. As a result, new Nash equilibria with positive contribution levels may
emerge: indeed, Nax et al., 2013 generalizes the results by Gunnthorsdottir et
al., 2010 showing that, if the level of meritocracy stays sufficiently large in
addition to some bound on r, there exist a near-efficient pure-strategy Nash
equilibria in which a large majority of players contributes the full budget B
and a small minority of players contributes nothing.15
244
(MTurk) with a total of 256 participants using our new NodeGame software.
Details about the experiment can be found in Appendix B. In each session,
all participants played a game with different variance levels which were σ 2 =
{0, 2, 4, 5, 10, 20, 50, 100, 1000, ∞}. For all variance levels below σ 2 = 100, the
near-efficient Nash equilibria exist in the stage game. For higher variance levels,
the free-riding Nash equilibrium is the unique Nash equilibrium of the stage
game.
Each game was repeated for 25 (or 20) successive rounds. We evaluated the
level of variance starting at which the mechanism started (i) to differ from
the levels implied by the near-efficient Nash equilibria under σ 2 = 0 and (ii)
not to stabilize, and we found these variance levels to be (i) σ 2 = 3 and
(ii) σ 2 = 3. Appendix C contains details. Hence, we settled for the following
four variances for our laboratory experiment: σ 2 = {0, 3, 20, ∞}. We use the
following terminology. For σ 2 = 0 we use PERFECT-MERIT, and for σ 2 = ∞
we use NO-MERIT. For the intermediate values we use HIGH-MERIT (σ 2 = 3)
and LOW-MERIT(σ 2 = 20).
NOTE that in the case of these four levels of variance tested in this study, the
predicted stage-game Nash equilibria are as follows. For σ 2 = ∞ (NO-MERIT),
the unique stage-game Nash equilibrium is universal free-riding, which is also a
Nash equilibrium for all the other variance levels. For σ 2 = {0, 3, 20}, moreover,
there exist n2 alternative pure-strategy equilibria where exactly two players
free-ride while all others contribute fully. Details on equilibria can be found in
Appendix A.
245
7.6.3 The laboratory experiment
Fig. 7.4 illustrates how the laboratory results fit in with the MTurk pre-
tests.
246
247
248
to determine the agents’ sensitivity to changes in meritocracy levels.
7.7.1 Efficiency
249
HIGH-MERIT to an increase of 8.1964 (LRT: χ(1) = 17.48, P < 0.0001), and
PERFECT-MERIT to an increase of 8.8287 (LRT: χ(1) = 16.22, P < 0.0001).
These levels correspond to roughly double those of NO-MERIT. Computing the
most conservative (Bonferroni) adjusted p-values on all pair-wise differences re-
veals that the treatment with variance ∞ is significantly different (P < 0.0001)
from the other three variance levels σ 2 = {0, 3, 20}, which are themselves not
significantly different from each other.
250
Figure 7.6: Analysis of efficiency based on smoothed distributions
of average payoffs over 40 rounds for perfect-, high-, low-, and
no-meritocracy, respectively associated with the values of σ 2 =
{0, 3, 20, ∞}. Efficiency, measured as average payoff, increases as meritocracy
increases. Black solid lines indicate the mean payoff as implied by the respec-
tive payoff-dominant Nash equilibria, red solid lines indicate the mean payoff
observed in the experiment, red-shaded areas indicate the 95%-confidence in-
tervals of the mean. Blue dots indicate the payoff of the worst-off player (note
that the worst-off player in every equilibrium receives twenty ‘coins’).
251
7.7.2 Equality
Recall the theory prediction from Nax, Murphy, and Helbing, 2014 that equi-
libria supported by higher meritocracy levels feature more inequality in the
distribution of payoffs. In this section, we shall show that laboratory evidence
yields diametrically opposite results; namely, higher meritocracy levels lead to
outcomes that are more equal in terms of payoff distributions.
One can identify two measures of payoff inequality directly from the mo-
ments of the payoff distribution: (i) the payoff of the worst-off (Rawls, 1971),
2
P
i∈N (φi −φ)
φ = min{φi }, and (ii) the variance of payoffs, σ 2 = n
. A more sophis-
ticated third alternative is (iii) the Gini coefficient. In terms of all measures,
our analysis shows that equality increases with meritocracy. Note that the fol-
lowing results are also robust to other measures of inequality (Cowell, 2011)
(see appendix).
Fig. 7.7 shows that, like efficiency, equality also increases from σ 2 = ∞ (NO-
MERIT) through σ 2 = {20, 3} to σ 2 = 0 (PERFECT-MERIT). These in-
creases are reflected by differences in the Gini coefficient, and by the order of
the payoff of the worst-off – Rawlsian inequality. Under NO-MERIT, equality
is significantly below the level implied by equilibrium. For all three positive
levels of meritocracy, equality is above that achieved by NO-MERIT and above
the theoretically implied levels. Details about the statistical tests can be found
in the Statistical Analysis section of Materials and Methods section.
252
Figure 7.7: Level of payoff equality for perfect-, high-, low- and
no-meritocracy, respectively associated with the values of σ 2 =
{0, 3, 20, ∞}. Inequality, measured by the variance of payoff and by the Gini
coefficient, decreases, as meritocracy increases. Left panel: Smoothed distribu-
tions of average payoffs over 40 rounds. Black solid lines indicate the variance
of the payoffs as given by the respective payoff-dominant Nash equilibria, red
solid lines indicate the mean variance observed in the experiment, red-shaded
areas indicate the 95%-confidence intervals of the mean variance. Right panel:
Average Gini coefficient of the distribution of payoffs with 95%-confidence in-
tervals. Black solid lines and and red dots indicate the Gini coefficient implied
by the equilibrium (without fairness considerations).
253
7.7.3 Fairness
254
vantageous unfairness has an accentuated negative effect on a player’s utility,
while advantageous unfairness has a negative but weaker effect. This gain-loss
asymmetry is of course related to some of the most robust findings in exper-
imental economics (Kahneman and Tversky, 1979; Tversky and Kahneman,
1991; Erev, Ert, and Yechiam, 2008). The consequences of the distaste for un-
fairness are such that, on average, a player responds by decreasing (increasing)
his/her contribution after experiencing disadvantageous (advantageous) un-
fairness (Fehr and Schmidt, 1999; Ockenfels and Bolton, 2000). Importantly,
the tendency to decrease is stronger than the tendency to increase due to the
asymmetry in distastes. The typical contribution pattern found in repeated
public goods experiments (intermediate contribution levels at the beginning,
followed by a decay over time) can therefore be explained by heterogeneity
in social preferences, and reactions to (un)fairness and reciprocity (Ledyard,
1997; Chaudhuri, 2011b).
255
cally increasing it in order to enter a better group in the next round. In order
to account for this more complex reasoning, we generalize the concept of dis-
tributional fairness of Refs. (Fehr and Schmidt, 1999; Ockenfels and Bolton,
2000) to a definition of ‘meritocratic’ fairness, and we shall use it to explain
the deviations from equilibrium predictions in the intermediate meritocracy
regimes (HIGH-MERIT and LOW-MERIT).
1
P
M UDis = n−s
∗ j∈N max(∆ij , 0) ∗ max(∆Gj Gi , 0),
(7.22)
1
P
M UAdv = n−s
∗ j∈N max(∆ji , 0) ∗ max(∆Gi Gj , 0),
where for any pair of players, i and j in groups Gi and Gj (i 6= j), ∆ij represents
256
the difference in contributions ci − cj , and ∆Gi Gj is the difference in average
group contributions 14 k∈Gi ck − 14 k∈Gj ck .
P P
257
starting at the near-efficient Nash equilibrium prediction, we expect de-
creases as unfairness is expected to occur even in equilibrium. However,
other than under zero meritocracy, downward corrections of contributions
will not trigger an overall downward decay of contributions because higher
amounts become better and fair replies again than contributing zero once
substantial decreases of contributions occurred, which were themselves
triggered by disadvantageous unfairness. This is due to the fact that there
are then new strategic concerns.
Fig. 7.8 shows the distributions of meritocratic unfairness across different treat-
ments. Similarly to efficiency and inequality, we find increases in fairness from
NO-MERIT through all meritocracy levels up to PERFECT-MERIT, and
these increases are significant (LMM: F3,8 = 53.74, P < 0.0001).
258
the results of the regressions for distributional fairness are often inconsistent
across treatments, and, even in many cases contrary to the predictions of the
theory. On the other hand, meritocratic unfairness proved a good predictor of
the contribution adjustments between rounds across all treatments. Therefore,
meritocratic fairness can be seen as natural generalization of distributional
fairness in games with positive levels of meritocracy.
7.7.4 Sensitivity
So far, we have shown that (i) both efficiency and equality increase with mer-
itocracy, and that (ii) considerations of ‘meritocratic’ fairness can explain de-
viations from the theoretically expected equilibrium. In this section, we show
that changes in the level of experienced meritocracy have significant implica-
tions as well. In particular, we test whether participants coming from a higher
(lower) meritocracy level in part 1 are more (less) sensitive to meritocratic
unfairness in part 2.
For this analysis, we used the data pertaining of part 2 of the experiment,
controlling for which meritocracy level was played in part 1. We divided the
dataset in two subsets, depending on whether participants in part 2 experi-
enced a higher or lower meritocracy level than in part 1. In order to obtain a
balanced design with respect to the direction of meritocracy changes, we fur-
ther sampled the data from part 2 to include only the intermediate regimes of
meritocracy (σ 2 = {3, 20}). In this way, both conditions could be tested against
perfect meritocracy, zero meritocracy, and one intermediate regime. We cre-
ated a dummy variable for “contribution goes down” (0;1) and performed a
259
Figure 7.8: Meritocratic unfairness for perfect-, high-, low-, and
no-meritocracy, respectively associated with the values of σ 2 =
{0, 3, 20, ∞}. Smoothed distribution of average meritocratic unfairness per
round. Unfairness decreases as meritocracy increases. Red solid lines indicate
the mean level of meritocratic unfairness observed in the experiment, red-
shaded areas indicate the 95%-confidence intervals of the mean.
260
multilevel logistic regression with subject and session as random effects. We
used the level of disadvantageous meritocratic unfairness experienced in the
previous round as a predictor of whether contribution is expected to go up or
down in the next round.
Our main finding is that the distaste for meritocratic unfairness is exacer-
bated after having played a more meritocratic regimes in part 1. That is,
if a participant experienced meritocratic unfairness in the previous round,
he/she is more likely to reduce the own contribution in the current round
if the level of meritocracy in part 2 is lower than in part 1 (Logistic Mixed
Regression LMR: Z = 2.521, P = 0.0117). The effect in the opposite direction
– a lower meritocracy level in part 1 than in part 2 – is not significant (LMR:
Z = 1.522, P = 0.128).
261
7.8 Discussion
The standard case of random re-matching and a recently proposed and seminal
group-based mechanism (Gunnthorsdottir et al., 2010) were generalized to a
class of mechanisms called “meritocratic matching” (Nax, Murphy, and Hel-
bing, 2014). Here, we test these mechanism, we made the astonishing finding
that agents seem to be able to ‘make the better system work’. That is, mer-
itocratic mechanisms that promise higher efficiency from a theoretic point of
view, also turn out to benefit the worst-off and to improve overall distributional
equality, despite theory predicting the opposite (Nash, 1951). The reason for
this unexpected finding lies in agents’ attempts to improve ‘fairness’ by adjust-
ments of their actions in order to counter situations in which particular agents
are better-off (worse-off) despite being associated with low (high) ‘merit’. This
fairness concept not only explains our results in the new class of assortative
games studied by us, but also remains a significant explanatory variable in
262
games with random interactions, and is consistent with previous results for
this class of games. The criterion of ‘meritocratic’ fairness is formally different
from the standard formulation of ‘distributional’ fairness (Fehr and Schmidt,
1999; Ockenfels and Bolton, 2000), but for random interaction environments
their predictions agree qualitatively. In meritocratic environments, due to the
double-role of contributions inherent in the matching mechanism (both as a
group-sorting device and as a payoff determinant within groups), the concept of
‘meritocratic‘ fairness is indeed a natural extension of classical fairness criteria
when agents are aware of this double-nature.
The results of our study show that meritocracy can dissolve the fundamen-
tal tradeoff between efficiency and equality. Creating a public good does not
necessarily generate inefficiencies, nor it requires the intervention of a central
coercive power for their suppression. Fairness preferences and suitable insti-
tutional settings, such as well-working merit-based matching mechanisms, can
align agents’ incentives, and shift the system towards more cooperative and
near-efficient Nash equilibria. Overall, the results of our experiment lend cred-
ibility to agents’ sensitivity to the famous quote associated with Virgil that
“The noblest motive is the public good.”
263
Appendix: Materials and methods
Our stage games with n = 16, s = 4, B = 20 and m = 0.5 have the following
equilibria dependent on which variance level of σ 2 = {0, 3, 20, ∞} is played.
When σ 2 = ∞ (NO-MERIT), the only equilibrium is ci = 0 for all i. ci = 0 for
all i is also an equilibrium for all other variance levels. In that equilibrium, all
players receive a payoff of φi = 20. However, when σ 2 = {0, 3, 20}, there also
exist exactly nk unique pure-strategy equilibria such that ci = 0 for exactly
two agents and cj = 20 for the remaining fourteen. In that equilibrium, for
the case when σ 2 = 0 (PERFECT-MERIT), payoffs are such that twelve of
the fourteen players who contribute ci = 20 are matched in groups with each
other and receive φi = 40. The remaining four players are matched in the
worst group. Of those, the two players who contribute ci = 0 receive a payoff
of φi = 40, while the two players who contribute ci = 20 receive a payoff of
φi = 20. For the cases when σ 2 = 3 (HIGH-MERIT)/σ 2 = 20 (LOW-MERIT),
payoffs in the last group are as in the case when σ 2 = 0 (PERFECT-MERIT)
in over 99.9%/ 99% of all cases. In the remaining cases, payoffs are such that 6
out of fourteen players who contribute ci = 20 are matched in groups with each
other and receive φi = 40. The remaining 6 players who contribute ci = 20
are matched in a group with one player who contributes ci = 0 and receives a
payoff of 30. The two players who contribute ci = 0 receive a payoff of φi = 50
each. The near-efficient Nash equilibrium collapses when the variance reaches
a level of about σ 2 = 100 (see propositions 6 and 7 in Ref. (Nax, Murphy, and
264
Helbing, 2014)).
A total of 192 voluntary participants took part in one session consisting of two
separate games each. Each session lasted roughly one hour. There were 16 par-
ticipants in each session and 12 sessions in total. All sessions were conducted
at the ETH Decision Science Laboratory (DeSciL) in Zürich, Switzerland, us-
ing the experimental software NodeGame (nodegame.org). DeSciL recruited
the subjects using the Online Recruitment System for Economic Experiments
(ORSEE). The experiment followed all standard behavioral economics pro-
cedures and meets the ethical committee guidelines. Decisions, earnings and
payments were anonymous. Payments were administered by the DeSciL ad-
ministrators. In addition to a 10 CHF show-up fee, each subject was paid
according to a known exchange rate of 0.01 CHF per coin. Overall, monetary
rewards ranged from 30 to 50 CHF, with a mean of 39 CHF.
Each session consisted of two games, each of which was a forty-round repetition
of the same underlying stage game, namely a public-goods game. The same
fixed budget was given to each subject every period. Each game had separate
instructions that were distributed at the beginning of each game. After reading
the instructions, all participants were quizzed to make sure they understood
the task. The two games differ with respect to the variance level that is added
to players’ contributions. There were four variance levels (σ 2 = {0, 3, 20, ∞}),
and each game had equivalent instructions. Instructions contained full infor-
mation about the structure of the game and about the payoff consequences to
265
themselves and to the other agents. We played every possible pair of variance
levels in both orders to have an orthogonal balanced design, which yields a to-
tal of 12 sessions. As the game went on, players learnt about the other players’
previous actions and about the groups that formed. Each of our 192 partici-
pants made forty contribution decisions in each of the two games in his session.
This yields 80 choices per person per session, hence a total of 15,360 obser-
vations. More details, including a copy of a full instructions set and the quiz
questions, are provided in the Supplementary Information Appendix.
Each experimental session consisted of two separate games (part 1, part 2),
each played with a different variance level. We exhausted all possible pair of
variance levels in both orders, for a total of 12 different combinations. Con-
sequently, we prepared 12 different instruction texts that took into account
whether a variance level was played in the first or in the second part, and in
the latter case also considered which variance level was played in part 1.
266
Instructions for Variance Level = 20, Part 1
Welcome to the experiment and thanks for your participation. You have been
randomly assigned to an experimental condition with 16 people in total. In
other words you and 15 others will be interacting via the computer network
for this entire experimental session.
The experiment is divided into two parts and each part will last approximately
30-40 minutes long. Both parts of the experiment contribute to your final
earnings. The instructions for the first part of the experiment follow directly
below. The instructions for the second part of the experiment will be handed
out to you only after all participants have completed the first part of the
experiment. It is worth your effort to read and understand these instructions
well. You will be paid based on your performance in this study; the better
you perform, the higher your expected earnings will be for your participation
today.
Your decision.
In this part you will play 40 independent rounds. At the beginning of each
round, you will receive 20 “coins”. For each round, you will have to decide
how many of your 20 coins to transfer into your “personal” account, and how
many coins to transfer into a “group” account. Your earnings for the round
depend on how you and the other participants decide to divide the coins you
have received between the two accounts.
For each round you will be assigned to a group of 4 people, that is, you and
267
three other participants. In general, groups are formed by ranking each indi-
vidual transfer to the group account, from the highest to the lowest. Group 1
is generally composed of those participants who transferred the most to the
group account; Group 4 is generally composed of those who transferred the
least to the group account. The other groups (2 and 3) are between these two
extremes.
However, the sorting process is noisy by design; contributing more will increase
a participant’s chances of being in a higher ranked group, but a high ranking is
not guaranteed. Technical note- The noisy ranking and sorting is implemented
with the following process:
3. Step 3: Group matching. Based on the final list created at Step 2 (the
list with noise), the first 4 participants on that list form Group 1, the
next 4 people in the list form Group 2, the third 4 people in the list form
Group 3, and the last 4 people form Group 4.
268
Return from personal account.
Each coin that you put into your personal account results in a simple one-to-
one payoff towards your total earnings.
Each coin that you put into the group account will pay you back some positive
amount of money, but it depends also on how much the other group members
have transferred to the group account, as described below.
The total amount of coins in your group account is equal to the sum of the
transfers to the group account by each of the group members. That amount is
then multiplied by 2 and distributed equally among the 4 group members. In
other words, you will get a return equal to half of the group account total.
Final Earnings
Your total earnings for the first part of the experiment are equal to the sum of
all your rounds’ earnings. One coin is equal to 0.01 CHF. This may not appear
to be very much money, but remember there are 40 rounds in this part of the
experiment so these earnings build up.
Example
269
participant #8 transferred less to the group account than participant
#10, but the noisy sorting process placed him in a higher ranked group.
from the group account and the 13 he kept in his personal account).
270
Transfer Transfer Total Amount Total
Player
Group to groupto personalto group returned earnings
ID
account account account to player for the round
7 1 14 6 64 32 38
6 1 13 7 64 32 39
14 1 16 4 64 32 36
4 1 8 12 64 32 44
1 2 14 6 51 25.5 31.5
3 2 20 0 51 25.5 25.5
8 2 11 9 51 25.5 34.5
11 2 19 1 51 25.5 26.5
10 3 17 3 46 23 26
12 3 7 13 46 23 36
16 3 6 14 46 23 37
5 3 16 4 46 23 27
9 4 10 10 18 9 19
2 4 1 19 18 9 28
13 4 5 15 18 9 24
15 4 2 18 18 9 27
Additional examples are provided in a separate sheet for your own refer-
ence.
271
Quiz
Subjects were given a quiz after instructions to test their understanding of the
game. Only after “passing” the quiz were subjects allowed to begin play. Details
about the quiz can be found at http://nodegame.org/games/merit/.
272
Appendix C: Statistical analyses
Equality analysis
Similarly, the Gini index differs significantly among the four treatments (LMM:
F3,20 = 42.0, P < 0.0001). Taking NO-MERIT as a baseline, LOW-MERIT led
to a decrease in the variance of realized payoff in each round of -0.058901
(LRT χ(1) = 18.18, P < 0.0001), HIGH-MERIT to a decrease of -0.071843
(LRT χ(1) = 22.28, P < 0.0001), and PERFECT-MERIT to a decrease of
-0.075453 (LRT χ(1) = 22.06, P < 0.0001). Computing Bonferroni adjusted
p-values for all pair-wise differences reveals that the treatment with variance
∞ is significantly different (P < 0.0001) from the other three variance levels
(σ 2 = {0, 3, 20}), which are themselves not significantly different from each
other (see Fig. 7.7).
273
Fairness analysis
274
Fairness regressions
Meritocratic fairness
Distributional fairness
The results of the regressions for distributional fairness are shown in tables 7.4,
7.5, 7.6 and 7.7. Based on the original formula in Ref. Fehr and Schmidt, 1999,
we tried two different extensions of the notion of distributional fairness for
275
Table 7.2: Meritocratic fairness predicts contribution differential.
(Part 1) The sign of the regression coefficient is always consistent with theory
predictions. HIGH-MERIT is significant if pooled together with LOW-MERIT.
276
Then, we also computed distributional fairness across all players, regardless of
the group they belonged to (Across-group distributional fairness). The regres-
sors for across-group distributional fairness are called: lag.distr.fair.dis
and lag.distr.fair.adv.
HIGH-
HIGH- LOW- MERIT & NO-
PERFECT-
MERIT MERIT LOW- MERIT
MERIT
MERIT
(Intercept) −0.79∗∗∗ −1.39∗∗∗ −1.32∗∗∗ −1.39∗∗∗ 1.40∗∗
(0.23) (0.22) (0.21) (0.15) (0.45)
lag.distr.fair.group.dis −0.03 0.13∗∗ 0.01 0.06∗ −0.70∗∗∗
(0.04) (0.05) (0.05) (0.03) (0.04)
lag.distr.fair.group.adv 0.76∗∗∗ 0.99∗∗∗ 0.77∗∗∗ 0.88∗∗∗ 0.28∗∗∗
(0.04) (0.04) (0.04) (0.03) (0.04)
AIC 11682.40 11933.18 12025.27 23952.86 11968.23
BIC 11715.59 11966.38 12058.46 23990.22 12001.43
Log Likelihood -5835.20 -5960.59 -6006.64 -11970.43 -5978.12
Num. obs. 1872 1872 1870 3742 1872
*** p < 0.001, ** p < 0.01, * p < 0.05
277
Table 7.5: Within-group distributional fairness predicts contribution
differential. (Part 2) The sign of the regression coefficient is often inconsis-
tent with theory predictions.
HIGH-
HIGH- LOW- MERIT & NO-
PERFECT-
MERIT MERIT LOW- MERIT
MERIT
MERIT
(Intercept) −0.93∗∗∗ −1.54∗∗∗ −1.25∗∗∗ −1.43∗∗∗ 1.60∗∗∗
(0.25) (0.40) (0.23) (0.22) (0.38)
lag.distr.fair.group.dis −0.10∗ 0.05 −0.06 0.00 −0.61∗∗∗
(0.04) (0.04) (0.05) (0.03) (0.03)
lag.distr.fair.group.adv 0.88∗∗∗ 1.19∗∗∗ 0.86∗∗∗ 1.02∗∗∗ 0.15∗∗∗
(0.04) (0.04) (0.04) (0.03) (0.03)
AIC 11856.01 11799.36 12109.33 23935.12 11827.92
BIC 11889.21 11832.55 12142.53 23972.48 11861.12
Log Likelihood -5922.01 -5893.68 -6048.67 -11961.56 -5907.96
Num. obs. 1871 1872 1871 3743 1872
*** p < 0.001, ** p < 0.01, * p < 0.05
HIGH-
HIGH- LOW- MERIT & NO-
PERFECT-
MERIT MERIT LOW- MERIT
MERIT
MERIT
(Intercept) −1.42∗∗∗ −2.40∗∗∗ −2.20∗∗∗ −2.23∗∗∗ 1.04∗
(0.26) (0.34) (0.34) (0.24) (0.40)
lag.distr.fair.dis 0.22∗∗∗ 0.39∗∗∗ 0.33∗∗∗ 0.35∗∗∗ −0.44∗∗∗
(0.03) (0.04) (0.04) (0.03) (0.05)
lag.distr.fair.adv 0.44∗∗∗ 0.59∗∗∗ 0.43∗∗∗ 0.48∗∗∗ 0.13∗
(0.08) (0.10) (0.08) (0.06) (0.05)
AIC 11934.03 12223.59 12225.86 24434.15 12277.90
BIC 11967.23 12256.79 12259.05 24471.51 12311.10
Log Likelihood -5961.02 -6105.80 -6106.93 -12211.07 -6132.95
Num. obs. 1872 1872 1870 3742 1872
*** p < 0.001, ** p < 0.01, * p < 0.05
278
Table 7.7: Across-group distributional fairness predicts contribution
differential. (Part 2) The sign of the regression coefficient is often inconsis-
tent with theory predictions.
HIGH-
HIGH- LOW- MERIT & NO-
PERFECT-
MERIT MERIT LOW- MERIT
MERIT
MERIT
(Intercept) −2.15∗∗∗ −1.98∗∗∗ −2.19∗∗∗ −2.01∗∗∗ 1.96∗∗∗
(0.30) (0.30) (0.35) (0.23) (0.48)
lag.distr.fair.dis 0.21∗∗∗ 0.29∗∗∗ 0.30∗∗∗ 0.29∗∗∗ −0.49∗∗∗
(0.03) (0.03) (0.04) (0.02) (0.04)
lag.distr.fair.adv 0.65∗∗∗ 0.54∗∗∗ 0.46∗∗∗ 0.48∗∗∗ −0.04
(0.09) (0.09) (0.09) (0.06) (0.04)
AIC 12162.64 12222.36 12374.95 24584.87 12068.03
BIC 12195.83 12255.56 12408.15 24622.23 12101.23
Log Likelihood -6075.32 -6105.18 -6181.48 -12286.43 -6028.02
Num. obs. 1871 1872 1871 3743 1872
*** p < 0.001, ** p < 0.01, * p < 0.05
279
Additional inequality indexes
280
Appendix D: Implications
Our model implies that situations consistent with our model assumptions
would benefit from higher degrees of meritocracy, both in terms of efficiency
and in terms of equality. This positive result relies on several features of the
underlying model. It is an avenue for future research to consider these gen-
eralizations. First, our model describes an ex ante homogeneous population.
Differences in payoff are driven by differences in actions and by neutral stochas-
tic elements alone. Heterogeneity in priority given by the matching mechanism
and/or heterogeneities in the individual rates of return could influence the re-
sults. This is true for any public-goods game including the standard models
with random interactions (e.g. Buckley and Croson, 2006; Fischbacher, Schudy,
and Teyssier, 2014). However, it should be noted that meritocracy may actually
mitigate the associated inequality problems. Second, related to heterogeneity,
our model allows for no wealth creation, that is, individuals receive a new
budget every period and the size of this budget is fixed and constant over
time. Players cannot accumulate wealth. The role of wealth creation in public-
goods games has received some attention and has been shown to lead to the
emergence of different classes of contributions and income (e.g. Tamai, 2010,
see also King and Rebelo, 1990; Rebelo, 1991). Under assortative matching,
wealth creation can be problematic as it allows rich players to block out poor
players. Third, group sizes are fixed. Alternative models have been proposed
(e.g. Cinyabuguma, Page, and Putterman, 2005; Charness and Yang, 2008;
Ehrhart and Keser, 1999; Ahn, Isaac, and Salmon, 2008; Coricelli, Fehr, and
Fellner, 2004; Page, Putterman, and Unel, 2005; Brekke, Nyborg, and Rege,
281
2007; Brekke et al., 2011).
References
282
Arrow, K. J. (1951). Social Choice and Individual Values. Yale, USA: Yale
University Press.
Atkinson, A. B. (1970). “On the measurement of inequality”. In: Journal of
Economic Theory 2, pp. 244–263.
– (2012). “Public Economics after the Idea of Justice”. In: Journal of Human
Development and Capabilities 13.4, pp. 521–536.
Bayer, R.-C., E. Renner, and R. Sausgruber (2013). “Confusion and learning in
the voluntary contributions game”. In: Experimental Economics 16, pp. 478–
496.
Becker, G. S. (1973). “A Theory of Marriage: Part 1”. In: Journal of Political
Economy 81, pp. 813–846.
Bentham, J. (1907). An Introduction to the Principles of Morals and Legisla-
tion. Clarendon Press.
Binmore, K. (2005). Natural Justice. Oxford University Press.
Bohm, R. and B. Rockenbach (2013). “The inter-group comparisonintra-group
cooperation hypothesis”. In: PLoS ONE 8, p. 56152.
Bowles, S. and H. Gintis (2011). A cooperative specieshuman reciprocity and
its evolution. Princeton University Press.
Brekke, K., K. Nyborg, and M. Rege (2007). “The fear of exclusion: individual
effort when group formation is endogenous”. In: Scandinavian Journal of
Economics 109, pp. 531–550.
Brekke, K. et al. (2011). “Playing with the good guys. A public good game
with endogenous group formation”. In: Journal of Public Economics 95,
pp. 1111–1118.
283
Buckley, Edward and Rachel Croson (2006). “Income and wealth heterogeneity
in the voluntary provision of linear public goods”. In: Journal of Public
Economics 90.4-5, pp. 935–955.
Cabral, L. M. B. (1988). “ASYMMETRIC EQUILIBRIA IN SYMMETRIC
GAMES WITH MANY PLAYERS”. In: Economic Letters 27, pp. 205–208.
Charness, G. B. and C.-L. Yang (2008). “Endogenous Group Formation and
Public Goods Provision: Exclusion, Exit, Mergers, and Redemption”. In:
University of California at Santa Barbara, Economics WP.
Chaudhuri, A. (2011a). “Sustaining cooperation in laboratory public goods ex-
periments: a selective survey of the literature”. In: Experimental Economics
14, pp. 47–83.
Chaudhuri, Ananish (2011b). “Sustaining cooperation in laboratory public
goods experiments: a selective survey of the literature”. In: Experimental
Economics 14, pp. 47–83.
Cinyabuguma, M., T. Page, and L. Putterman (2005). “Cooperation under
the threat of expulsion in a public goods experiment”. In: Journal of Public
Economics 89, pp. 1421–1435.
Cole, H. J., G. Mailath, and A. Postlewaite (1992). “Social Norms, Savings
Behavior, and Growth”. In: Journal of Political Economy 100, pp. 1092–
1125.
Coricelli, G., D. Fehr, and G. Fellner (2004). “Partner Selection in Public
Goods Experiments”. In: Economics Series 151.
Cowell, F. (2011). Measuring Inequality. Oxford University Press.
284
Dickinson, D. L. and R. M. Isaac (1998). “Absolute and relative rewards for
individuals in team production”. In: Managerial and Decision Economics
19, pp. 299–310.
Ehrhart, K. and C. Keser (1999). “Mobility and cooperation: On the run”. In:
CIRANO WP 99.s-24.
Erev, Ido, Eyal Ert, and Eldad Yechiam (2008). “Loss aversion, diminishing
sensitivity, and the effect of experience on repeated decisions”. In: Journal
of Behavioral Decision Making 21.5, pp. 575–597.
Fehr, E. and C. Camerer (2007). “Social neuroeconomics: the neural circuitry
of social preferences”. In: Trends in Cognitive Sciences 11, pp. 419–427.
Fehr, Ernst and Simon Gächter (2000). “Cooperation and Punishment in Pub-
lic Goods Experiments”. In: Am. Econ. Rev. 90, pp. 980–994.
Fehr, Ernst and Klaus M. Schmidt (1999). “A Theory of Fairness, Competition,
and Cooperation”. In: Quarterly J. Econ. 114, pp. 817–868.
Feldman, A. (1980). Welfare Economics and Social Choice Theory. Boston,
USA: Martinus Nijhoff Publishing.
Ferraro, P. J. and C. A. Vossler (2010). “The source and significance of con-
fusion in public goods experiments”. In: The B.E. Journal in Economic
Analysis and Policy 10, p. 53.
Fischbacher, U. and S. Gaechter (2010). “Social preferences, beliefs, and the
dynamics of free riding in public good experiments”. In: American Economic
Review 100, pp. 541–556.
Fischbacher, Urs, Simeon Schudy, and Sabrina Teyssier (2014). “Heteroge-
neous reactions to heterogeneity in returns from public goods”. In: Social
Choice and Welfare 43.1, pp. 195–217.
285
Foster, D. and H. P. Young (1990). “Stochastic evolutionary game dynamics”.
In: Theoretical Population Biology 38, pp. 219–232.
Goeree, J. K., C. A. Holt, and S. K. Laury (2002). “Private costs and public
benefits: Unraveling the effects of altruism and noisy behavior”. In: Journal
of Public Economics 83, pp. 255–276.
Greenberg, Jerald (1987). “A taxonomy of organizational justice theories”. In:
Academy of Management review 12.1, pp. 9–22.
Greenwood, J. et al. (2014). “MARRY YOUR LIKE: ASSORTATIVE MAT-
ING AND INCOME INEQUALITY”. In: NBER WP 19829.
Grund, T., C. Waloszek, and D. Helbing (2013). “How Natural Selection Can
Create Both Self- and Other-Regarding Preferences, and Networked Minds”.
In: Scientific Reports 3, p. 1480.
Gunnthorsdottir, A. and P. Thorsteinsson (2010). “Tacit Coordination and
Equilibrium Selection in a Merit-Based Grouping Mechanism: A Cross-
Cultural Validation Study”. In: Department of Economics WP 0.
Gunnthorsdottir, A., R. Vragov, and J. Shen (2010). “TACIT COORDINA-
TIONIN CONTRIBUTION-BASED GROUPING WITH TWO ENDOW-
MENT LEVELS”. In: Research in Experimental Economics 13, pp. 13–75.
Gunnthorsdottir, A. et al. (2010). “Near-efficient equilibria in contribution-
based competitive grouping”. In: Journal of Public Economics 94, pp. 987–
994.
Hamilton, W. D. (1964a). “The Genetical Evolution of Social Behaviour I”.
In: Journal of Theoretical Biology 7, pp. 1–16.
– (1964b). “The Genetical Evolution of Social Behaviour II”. In: Journal of
Theoretical Biology 7, pp. 17–52.
286
Hardin, Gerrett (1968). “The Tragedy of the Commons”. In: Science 162,
pp. 1243–1248.
Harsanyi, J. (1953). “Cardinal Utility in Welfare Economics and in the Theory
of Risk-Taking”. In: Journal of Political Economy 61, pp. 434–435.
Harsanyi, J. C. and R. Selten (1988a). A General Theory of Equilibrium Se-
lection in Games. MIT Press.
– (1988b). A General Theory of Equilibrium Selection in Games. Cambridge,
MA: MIT Press.
Hayek, F. A. von (1935). “The Nature and History of the Problem”. In: Col-
lectivist Economic Planning, pp. 1–47.
Helbing, D. (1996). “A stochastic behavioral model and a ‘microscopic’ foun-
dation of evolutionary game theory”. In: Theory and Decision 40, pp. 149–
179.
Irlenbusch, B. and G. Ruchala (2008). “Relative Rewards within Team-Based
Compensation”. In: Labour Economics 15, pp. 141–167.
Isaac, M. and J. Walker (1988). “Group Size Effects in Public Goods Provi-
sion: The Voluntary Contributions Mechanism”. In: Quarterly Journal of
Economics 103, pp. 179–199.
Isaac, M. R., K. F. McCue, and C. R. Plott (1985a). “Public goods provision in
an experimental environment”. In: Journal of Public Economics 26, pp. 51–
74.
Isaac, Mark R., Kenneth F. McCue, and Charles R. Plott (1985b). “Pub-
lic goods provision in an experimental environment”. In: Journal of Public
Economics 26, pp. 51–74.
287
Jones-Lee, M. W. and G. Loomes (1995). “Discounting and Safety”. English.
In: Oxford Economic Papers. New Series 47, pp. 501–512.
Kahneman, D. and A. Tversky (1979). “Prospect Theory: An Analysis of De-
cision under Risk”. In: Econometrica 47, pp. 263–291.
Kandori, M., G. J. Mailath, and R. Rob (1993). “Learning, mutation, and long
run equilibria in games”. In: Econometrica 61, pp. 29–56.
King, Robert G and Sergio Rebelo (1990). “Public Policy and Economic Growth:
Developing Neoclassical Implications”. In: Journal of Political Economy 98.5,
S126–50.
Lane, G. (2004). Genghis Khan and Mongol Rule. Greenwood.
Ledyard, J. O. (1995). “Public Goods: A Survey of Experimental Research”. In:
in J. H. Kagel and A. E. Roth (Eds.), Handbook of experimental economics
37, pp. 111–194.
– (1997). “Public Goods: A Survey of Experimental Research”. In: The Hand-
book of Experimental Economics. Ed. by J. H. Kagel and A. E. Roth. Prince-
ton, NJ: Princeton University Press, pp. 111–194.
Maynard Smith, J. and G. R. Price (1973). “The logic of animal conflict”. In:
Nature 246, pp. 15–18.
Mises, Ludwig von (1922). Die Gemeinwirtschaft: Untersuchungen über den
Sozialismus. Jena, Germany: Gustav Fischer Verlag.
Miyazaki, I. (1976). China’s Examination Hell: The Civil Service Examinations
of Imperial China. Weatherhill.
Nash, John (1950). “Equilibrium points in n-person games”. In: Proc. Natl.
Acad. Sci. USA 36, pp. 48–49.
– (1951). “Non-cooperative games”. In: Ann. Math. 54, pp. 286–295.
288
Nax, H. H., R. O. Murphy, and D. Helbing (2014). Stability and welfare of
‘merit-based’ group-matching mechanisms in voluntary contribution games.
Nax, H. H. et al. (2013). “Learning in a Black Box”. In: Department of Eco-
nomics WP, University of Oxford 653.
Nowak, M. A. (2006). “Five rules for the evolution of cooperation”. In: Science
314, pp. 1560–1563.
Ockenfels, Axel and Gary E. Bolton (2000). “ERC: A Theory of Equity, Reci-
procity, and Competition”. In: American Economic Review 90.1, pp. 166–
193.
Okun, A.M. (1975). The Big Tradeoff. Washington D.C.: Brookings Institution
Press.
Ones, U. and L. Putterman (2007). “The ecology of collective action: A pub-
lic goods and sanctions experiment with controlled group formation”. In:
Journal of Economic Behavior and Organization 62, pp. 495–521.
Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for
Collective Action. Cambridge, U.K.: Cambridge University Press.
Ostrom, Elinor (1999). “COPING WITH TRAGEDIES OF THE COMMONS”.
In: Annu. Rev. Polit. Sci. 2, pp. 493–535.
Page, T., L. Putterman, and B. Unel (2005). “Voluntary association in public
goods experiments: reciprocity,mimicryand efficiency”. In: The Economic
Journal 115, pp. 1032–1053.
Palfrey, T. R. and J. E. Prisbrey (1996). “Altruism, reputation and noise in lin-
ear public goods experiments”. In: Journal of Public Economics 61, pp. 409–
427.
289
Palfrey, T. R. and J. E. Prisbrey (1997). “Anomalous behavior in public
goods experiments: how much and why?” In: American Economic Review
87, pp. 829–846.
Rabanal, J. P. and O. A. Rabanal. “Efcient Investment via Assortative Match-
ing: A laboratory experiment”. In: mimeo.
Rawls, J. (1971). A Theory of Justice. Belknap Press.
Rebelo, Sergio (1991). “Long-Run Policy Analysis and Long-Run Growth”. In:
Journal of Political Economy 99.3, pp. 500–521.
Samek, S. and R. Sheremeta (2014). “Visibility of Contributors: An Experi-
ment on Public Goods”. In: Experimental Economics.
Samuelson, P. A. (1980). Foundations of Economic Analysis. Cambridge, USA:
Harvard University Press.
Sen, Amartya (1970). “The Impossibility of a Paretian Liberal”. In: Journal
of Political Economy 78.1, pp. 152–57.
Simon, H. A. (1990). “A mechanism for social selection and successful altru-
ism”. In: Science 250, pp. 1665–1668.
Tamai, Toshiki (2010). “Public goods provision, redistributive taxation, and
wealth accumulation”. In: Journal of Public Economics 94.11-12, pp. 1067–
1072.
Taylor, P. D. and L. Jonker (1978). “Evolutionary stable strategies and game
dynamics”. In: Mathematical Bioscience 40, 145156.
Tversky, A. and D. Kahneman (1991). “Loss Aversion in Riskless Choice:
A Reference Dependent Model”. In: Quarterly Journal of Economics 106,
pp. 1039–1061.
Weibull, J. (1995). Evolutionary Game Theory. The MIT Press.
290
Young, H. P. (1993). “The Evolution of Conventions”. In: Econometrica 61,
pp. 57–84.
– (1998). Individual Strategy and Social Structure: An Evolutionary Theory of
Institutions. Princeton University Press.
Young, M. (1958a). The Rise of the Meritocracy, 1870-2033: An Essay on
Education and Equality. Transaction Publishers.
– (1958b). The Rise of the Meritocracy, 1870-2033: An Essay on Education
and Equality. Transaction Publishers.
291
Chapter 8
Conclusion
292
Maybe better names for “game theory” would be “strategy theory/strategics”
or “interaction theory/interactics”. The word “game”, in everyday language,
insinuates joy or playfulness, and is therefore often mis-associated with such
things as the (computer) gaming industry or board games. This stands in
stark contrast to the seriousness of many of the interactions that are studied
using game-theoretic models such as political conflict, public goods provision,
or organ transplantation markets. But the word “game” also does something
useful. Namely, it captures something integral to human nature related to
what has been described as homo ludens by Johan Huizinga (in his 1938-
book), which is that humans, even in very serious situations, often behave in
ways that are hard to predict because they experiment/gamble/reason in ways
of strategic logic that are hard to decipher.
The aim of this thesis is to improve our understanding of two separate aspects
fundamental to behavioral game theory. On the one hand, the thesis aims
to contribute to predicting the consequences of behavioral models of game
play, especially of game dynamics driven by learning. Chapters 2, 4, 5 and 7
are written to this end. On the other hand, the thesis seeks to improve our
modeling foundations, that is, to know what behavioral models best describe
293
the deviations from standard economic predictions. Chapters 1, 3, 6 and 7
pursue this goal.
294
(precisely these components were explored theoretically in Chapters 2 and 4).
Such trend-following behaviors persist with possibly grave consequences even
on financial markets where extreme rationality assumptions are often made
instead (Chapter 6).
Perhaps the most subtle findings of the thesis were borne out of our recent
work on institutionally “meritocratic” mechanism designs (Chapter 7). Theory
made the prediction that higher levels of meritocracy would increase efficiency,
but at the cost of increased inequality. In reality, however, despite aggregate
macro-behavior closely resembling equilibrium predictions in the higher mer-
itocracy regimes, meritocracy increased both efficiency and equality. This
mismatch between theory and evidence is resolved by inspection of the under-
lying micro-adjustments. It turns out that a supercritical number of agents
cares about ‘meritocratic fairness’, and these agents adjust their own behavior
in reaction to inequalities sufficiently, so that more efficient regimes are made
also more equitable.
Of course, this thesis has not settled matters definitely regarding the complex
subject of human interactions. Instead, the contribution of the thesis was to
propose ways and methods, as part of a novel research agenda, with the aim
of integrating existing theories. I intend to pursue this agenda further in the
future, starting with several directions that I find most pressing and interesting.
In particular, I intend to study how game structures and information contexts
influence the type of reasoning humans tend to use. In parallel, I want to
explore how institutions and mechanisms influence behaviors, focussing on real-
world applications and laboratory experiments. Subsequently, such findings
could be prove useful to help design better institutions and mechanisms.
295