Adaptive Dynamics of Memory-One Strategies in The Repeated Donation Game - La Porte
Adaptive Dynamics of Memory-One Strategies in The Repeated Donation Game - La Porte
Adaptive Dynamics of Memory-One Strategies in The Repeated Donation Game - La Porte
RESEARCH ARTICLE
Introduction
Evolution of cooperation is of considerable interest, because it demonstrates that natural selec-
tion does not only lead to selfish, brutish behavior red in tooth and claw [1, 2]. Yet in absence
of a mechanism for its evolution, natural selection opposes cooperation. A mechanism for evo-
lution of cooperation is an interaction structure that allows natural selection to favor coopera-
tion over defection [3]. Direct reciprocity is one such mechanism [4–8]. This mechanism is
based on repeated interactions among the same individuals. In a repeated interaction, individ-
uals can condition their decisions on their co-player’s previous behavior. By being more coop-
erative towards other cooperators, they can generate a favorable social environment for the
evolution of cooperation.
The most basic model to illustrate reciprocity is the repeated donation game [1]. This game
takes place among two players, who interact for many rounds. Each round, players indepen-
dently decide whether to cooperate or defect. Cooperation implies a cost c for the donor and
generates a benefit b for the recipient. Defection implies no cost and confers no benefit. Both
players decide simultaneously. If they both cooperate, each of them gets payoff b − c. If both
players defect, each of them gets payoff 0. If one player cooperates while the other defects, the
cooperator’s payoff is −c while the defector’s is b. The donation game is a special case of a pris-
oner’s dilemma if b > c > 0, which is normally assumed.
If the donation game is played for a single round, players can only choose between the two
possible strategies of cooperation and defection. Based on the game’s payoffs, each player pre-
fers to defect, creating the dilemma. In contrast, in the repeated donation game, infinitely
many strategies are available. For example, players may choose to cooperate if and only if their
co-player cooperated in the previous round. This is the well-known strategy Tit-for-tat [5, 9].
Alternatively, players may wish to occasionally forgive a defecting opponent, as captured by
Generous Tit-for-tat [10, 11]. Against each of these strategies, unconditional defection is no
longer the best response. Instead, mutual cooperation is now in the co-player’s best interest.
During the past decades, there has been a considerable effort to explore whether condition-
ally cooperative behaviors would emerge naturally (e.g., [12–24]). To this end, researchers
study the dynamics in evolving populations, in which strategies are transmitted either by bio-
logical or cultural evolution (by inheritance or imitation). For such an analysis, it is useful to
restrict the space of strategies that individuals can choose from. The strategy space ought to be
small enough for a systematic analysis, yet large enough to capture the most interesting
behaviors.
One frequently used subspace is the set of memory-one strategies [24–32]. Players with
memory-one strategies respond to the outcome of the previous round only. Such strategies
can be written as a vector p = (pCC, pCD, pDC, pDD) in the 4-dimensional cube [0, 1]4. Each
entry pij reflects the player’s conditional cooperation probability, depending on the four possi-
ble outcomes of the previous round, CC, CD, DC, DD (the first letter is the focal player’s action,
the second letter is the co-player’s action). Despite their simplicity, memory-one strategies can
capture many different behavioral archetypes. They include always defect, ALLD = (0, 0, 0, 0),
always cooperate, ALLC = (1, 1, 1, 1), Tit-for-tat, TFT = (1, 0, 1, 0) [5, 9], Generous Tit-for-tat,
GTFT = (1, x, 1, x) with 0 < x < 1 [10, 11], and Win-stay, Lose-shift, WSLS = (1, 0, 0, 1) [25,
33]. The sixteen corner points of the cube are the pure strategies. The interior of the cube are
stochastic strategies. The center of the cube is the random strategy (1/2, 1/2, 1/2, 1/2) [5].
Conditionally cooperative strategies have been of particular interest in the study of human
behavior. For example, there is evidence for the intuitive expectation that people tend to coop-
erate more if their co-player was cooperative in the past, or if they expect their co-player to
cooperate in the future [34–36]. The concept of conditionally cooperative strategies is quite
broad and includes strategies such as Tit-for-two-tats, which cannot be realized as a memory-
one strategy. In this paper we consider only conditionally cooperative strategies which can be
realized as memory-one strategies, such as TFT, GTFT, and nearby strategies. However, it is
hoped that techniques similar to the ones used in this paper can be used to study more general
strategy spaces.
When both players adopt memory-one strategies, there is an explicit formula to derive their
average payoffs (as described in the next section). Based on this formula, it is possible to char-
acterize all Nash equilibria among the memory-one strategies [37–42]. In general, however the
payoff formula yields a complex expression in the players’ conditional cooperation probabili-
ties pij. As a result, it is difficult to characterize the dynamics of evolving populations, in which
players switch strategies depending on the payoffs they yield. Most previous work had to resort
to individual-based simulations. Only in special cases, an analytical description has been feasi-
ble (for example, based on differential equations). One special case arises when individuals are
restricted to use reactive strategies [43–48]. Reactive strategies only depend on the co-player’s
previous move. Within the memory-one strategies, they correspond to the 2-dimensional sub-
set with pCC = pDC and pCD = pDD. In addition, there has been work on the replicator dynamics
among three strategies [15, 49], and on the dynamics among transformed memory-one strate-
gies [50, 51]. Here, we wish to explore the dynamics among memory-one strategies directly,
using adaptive dynamics [52, 53].
We begin by describing two interesting mathematical results. First, we show that under
adaptive dynamics, the 4-dimensional space of memory-one strategies contains an invariant
3-dimensional subset. This subset comprises all “counting strategies”. These strategies only
depend on the number of cooperators in the previous round. They correspond to memory-
one strategies with pCD = pDC. Second, we find that for the donation game, the adaptive
dynamics exhibits an interesting symmetry between orbits forward-in-time and backward-in-
time. We use these mathematical results to partially characterize the adaptive dynamics among
memory-one strategies, and to fully characterize the dynamics among memory-one counting
strategies.
Model
We study the infinitely repeated donation game between two players. Each round, each player
has the option to cooperate (C) or to defect (D). Players make their choices independently, not
knowing their co-player’s choice in that round. Payoffs in each round are given by the matrix
C D
� �
C b c c ð1Þ
D b 0
The entries correspond to the payoff of the row-player, with b and c being the benefit and
cost of cooperation, respectively. We assume b > c > 0 throughout. The above payoff matrix is
C D
� �
C R S ð2Þ
D T P
The payoff matrix (1) of the donation game satisfies the typical inequalities of a prisoner’s
dilemma, T > R > P > S and 2R > T + S. Moreover, it satisfies the condition of ‘equal gains
from switching’,
RþP ¼ T þS ð3Þ
This condition ensures that if players interact repeatedly, their overall payoffs only depend
on how often each player cooperates, independent of the timing of cooperation.
In the following we focus on repeated games among players with memory-one strategies.
Each player’s decision is determined by a four-tuple p = (pCC, pCD, pDC, pDD). Depending on
the outcome of the previous round, CC, CD, DC, or DD, the focal player responds by cooperat-
ing with probability pCC, pCD, pDC, or pDD, respectively.
Strategies with large pCC exhibit a high frequency of mutual cooperation and will receive
relatively large payoffs in the donation game. We note that in games with other payoff matrices
2, it may be beneficial in the long run for players to take turns cooperating with each other
while the other defects. This behavior is called ST-reciprocity, because players will alternately
receive payoffs S and T rather than R in every round. ST-reciprocity becomes superior to R-
reciprocity in terms of payoffs when S + T > 2R, and it can be achieved by memory-one strate-
gies such as (p1, 0, 1, p4) with small but positive p1, p4. For an account of ST- and R-reciprocity
in other 2 × 2 games such as the Chicken or Snowdrift game, see [54, 55]. For the donation
game, where S + T = R < 2R, we are primarily interested in the evolution of mutual coopera-
tion CC.
We refer to a memory-one strategy as a counting strategy if it satisfies pCD = pDC. A count-
ing strategy only reacts to the number of cooperators in the previous round. If both players
cooperated in the previous round, they cooperate with probability pCC. If exactly one of the
players cooperated, they cooperate with probability pCD = pDC, irrespective of whether the out-
come was CD or DC. If no one cooperated, the cooperation probability is pDD. Memory-one
counting strategies include all unconditional strategies (such as ALLC and ALLD), as well as
the strategies GRIM = (1, 0, 0, 0) and WSLS = (1, 0, 0, 1).
If the two players employ memory-one strategies p = (pCC, pCD, pDC, pDD) and
p0 ¼ ðp0CC ; p0CD ; p0DC ; p0DD Þ, then their behavior generates a Markov chain with transition matrix
0 1
pCC p0CC pCC ð1 p0CC Þ ð1 pCC Þp0CC ð1 pCC Þð1 p0CC Þ
B C
B pCD p0DC pCD ð1 p0DC Þ ð1 pCD Þp0DC ð1 pCD Þð1 p0DC Þ C
M¼B
B p p0
C ð4Þ
@ DC CD pDC ð1 p0CD Þ ð1 pDC Þp0CD ð1 pDC Þð1 p0CD Þ C
A
pDD p0DD pDD ð1 p0DD Þ ð1 pDD Þp0DD ð1 pDD Þð1 p0DD Þ
That is, if s(n) = (sCC(n), sCD(n), sDC(n), sDD(n)), and sij(n) is the probability that the p-
player chooses i and the p0 -player chooses j in round n, then s(n + 1) = s(n)M. For
p, p0 2 (0, 1)4, the Markov chain has a unique invariant distribution v = (vCC, vCD, vDC, vDD).
This distribution v corresponds to the left eigenvector of M with respect to the eigenvalue 1,
normalized such that the entries of v sum up to one. The entries of v can be interpreted as the
average frequency of the four possible outcomes over the course of the game. Therefore we can
For a more explicit representation of the players’ payoffs, one can use the determinant for-
mula by [56], which is shown in Methods.
To explore how players adapt their strategies over time, we use adaptive dynamics [52, 53].
Adaptive dynamics is a method to study deterministic evolutionary dynamics in a continuous
strategy space. The idea is that the population is (mostly) homogeneous at any given time.
Mutations generate a small ensemble of possible invaders, which are very close to the resident
in strategy space. These invaders can take over the population if they receive a higher payoff
against the resident than the resident achieves against itself. In the limit of infinitesimally small
variation between resident and invader, we obtain an ordinary differential equation. For mem-
ory-one strategies this differential equation takes the form
�
@Aðp; p0 Þ ��
p_ ij ¼ with i; j 2 fC; Dg ð6Þ
@pij �p¼p0
That is, populations evolve towards the direction of the payoff gradient. We derive an
explicit representation of this differential equation in Methods. The resulting expression
defines a flow on the cube [0, 1]4. Our aim is to understand the properties of this flow.
Results
Structural properties of adaptive dynamics
We begin by describing two general properties of adaptive dynamics in the cube [0, 1]4 of
memory-one strategies. The first property concerns an invariance result. As we prove in Meth-
ods, the subspace of counting strategies is left invariant under adaptive dynamics. That is, if
the initial population p(0) satisfies pCD(0) = pDC(0) and p(t) is a solution of the dynamic (6),
then pCD(t) = pDC(t) for all times t. Therefore, if initially all population members only care
about the number of cooperators, then the same is true for all future population members.
This result does not require the specific payoffs of the donation game. Instead it is true for all
symmetric 2 × 2 games. The result is useful because it allows us to decompose the space of
memory-one strategies into three invariant sets: the set of strategies with pCD > pDC, with
pCD = pDC, and with pCD < pDC. Each of these invariant subsets can be studied in isolation. In a
subsequent section, we provide such an analysis for the counting strategies (with pCD = pDC)
specifically.
As a second property, we observe an interesting symmetry between different orbits of adap-
tive dynamics. Specifically, if (pCC, pCD, pDC, pDD)(t) is a solution to (6) on some interval t 2
(a , b), then so is (1 − pDD, 1 − pDC, 1 − pCD, 1 − pCC)(−t) on the interval t 2 (−b, −a). This prop-
erty implies that for every orbit forward in time, there is an associated orbit backward in time
that exhibits the same dynamics. This result is specific to the donation games (or more pre-
cisely, to games with equal gains from switching). The formal proof of this symmetry is in
Methods. In the following we provide an intuitive argument. To this end, consider the follow-
ing series of transformations applied to the payoff matrix of a 2 × 2 game with equal gains
µC D¶ negating µC D¶
C R S payoff
¡¡¡¡¡¡¡¡! C ¡R ¡S
D T P D ¡T ¡P
adding a µ C D ¶ µC D¶
constant
¡¡¡¡¡¡¡¡! C ¡R+(R+P ) ¡S +(S +T ) = C P T ð7Þ
D ¡T +(S +T ) ¡P +(R+P ) D S R
exchanging µC D¶
C and D
¡¡¡¡¡¡¡¡! C R S
D T P
Notice that we started and ended at the same game; this property is equivalent to equal
gains from switching. But now it is easy to see that solutions to the associated ordinary differ-
ential equation transform correspondingly as follows,
negating
payoff
ðpCC ; pCD ; pDC ; pDD ÞðtÞ ��������! ðpCC ; pCD ; pDC ; pDD Þð tÞ
adding a
constant ð8Þ
��������! ðpCC ; pCD ; pDC ; pDD Þð tÞ
exchanging
C and D
��������! ð1 pDD ; 1 pDC ; 1 pCD ; 1 pCC Þð tÞ
The upshot of this duality is that solutions to adaptive dynamics come in related pairs. We
will see expressions of this duality in several of the figures below.
Fig 1. Local adaptive dynamics for memory-one strategies. For a 9 × 9 × 9 × 9-grid (= 6561 points) we show the direction of change in terms of the
sign of each component of ðp_ CC ; p_ CD ; p_ DC ; p_ DD Þ as given by Eq (6). The possibilities are shown on the right. We observe that for 1424 points all four
components are positive, ++++. For 3269 points all four components are negative, - - --. Seven combinations do not occur. These combinations fall
into one or both of the following categories: (i) p_ CC is negative and p_ DC is positive, and (ii) p_ DD is negative and p_ CD is positive. Both combinations are
forbidden. Because of the symmetry (8) there are three pairs where each combination occurs as often as its partner. One such pair is ++-+ and +-++
(each occurring 353 times). The configuration +- -+ is its own mirror image and therefore a singleton (occurring 536 times). The reason for the
symmetry in the plot is explained in the main text. Let σ: [0, 1]4 ! [0, 1]4 be defined by σ(pCC, pCD, pDC, pDD) = (1−pDD, 1−pDC, 1−pCD, 1−pCC). If abcd
are the signs at p, then dcba are the signs at σ(p). σ acts by reflection about the dotted diagonal line shown. Finally, eight points are critical points with
ðp_ CC ; p_ CD ; p_ DC ; p_ DD Þ ¼ ð0; 0; 0; 0Þ. Two points are zero in one but not all of the four components. The graph is created for c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g001
In particular, the set of interior critical points forms a two-dimensional plane within the
four-dimensional cube. As we will show in Methods, (9) implies certain bounds on pCC and
pDD among the interior critical points: pCC > c/b and pDD < 1−c/b.
By definition, critical points satisfy a local condition, p_ ij ¼ 0 for all i, j 2 {C, D}. However, it
turns out that the critical points identified above have a shared global property. The points that
satisfy (9) coincide with the equalizer strategies that have been described earlier [56, 57]. An
equalizer is a strategy p such that A(p0 , p) is a constant, irrespective of p0 . Every such strategy
must be a critical point of adaptive dynamics. Our result shows that also the converse is true.
Every interior critical point of the system (6) needs to be an equalizer.
We can also examine what happens on the boundary of the strategy space. For our analysis,
4
we define the boundary Bð½0; 1� Þ to be all points p 2 [0, 1]4 with exactly one entry pij 2 {0, 1}.
That is, we exclude corner and edge points. What remains is a set of eight 3-dimensional
4
cubes. We call a point p 2 Bð½0; 1� Þ saturated if pij = 0 implies p_ ij � 0 and pij = 1 implies p_ ij � 0.
A point is called strictly saturated if the above inequalities are strict. A point is unsaturated if it
is not saturated. Orbits that start at an unsaturated point move into the interior of the strategy
space. Conversely, every strictly saturated point is the limit, forward in time, of some trajectory
in the interior.
For memory-one strategies, all eight boundary faces contain both saturated and unsaturated
points for some values of 0 < c < b (Fig 2). In the following, we discuss in more detail the
boundary face for which mutual cooperation is absorbing (that is, the boundary face with
pCC = 1). On this boundary face, the population obtains the socially optimal payoff of b − c,
irrespective of the specific values of pCD, pDC, pDD. As a result, we show in Methods that the
time derivatives with respect to these components vanish, p_ CD ¼ p_ DC ¼ p_ DD ¼ 0. The saturated
points on the face pCC = 1 are exactly those that satisfy p_ CC � 0, which yields the condition
2
ð1 pCD Þð1 ð1 pDC ÞðpCD pDD Þ ðpDC pDD Þ Þ c
2 � ð10Þ
ð1 pCD Þ ð1 pDC Þ þ pDC pDD ð2 pDD Þ þ ð1 pCD Þð1 pDC ÞðpDC þpDD Þ b
This set of saturated points contains all cooperative memory-one Nash equilibria, which
has been characterized by [38] to be the set of all strategies p that satisfy pCC = 1 and
1 pCD c 1 pCD c
� and � ð11Þ
pDD b c pDC b
We note, that the conditions (11) are more strict than the conditions (10). Put another way,
a boundary point can be a local maximum of the payoff function against itself without being a
global maximum.
In a similar way, one can also characterize the saturated points on the boundary face with
pDD = 0, where mutual defection is absorbing. We depict the set of saturated points on this
face in the bottom row of Fig 2, together with the previously discussed set of saturated points
with pCC = 1 in the top row. As the figure suggests, the two sets exactly complement each
other. For every point that is strictly saturated on the boundary face pCC = 1 there is a corre-
sponding point on the face pDD = 0 that is unsaturated. Of course, that correspondence is
again a consequence of the symmetry described earlier.
After describing the critical points in the interior, and the saturated points on the boundary,
we explore the ‘typical’ behavior of interior trajectories. To this end, we record the end behav-
ior of solutions p(t) to Eq (6) beginning at various initial conditions p(0). Dynamics are
assumed to cease at the boundary of the strategy space. This behavior can be numerically cal-
culated. The results, for a 9 × 9 × 9 × 9 grid of initial conditions and cost-to-benefit ratio
c/b = 0.1, are shown in Fig 3. There are 6561 initial conditions. Out of those, 1835 points are
observed to end at full cooperation (pCC = 1), 1375 points at full defection (pDD = 0), 2964
points at other places on the boundary, and 387 at interior critical points (equalizers). Unlike
in Fig 1, we do not observe the symmetry described in Eqs (7 and 8). The choice of depicting
the forward direction of time breaks the symmetry.
Fig 2. Saturated points on the boundary of memory-one strategies. The boundary of the set of memory-one
strategies consists of eight three-dimensional faces with pij = 0 or pij = 1 for exactly one pair of i, j 2 {C, D}. We omit
points (pCC, pCD, pDC, pDD) for which more than one pij is 0 or 1. Thus, the eight boundary faces do not intersect. A
point p on the boundary is saturated if the payoff gradient does not point into the interior of the cube. We show the set
of saturated points on all eight boundary faces. Because of the symmetry described by Eqs (7) and (8), these eight sets
of points fit together in four complementary pairs, like the curved pieces of a three-dimensional puzzle. The boundary
� ¼ D and D
face pij = 0 is paired with the face p �i �j ¼ 1 (where a bar refers to the opposite action, C � ¼ C). The paired
�
boundary faces fit together after a rotation of one of them 180˚ about the line parameterized by t; 12 ; 1 t . Parameter
c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g002
Fig 3. Long-time limits of adaptive dynamics of memory-one strategies. For a 9 × 9 × 9 × 9-grid of starting points (= 6561 points), we show the limit
limt! 1 p(t) of a solution p(t) to Eq (6). Dynamics are assumed to cease at the boundary of the strategy space. Generically, there are 4 possibilities, as
shown in the legend. For 1835 points, the trajectory p(t) evolves to full cooperation, defined by pCC = 1 (blue). For 1375 points, the trajectory p(t)
evolves to full defection, defined by pDD = 0 (red). The remaining points either evolve into other regions of the boundary (green) or approach interior
critical points, which are equalizers (yellow). The symmetry described in the main text does not manifest in this plot, but reappears when we juxtapose
the plot with the corresponding plot for reversed time. Parameter c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g003
This dynamics among counting strategies is not identical to the previously considered
dynamics among memory-one strategies, even when the starting population is taken from the
invariant subset with pCD = pDC. Instead, differences arise because the embedding
[0, 1]3 ! [0, 1]4 is not distance-preserving with the standard metric on each space. As a result,
the gradient of the payoff function is computed slightly differently in the two spaces—specifi-
cally, the memory-one adaptive dynamics (6) within the subspace of counting strategies sub-
space differs from the adaptive dynamics (6) by a factor of 2 in q_ 1 ðtÞ. The analysis in this
section is thus not to characterize the orbits of the invariant subspace of counting strategies
within the memory-one strategies. Rather we consider the space of counting strategies [0, 1]3
as an interesting space in its own right, which we analyze in the following.
In a first step, we reproduce Fig 1 for the case of counting strategies. In Fig 1, counting strat-
egies correspond to the points on the diagonal pCD = pDC of each subpanel. Fig 4 is the analog
of Fig 1 for counting strategies, where we plot the signs of the components of ðq_ 2 ; q_ 1 ; q_ 0 Þ at
each counting strategy. As one may expect, these combinations again come in pairs, where abc
is paired with cba. Some combinations, such as +++, are self-paired.
Similar to the memory-one strategies, we also want to characterize the set of interior critical
points of the system (12). In Methods, we show that these points can now be parametrized by
� � � �
c c c b
tþ ; t; t ; with t 2 ; ð13Þ
bþc bþc bþc bþc
Hence the set of interior critical points forms a straight line segment. The boundary points
of this line segment are
� � � �
2c c b b c
; ;0 and 1; ; ð14Þ
bþc bþc bþc bþc
pffiffi pffiffi
The length of this line segment is 3ðb cÞ=ðbþcÞ, which ranges from 3 (the diagonal of
the cube) to 0, as c/b ranges from 0 to 1. We can classify the stability of the critical points by
finding their associated eigenvalues. The complete results are shown in Fig 5. Five generic
types of critical points are present as we vary the cost-to-benefit ratio: source, spiral source, spi-
ral sink, sink, and saddle.
In addition to these interior critical points, Fig 6 also depicts the critical points on the
3
boundary faces Bð½0; 1� Þ. Using the terminology of the previous section, these critical points
are saturated without being strictly saturated. On each boundary face, the respective curve
thus separates the region of strictly saturated points from the unsaturated points. Because of
the aforementioned symmetry of solutions, the set of boundary critical points is symmetric
under the transformation (x, y, z) 7! (1 − z, 1 − y, 1 − x). We note that counting strategies have
Fig 4. Local adaptive dynamics for counting strategies. On a 9 × 9 × 9 × 9-grid representing the space of memory-one strategies, we depict the 729
points which are counting strategies (defined by pCD = pDC). They are colored according to their direction of change in terms of the sign of each
component of ðq_ 2 ; q_ 1 ; q_ 0 Þ. Generically, there are eight possibilities as shown in the legend. We observe that for 156 points all three components are
positive, +++, while for 373 points all three components are negative, ---. Three combinations do not occur: -+-, -++, and ++-. These are combinations
in which q_ 2 or q_ 0 is negative while q_ 1 is positive; such combinations are forbidden. Because of the symmetry derived in the main text there is a
symmetric pair, +- - and - -+, each occurring 29 times. The configuration +-+ is its own mirror image and therefore a singleton (occurring 142 times).
Parameter c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g004
boundary properties unshared by memory-one strategies. For example, every boundary point
with q1 = 0 is saturated. Conversely, every boundary point with q1 = 1 is unsaturated.
To explore the dynamics in the interior, Fig 7 depicts the end behavior of solutions q(t) to
Eq (12) with initial conditions on an evenly spaced grid (analogous to Fig 3). Again, dynamics
are assumed to cease at the boundary. We observe that out of 729 initial points, 190 evolve to
full cooperation, 140 evolve to full defection, 229 evolve to other places on the boundary, and
170 evolve to interior critical points. The overall abundance of the four outcomes is thus simi-
lar to the respective numbers in the space of all memory-one strategies, with the only exception
being that now more orbits converge to interior critical points.
Fig 5. Classification of interior critical points in the space of counting strategies. We show the line of interior
critical points in the space of counting strategies for five values of c. The line is colored according to the type of each
critical point, which is determined by the eigenvalues of the linearization of the system (12) at this point. We observe
all five generic types: source, spiral source, sink, spiral sink, and saddle. The complete classification is shown in the
lower right panel. Each interior critical point is an equalizer (see main text). The line is parameterized by
(t + c/(1 + c), t, t − c/(1 + c)) as t ranges over the interval (c/(1 + c), 1/(1 + c)). The symmetry described in the main text
is manifest in this figure. The transformation σ: (x, y, z) 7! (1 − z, 1 − y, 1 − x) carries the line of critical points to itself.
It exchanges sinks and sources, spiral sinks and spiral sources, and saddle points and other saddle points.
https://doi.org/10.1371/journal.pcbi.1010987.g005
Fig 6. Interior and boundary critical points in the space of counting strategies. For four values of c, we show the line of interior critical points
(green) and the boundary critical points (black) in the space of counting strategies. The boundary critical points consist of three pieces: the edge defined
by q0 = 0 and q2 = 1 (i.e. the intersection of full cooperation and full defection) and two separate curves on the faces q0 = 0 and q2 = 1. For example, the
strategy GRIM = (1, 0, 0) is a boundary critical point. The symmetry described in the main text is visible in the rotational symmetry of the set of critical
points.
https://doi.org/10.1371/journal.pcbi.1010987.g006
Fig 7. Long-time limits of adaptive dynamics of counting strategies. On a 9 × 9 × 9 × 9-grid representing the space of memory-one strategies, we
depict the 729 points which are counting strategies (defined by pCD = pDC). They are colored according to the limit limt!1 q(t) of a solution q(t) to
Eq (6), with starting value q(0) in the grid. Dynamics are assumed to cease at the boundary of the strategy space. Generically, there are 4 possibilities as
shown in the legend. For 190 points the trajectory q(t) evolves to full cooperation, defined by q2 = 1 (blue). For 140 points the trajectory q(t) evolves to
full defection, defined by q0 = 0 (red). The remaining points either evolve into other regions of the boundary (green) or approach interior critical points,
which are equalizers (yellow). This figure is not a simple restriction of Fig 3 because the restriction of Eq (6) differs from Eq (12) by a factor of 2.
Parameter c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g007
Fig 8. Trajectories of adaptive dynamics of counting strategies. We consider four different initial conditions. We
plot the solutions q(t) to Eq (12) on the left, colored by hue and marked with arrowheads to indicate the direction of
evolution in the strategy space. On the right, we plot the cooperation rate C(q(t)), which is a real number between zero
(full defection) and one (full cooperation). Each of the initial conditions leads to a different behavior. In the first row,
for an initial condition q(0) = (1, 1, 0.8), the cooperation rate decreases monotonically from one to zero. In the second
row, for q(0) = (0.6833, 0.85, 0), the cooperation rate increases monotonically from zero to one. In the third row, for
q (0) = (0.6, 0.5, 0), the cooperation rate increases from zero to an intermediate value before decreasing and then
increasing again to one. Finally, in the last row, for q(0) = (0.6667, 0.75, 0), the cooperation rate increases from zero
before oscillating and converging to an intermediate value. The last two orbits loop around the line of interior critical
points, shown in black. Parameter c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g008
We can also plot a few solutions q(t) of Eq (12) in three dimensions to give an idea of the
possible behaviors. Four types of behavior are shown in Fig 8. Alongside plots of the trajectory
q(t) we depict the cooperation rate C(q(t)), defined as the average rate of cooperation in a
large population playing the respective strategy. Previous studies show that these cooperation
rates change monotonically when players are restricted to use reactive strategies (those with
pCC = pDC and pCD = pDD, see [1]). Within the counting strategies, this monotonicity is violated
in the third and fourth example, and the fourth converges to intermediate cooperation rather
than full cooperation or full defection.
parent, there is a strictly larger strategy set of memory-one strategies that can maintain
cooperation.
We believe these results give a more rigorous understanding of the properties of memory-
one strategies. At the same time we hope that similar techniques can be used to explore other
games and more general strategy spaces.
Methods
Adaptive dynamics of memory-one strategies
Derivation of the adaptive dynamics. In the main text, we have described how to define
the payoff of two players with memory-one strategies by representing the game as a Markov
chain. However, to derive the adaptive dynamics, it is useful to start with an alternative repre-
sentation of the payoffs. As shown by [56], the payoff expression (5) can be rewritten as
0 1
1þpCC p0CC 1þpCC 1þp0CC R
B C
B pCD p0DC 1þpCD p0DC SC
detB
B
C
@ pDC p0CD pDC 1þp0CD TC
A
pDD p0DD pDD p0DD P
Aðp; p0 Þ ¼ 0 1 ð15Þ
1þpCC p0CC 1þpCC 1þp0CC 1
B C
B pCD p0DC 1þpCD p0DC 1C
detB
B
C
@ pDC p0CD pDC 1þp0CD 1CA
pDD p0DD pDD p0DD 1
Using this representation, we can write out the expression for adaptive dynamics (6) in full.
To this end, it is convenient to multiply the resulting system by the common denominator,
(1 − pCD + pDC)r(pCC, pCD, pDC, pDD)2, where
This denominator is positive in the interior (0, 1)4 of the strategy space. Hence, multiplying
by the denominator only affects the timescale of evolution, but not the direction of the trajec-
tories. After applying this modification to the system (6), the dynamics among the memory-
one strategies of the donation game takes the following form,
� �
p_ CC ¼ f1 ðpCD ; pDC ; pDD Þ � b�g1 ðpCC ; pCD ; pDC ; pDD Þ þ c�h1 ðpCC ; pCD ; pDC ; pDD Þ
� �
p_ CD ¼ f2 ðpCC ; pDD Þ � b�g2 ðpCC ; pCD ; pDC ; pDD Þ þ c�h2 ðpCC ; pCD ; pDC ; pDD Þ
� � ð17Þ
p_ DC ¼ f3 ðpCC ; pDD Þ � b�g3 ðpCC ; pCD ; pDC ; pDD Þ þ c�h3 ðpCC ; pCD ; pDC ; pDD Þ
� �
p_ DD ¼ f4 ðpCC ; pCD ; pDC Þ � b�g4 ðpCC ; pCD ; pDC ; pDD Þ þ c�h4 ðpCC ; pCD ; pDC ; pDD Þ
Here, the auxiliary functions fi, gi, hi for i 2 {1, 2, 3, 4} are defined as follows
f3 ðx; wÞ ¼ f2 ð1 w; 1 xÞ
g3 ðx; y; z; wÞ ¼ g2 ð1 w; 1 z; 1 y; 1 xÞ
h3 ðx; y; z; wÞ ¼ h2 ð1 w; 1 z; 1 y; 1 xÞ
f4 ðx; y; zÞ ¼ f1 ð1 z; 1 y; 1 xÞ
g4 ðx; y; z; wÞ ¼ g1 ð1 w; 1 y; 1 z; 1 xÞ
h4 ðx; y; z; wÞ ¼ h1 ð1 w; 1 z; 1 y; 1 xÞ
Note that we can write fi, gi, hi for i 2 {3, 4} in terms of the same functions for i 2 {1, 2}.
This is a consequence of the symmetry we discuss later.
Invariance of counting strategies. Using the representation (17) and (18), it becomes
straightforward to show that the space of memory-one counting strategies remains invariant
under adaptive dynamics.
Proposition 1. Let C denote the three-dimensional subspace of counting strategies among the
memory-one strategies,
4
C ≔f p 2 ½0; 1� j pCD ¼ pDC g ð19Þ
Then C is invariant under adaptive dynamics. That is, if p(t) is a solution of Eq (17) with
pð0Þ 2 C, then pðtÞ 2 C for all t.
d_ ¼ p_ CD p_ DC ¼ f2 ðpCC ; pDD Þðb cÞð1 pCD þpDC ÞðpCD pDC ÞðpCC pCD pDC þpDD Þ ð21Þ
¼ p_ CC ðtÞ
That is, if one takes the line segment between p and p~ , then the midpoint of this line seg-
ment is in P. The plane P is exactly the set of points that are mapped onto themselves. Every
point is mapped onto itself if the transformation is applied twice. It can be directly checked
that the transformation p 7! p~ maps critical points to critical points (see next subsection), and
the previous proposition means that it interchanges points which are limits forward in time
and points which are limits backward in time.
bðpCC pCD Þ cð1 pCC þ pDC Þ ¼ 0; pCC þ pDD ¼ pCD þ pDC ð23Þ
p_ CC _
pCD
0 ¼ þ
f1 ðpCD ; pDC ; pDD Þ f2 ðpCC ; pDD Þ
p_ CC _
pDC
0 ¼ þ
f1 ðpCD ; pDC ; pDD Þ f3 ðpCC ; pDD Þ ð24Þ
¼ ðb cÞðpCC pCD ÞðpCC þ pDD pCD pDC Þð1 pCD þ pDC Þ
p_ CC pDD_
0 ¼
f1 ðpCD ; pDC ; pDD Þ f4 ðpCC ; pCD ; pDC Þ
Since 1 − pCD + pDC > 0 for pCD, pDC 2 (0, 1), either pCC = pCD = pDC = pDD or pCC + pDD
= pCD + pDC must be enforced. Note that if pCC = pCD = pDC = pDD, then pCC + pDD
= pCD + pDC holds trivially. Hence, in both cases we have the identity pDD = pCD + pDC −
pCC, which we can plug into p_ CC =f1 ðpCD ; pDC ; pDD Þ to get
� � � �
p_ CC 2 2
¼ bðpCD pCC Þþcð1 pCC þpDC Þ � 1þðpCD pCC Þ þðpDC pCC Þ ð25Þ
f1 ðpCD ; pDC ; pDD Þ
It is verified without too much difficulty that whenever the second factor vanishes in (0, 1)3,
then pCD + pDC −pCC 2 = (0, 1). Any interior critical points of (17) thus needs to satisfy
bðpCC pCD Þ cð1 pCC þpDC Þ ¼ 0 and pCC þ pDD ¼ pCD þ pDC ð26Þ
(() If a strategy satisfies the conditions (26), we can express pCD and pDC in terms of pCC and
pDD,
b pCC cð1þpDD Þ cð1 pCC Þ þ bpDD
pCD ¼ and pDC ¼ ð27Þ
b c b c
Inserting these expressions into the system (17) yields, after some algebraic manipulations,
p_ CC ¼ p_ CD ¼ p_ DC ¼ p_ DD ¼ 0.
Using (28), the constraint pDD > 0 becomes pDC > (c/b)(1 − pCD). When we plug this back
into the expression for pCC and use the fact that pCD > 0, we get pCC > c/b. Similarly, the con-
straints pCC < 1 and pDC < 1 lead to pDD < 1 − c/b. The result is that we have two useful
bounds pCC > c/b and pDD < 1 − c/b among the interior critical points.
We now relate the interior critical points to the equalizer strategies discussed by [57] and
[56].
Definition. An equalizer is a strategy p for which A(p0 , p) is a constant function of p0 .
It follows from the definition that every equalizer strategy is a critical point of the dynamics
(17). In the interior (0, 1)4, the converse is also true. That is,
Proposition 4. Every interior critical point of the system (17) is an equalizer.
Proof. Our condition for critical points (27) coincides with the expression for equalizers,
Eq. (8) in [56], when using the payoffs of the donation game.
As shown by [39], equalizers are the only Nash equilibria among the stochastic memory-
one strategies. Thus our above results can be summarized as follows. In the donation game,
an interior point is a critical point of adaptive dynamics if and only if it is a Nash equilib-
rium (such a result does not need to hold in general, because strategies might be locally sta-
ble critical points of adaptive dynamics without being global best responses to themselves,
see [50]).
Analysis of the boundary faces. In the main text, we define the boundary of the strategy
space [0, 1]4 as the set of all (pCC, pCD, pDC, pDD) for which exactly one entry is in {0, 1}. There-
fore there are eight different boundary faces. One particularly important face is the one with
pCC = 1, which corresponds to a fully cooperative population. It follows from Eq (18) that on
this boundary face f2(pCC, pDD) = f3(pCC, pDD) = f4(pCC, pCD, pDC) = 0. By Eq (17) we can then
conclude that p_ CD ¼ p_ DC ¼ p_ DD ¼ 0. A point p on this boundary face is saturated if and only if
p_ CC � 0. By Eq (17) and because f1(pCD, pDC, pDD) > 0, this condition is equivalent to
b �g1(1, pCD, pDC, pDD) > − c � h1(1, pCD, pDC, pDD), which yields condition (10).
The boundary face with pDD = 0 can be analyzed analogously.
a q-player against q0 -player using the payoff formula (15), which yields
0 1
1 þ q2 q02 1 þ q2 1 þ q02 b c
B C
B q1 q01 1 þ q1 q01 c C
detBB C
@ q1 q01 q1 1 þ q01 b CA
q0 q00 q0 q00 0
Aðq; q0 Þ ¼ 0 1 ð29Þ
1 þ q2 q02 1 þ q2 1 þ q02 1
B C
B q1 q01 1 þ q1 q01 1C
detB
B
C
@ q1 q01 q1 1 þ q01 1CA
q0 q00 q0 q00 1
In the following we study the adaptive dynamics of counting strategies. Again, we consider
a homogeneous population with strategy q, evolving in the direction of the gradient of the pay-
off function, now calculated in [0, 1]3. Evolution in the space of counting strategies is thus
given by
�
@Aðq; q0 Þ ��
q_ i ¼ ð30Þ
@qi �q¼q0
To write out the adaptive dynamics Eq (30) in full, it is again convenient to multiply the
equations by the common denominator r(q2, q1, q0)2, with
rðx; y; zÞ ¼ ð 1þxÞð 1þyþ ð1 2yÞðy xÞÞ þ ð2 2x2 þ 2y2 Þz þ ð 1 þ 2x 2yÞz2 ð31Þ
This denominator is nonzero in the interior (0, 1)3 of the strategy space. After this rescaling,
the system of Eq (30) becomes
� �
q_ 2 ¼ f2 ðq1 ; q0 Þ � b�g2 ðq2 ; q1 ; q0 Þ þ c�h2 ðq2 ; q1 ; q0 Þ
� �
q_ 1 ¼ f1 ðq2 ; q0 Þ � b�g1 ðq2 ; q1 ; q0 Þ þ c�h1 ðq2 ; q1 ; q0 Þ ð32Þ
� �
q_ 0 ¼ f0 ðq2 ; q1 Þ � b�g0 ðq2 ; q1 ; q0 Þ þ c�h0 ðq2 ; q1 ; q0 Þ
f0 ðx; yÞ ¼ f2 ð1 y; 1 xÞ
g0 ðx; y; zÞ ¼ g2 ð1 z; 1 y; 1 xÞ
h0 ðx; y; zÞ ¼ h2 ð1 z; 1 y; 1 xÞ
Proof. Because f2, f1, f0 do not vanish in the interior of the strategy space (0, 1)3, we can com-
pute
q_ 1 q_ 0
þ ¼ ðb cÞðq0 q1 Þðq2 2q1 þ q0 Þ;
f1 ðq2 ; q0 Þ f0 ðq2 ; q1 Þ
ð35Þ
q_ 2 q_ 0
¼ ðb cÞðq2 q0 Þðq2 2q1 þ q0 Þ
f2 ðq1 ; q0 Þ f0 ðq2 ; q1 Þ
At a critical point we have q_ 2 ¼ q_ 1 ¼ q_ 0 ¼ 0; so the expressions on the right hand side must
vanish. This implies q2 − 2q1 + q0 = 0 or q2 = q1 = q0 (in which case q2 − 2q1 + q0 = 0 holds trivi-
ally). So q1 = (q2 + q0)/2 is a necessary condition for the strategy q to be a critical point. To
obtain a condition that is also sufficient we take this expression for q1 and plug it into
4q_ 1 ðq2 ; ðq2 þq0 Þ=2; q0 Þ � 2 �
¼ bðq0 q2 Þ þ cð2þq0 q2 Þ 2 ðq2 q0 Þ ð36Þ
f1 ðq2 ; q0 Þ
2c
This expression only vanishes when q2 q0 ¼ bþc . The solutions to the conditions
2c
q2 þq0 ¼ 2q1 ; q2 q0 ¼ ð37Þ
bþc
are parameterized by
� � � �
c c c b
tþ ; t; t ; t2 ; ð38Þ
bþc bþc bþc bþc
Conversely, it is easily checked that all of these strategies are critical points of (32).
Thus the interior critical points form a straight line segment on the interior of the cube
� � � � pffiffi pffiffi
c b
with boundary points bþc 2c
; bþc ; 0 and 1; bþc ; bbþcc and length 3 bbþcc, which ranges from 3
(the diagonal of the cube) to 0 as bc ranges from 0 to 1. We can classify the stability of these criti-
cal points by finding their associated eigenvalues. The results are complicated, but shown in
Fig 5.
turns out to associate each trajectory to itself. That is, trajectories for reactive strategies do not
come in pairs, as they do in the larger spaces of memory-one, memory-one counting, and
higher memory strategies.
In Fig 9, we plot the cooperative region for memory-one strategies (the region for which
the self-cooperation rate is locally increasing). The corresponding region for reactive strategies
is straightforward to describe [43]: If (pC, pD) is a player’s probability to cooperate depending
on the co-player’s previous action (C or D), then the cooperative region consists of all points
with pC − pD > c/b.
0 1
0 0 0 0
B 0 C
B �pDC �ð1 p0DC Þ �p0DC �ð1 p0DC Þ C
dM ¼ B
B �p0
C ð39Þ
@ CD �ð1 p0CD Þ �p0CD �ð1 p0CD Þ CA
0 0 0 0
Now suppose p and p0 are equal and furthermore that pCD = pDC. Then vCD = vDC by sym-
metry, and vδM manifestly vanishes. It follows from the above that δvM = δv. Then δv is pro-
portional to v by uniqueness of a stationary distribution. But we are also demanding that the
sum of components of v + δv is 1. Thus δv = 0 and there is no variation in payoff π(v). No
player gains from deviating infinitesimally off the hypersurface pCD = pDC in adaptive dynam-
ics, i.e. from departing the space C.
In a second step, we ask whether a similar invariance result applies to memory-n strategies.
With an argument similar to the one above, we can show that it applies at least in a restricted
way.
Our notation for memory-n strategies is best introduced by example: the component
p� CDC � of a memory-3 strategy of player 1 denotes the probability of cooperation if the
DDC
outcomes of the most recent three rounds were CD, DD, CC, in that order.
Fig 9. Cooperative region for adaptive dynamics of memory-one strategies. For a 9 × 9 × 9 × 9-grid (= 6561 points) we show the points for which the
cooperativity, or rate of self-cooperation, of ðp_ CC ; p_ CD ; p_ DC ; p_ DD Þ is locally increasing. The rate of self-cooperation of a strategy p can be calculated by
A(p, p)/(b − c) using formula (15). We find that for 1876 points cooperativity is locally increasing; for 4677 points cooperativity is decreasing; and eight
points are critical points with ðp_ CC ; p_ CD ; p_ DC ; p_ DD Þ ¼ ð0; 0; 0; 0Þ. Note that, unlike the corresponding region for reactive strategies, trajectories beginning
in the cooperative region can leave this region, and trajectories beginning outside of the cooperative region can enter it. We show examples of this in
Fig 8). The graph is created for c = 0.1.
https://doi.org/10.1371/journal.pcbi.1010987.g009
Proposition 7. Consider the adaptive dynamics for memory-n strategies p and let s be a fixed
arbitrary sequence of n − 1 moves for one player. Then the condition
p� Cs � ¼ p� Ds � ð40Þ
Ds Cs
dp ¼ � � e� Cs � � � e� Ds � ð41Þ
Ds Cs
We can compute
2 3
6 7
6 7
6 ! p0 0 7
vdM ¼ � 6v ! v !p ! 7e !
6 Cs Ds Cs 7
4 Ds 5 sC
Ds Cs Cs Ds sC
2 0 1 0 13
6 B C B C7
6 B C B C7
6 ! B 0 C B 0 C7
þ�6v B1 p !C v ! B1 p ! C 7e !
6 Cs B Ds C Ds B Cs C 7
4 @ A @ A5 sC
Ds Cs Cs Ds sD
2 3 ð43Þ
6 7
6 7
6 ! 0 0 7
þ�6 v p ! þv ! p ! 7e !
6 Cs Ds Ds Cs 7 sD
4 5
Ds Cs Cs Ds sC
2 0 1 0 13
6 B C B C7
6 B C B C7
6 !B C B C7
þ�6 v
Cs B
1 p0 !C þ v ! B1 p0 ! C7e sD !
6 B Ds AC Ds B Cs C 7
4 @ @ A5
Ds Cs Cs Ds sD
Now (40) applied to p0 , along with (44), imply that the right hand side of (43) vanishes.
Since vδM = 0, our initial discussion means that δvM = δv. Therefore δv is proportional to v
by uniqueness of stationary distribution. Because the sum of components of v + δv is 1, we
conclude that δv = 0. Hence there is no variation in payoff π(v). No player gains from making
the infinitesimal variation (41).
Author Contributions
Conceptualization: Philip LaPorte, Christian Hilbe, Martin A. Nowak.
Formal analysis: Philip LaPorte.
Supervision: Christian Hilbe, Martin A. Nowak.
Validation: Christian Hilbe, Martin A. Nowak.
Visualization: Philip LaPorte, Martin A. Nowak.
Writing – original draft: Philip LaPorte, Christian Hilbe, Martin A. Nowak.
Writing – review & editing: Philip LaPorte, Christian Hilbe, Martin A. Nowak.
References
1. Sigmund K. The Calculus of Selfishness. Princeton, NJ: Princeton Univ. Press; 2010.
2. Nowak MA. Evolutionary dynamics. Cambridge MA: Harvard University Press; 2006.
3. Nowak MA. Five rules for the Evolution of Cooperation. Science. 2006; 314:1560–1563. https://doi.org/
10.1126/science.1133755 PMID: 17158317
4. Trivers RL. The evolution of reciprocal altruism. The Quarterly Review of Biology. 1971; 46:35–57.
https://doi.org/10.1086/406755
5. Axelrod R, Hamilton WD. The evolution of cooperation. Science. 1981; 211:1390–1396. https://doi.org/
10.1126/science.7466396 PMID: 7466396
6. Garcı́a J, van Veelen M. No strategy can win in the repeated prisoner’s dilemma: Linking game theory
and computer simulations. Frontiers in Robotics and AI. 2018; 5:102. https://doi.org/10.3389/frobt.
2018.00102 PMID: 33500981
7. Hilbe C, Chatterjee K, Nowak MA. Partners and rivals in direct reciprocity. Nature Human Behaviour.
2018; 2(7):469–477. https://doi.org/10.1038/s41562-018-0342-3 PMID: 31097794
8. Glynatsi NE, Knight VA. A bibliometric study of research topics, collaboration and centrality in the field
of the Iterated Prisoner’s Dilemma. Humanities and Social Sciences Communications. 2021; 8:45.
https://doi.org/10.1057/s41599-021-00718-9
9. Rapoport A. Prisoner’s Dilemma. In: Eatwell J, Milgate M, Newman P, editors. Game Theory. Palgrave
Macmillan UK; 1989. p. 199–204.
10. Molander P. The optimal level of generosity in a selfish, uncertain environment. Journal of Conflict Res-
olution. 1985; 29:611–618. https://doi.org/10.1177/0022002785029004004
11. Nowak MA, Sigmund K. Tit for tat in heterogeneous populations. Nature. 1992; 355:250–253. https://
doi.org/10.1038/355250a0
12. Hauert C, Schuster HG. Effects of increasing the number of players and memory size in the iterated
Prisoner’s Dilemma: a numerical approach. Proceedings of the Royal Society B. 1997; 264:513–519.
https://doi.org/10.1098/rspb.1997.0073
13. Szabó G, Antal T, Szabó P, Droz M. Spatial evolutionary prisoner’s dilemma game with three strategies
and external constraints. Physical Review E. 2000; 62:1095–1103. https://doi.org/10.1103/PhysRevE.
62.1095 PMID: 11088565
14. Killingback T, Doebeli M. The continuous Prisoner’s Dilemma and the evolution of cooperation through
reciprocal altruism with variable investment. The American Naturalist. 2002; 160(4):421–438. https://
doi.org/10.1086/342070 PMID: 18707520
15. Grujic J, Cuesta JA, Sanchez A. On the coexistence of cooperators, defectors and conditional coopera-
tors in the multiplayer iterated prisoner’s dilemma. Journal of Theoretical Biology. 2012; 300:299–308.
https://doi.org/10.1016/j.jtbi.2012.02.003 PMID: 22530239
16. van Veelen M, Garcı́a J, Rand DG, Nowak MA. Direct reciprocity in structured populations. Proceedings
of the National Academy of Sciences USA. 2012; 109:9929–9934. https://doi.org/10.1073/pnas.
1206694109 PMID: 22665767
17. van Segbroeck S, Pacheco JM, Lenaerts T, Santos FC. Emergence of fairness in repeated group inter-
actions. Physical Review Letters. 2012; 108:158104. https://doi.org/10.1103/PhysRevLett.108.158104
PMID: 22587290
18. Garcı́a J, Traulsen A. The Structure of Mutations and the Evolution of Cooperation. PLoS One. 2012; 7:
e35287. https://doi.org/10.1371/journal.pone.0035287 PMID: 22563381
19. Szolnoki A, Perc M. Defection and extortion as unexpected catalysts of unconditional cooperation in
structured populations. Scientific Reports. 2014; 4:5496. https://doi.org/10.1038/srep05496 PMID:
24975112
20. Szolnoki A, Perc M. Evolution of extortion in structured populations. Physical Review E. 2014;
89:022804. https://doi.org/10.1103/PhysRevE.89.022804
21. Yi SD, Baek SK, Choi JK. Combination with anti-tit-for-tat remedies problems of tit-for-tat. Journal of
Theoretical Biology. 2017; 412:1–7. https://doi.org/10.1016/j.jtbi.2016.09.017 PMID: 27670803
22. Knight V, Harper M, Glynatsi NE, Campbell O. Evolution reinforces cooperation with the emergence of
self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated pris-
oner’s dilemma. PLoS One. 2018; 13(10):e0204981. https://doi.org/10.1371/journal.pone.0204981
PMID: 30359381
23. Li J, Zhao X, Li B, Rossetti CSL, Hilbe C, Xia H. Evolution of cooperation through cumulative reciprocity.
Nature Computational Science. 2022; 2:677–686. https://doi.org/10.1038/s43588-022-00334-w
24. Murase Y, Hilbe C, Baek SK. Evolution of direct reciprocity in group-structured populations. Scientific
Reports. 2022; 12(1):18645. https://doi.org/10.1038/s41598-022-23467-4 PMID: 36333592
25. Nowak MA, Sigmund K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s
Dilemma game. Nature. 1993; 364:56–58. https://doi.org/10.1038/364056a0 PMID: 8316296
26. Brauchli K, Killingback T, Doebeli M. Evolution of Cooperation in Spatially Structured Populations. Jour-
nal of Theoretical Biology. 1999; 200:405–417. https://doi.org/10.1006/jtbi.1999.1000 PMID: 10525399
27. Martinez-Vaquero LA, Cuesta JA, Sanchez A. Generosity pays in the presence of direct reciprocity: A
comprehensive study of 2x2 repeated games. PLoS ONE. 2012; 7(4):E35135. https://doi.org/10.1371/
journal.pone.0035135 PMID: 22529982
28. Stewart AJ, Plotkin JB. From extortion to generosity, evolution in the Iterated Prisoner’s Dilemma. Pro-
ceedings of the National Academy of Sciences USA. 2013; 110(38):15348–15353. https://doi.org/10.
1073/pnas.1306246110 PMID: 24003115
29. Glynatsi NE, Knight VA. Using a theory of mind to find best responses to memory-one strategies. Scien-
tific Reports. 2020; 10(1):1–9. https://doi.org/10.1038/s41598-020-74181-y PMID: 33057134
30. Schmid L, Hilbe C, Chatterjee K, Nowak MA. Direct reciprocity between individuals that use different
strategy spaces. PLoS Computational Biology. 2022; 18(6):e1010149. https://doi.org/10.1371/journal.
pcbi.1010149 PMID: 35700167
31. McAvoy A, Kates-Harbeck J, Chatterjee K, Hilbe C. Evolutionary instability of selfish learning in
repeated games. PNAS Nexus. 2022; 1(4):pgac141. https://doi.org/10.1093/pnasnexus/pgac141
PMID: 36714856
32. Montero-Porras E, Grujić J, Fernández Domingos E, Lenaerts T. Inferring strategies from observations
in long iterated prisoner’s dilemma experiments. Scientific Reports. 2022; 12:7589. https://doi.org/10.
1038/s41598-022-11654-2
33. Kraines DP, Kraines VY. Learning to cooperate with Pavlov an adaptive strategy for the iterated prison-
er’s dilemma with noise. Theory and Decision. 1993; 35:107–150. https://doi.org/10.1007/BF01074955
34. Fischbacher U, Gächter S, Fehr E. Are people conditionally cooperative? Evidence from a public goods
experiment. Economic Letters. 2001; 71:397–404. https://doi.org/10.1016/S0165-1765(01)00394-9
35. Fischbacher U, Gächter S. Social preferences, beliefs, and the dynamics of free riding in public goods
experiments. American Economic Review. 2010; 100(1):541–556. https://doi.org/10.1257/aer.100.1.
541
36. Grujic J, Gracia-Lázaro C, Milinski M, Semmann D, Traulsen A, Cuesta JA, et al. A comparative analy-
sis of spatial Prisoner’s Dilemma experiments: Conditional cooperation and payoff irrelevance. Scien-
tific Reports. 2014; 4:4615. https://doi.org/10.1038/srep04615 PMID: 24722557
37. Akin E. What you gotta know to play good in the iterated prisoner’s dilemma. Games. 2015; 6(3):175–
190. https://doi.org/10.3390/g6030175
38. Akin E. The iterated prisoner’s dilemma: Good strategies and their dynamics. In: Assani I, editor. Ergo-
dic Theory, Advances in Dynamics. Berlin: de Gruyter; 2016. p. 77–107.
39. Stewart AJ, Plotkin JB. Collapse of cooperation in evolving games. Proceedings of the National Acad-
emy of Sciences USA. 2014; 111(49):17558–17563. https://doi.org/10.1073/pnas.1408618111 PMID:
25422421
40. Hilbe C, Traulsen A, Sigmund K. Partners or rivals? Strategies for the iterated prisoner’s dilemma. Games
and Economic Behavior. 2015; 92:41–52. https://doi.org/10.1016/j.geb.2015.05.005 PMID: 26339123
41. Donahue K, Hauser OP, Nowak MA, Hilbe C. Evolving cooperation in multichannel games. Nature
Communications. 2020; 11:3885. https://doi.org/10.1038/s41467-020-17730-3 PMID: 32753599
42. Park PS, Nowak MA, Hilbe C. Cooperation in alternating interactions with memory constraints. Nature
Communications. 2022; 13:737. https://doi.org/10.1038/s41467-022-28336-2 PMID: 35136025
43. Nowak MA, Sigmund K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Applican-
dae Mathematicae. 1990; 20:247–265. https://doi.org/10.1007/BF00049570
44. Imhof LA, Nowak MA. Stochastic evolutionary dynamics of direct reciprocity. Proceedings of the Royal
Society B. 2010; 277:463–468. https://doi.org/10.1098/rspb.2009.1171 PMID: 19846456
45. Allen B, Nowak MA, Dieckmann U. Adaptive dynamics with interaction structure. American Naturalist.
2013; 181(6):E139–E163. https://doi.org/10.1086/670192 PMID: 23669549
46. Reiter JG, Hilbe C, Rand DG, Chatterjee K, Nowak MA. Crosstalk in concurrent repeated games
impedes direct reciprocity and requires stronger levels of forgiveness. Nature Communications. 2018;
9:555. https://doi.org/10.1038/s41467-017-02721-8 PMID: 29416030
47. McAvoy A, Nowak MA. Reactive learning strategies for iterated games. Proceedings of the Royal Soci-
ety A. 2019; 475:20180819. https://doi.org/10.1098/rspa.2018.0819 PMID: 31007557
48. Chen X, Fu F. Outlearning extortioners by fair-minded unbending strategies. arXiv. 2022;
p. 2201.04198.
49. Brandt H, Sigmund K. The good, the bad and the discriminator—Errors in direct and indirect reciprocity.
Journal of Theoretical Biology. 2006; 239:183–194. https://doi.org/10.1016/j.jtbi.2005.08.045 PMID:
16257417
50. Stewart AJ, Plotkin JB. The evolvability of cooperation under local and non-local mutations. Games.
2015; 6(3):231–250. https://doi.org/10.3390/g6030231
51. Chen X, Wang L, Fu F. The intricate geometry of zero-determinant strategies underlying evolutionary
adaptation from extortion to generosity. New Journal of Physics. 2022; 24:103001. https://doi.org/10.
1088/1367-2630/ac932d
52. Geritz SAH, Metz JAJ, Kisdi E, Meszéna G. Dynamics of Adaptation and Evolutionary Branching. Phys-
ical Review Letters. 1997; 78(10):2024–2027. https://doi.org/10.1103/PhysRevLett.78.2024
53. Hofbauer J, Sigmund K. Evolutionary Games and Population Dynamics. Cambridge, UK: Cambridge
University Press; 1998.
54. Wakiyama M, Tanimoto J. Reciprocity phase in various 2×2 games by agents equipped with two-mem-
ory length strategy encouraged by grouping for interaction and adaptation. Biosystems. 2011; 103
(1):93–104. https://doi.org/10.1016/j.biosystems.2010.10.009 PMID: 21035518
55. Miyaji K, Tanimoto J, Wang Z, Hagishima A, Ikegaya N. Direct reciprocity in spatial populations
enhances R-reciprocity as well as ST-reciprocity. PLOS One. 2013; p. 8: e71961. https://doi.org/10.
1371/journal.pone.0071961 PMID: 23951272
56. Press WH, Dyson FD. Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary
opponent. PNAS. 2012; 109:10409–10413. https://doi.org/10.1073/pnas.1206569109 PMID:
22615375
57. Boerlijst MC, Nowak MA, Sigmund K. Equal pay for all prisoners. American Mathematical Monthly.
1997; 104:303–307. https://doi.org/10.1080/00029890.1997.11990641