Unbending Strategies Shepherd Cooperation and Suppress Extortion in Spatial Populations
Unbending Strategies Shepherd Cooperation and Suppress Extortion in Spatial Populations
Unbending Strategies Shepherd Cooperation and Suppress Extortion in Spatial Populations
Introduction
a benefit b while a defector does nothing, the payoff matrix becomes R = b − c, S = −c,
T = b, and P = 0. Prior studies have quantified how network structure, in particular
the average degree k, impacts the evolution of cooperation, requiring b/c > k [21].
This celebrated result for the evolution of cooperation on networks states that
cooperators are able to prevail and be favored over defectors because of the network
clustering effect in structured populations. More generally, the impact of population
structure on strategy competition can be written as a more general form: σR + S >
T + σP , which says that cooperators are more abundant than defectors if this condition
holds with the coefficient σ summarizing the effect of population structure [29, 30].
Extending this formula to matrix games with multiple strategies yields similar conditions
that can be obtained under the mutation-selection equilibrium [31, 32, 33]. Besides these
analytical insights, there has been significant interest in studying games on networks
from the perspective of statistical physics [5], with numerous contributions from the
field (for a recent review, for example, see Ref. [34]).
Beyond simple strategies, a surge of curiosity has arisen in exploring the
evolutionary dynamics of prescribed IPD strategies in well-mixed [35, 36] and structured
populations [37, 38, 39, 40, 41]. Part of these efforts is to study extortionate ZD
from an evolutionary perspective [35, 36, 42, 43]. The intuition that the lack of
mutual cooperation among extortionate ZD makes them not evolutionary favorable.
However, prior work has shown the impact of population size on the evolutionary
advantage of extortionate ZD [35], including some other general classifications of IPD
strategies [44, 45, 46]. Interestingly, previous work found that extortionate ZD can
be a catalyst for cooperation [35, 37]. And simulations of a handful of IPD strategies
show that this holds both in well-mixed populations and on networks [38, 39, 40, 41].
Despite much progress made in this direction, it remains largely unknown how the
resilience of cooperation can be bolstered in the wake of increased extortion factors by
extortionate ZD aimed at securing even more inequality. Understanding the potential
interplay between spatial reciprocity and direct reciprocity is especially important in
light of recently discovered unbending strategies.
To address this issue, we use the IPD game with conventional payoff values (namely,
R = 3, S = 0, T = 5, P = 1), the same as in the well-known Axelrod tournament.
Aside from always cooperate (AllC) and extortionate ZD typically considered in previous
studies, we also include a third memory-one strategy called PSO Gambler. This strategy
has been optimized using the machine learning algorithm particle swarm optimization
(PSO) [47]. Notably, the PSO Gambler is found to have the unbending property [14]:
the best response of extortionate ZD when playing against a fixed unbending player is
to offer a fair split by letting the extortion factor be one. In other words, the larger
the extortion factor, the greater the payoff reduction compared to what it would have
been otherwise. Although we focus on these three particular strategies, our method
works for any IPD strategy and can be extended to study more than three strategies
in structured populations. We find that the presence of unbending individuals can
greatly enhance the resilience of cooperation in a synergistic way that promotes direct
Unbending strategies shepherd cooperation 4
qPSO = [1, 0.52173487, 0, 0.12050939] [47]. We consider a square lattice with the von
Neumann neighborhood (degree k = 4) and periodic boundary conditions. Each node
is occupied by an individual who adopts one of these three aforementioned strategies.
An individual i interacts with their immediate neighbors and accrues payoffs from their
pairwise interactions, πi . We use the exponential fitness function as fi = exp(βπi ),
where β is the intensity of selection.
To account for the success of these three strategies, we use the following 3 × 3
expected payoff matrix to characterize their repeated interactions:
AllC ZD PSO
AllC a11 a12 a13
ZD a21 a22 a23 . (1)
PSO a31 a32 a33
Using the method originally introduced by Press and Dyson [13], these payoff entries
can be analytically calculated by quotients of two determinants (see Appendix).
Specifically, for AllC versus extortionate ZD with a given extortion factor χ, we
have
(R − P )(T − S)
a11 = R, a12 = P + ,
(R − S)χ + (T − R)
(2)
(R − P )(T − S)χ
a22 = P, a21 = P + .
(R − S)χ + (T − R)
Moreover, the average payoff of extortion ZD against PSO Gambler, a23 , can be written
as a23 = g(χ)/h(χ), where both g(·) and h(·) are quadratic functions of χ and g(χ)/h(χ)
is monotonically decreasing for χ > 1 with the maximum value of R at χ = 1 [14].
As for evolutionary updating, we consider death-birth updating with mutation. At
each time step, an individual is randomly chosen to die and its neighbors compete for
this vacant site with probability proportional to their fitness. A mutation occurs with
probability µ. The newly produced offspring is identical to its parent with probability
1 − µ; otherwise with probability µ, it randomly chooses one of the three strategies (see
Fig. 1). This process can also be interpreted in a cultural evolution setting with social
imitation and exploration rates.
We perform stochastic agent-based simulations with asynchronous updating and
average the frequencies of strategies over 1 × 107 times. On top of that, closed-form
predictions are feasible under the weak selection limit and with rare mutations. In
the limit of low mutations, the fate of a new mutant is determined, either reaching
fixation or going extinct, before the next mutant arises. Therefore, the population
spends most of the time in homogeneous states with transitions in between given by
the pairwise fixation probabilities, ρij , for 1 ≤ i, j ≤ 3 and i ̸= j: the probability that
a single mutant of strategy j invades and takes over the entire population of strategy
i [21]. The long-term frequencies of strategies can be approximated by the stationary
distribution of the corresponding embedded Markov chain with the following transition
Unbending strategies shepherd cooperation 6
Results
We run agent-based simulations of the spatial system with the three aforementioned
IPD strategies and are interested in their long-term frequencies. As depicted in Fig. 1,
the spatial snapshots show the rise and fall of the respective three strategies and
spatial population structures promote clustering (‘like-with-alike’) as foreseen. From
time to time, mutations arise in the population, following which the mutants can form
clusters and are likely to succeed in invading and taking over the system. Accordingly,
Fig 1 shows possible transitions among homogeneous population states from AllC to
PSO Gambler to extortionate ZD, along with the evolutionary dynamics of multiple
strategies. Most of the time, evolutionary dynamics involve pairwise competition,
but occasionally, all three strategies are present due to mutations, in particular as a
consequence of neutral drift when χ = 1.
As a base case for comparison, we consider the special limiting case where the
extortion factor χ = 1 for the extortionate ZD strategy, which becomes the well-
known Tit-for-Tat strategy [13], with the memory-one strategy representation as qTFT =
[1, 0, 1, 0]. This, in fact, leads to neutral dynamics among the three strategies considered
since they each receive an equal payoff of mutual cooperation R = 3. As such, the long-
term abundance of the three strategies should be equal to 1/3. It can also be observed
from Fig. 2a that our stochastic simulations confirm neutral drift dynamics, with the
three strategies being roughly equally present in the population.
In contrast, for an extortion factor χ > 1, the extortionate ZD strategy is able to
ensure a higher payoff (or at least equal) than any other strategy in a Prisoner’s Dilemma
satisfying T + S > 2P [14], which is why it is called extortioners in prior work [35]. The
resulting evolutionary dynamics is thereby no longer neutral (compared with Fig. 2a).
The extortion factor χ has a dual impact on extortioners: increasing χ makes them
more fiercely aggressive against AllC players, but on the other hand, it reduces their
absolute payoffs against PSO Gamblers. This stems from the PSO Gambler strategy
being demonstrated to have the unbending property: against a fixed unbending player,
any extortioner who intends to demand an unfairer payoff by increasing the extortion
factor will lead to a lower payoff. As a consequence, unbending individuals are able to
turn the tables against extortioners, suppress extortion, and shepherd cooperation. As
shown in Fig. 2b, the unbending strategy, PSO Gambler, emerges as the most abundant
in the long run, trailed by AllC, both are above 1/3, while extortioners are suppressed
and its abundance is below 1/3.
In order to understand the underlying evolutionary dynamics in Fig. 2, we now
consider pairwise competition dynamics that arise under rare mutations. In this limit,
Unbending strategies shepherd cooperation 7
the dynamics can be studied by the stochastic transitions between homogenous states.
The transitions are determined by the fixation probabilities ρij (see Model and Methods
and also the Appendix for technical details).
Under weak selection, the pairwise fixation probability ρij , the likelihood of a single
individual of strategy j taking over a spatial population of strategy i, has a closed-form
expression up to the first order of β:
1 (∗)
ρij = + β + O(β 2 ),
N 6(k − 1) (4)
2 2 2
(∗) = (k + 1) ajj + (2k − 2k − 1)aji − (k − k + 1)aij − (2k − 1)(k + 1)aii ,
Unbending strategies shepherd cooperation 8
where N is the total population size, and k is the degree of the lattice. This formula
is derived using the discrete random walk approach as Ref. [24] (see Appendix), but it
can also be obtained by the diffusion approximation method [21].
In the limit of low mutation (and weak selection), the long-term frequencies of
strategies, λi , can be approximated using the embedded Markov chain approach and
are related to the normalized left eigenvector of the transition matrix corresponding to
the largest eigenvalue one:
1
[λ1 , λ2 , λ3 ] = [γ1 , γ2 , γ3 ], (5)
γ1 + γ2 + γ3
where
γ1 = ρ21 ρ31 + ρ21 ρ32 + ρ31 ρ23 ,
γ2 = ρ31 ρ12 + ρ12 ρ32 + ρ32 ρ13 , (6)
γ3 = ρ21 ρ13 + ρ12 ρ23 + ρ13 ρ23 .
Under weak selection (β → 0), the condition for the stationary distribution of strategy
i, λi , to be greater than 1/3 can be expressed equivalently as
3
X
(ρji − ρij ) > 0. (7)
j=1
Intuitively, this inequality indicates that the influx into strategy i must exceed the
P P
outflux from it in the embedded Markov chain: j ρji > j ρij .
After substituting the fixation probabilities under weak selection in Eq. 4 and
simplifying the algebra, we obtain the following inequality for λi > 1/3, namely, natural
selection favors strategy i and its long-run abundance is greater than 1/3:
k+1 k+1
aii + āi∗ − ā∗i − ā∗∗ > 0, (8)
k−1 k−1
where ā∗∗ = 1/3 3i aii is the average payoff for two players using the same strategies,
P
āi∗ = 1/3 3j=1 aij is the average payoff of strategy i, and ā∗i = 1/3 3j=1 aji is the
P P
a b
AllC AllC
25
1
ρ 31
ρ 31
09
0.0
0.0
63
=
=
=
ρ 13
11
ρ 13
0.0
0.0
0.0
=
2
0.0
ρ1
1
1
=
2
=
ρ1
0.0
0.0
χ=1 χ=4
=
1
ρ2
1
1
1
ρ2
ρ32 = 0.01 ρ32 = 0.00707
ZD PSO ZD PSO
ρ23 = 0.01 ρ23 = 0.01242
Even though extortionate ZD can now secure the highest payoffs when interacting
with AllC or PSO Gambler, it is unable to exploit the unbending PSO Gambler to the
same extent as AllC because unbending leads to a monotonic decrease in extortionate
ZD’s payoff with respect to χ. This makes PSO Gambler harder to invade by extortion
ZD and meanwhile easier to take over extortionate ZD, as compared to AllC (Fig. 2).
Altogether, the presence of spatial structure can help natural selection favor AllC over
extortion because of spatial assortment, and extortioners fail to do well when against
each other since they get neutralized and together receive the payoff P . Moreover,
unbending PSO Gambler, compared to AllC, can even further diminish the advantage
of extortion, if any, in spatial populations.
As demonstrated in Fig. 4, an increase in the extortion factor helps extortionate
ZD players increase in abundance as more benefits are squeezed from AllC, but their
extortion is greatly mitigated by the presence of an unbending strategy, the PSO
Gambler for example. These PSO Gamblers are able to make extortion backfire: the
greater χ, the less absolute payoff extortionate ZD can reap from unbending players.
Therefore, the presence of a third type of unbending greatly increases the system’s
resilience against extortion. The abundance of PSO Gamblers is only slightly impacted
by increases in χ, quickly reaching a plateau – in other words, it saturates much faster
Unbending strategies shepherd cooperation 10
than the impacts on AllC and extortionate ZD. For the limit χ → ∞, the abundance
of extortionate ZD reaches a limit that is strictly below 1/3. Our simulation results
agree well with the analytical predictions (Fig. 4). This is in sharp contrast with the
scenario where PSO Gambler is absent and extortionate ZD can achieve a half-half
split in abundance with AllC at the limit χ → ∞ (see Appendix). Taken together,
unbending strategies such as the PSO Gambler can help shepherd cooperation and
suppress extortion in spatial populations.
We also note the jump discontinuity of the long-term abundance at χ = 1 versus
χ > 1. This is due to the fact that the stochastic memory-one ZD strategy reaches the
boundary deterministic strategy Tit-for-Tat, thereby causing the pairwise moves CC to
become the only attracting state, as opposed to being ergodic among all four possible
pairwise states CC, CD, DC, and DD.
game. A recent study reveals that unbending players, when fixed, can render the best
response of an extortionate ZD player to be fair by letting their extortion factor approach
one [14]. Moreover, unbending players can dominate ZD players even if the underlying
game remains a Prisoner’s Dilemma game but is of a more adversarial nature featuring
T + S < 2P . In this work, we deepen our understanding of unbending strategies
by showing their capacity to enhance spatial cooperation while suppressing extortion.
The intuition is that unbending strategies are neutral with AllC while they reduce the
payoffs of extortionate ZD strategies due to their unbending properties. Therefore, the
presence of unbending individuals can drastically change the invasion dynamics among
them under the mutation-selection equilibrium.
Among the body of previous research, certain work has demonstrated that
extortionate ZD players cannot be evolutionary successful unless they become more
generous [36]. These studies typically have been done in well-mixed populations
with multiple strategies [35]. It is evident that population size has an effect on the
evolutionary advantage of extortion. On the one hand, extortion can dominate in small-
sized populations [35]. On the other hand, population structure promotes assortment
(termed as network or spatial reciprocity [16, 8]), which can further strengthen the
advantage of cooperation. Depending on the payoff structure parameters, such as the
benefit-to-cost ratio in donation games or other parameterizations of the Prisoner’s
Dilemma, increasing the extortion factor χ can help ZD players dominate other strategies
and subsequently provide an evolutionary pathway to cooperation, thereby acting as a
catalyst for the evolution of cooperation. These insights are obtained in both well-mixed
and structured populations [35, 37]. Here, the contribution of our current work lies in
demonstrating how unbending players can further foster direct reciprocity and suppress
extortion, thus increasing the resilience of cooperation against extortion.
In addition to fixation probabilities, another important quantity of interest is the
conditional fixation time [51]. Prior work has shown that fixation time strongly depends
on the type of game interactions (payoff structure in general) in finite, well-mixed
populations [51, 52]. For instance, the fixation time is exponential when the underlying
game is of the snowdrift type [51]. Moreover, spatial structure can promote exceedingly
long co-existence even if the underlying game is of the Prisoner’s Dilemma type [16].
In our current study, we have a snowdrift game both between AllC and extortionate
ZD, and between extortionate ZD and PSO Gambler, and a neutral game between
AllC and PSO Gambler. Thus, the fixation time can be prohibitively long for non-
weak selection strength in well-mixed populations, not to mention the role of spatial
structure in promoting coexistence [19]. That said, our analytical approach based on
pairwise invasion dynamics can no longer be employed because fixation takes exceedingly
long and, as a consequence, we can only observe the co-existence of AllC, extortionate
ZD, and PSO Gambler while one strategy being fixed in the population becomes an
extremely rare event. In this case, we are compelled to rely on agent-based simulations
and extended pair approximation methods for multiple strategies to understand the
dynamics. Previous studies utilizing simulations have focused on non-weak selection
Unbending strategies shepherd cooperation 12
and offered some insights into this regime [37, 38, 39, 53, 40, 41].
In conclusion, we have analytically and by means of agent-based simulations
investigated how introducing unbending strategies can help suppress extortion
and shepherd cooperation in spatial populations. We have derived closed-form
approximations for the long-term frequencies of three strategies, AllC, extortionate
ZD, and the unbending strategy PSO Gambler, under weak selection and in the limit
of low mutation. We find that the presence of unbending strategies can restrain the
abundance of extortion ZD no matter how large the extortion factor is, whereas the
impact of the extortion factor has little effect on the long-run abundance of unbending
strategies. Therefore, unbending individuals can strengthen the resilience of spatial
cooperation. Although we demonstrate our general method through a particular
candidate of unbending strategy, the PSO Gambler, our approach can apply to study
broader contexts such as the evolutionary dynamics of multiple powerful strategies in
the repeated multiplayer games [54, 55, 56, 57, 58], multiplex networks [59], higher order
networks [60], and enforcing fairness in human-AI systems [61, 62], providing insights
into understanding the interplay between network reciprocity and direct reciprocity.
Acknowledgments
We would like to express our heartfelt gratitude to Professor Long Wang on the occasion
of his 60th birthday. X.C. gratefully acknowledges the support by Beijing Natural
Science Foundation (grant no. 1244045). F.F. is grateful for support from the Bill &
Melinda Gates Foundation (award no. OPP1217336).
Author Contributions
Z.C., X.C., and F.F. conceived the model; Z.C. and Y.G. performed calculations and
analyses and plotted the figures; Z.C., X.C., and F.F. wrote the manuscript. All authors
give final approval of publication.
Competing Interests
Appendix A.
Similarly, if a focal B individual is selected, let the numbers of its neighbors of type A
and type B be nA B A
B (i) and nB (i), respectively. On average, we have nB (i) = kqA|B and
nB
B (i) = kqB|B . The payoffs of these two types of neighbors are
We can further get the condition for natural selection to favor strategy A:
N −1
X Y T − (j) i
1 A
ρA > ⇔1+ + < N. (A.8)
N i=1 j=1
TA (j)
We know that TA− (i) = TA+ (i) = pAB when the selection strength β = 0. In the
limit of weak selection where β ≪ 1, we can take the Taylor expansion of the ratio
TA− (i)/TA+ (i) to the first order and get
TA− (j)
= 1 + β{qA|A [πAB (i) − πAA (i)] + qB|B [πBB (i) − πBA (i)]} + O(β 2 ). (A.9)
TA+ (j)
As such, the condition in A.8 is simplified as
N
X −1 X
i
{qA|A [πAA (i) − πAB (i)] + qB|B [πBA (i) − πBB (i)]} > 0. (A.10)
i=1 j=1
where
pA (k 2 − k − 2) + (k + 1)
Fa = · a,
k(k − 1)
pA (−k 2 + k + 2) + (k 2 − k − 1)
Fb = · b,
k(k − 1)
(A.12)
pA (k 2 − k − 2) + 1
Fc = · c,
k(k − 1)
pA (−k 2 + k + 2) + (k 2 − 1)
Fd = · d.
k(k − 1)
If the population size is large enough, that is, N ≫ k, a discrete sum can be
approximated by a continuous integral. Therefore, we can replace discrete variables
i, j ∈ {0, 1, 2, · · · , N } by continuous variables u, v ∈ (0, 1). And the condition in A.11 is
simplified as
Z 1Z u
(Ha + Hb − Hc − Hd )dvdu > 0, (A.13)
0 0
where
Ha = [v(k 2 − k − 2) + (k + 1)] · a,
Hb = [v(−k 2 + k + 2) + (k 2 − k − 1)] · b,
(A.14)
Hc = [v(k 2 − k − 2) + 1] · c,
Hd = [v(−k 2 + k + 2) + (k 2 − 1)] · d,
By integrating the four parts, we finally get the condition for natural selection to favor
strategy A, determined by the payoff structure and the network structure:
(k + 1)2 a + (2k 2 − 2k − 1)b − (k 2 − k + 1)c − (2k − 1)(k + 1)d. (A.15)
Unbending strategies shepherd cooperation 15
In like manner, we obtain the fixation probabilities for both strategies as linear
functions of the selection strength:
1 (k + 1)2 a + (2k 2 − 2k − 1)b − (k 2 − k + 1)c − (2k − 1)(k + 1)d
ρA = +β· ,
N 6(k − 1)
(A.16)
1 (k + 1)2 d + (2k 2 − 2k − 1)c − (k 2 − k + 1)b − (2k − 1)(k + 1)a
ρB = +β· .
N 6(k − 1)
It always holds that a11 = a13 = a31 = a33 = R. When χ = 1, it follows that
q2 = [1, 0, 1, 0], namely, extortionate ZD degenerates to Tit-for-Tat. We further have
Unbending strategies shepherd cooperation 16
a11 = a12 = a21 = a22 = a23 = a32 = R. In contrast, when χ > 1, we have
(R − P )(T − S)
a11 = R, a12 = P + ,
(R − S)χ + (T − R)
(A.20)
(R − P )(T − S)χ
a22 = P, a21 =P+ .
(R − S)χ + (T − R)
And the remaining two payoffs a23 and a32 become quadratic rational functions of χ.
In the conventional IPD game, the four payoffs decided by χ can be (approximately)
written as
3(χ + 4) 13χ + 2
a12 = , a21 = ,
3χ + 2 3χ + 2
0.455742281814226(χ − 0.598718917613494)(4χ + 1)
a23 = , (A.21)
χ2 − 0.422337214095301χ − 0.272861525678518
χ2 + 0.400631913161605χ − 0.48622813248306
a32 = 2 .
χ − 0.422337214095301χ − 0.272861525678518
Moreover, we can get the extreme values of these payoffs by letting χ → +∞:
13
lim a12 = lim a32 = 1, lim a21 = , lim a23 = 1.82296912725691. (A.22)
χ→+∞ χ→+∞ χ→+∞ 3 χ→+∞
• when χ = 1,
1
ρ21 = ρ12 = ρ31 = ρ23 = ρ31 = ρ13 = , (A.26)
100
• when χ > 1,
28χ + 34.5 28χ + 4.5
ρ21 = , ρ12 = ,
900(3χ + 2) 900(3χ + 2)
2.14880483211515χ2 − 1.03438540268879χ − 0.454016698936298
ρ32 =
300(χ2 − 0.422337214095301χ − 0.272861525678518)
(A.27)
3.655023355761χ2 + 1.25725839044252χ + 1.12775971437606
ρ23 = ,
300(χ2 − 0.422337214095301χ − 0.272861525678518)
1
ρ31 = ρ13 = .
100
It is clear that these fixation probabilities can be seen as functions of the extortion
factor. The curves of ρij ’s with respect to χ are given in Fig A1.
Figure A1. Fixation probabilities between the three strategies considered in repeated
games: AllC, extortionate zero-determinant strategy (ZD extortioner) with χ ≥ 1, and
unbending strategy (PSO Gambler). Simulation parameters are as in Fig. 1, except
that we vary the extortion factor χ for the extortionate ZD strategy.
As before, these abundances can be seen as functions of the extortion factor. The curves
of λ̃i ’s and λi ’s with respect to χ are given in Fig 4. In particular, we can get the extreme
values of these abundances by letting χ → +∞:
1
lim λ̃1 = lim λ̃2 = ,
χ→+∞ χ→+∞ 2
lim λ1 = 0.332645621401128,
χ→+∞
(A.30)
lim λ2 = 0.276940954892473,
χ→+∞
lim λ3 = 0.390413423706399,
χ→+∞
It can be observed from Fig 4 that the abundance of PSO Gambler is only slightly
impacted by increases in the extortion factor, compared with those of AllC and
extortionate ZD. To understand the trends in a more intuitive way, we differentiate
λi ’s with respective to χ and get their corresponding derivatives. The curves of λ̇i ’s are
shown in Fig A2.
Appendix B.
Data Availability: All the data and analyses pertaining to this work have been included
in the main text.
Code Availability: The source code for reproducing the results is available at the GitHub
repository (https://github.com/fufeng/unbending3S).
Reference
[1] Perc, M. et al. Statistical physics of human cooperation. Physics Reports 687, 1–51 (2017).
Unbending strategies shepherd cooperation 19
[2] Wang, J., Fu, F., Wu, T. & Wang, L. Emergence of social cooperation in threshold public goods
games with collective risk. Physical Review E 80, 016101 (2009).
[3] Vasconcelos, V. V., Santos, F. C., Pacheco, J. M. & Levin, S. A. Climate policies under wealth
inequality. Proceedings of the National Academy of Sciences 111, 2212–2216 (2014).
[4] Glaubitz, A. & Fu, F. Social dilemma of non-pharmaceutical interventions. arXiv preprint
arXiv:2404.07829 (2024).
[5] Hauert, C. & Szabó, G. Game theory and physics. American Journal of Physics 73, 405–414
(2005).
[6] Axelrod, R. & Hamilton, W. D. The evolution of cooperation. science 211, 1390–1396 (1981).
[7] Rapoport, A. Prisoner’s dilemma. In Game theory, 199–204 (Springer, 1989).
[8] Nowak, M. A. Five rules for the evolution of cooperation. science 314, 1560–1563 (2006).
[9] Nowak, M. A. & Sigmund, K. Tit for tat in heterogeneous populations. Nature 355, 250–253
(1992).
[10] Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the
prisoner’s dilemma game. Nature 364, 56–58 (1993).
[11] Molander, P. The optimal level of generosity in a selfish, uncertain environment. Journal of
Conflict Resolution 29, 611–618 (1985).
[12] Boerlijst, M. C., Nowak, M. A. & Sigmund, K. Equal pay for all prisoners. The American
mathematical monthly 104, 303–305 (1997).
[13] Press, W. H. & Dyson, F. J. Iterated prisoner’s dilemma contains strategies that dominate any
evolutionary opponent. Proceedings of the National Academy of Sciences 109, 10409–10413
(2012).
[14] Chen, X. & Fu, F. Outlearning extortioners: unbending strategies can foster reciprocal fairness
and cooperation. PNAS Nexus 2, pgad176 (2023).
[15] Hilbe, C., Röhl, T. & Milinski, M. Extortion subdues human players but is finally punished in
the prisoner’s dilemma. Nature communications 5, 3976 (2014).
[16] Nowak, M. A. & May, R. M. Evolutionary games and spatial chaos. nature 359, 826–829 (1992).
[17] Lindgren, K. & Nordahl, M. G. Evolutionary dynamics of spatial games. Physica D: Nonlinear
Phenomena 75, 292–309 (1994).
[18] Perc, M. Coherence resonance in a spatial prisoner’s dilemma game. New Journal of Physics 8,
22 (2006).
[19] Hauert, C. & Doebeli, M. Spatial structure often inhibits the evolution of cooperation in the
snowdrift game. Nature 428, 643–646 (2004).
[20] Santos, F. C. & Pacheco, J. M. Scale-free networks provide a unifying framework for the emergence
of cooperation. Physical review letters 95, 098104 (2005).
[21] Ohtsuki, H., Hauert, C., Lieberman, E. & Nowak, M. A. A simple rule for the evolution of
cooperation on graphs and social networks. Nature 441, 502–505 (2006).
[22] Poncela, J., Gómez-Gardenes, J., Florı́a, L. M. & Moreno, Y. Robustness of cooperation in the
evolutionary prisoner’s dilemma on complex networks. New Journal of Physics 9, 184 (2007).
[23] Szolnoki, A. & Perc, M. Coevolution of teaching activity promotes cooperation. New Journal of
Physics 10, 043036 (2008).
[24] Fu, F., Wang, L., Nowak, M. A. & Hauert, C. Evolutionary dynamics on graphs: Efficient method
for weak selection. Physical Review E 79, 046707 (2009).
[25] Jackson, M. O. & Zenou, Y. Games on networks. In Handbook of game theory with economic
applications, vol. 4, 95–163 (Elsevier, 2015).
[26] Su, Q., Li, A., Zhou, L. & Wang, L. Interactive diversity promotes the evolution of cooperation
in structured populations. New Journal of Physics 18, 103007 (2016).
[27] Perez-Martinez, H., Gracia-Lazaro, C., Dercole, F. & Moreno, Y. Cooperation in costly-access
environments. New Journal of Physics 24, 083005 (2022).
[28] Wu, T., Fu, F. & Wang, L. Evolutionary games and spatial periodicity. Journal of Automation
and Intelligence 2, 79–86 (2023).
Unbending strategies shepherd cooperation 20
[29] Tarnita, C. E., Ohtsuki, H., Antal, T., Fu, F. & Nowak, M. A. Strategy selection in structured
populations. Journal of theoretical biology 259, 570–581 (2009).
[30] Allen, B. et al. Evolutionary dynamics on any population structure. Nature 544, 227–230 (2017).
[31] Antal, T., Traulsen, A., Ohtsuki, H., Tarnita, C. E. & Nowak, M. A. Mutation-selection
equilibrium in games with multiple strategies. Journal of theoretical biology 258, 614–622 (2009).
[32] Tarnita, C. E., Wage, N. & Nowak, M. A. Multiple strategies in structured populations.
Proceedings of the National Academy of Sciences 108, 2334–2337 (2011).
[33] McAvoy, A. & Allen, B. Fixation probabilities in evolutionary dynamics under weak selection.
Journal of Mathematical Biology 82, 14 (2021).
[34] Jusup, M. et al. Social physics. Physics Reports 948, 1–148 (2022).
[35] Hilbe, C., Nowak, M. A. & Sigmund, K. Evolution of extortion in iterated prisoner’s dilemma
games. Proceedings of the National Academy of Sciences 110, 6913–6918 (2013).
[36] Stewart, A. J. & Plotkin, J. B. From extortion to generosity, evolution in the iterated prisoner’s
dilemma. Proceedings of the National Academy of Sciences 110, 15348–15353 (2013).
[37] Szolnoki, A. & Perc, M. Evolution of extortion in structured populations. Physical Review E 89,
022804 (2014).
[38] Szolnoki, A. & Perc, M. Defection and extortion as unexpected catalysts of unconditional
cooperation in structured populations. Scientific reports 4, 5496 (2014).
[39] Wu, Z.-X. & Rong, Z. Boosting cooperation by involving extortion in spatial prisoner’s dilemma
games. Physical Review E 90, 062102 (2014).
[40] Xu, X., Rong, Z., Wu, Z.-X., Zhou, T. & Tse, C. K. Extortion provides alternative routes to the
evolution of cooperation in structured populations. Physical Review E 95, 052302 (2017).
[41] Mao, Y., Xu, X., Rong, Z. & Wu, Z.-X. The emergence of cooperation-extortion alliance on
scale-free networks with normalized payoff. Europhysics Letters 122, 50005 (2018).
[42] Hilbe, C., Nowak, M. A. & Traulsen, A. Adaptive dynamics of extortion and compliance. PloS
one 8, e77886 (2013).
[43] Chen, X., Wang, L. & Fu, F. The intricate geometry of zero-determinant strategies underlying
evolutionary adaptation from extortion to generosity. New Journal of Physics 24, 103001 (2022).
[44] Akin, E. The iterated prisoner’s dilemma: good strategies and their dynamics. Ergodic Theory,
Advances in Dynamical Systems 77–107 (2016).
[45] Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nature human
behaviour 2, 469–477 (2018).
[46] Chen, X. & Fu, F. Identifying bridges and catalysts for persistent cooperation using network-based
approach. In 2023 42nd Chinese Control Conference (CCC), 8064–8069 (IEEE, 2023).
[47] Harper, M. et al. Reinforcement learning produces dominant strategies for the iterated prisoner?s
dilemma. PloS one 12, e0188046 (2017).
[48] Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. Journal of Economic
Theory 131, 251–262 (2006).
[49] Wu, B., Gokhale, C. S., Wang, L. & Traulsen, A. How small are small mutation rates? Journal
of mathematical biology 64, 803–827 (2012).
[50] McAvoy, A. Comment on “imitation processes with small mutations”[j. econ. theory 131 (2006)
251–262]. Journal of Economic Theory 159, 66–69 (2015).
[51] Antal, T. & Scheuring, I. Fixation of strategies for an evolutionary game in finite populations.
Bulletin of mathematical biology 68, 1923–1944 (2006).
[52] Altrock, P. M. & Traulsen, A. Fixation times in evolutionary games under weak selection. New
Journal of Physics 11, 013012 (2009).
[53] Hao, D., Rong, Z. & Zhou, T. Extortion under uncertainty: Zero-determinant strategies in noisy
games. Physical Review E 91, 052803 (2015).
[54] Hilbe, C., Wu, B., Traulsen, A. & Nowak, M. A. Cooperation and control in multiplayer social
dilemmas. Proceedings of the National Academy of Sciences 111, 16425–16430 (2014).
[55] Pan, L., Hao, D., Rong, Z. & Zhou, T. Zero-determinant strategies in iterated public goods game.
Unbending strategies shepherd cooperation 21