Topic 3 Lecture Notes
Topic 3 Lecture Notes
3.1 Example: prisoners’ dilemma Although the strategy pro le (C, c) is not a NE, it is more
Pareto ef cient than the NE (D, d). In reality, people may
Consider a prisoners’ dilemma game: play (C, c) and this can be explained in two ways:
• Explanation 1: factors such as altruism or preferences
2 for cooperation are not included in the payoffs (i.e. the
c d payoff table here does not re ect players’ true
preferences);
1 C 2,2 -2,3 • Explanation 2: players may want to develop a long-
D 3,-2 1,1 term relationship with their opponents if the game is
played repeatedly.
• Nash equilibrium (D, d) is Pareto-dominated by (C, c)
• In real life, people are less opportunistic and more cooperative
Is it possible to accommodate the last observation within the model with rational agents?
If the opponents cooperated in the past we continue to cooperate and we defect otherwise.
The analogy with social norms: if everyone followed the social norm in the past we continue
to follow it in the present. If the deviation from the social norm is detected we switch to a
“punishment regime”.
Usually social norms are self-sustaining and rely on repeated interaction with the same
players. For example, it is difficult to sustain “non-equilibrium” social norm in large anonymous
societies.
1
Total payo↵:
(payo↵ at t = 1) + (payo↵ at t = 2)
where is a discount factor. δ is a common discount factor (i.e both players have the same discount factor).
3.4 Unraveling: t = 2
Pick any subgame at t = 2:
2
c d
1 C u1 + 2 , u2 + 2 u1 2, u2 + 3
D u1 + 3 , u2 2 u1 + 1 , u2 + 1
where (u1 , u2 ) are the stage payo↵ from t = 1. This subgame is strategically equivalent to
an original stage game (payo↵s are positive affine transformation of the original payo↵s).
3.5 Unraveling: t = 1
At time t = 1 players are playing the following subgame:
2
c d
1 C u1 + 2, u2 + 2 u1 2, u2 + 3
D u1 + 3, u2 2 u1 + 1 , u2 + 1
where (u1 , u2 ) = (1, 1) are the (predicted) stage payo↵ from t = 2. This game is strategically
equivalent to an original stage game (payo↵s are positive affine transformation of the original
payo↵s).
Therefore, no matter how many times agents interact (as long as T is finite), cooperation
cannot be self-sustained.
2
Prisoners’ Dilemma Repeated Twice
Unraveling: t = 2
The total utility in the second period following a given history is [(u_1 + δu_stage game), (u_2 + δu_stage game)], where u_1 and u_2 are
the payoffs received in the rst period and u_stage game is the payoffs in the second period. Since this is a positive af ne transformation
of the original transformation, the game in period 2 is strategically equivalent to the stage game, which has an unique NE of (D, d).
Therefore, the players are always going to play (D, d) regardless of the actions played in the rst period.
Unraveling: t = 1
Knowing that the players are going to play (D, d) in the second period, we can write the payoffs for the rst period as [(u_stage game +
δu_1), (u_stage game + δu_2)], where u_1 and u_2 are the payoffs received in the second period and u_stage game is the payoffs in the
rst period. Since this is also a positive af ne transformation of the original game, the unique NE in the rst period is also to play (D, d). In
conclusion, the SPNE for this twice repeated game is s* = {[D, (D D D D)], [d, (d d d d)]}, where each action in the small bracket
correspond to the action the player is going to choose in period 2 after each history.
Generalisation
Suppose that the stage game is repeated three times, the last two periods of this longer repeated game would be strategically
equivalent to the twice repeated game we just solved. It follows that in the last two periods, the players are going to play D or d all the
time. The utility for the rst period is going to be similar expect that there is an additional term of δ^2 for the third period. This is again a
positive af ne transformation of the original game, so the players will play D, d in the rst period as well. This result can be generalised to
any game with nite number of repetitions.
The intuition is that in the last period, the agents are presented with a simple one shot normal form game with a unique prediction of
playing (D, d). In the period before the last period, the agents know that in the last period, there is a unique prediction of playing (D, d).
Thus, what happens in the future does not depend on what they do today and therefore this makes their game in the period before the
last also a one shot game with a unique prediction of playing (D, d). This idea of unraveling continues, which results in the players playing
D or d in every period.
Note that if we consider this game played once, there are two pure strategy NE: (D1, d1)
and (D2, d2).
The first equilibrium Pareto dominates the second one.
Intuition: playing (D2, d2) instead of a better (D1, d1) after certain histories serves as a
punishment for deviating from cooperation at t = 1
3
Multiple Equilibria at t = 2
Intuition
This strategy may be a SPNE because players may want to avoid playing (D2, d2) in the second period because (D2, d2) gives a much
lower payoff than (D1, d1). In order to do that, they must generate a history of (C, c) in the rst period. When the difference between the
payoffs of (D1, d1) and (D2, d2), counting the discount factor, is large enough, the individual temptation to deviate from C or c in the rst
period (in order to get a higher payoff of 3 instead of 2) is not going to be suf cient to actually engage the players in that deviation.
Proof
Since both (D1, d1) and (D2, d2) are NE in the stage game, we do not need to check that the equilibrium action in the second period is a
part of the SPNE. The only thing we need to check is that playing (C, c) in the rst period is a part of the SPNE. We can do this by
applying the one step deviation principle to player 1. Since the game is symmetric, the result for player 1 is going to be exactly the same
as that for player 2. Under the one step deviation principle, we are only allowing player 1 to change his action from C to D1 or D2, but
player 2 is assumed to play according to the strategy pro le. (See lecture notes).
Intuition
The main logic is that we are conditioning what is going to happen in the second period on what happens in the rst period. By doing
this, we create a trade off: a player can gain by deviating from C/c to D/d today and receive a payoff of 3 instead of 2; however, by doing
so, he will get -δ instead of δ in the second period. The strategy pro le is a SPNE as long as the players are patient enough such that the
inequality (i.e. δ ≥ 1/2) holds.
Use one-step deviation principle for t = 1:
u1 (C, c) = 2 + 1
u1 (D1, c) = 3 1
u1 (D2, c) = 3 1
2+ 3
0.5
Stage payo↵:
2
c d In an in nitely repeated game, we can condition
tomorrow’s play on what happens today and try
1 C 2,2 -2,3
to provide incentives in every period for the
D 3,-2 1,1 agent to play (C, c) in the SPNE.
Total payo↵:
1
X
t
ui,t (s1 , s2 )
t=0
where is a discount factor, and ui,t is a stage payo↵ for player i at time t.
Use one-step deviation principle: we can check for deviations at time t keeping the rest
of the agents’ actions constant.
4
As long as delta is strictly less one, the summation converges and we can therefore get some number as the players’ payoffs at the
terminal history.
3.9 Grim trigger strategies
Once d or D is played, players switch to playing (D, d) forever: this is always NE since it is a
NE of a stage game.
Consider a history at which players should play (C, c): By symmetry, the same results apply to player 2.
If player 1 plays C he gets
2
1
If he deviates to D he will get 3 at the current period and then he will get an infinite stream
of 1 because both players will play (D, d) forever:
3+1
1
Player 1 will not deviate (i.e. a proposed grim trigger strategy profile will be a SPNE) if
2
3+1
1 1
or
1
2
If players are sufficiently patient the cooperation can be sustained in equilibrium: on equi-
librium path (C, c) will be played forever.
The intuition is that deviating would allow the agents to gain one extra payoff in the period where they decide to deviate.
However, they will be getting (1, 1) instead of (2, 2) after the period in which they deviation. Thus, the one extra payoff in
one period comes at the cost of losing one payoff in every period after. The strategy pro le here relies on the agents’
patience for this punishment to offset the incentive to deviate in the current period. Speci cally, we need the short term gain
received by deviating from C or c to D or d today to be less than the long term less from generating D or d in the history.
The incentive is going to resent in every period since the 5game is in nite (i.e. that is a future in every period). A problem with
this strategy pro le is that it is not very robust to mistakes (i.e. people deviate).
3.10 Other strategy profiles
Are there any other SPNE in which cooperation is sustained?
There are infinitely many of them. One other example is a strategy profile in which both
players switch to (D, d) for k periods if the deviation from (C, c) is detected or if the players
have not finished a previously prescribed k rounds of (D, d):
Consider a history at which players should play (C, c): A key difference between this strategy pro le and
If player 1 plays C he gets the Grim Trigger strategy pro le is that the
2 punishment in this case is temporary instead of
1 permanent.
If he deviates to D he will get 3 at the current period and then he will get 1 for k periods
and after that he will receive an infinite stream of 2:
k+1 k+1 We do not need to worry about players deviating
3+1 +2 from the punishment because during a punishment,
1 1 deviating would give a player a payoff of -2 instead
of 1. Furthermore, the k periods of punishment will
There is no profitable deviation if
be reset. Thus, deviating from the punishment will
2 k+1 k+1 generate a net loss today for the player and delay
3+1 +2 the time for which the punishment is over. Thus,
1 1 1 deviating is clearly not pro table for the players.
or equivalently if
k+1
1
2 2
This condition is more demanding than 1/2 which was obtained for the grim trigger
strategy profile. Intuitively, it is the case because the punishment (see next section for more
discussion) for deviations is more mild in this case compared to the grim trigger strategies.
Since the length of punishment is shorter in this case, we need the agents to be more patient and put more weight on their future
payoffs in order to sustain the incentives for them to play (C, c).
3.11 Punishments
In this example, cooperation is sustained on the equilibrium path because any deviation from
that path leads to players playing (D, d)—this is called a punishment.
In general a punishment need not be a NE of a stage game as in the example above. In the
following example a punishment will not be a NE of a stage game. In all examples above, the punishment is playing
Consider a repeated Cournot duopoly with P = 90 Q. the NE of the stage game, which is (D, d).
⇤ ⇤
NE of a stage game is q1 = q2 = 30 and
6
Implication of the Prisoners’ Dilemma Example
When we try to ask the players to play something that is not a static NE in a repeated game, we need to deter deviations in the current
period by introducing future punishment. Then, depending on the harshness of the punishment, we need to have the agents to be
patient enough for these deterrents to be effective.
Collusion
Suppose that the two rms collude and jointly decide how much to produce, then, the rms need to solve the following optimization
problem:
Since the function is concave, the rst order condition is both necessary and suf cient to nd the maximum.
We are interested in this monopoly outcome because we are going to look at a repeated game where the rms collude and behave non-
competitively in the markets. The best they can do is to produce the monopoly output solved above and this gives them the highest
pro t they can secure. This pro t is larger than what they get in a static NE, thus, collusion is bene cial to both rms. The task here is to
nd a strategy pro le in which the rms can sustain this collusion on a credible path. If the rms deviate from the monopoly regime, they
will end up playing the NE, which gives them lower payoffs.
Step One: check that playing the monopoly regime can be a part of the SPNE
In the repeated Prisoners’ Dilemma game, the players only have two actions. Therefore, we know what action the player would choose if
he deviates and the payoff he would get from doing that. However, in the Cournot Duopoly case, the players have in nitely many actions
(i.e. they can choose any positive quantity). Using the one step deviation principle, we would require that the payoff on the equilibrium
path (of playing the monopoly regime) is greater than the payoff from any possible deviation. We can check whether this is the case by
substituting in the best available deviation, which will guarantee that the condition holds for any possible deviation.
We can simplify this inequality using the fact that rm 1’s payoffs from not deviating and deviating are the same for all the periods after
the punishment is played. Thus, we can only look at the current and the next period.
Since the game is symmetric, the same results apply to rm 2. The result suggests that the rms have to be patient enough in order to
not deviation. The condition above ensures that the rms are not going to deviate from playing the monopoly regime.
Step Two: check that playing the punishment regime can be a part of the SPNE
Since the punishment here is not a NE of the stage game, we need to check that this can also be a part of the SPNE. We do this by
checking that rm 1 would not want to deviate under certain condition. This same results apply to rm 2 due to symmetry of the game.
We need to nd a condition such that the pro t rm 1 gets from not deviating is greater than the pro t it gets from any other available
actions. We do so by nding the best deviation available and substituting the payoff from this action into the expression above.
For rm 1 and 2 to not deviate from the punishment regime, delta must be equal to or greater than 1/2.
Combining the results together shows that for the strategy pro le considered to be a SPNE, delta must be equal to or greater than both
thresholds. Since 1/2 is greater than 1/8, the condition can be expressed as follows.
The Cournot Duopoly example illustrates that we do not have use static NE as a punishment to induce some cooperation in a repeated
game. Using a static NE is convenient in that we do not need to check that it is a part of the SPNE since it is itself a NE of the stage
game. However, we can also make the punishment harsher than the static NE (as in the previous example). In this case, we would need to
make sure that the punishment itself is credible such that the players would not deviate from it when they are supposed to deliver the
punishment.
3.11.1 Collusion
Consider the following strategy:
• If (q1 , q2 ) = (22.5, 22.5) or (q1 , q2 ) = (45, 45) was played in the previous period, play
(q1 , q2 ) = (22.5, 22.5)
or equivalently
0.125
It is better not to deviate from (q1 , q2 ) = (45, 45):
✓ ◆2
90 45
1012.5
2
or equivalently
0.5