0% found this document useful (0 votes)
11 views13 pages

Topic 3 Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views13 pages

Topic 3 Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

ECON0027 Lecture notes

Section 3: Repeated games


Reading: Osborne: Ch 14.1-14.6

3.1 Example: prisoners’ dilemma Although the strategy pro le (C, c) is not a NE, it is more
Pareto ef cient than the NE (D, d). In reality, people may
Consider a prisoners’ dilemma game: play (C, c) and this can be explained in two ways:
• Explanation 1: factors such as altruism or preferences
2 for cooperation are not included in the payoffs (i.e. the
c d payoff table here does not re ect players’ true
preferences);
1 C 2,2 -2,3 • Explanation 2: players may want to develop a long-
D 3,-2 1,1 term relationship with their opponents if the game is
played repeatedly.
• Nash equilibrium (D, d) is Pareto-dominated by (C, c)
• In real life, people are less opportunistic and more cooperative

Is it possible to accommodate the last observation within the model with rational agents?

• Preferences for cooperation or altruism


• Social norms and repeated interaction

3.2 Repeated interaction


Idea: if players play the same game with the same opponents over and over again we can
condition current actions on the history.

If the opponents cooperated in the past we continue to cooperate and we defect otherwise.

Is it possible to sustain cooperation using such or similar strategies?

The analogy with social norms: if everyone followed the social norm in the past we continue
to follow it in the present. If the deviation from the social norm is detected we switch to a
“punishment regime”.
Usually social norms are self-sustaining and rely on repeated interaction with the same
players. For example, it is difficult to sustain “non-equilibrium” social norm in large anonymous
societies.

3.3 Prisoners’ dilemma repeated twice


The same prisoners’ dilemma game played in two periods.
Stage payo↵:
2
c d
1 C 2,2 -2,3
D 3,-2 1,1

1
Total payo↵:
(payo↵ at t = 1) + (payo↵ at t = 2)
where is a discount factor. δ is a common discount factor (i.e both players have the same discount factor).

Important: Actions are observable at the end of each stage.

3.4 Unraveling: t = 2
Pick any subgame at t = 2:
2
c d
1 C u1 + 2 , u2 + 2 u1 2, u2 + 3
D u1 + 3 , u2 2 u1 + 1 , u2 + 1
where (u1 , u2 ) are the stage payo↵ from t = 1. This subgame is strategically equivalent to
an original stage game (payo↵s are positive affine transformation of the original payo↵s).

Unique equilibrium: (D, d)

3.5 Unraveling: t = 1
At time t = 1 players are playing the following subgame:
2
c d
1 C u1 + 2, u2 + 2 u1 2, u2 + 3
D u1 + 3, u2 2 u1 + 1 , u2 + 1
where (u1 , u2 ) = (1, 1) are the (predicted) stage payo↵ from t = 2. This game is strategically
equivalent to an original stage game (payo↵s are positive affine transformation of the original
payo↵s).

Unique equilibrium: (D, d) at both t = 1 and t = 2

3.6 Unraveling: General case


By induction if players play (D, d) at all times after t and no matter the history, they also play
(D, d) at time t.
The “no matter the history” part of the statement implies that at time t, when we look at
the subgame from the point of view of one-step deviation principle, the future payo↵s are the
same for all possible action profiles—i.e., the only thign that changes from one action profile to
the other is the current stage payo↵s.

Therefore, no matter how many times agents interact (as long as T is finite), cooperation
cannot be self-sustained.

We have two posibilities to upset this result:

2
Prisoners’ Dilemma Repeated Twice

The unique NE of the stage game is (D, d).

Unraveling: t = 2

The total utility in the second period following a given history is [(u_1 + δu_stage game), (u_2 + δu_stage game)], where u_1 and u_2 are
the payoffs received in the rst period and u_stage game is the payoffs in the second period. Since this is a positive af ne transformation
of the original transformation, the game in period 2 is strategically equivalent to the stage game, which has an unique NE of (D, d).
Therefore, the players are always going to play (D, d) regardless of the actions played in the rst period.

Unraveling: t = 1

Knowing that the players are going to play (D, d) in the second period, we can write the payoffs for the rst period as [(u_stage game +
δu_1), (u_stage game + δu_2)], where u_1 and u_2 are the payoffs received in the second period and u_stage game is the payoffs in the
rst period. Since this is also a positive af ne transformation of the original game, the unique NE in the rst period is also to play (D, d). In
conclusion, the SPNE for this twice repeated game is s* = {[D, (D D D D)], [d, (d d d d)]}, where each action in the small bracket
correspond to the action the player is going to choose in period 2 after each history.

Generalisation

Suppose that the stage game is repeated three times, the last two periods of this longer repeated game would be strategically
equivalent to the twice repeated game we just solved. It follows that in the last two periods, the players are going to play D or d all the
time. The utility for the rst period is going to be similar expect that there is an additional term of δ^2 for the third period. This is again a
positive af ne transformation of the original game, so the players will play D, d in the rst period as well. This result can be generalised to
any game with nite number of repetitions.

The intuition is that in the last period, the agents are presented with a simple one shot normal form game with a unique prediction of
playing (D, d). In the period before the last period, the agents know that in the last period, there is a unique prediction of playing (D, d).
Thus, what happens in the future does not depend on what they do today and therefore this makes their game in the period before the
last also a one shot game with a unique prediction of playing (D, d). This idea of unraveling continues, which results in the players playing
D or d in every period.

Two ways to end unraveling:


• Extend the set of strategies in the stage game; and
• Play the game in nitely many times.
• Multiple inefficient equilibria—can use the worst as a punishment for a deviation from
cooperation

• Infinite time horizon

3.7 Multiple equilibria at t = 2


The idea here is the following: if we have a stage game with multiple Nash equilibria, we can
select di↵erent equilibria in subgames following di↵erent histories and, by doing that, make some
histories more attractive than the others.
Consider the following game repeated twice.
Stage payo↵:
c d1 d2
C 2,2 -2,3 -3,3
D1 3,-2 1,1 -2,-2
D2 3,-3 -2,-2 -1,-1

Note that if we consider this game played once, there are two pure strategy NE: (D1, d1)
and (D2, d2).
The first equilibrium Pareto dominates the second one.

3.7.1 Is (C, c) sustainable in the first period?


c d1 d2 Our aim is not to nd all the SPNE of the repeated
C 2,2 -2,3 -3,3 game. Instead, we are going to look for a SPNE in
D1 3,-2 1,1 -2,-2 which (C, c) is played as a part of the SPNE.
D2 3,-3 -2,-2 -1,-1

Consider the following strategy profile:

• Play (C, c) in the first period

• If (C, c) was played at t = 1, play (D1, d1) at t = 2

• Otherwise play (D2, d2) at t = 2

Intuition: playing (D2, d2) instead of a better (D1, d1) after certain histories serves as a
punishment for deviating from cooperation at t = 1

3.7.2 Checking equilibrium conditions


c d1 d2
C 2,2 -2,3 -3,3
D1 3,-2 1,1 -2,-2
D2 3,-3 -2,-2 -1,-1

3
Multiple Equilibria at t = 2

Intuition

This strategy may be a SPNE because players may want to avoid playing (D2, d2) in the second period because (D2, d2) gives a much
lower payoff than (D1, d1). In order to do that, they must generate a history of (C, c) in the rst period. When the difference between the
payoffs of (D1, d1) and (D2, d2), counting the discount factor, is large enough, the individual temptation to deviate from C or c in the rst
period (in order to get a higher payoff of 3 instead of 2) is not going to be suf cient to actually engage the players in that deviation.

Proof

Since both (D1, d1) and (D2, d2) are NE in the stage game, we do not need to check that the equilibrium action in the second period is a
part of the SPNE. The only thing we need to check is that playing (C, c) in the rst period is a part of the SPNE. We can do this by
applying the one step deviation principle to player 1. Since the game is symmetric, the result for player 1 is going to be exactly the same
as that for player 2. Under the one step deviation principle, we are only allowing player 1 to change his action from C to D1 or D2, but
player 2 is assumed to play according to the strategy pro le. (See lecture notes).

Intuition

The main logic is that we are conditioning what is going to happen in the second period on what happens in the rst period. By doing
this, we create a trade off: a player can gain by deviating from C/c to D/d today and receive a payoff of 3 instead of 2; however, by doing
so, he will get -δ instead of δ in the second period. The strategy pro le is a SPNE as long as the players are patient enough such that the
inequality (i.e. δ ≥ 1/2) holds.
Use one-step deviation principle for t = 1:

u1 (C, c) = 2 + 1
u1 (D1, c) = 3 1
u1 (D2, c) = 3 1

There is no profitable deviation from C if

2+ 3

or, equivalently when agents are patient enough:

0.5

If the above condition is satisfied, the proposed strategy profile is a SPNE

3.8 Infinitely repeated games


The reason for “unraveling” in the prisoners’ dilemma played twice is that in the last period
there is no future. If the stage game is played infinitely many times, at each period there is a
future and, therefore, a possibility of long-term consequences of present actions.

Stage payo↵:
2
c d In an in nitely repeated game, we can condition
tomorrow’s play on what happens today and try
1 C 2,2 -2,3
to provide incentives in every period for the
D 3,-2 1,1 agent to play (C, c) in the SPNE.
Total payo↵:
1
X
t
ui,t (s1 , s2 )
t=0

where is a discount factor, and ui,t is a stage payo↵ for player i at time t.

3.8.1 How to solve/check for SPNE


We will not be able to find all SPNE of this game. Instead, we will consider several interesting
strategy profiles and check whether they are SPNE. These strategy profiles have a clear economic
intuition—that’s why I have chosen them for this section.

• This game has an infinite number of subgames: a subgame at time t is characterized by a


history {s1,1 , s2,1 , ..., s1,t 1 , s2,t 1 }
• Each subgame is infinitely repeated game identical to the original one.

Use one-step deviation principle: we can check for deviations at time t keeping the rest
of the agents’ actions constant.

4
As long as delta is strictly less one, the summation converges and we can therefore get some number as the players’ payoffs at the
terminal history.
3.9 Grim trigger strategies

The following strategy profile is called grim trigger:

• Play (C, c) at time t = 0.

• If at time t the history consists only of cooperation, play (C, c)

• Otherwise play (D, d)


i.e. once somebody decides to deviate from C or c to D or d, the players start playing (D, d) for the rest of the time.
Under what conditions, is this strategy profile a SPNE?

3.9.1 Checking for deviations

Once d or D is played, players switch to playing (D, d) forever: this is always NE since it is a
NE of a stage game.

A question to think about: Is the following statement correct? If a strategy profile s is a


NE of a stage game, it is a SPNE of a corresponding repeated game when played following any
history.

Consider a history at which players should play (C, c): By symmetry, the same results apply to player 2.
If player 1 plays C he gets
2
1
If he deviates to D he will get 3 at the current period and then he will get an infinite stream
of 1 because both players will play (D, d) forever:

3+1
1
Player 1 will not deviate (i.e. a proposed grim trigger strategy profile will be a SPNE) if

2
3+1
1 1
or
1
2

If players are sufficiently patient the cooperation can be sustained in equilibrium: on equi-
librium path (C, c) will be played forever.
The intuition is that deviating would allow the agents to gain one extra payoff in the period where they decide to deviate.
However, they will be getting (1, 1) instead of (2, 2) after the period in which they deviation. Thus, the one extra payoff in
one period comes at the cost of losing one payoff in every period after. The strategy pro le here relies on the agents’
patience for this punishment to offset the incentive to deviate in the current period. Speci cally, we need the short term gain
received by deviating from C or c to D or d today to be less than the long term less from generating D or d in the history.
The incentive is going to resent in every period since the 5game is in nite (i.e. that is a future in every period). A problem with
this strategy pro le is that it is not very robust to mistakes (i.e. people deviate).
3.10 Other strategy profiles
Are there any other SPNE in which cooperation is sustained?

There are infinitely many of them. One other example is a strategy profile in which both
players switch to (D, d) for k periods if the deviation from (C, c) is detected or if the players
have not finished a previously prescribed k rounds of (D, d):
Consider a history at which players should play (C, c): A key difference between this strategy pro le and
If player 1 plays C he gets the Grim Trigger strategy pro le is that the
2 punishment in this case is temporary instead of
1 permanent.

If he deviates to D he will get 3 at the current period and then he will get 1 for k periods
and after that he will receive an infinite stream of 2:
k+1 k+1 We do not need to worry about players deviating
3+1 +2 from the punishment because during a punishment,
1 1 deviating would give a player a payoff of -2 instead
of 1. Furthermore, the k periods of punishment will
There is no profitable deviation if
be reset. Thus, deviating from the punishment will
2 k+1 k+1 generate a net loss today for the player and delay
3+1 +2 the time for which the punishment is over. Thus,
1 1 1 deviating is clearly not pro table for the players.
or equivalently if
k+1
1
2 2
This condition is more demanding than 1/2 which was obtained for the grim trigger
strategy profile. Intuitively, it is the case because the punishment (see next section for more
discussion) for deviations is more mild in this case compared to the grim trigger strategies.
Since the length of punishment is shorter in this case, we need the agents to be more patient and put more weight on their future
payoffs in order to sustain the incentives for them to play (C, c).
3.11 Punishments
In this example, cooperation is sustained on the equilibrium path because any deviation from
that path leads to players playing (D, d)—this is called a punishment.
In general a punishment need not be a NE of a stage game as in the example above. In the
following example a punishment will not be a NE of a stage game. In all examples above, the punishment is playing
Consider a repeated Cournot duopoly with P = 90 Q. the NE of the stage game, which is (D, d).
⇤ ⇤
NE of a stage game is q1 = q2 = 30 and

⇡1⇤ = ⇡2⇤ = 900

A monopoly output and profit is Qm = 45 and ⇡ m = 2025 > ⇡1⇤ + ⇡2⇤ .


Is it possible to sustain collusion and obtain a joint profit that equals to 2025 in each period?

6
Implication of the Prisoners’ Dilemma Example

When we try to ask the players to play something that is not a static NE in a repeated game, we need to deter deviations in the current
period by introducing future punishment. Then, depending on the harshness of the punishment, we need to have the agents to be
patient enough for these deterrents to be effective.

Repeated Games: Cournot Duopoly

The stage game is described as:

Collusion

Suppose that the two rms collude and jointly decide how much to produce, then, the rms need to solve the following optimization
problem:

Since the function is concave, the rst order condition is both necessary and suf cient to nd the maximum.

The rst order condition is

The total pro t in the monopoly regime is

In the monopoly regime, the rms’ pro le of actions is

The corresponding payoffs for the rms are

We are interested in this monopoly outcome because we are going to look at a repeated game where the rms collude and behave non-
competitively in the markets. The best they can do is to produce the monopoly output solved above and this gives them the highest
pro t they can secure. This pro t is larger than what they get in a static NE, thus, collusion is bene cial to both rms. The task here is to
nd a strategy pro le in which the rms can sustain this collusion on a credible path. If the rms deviate from the monopoly regime, they
will end up playing the NE, which gives them lower payoffs.

Let the punishment by playing the action pro le

The corresponding payoffs for the rms are

Consider the following strategy pro le:


• The rms play the monopoly regime at t = 0.
• The rms switch to the punishment regime for one period if a rm deviates.
The punishment will be made if the players deviate from the monopoly regime or from the punishment regime. This means that if
the players do not play the punishment regime when they are supposed to do so, they will need to play the punishment again.
• The rms revert back to the monopoly regime after the punishment is delivered.

The strategy can also be described in the following way:


• Start by playing (A/4, A/4).
• Play (A/4, A/4) is either (A/4, A/4) or (A/2, A/2) was played in the previous period.
• Play (A/2, A/2) otherwise.

Step One: check that playing the monopoly regime can be a part of the SPNE
In the repeated Prisoners’ Dilemma game, the players only have two actions. Therefore, we know what action the player would choose if
he deviates and the payoff he would get from doing that. However, in the Cournot Duopoly case, the players have in nitely many actions
(i.e. they can choose any positive quantity). Using the one step deviation principle, we would require that the payoff on the equilibrium
path (of playing the monopoly regime) is greater than the payoff from any possible deviation. We can check whether this is the case by
substituting in the best available deviation, which will guarantee that the condition holds for any possible deviation.

We can nd the highest pro t that rm 1 can get by deviating as follows:

For rm 1 to not deviate, we need to following inequality to hold:

We can simplify this inequality using the fact that rm 1’s payoffs from not deviating and deviating are the same for all the periods after
the punishment is played. Thus, we can only look at the current and the next period.

Since the game is symmetric, the same results apply to rm 2. The result suggests that the rms have to be patient enough in order to
not deviation. The condition above ensures that the rms are not going to deviate from playing the monopoly regime.

Step Two: check that playing the punishment regime can be a part of the SPNE

Since the punishment here is not a NE of the stage game, we need to check that this can also be a part of the SPNE. We do this by
checking that rm 1 would not want to deviate under certain condition. This same results apply to rm 2 due to symmetry of the game.

We need to nd a condition such that the pro t rm 1 gets from not deviating is greater than the pro t it gets from any other available
actions. We do so by nding the best deviation available and substituting the payoff from this action into the expression above.

For rm 1 and 2 to not deviate from the punishment regime, delta must be equal to or greater than 1/2.

Combining the results together shows that for the strategy pro le considered to be a SPNE, delta must be equal to or greater than both
thresholds. Since 1/2 is greater than 1/8, the condition can be expressed as follows.

The Cournot Duopoly example illustrates that we do not have use static NE as a punishment to induce some cooperation in a repeated
game. Using a static NE is convenient in that we do not need to check that it is a part of the SPNE since it is itself a NE of the stage
game. However, we can also make the punishment harsher than the static NE (as in the previous example). In this case, we would need to
make sure that the punishment itself is credible such that the players would not deviate from it when they are supposed to deliver the
punishment.
3.11.1 Collusion
Consider the following strategy:

• Start with (q1 , q2 ) = (22.5, 22.5)

• If (q1 , q2 ) = (22.5, 22.5) or (q1 , q2 ) = (45, 45) was played in the previous period, play
(q1 , q2 ) = (22.5, 22.5)

• Otherwise play (q1 , q2 ) = (45, 45)

Check equilibrium conditions:


It is better not to deviate from (q1 , q2 ) = (22.5, 22.5):
✓ ◆2
90 22.5
 1012.5(1 + )
2

or equivalently

0.125
It is better not to deviate from (q1 , q2 ) = (45, 45):
✓ ◆2
90 45
 1012.5
2
or equivalently

0.5

3.11.2 Punishments for not carrying out punishments


In the proposed strategy, punishment is not NE of a stage game, therefore we need incentives
to sustain a punishment (make it credible).
We achieve these incentives by imposing a punishment for not carrying out punishment: If
instead of playing 45 agent decided to play something else (to maximize current profit he should
play 45
2
) in the next period we play (45,45) again.

You might also like