0% found this document useful (0 votes)
5 views

Lecture 17

The lecture discusses repeated games as a model for long-term strategic interactions between agents, emphasizing the potential for cooperation through rewards and punishments. It illustrates the dynamics of a two-player game where players decide to work or shirk, analyzing the outcomes based on different strategies and the implications of Nash equilibria. The findings suggest that regardless of past actions, players will tend to choose non-cooperative strategies in subsequent rounds, leading to a unique Nash equilibrium in the context of the Prisoner's Dilemma.

Uploaded by

welink057
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 17

The lecture discusses repeated games as a model for long-term strategic interactions between agents, emphasizing the potential for cooperation through rewards and punishments. It illustrates the dynamics of a two-player game where players decide to work or shirk, analyzing the outcomes based on different strategies and the implications of Nash equilibria. The findings suggest that regardless of past actions, players will tend to choose non-cooperative strategies in subsequent rounds, leading to a unique Nash equilibrium in the context of the Prisoner's Dilemma.

Uploaded by

welink057
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Lecture 17: Applications of Subgame Perfect Nash Equilibrium

Mauricio Romero
Lecture 17: Repeated Games

Repeated Games
Lecture 17: Repeated Games

Repeated Games
I Repeated games are often useful for modeling relationships between two or more
agents that interact in a strategic situation not just once but over a long period of
time
I Repeated games are often useful for modeling relationships between two or more
agents that interact in a strategic situation not just once but over a long period of
time

I Agent may or may not cooperate with one another even though this may not in
the best interest of the agents in the short run through a system of rewards and
punishments
I Repeated games are often useful for modeling relationships between two or more
agents that interact in a strategic situation not just once but over a long period of
time

I Agent may or may not cooperate with one another even though this may not in
the best interest of the agents in the short run through a system of rewards and
punishments

I A game can repeat itself several times


I Repeated games are often useful for modeling relationships between two or more
agents that interact in a strategic situation not just once but over a long period of
time

I Agent may or may not cooperate with one another even though this may not in
the best interest of the agents in the short run through a system of rewards and
punishments

I A game can repeat itself several times

I Static games turn into dynamic by repetition


I Repeated games are often useful for modeling relationships between two or more
agents that interact in a strategic situation not just once but over a long period of
time

I Agent may or may not cooperate with one another even though this may not in
the best interest of the agents in the short run through a system of rewards and
punishments

I A game can repeat itself several times

I Static games turn into dynamic by repetition

I We will use (G , T ) to denote that game G is repeated T times


1. In period 1, players simultaneously play the game G .
1. In period 1, players simultaneously play the game G .

2. Players observe the actions chosen by the players in period 1. Then in period 2,
players simultaneously play the game G .
1. In period 1, players simultaneously play the game G .

2. Players observe the actions chosen by the players in period 1. Then in period 2,
players simultaneously play the game G .

3. This game proceeds until time T .


1. In period 1, players simultaneously play the game G .

2. Players observe the actions chosen by the players in period 1. Then in period 2,
players simultaneously play the game G .

3. This game proceeds until time T .

4. After time T , if the action  profiles chosen in times 1, 2, . . . , T are given by


1 1 T T
(ai , a−i ), . . . , (ai , a−i ) :
X T
δ t−1 ui (ait , a−i
t
).
t=1
Consider the following two-player game:

I Each player i = 1, 2 simultaneously decide whether to play ei = 1 (work) or ei = 0


(shirk)
Consider the following two-player game:

I Each player i = 1, 2 simultaneously decide whether to play ei = 1 (work) or ei = 0


(shirk)

I Working incurs a cost of 1 however increases the utility of the other player −i by 2
Consider the following two-player game:

I Each player i = 1, 2 simultaneously decide whether to play ei = 1 (work) or ei = 0


(shirk)

I Working incurs a cost of 1 however increases the utility of the other player −i by 2

I Thus,
ui (ei , e−i ) = 2e−i − ei .
Prisoner’s Dilemma (Game G )

e2 = 1 e2 = 0
e1 = 1 1, 1 −1, 2
e1 = 0 2, −1 0, 0
I What happens when T = 1
I What happens when T = 1

I NE: Players 1 and 2 will both choose (e1 = 0, e1 = 0)


Imagine players are engaged in a long run relationship that lasts more than just playing
the game once: (G , 2)

1. Both players play the simultaneous move game G .


Imagine players are engaged in a long run relationship that lasts more than just playing
the game once: (G , 2)

1. Both players play the simultaneous move game G .

2. Both players observe the actions chosen by the two players. Then they play G
again.
Imagine players are engaged in a long run relationship that lasts more than just playing
the game once: (G , 2)

1. Both players play the simultaneous move game G .

2. Both players observe the actions chosen by the two players. Then they play G
again.

3. Then payoffs are realized as the discounted sum of the utilities of the actions in
each period with discount factor δ ∈ (0, 1].
Suppose that the two players chose (e1 = 1, e2 = 1) in the first period
In the second period, they chose (e1 = 0, e2 = 1)
Suppose that the two players chose (e1 = 1, e2 = 1) in the first period
In the second period, they chose (e1 = 0, e2 = 1)

u1 = 1 + δ · 2
u2 = 1 + δ · (−1).
I We will solve for the set of pure SPNE of this game.
I We will solve for the set of pure SPNE of this game.

I Player 1 has 5 information sets in total


I We will solve for the set of pure SPNE of this game.

I Player 1 has 5 information sets in total

I A pure strategy for player 1 must specify what he does in each of these
information sets
I We will solve for the set of pure SPNE of this game.

I Player 1 has 5 information sets in total

I A pure strategy for player 1 must specify what he does in each of these
information sets

I Player 1 has a total of 32 (25 ) pure strategies


I We will solve for the set of pure SPNE of this game.

I Player 1 has 5 information sets in total

I A pure strategy for player 1 must specify what he does in each of these
information sets

I Player 1 has a total of 32 (25 ) pure strategies

I Similarly, player 2 has a total of 32 pure strategies


I There are 5 subgames

1
1 0
2 2
1 0 1 0

(δ, δ) (−δ, 2δ) (2δ, −δ) (0, 0)


I There are 5 subgames

I Start at the end of the game (i.e., T = 2)

1
1 0
2 2
1 0 1 0

(δ, δ) (−δ, 2δ) (2δ, −δ) (0, 0)


I There are 5 subgames

I Start at the end of the game (i.e., T = 2)

I The first subgame that we will analyze is the one that the players encounter after
having play (e11 = 0, e21 = 0) in T = 1:

1
1 0
2 2
1 0 1 0

(δ, δ) (−δ, 2δ) (2δ, −δ) (0, 0)


The Nash equilibria can be seen by writing out the normal form of the game.

Normal Form of Extensive Form

e2 = 1 e2 = 0
e1 = 1 δ, δ −δ, 2δ
e1 = 0 2δ, −δ 0, 0
I This game has a unique Nash equilibrium in which the players play
(e12 = 0, e22 = 0)
I This game has a unique Nash equilibrium in which the players play
(e12 = 0, e22 = 0)

I Therefore after having observed (e11 = 0, e21 = 0) in the first period, both players
will play (e12 = 0, e22 = 0) in period 2
Consider the subgame following a play of (e11 = 1, e21 = 0) in the first period. The
extensive form of this subgame is given by:

1
1 0
2 2
1 1
0 0
(−1 + δ, 2 + δ) (−1 + 2δ, 2 − δ)

(−1 − δ, 2 + 2δ) (−1, 2)


The normal form of this subgame can be seen in the Table

Normal Form of Extensive Form

e2 = 1 e2 = 0
e1 = 1 −1 + δ, 2 + δ −1 − δ, 2 + 2δ
e1 = 0 −1 + 2δ, 2 − δ −1, 2
I (e1 = 0, e2 = 0) is the unique Nash equilibrium
I (e1 = 0, e2 = 0) is the unique Nash equilibrium

I In any SPNE, (e12 = 0, e22 = 0) must be played after observing (e11 = 1, e21 = 0)
I We can go through the remaining smaller subgames after the observation of
(e11 = 1, e21 = 0) and after the observation of (e11 = 1, e21 = 1)
I We can go through the remaining smaller subgames after the observation of
(e11 = 1, e21 = 0) and after the observation of (e11 = 1, e21 = 1)

I We will reach the same conclusion in each of these scenarios: that


(e12 = 0, e22 = 0) must be played in each of these subgames
I We can go through the remaining smaller subgames after the observation of
(e11 = 1, e21 = 0) and after the observation of (e11 = 1, e21 = 1)

I We will reach the same conclusion in each of these scenarios: that


(e12 = 0, e22 = 0) must be played in each of these subgames

I Regardless of the observed action, (0, 0) is played in period 2


I We can go through the remaining smaller subgames after the observation of
(e11 = 1, e21 = 0) and after the observation of (e11 = 1, e21 = 1)

I We will reach the same conclusion in each of these scenarios: that


(e12 = 0, e22 = 0) must be played in each of these subgames

I Regardless of the observed action, (0, 0) is played in period 2

I Why is this the case?


I We can go through the remaining smaller subgames after the observation of
(e11 = 1, e21 = 0) and after the observation of (e11 = 1, e21 = 1)

I We will reach the same conclusion in each of these scenarios: that


(e12 = 0, e22 = 0) must be played in each of these subgames

I Regardless of the observed action, (0, 0) is played in period 2

I Why is this the case?

I The idea is that payoffs that have accrued in period 1 are essentially sunk, and
have no influence on incentives in period 2
To see this consider the normal form representation in the subgame after the
observation of (e11 = 1, e21 = 0)

Normal Form of Extensive Form

e2 = 1 e2 = 0
e1 = 1 −1 + δ, 2 + δ −1 − 1δ, 2 + 2δ
e1 = 0 −1 + 2δ, 2 − δ −1, 2
I We can subtract off the payoff that player 1 received in period 1 and divide
through player 1’s payoffs by δ to obtain the following payoff matrix

Normal Form of Extensive Form

e2 = 1 e2 = 0
e1 = 1 1, 1 −1, 2
e1 = 0 2, −1 0, 0
I We can subtract off the payoff that player 1 received in period 1 and divide
through player 1’s payoffs by δ to obtain the following payoff matrix

I We can do the same thing for player 2’s payoffs and get the payoff matrix

Normal Form of Extensive Form

e2 = 1 e2 = 0
e1 = 1 1, 1 −1, 2
e1 = 0 2, −1 0, 0
I We’ve just performed affine transformations of each person’s utility functions
I We’ve just performed affine transformations of each person’s utility functions

I This payoff matrix is equivalent from a strategic perspective from the original
normal
I We’ve just performed affine transformations of each person’s utility functions

I This payoff matrix is equivalent from a strategic perspective from the original
normal

I Thus the set of Nash equilibria will remain unchanged after these transformations
I We’ve just performed affine transformations of each person’s utility functions

I This payoff matrix is equivalent from a strategic perspective from the original
normal

I Thus the set of Nash equilibria will remain unchanged after these transformations

I This normal form is just the original prisoner’s dilemma


I We’ve just performed affine transformations of each person’s utility functions

I This payoff matrix is equivalent from a strategic perspective from the original
normal

I Thus the set of Nash equilibria will remain unchanged after these transformations

I This normal form is just the original prisoner’s dilemma

I This will be true no matter the action profile played in period 1


I So what have we learned?
I So what have we learned?

I Basically after any history, the strategic normal form is essentially the same as the
original prisoner’s dilemma
I So what have we learned?

I Basically after any history, the strategic normal form is essentially the same as the
original prisoner’s dilemma

I Both players play (e12 = 0, e22 = 0) after any information set in the last period
I Now let us see what must be played in the first period by the two players
I Now let us see what must be played in the first period by the two players

I Both players anticipate that (e12 = 0, e22 = 0) will be played after any chosen
action profile in the first period
I Now let us see what must be played in the first period by the two players

I Both players anticipate that (e12 = 0, e22 = 0) will be played after any chosen
action profile in the first period

I We can simplify the extensive form game to the following:


1
1 0
2 2
1 0 1 0

(1, 1) (−1, 2) (2, −1) (0, 0)


If we draw the normal form of this game, then we get:

Normal Form of Extensive Form

e2 = 1 e2 = 0
e1 = 1 1, 1 −1, 2
e1 = 0 2, −1 0, 0

The unique Nash equilibrium of the above normal form game is (e11 = 0, e21 = 0)
Therefore the unique SPNE is:

e12 e22
   
=0 =0
2 e22
 e = 0 e 1 =0  =0 
 1  
 ,  e1 = 0 
 1 e12 =0   2 e22 = 0 
e12 =0 e22 =0

In other words both players always shirk


I Here the unique SPNE requires all players to play ei = 0 at all periods and all
information sets
I Here the unique SPNE requires all players to play ei = 0 at all periods and all
information sets

I Thus, the equilibrium outcome is simply the repetition of the unique NE of the
stage game
I Here the unique SPNE requires all players to play ei = 0 at all periods and all
information sets

I Thus, the equilibrium outcome is simply the repetition of the unique NE of the
stage game

I This holds more generally when the stage game has a unique NE
I Here the unique SPNE requires all players to play ei = 0 at all periods and all
information sets

I Thus, the equilibrium outcome is simply the repetition of the unique NE of the
stage game

I This holds more generally when the stage game has a unique NE

I Whenever the stage game has a unique NE, then the only SPNE of a finite
horizon repeated game with that stage game is the repetition of the stage game
NE
Theorem
Suppose that the stage game G has exactly one NE, (a1∗ , a2∗ , . . . , an∗ ). Then for any
δ ∈ (0, 1] and any T , the T -times repeated game has a unique SPNE in which all
players i play ai∗ at all information sets.
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I All past payoffs are sunk
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I All past payoffs are sunk
I In the last period, the incentives of all players are exactly the same as if the game
were being played once
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I All past payoffs are sunk
I In the last period, the incentives of all players are exactly the same as if the game
were being played once
I Thus all players must play the stage game Nash equilibrium action regardless of
the history of play up to that point
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I All past payoffs are sunk
I In the last period, the incentives of all players are exactly the same as if the game
were being played once
I Thus all players must play the stage game Nash equilibrium action regardless of
the history of play up to that point
I But then we can induct
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I All past payoffs are sunk
I In the last period, the incentives of all players are exactly the same as if the game
were being played once
I Thus all players must play the stage game Nash equilibrium action regardless of
the history of play up to that point
I But then we can induct
I Knowing that the stage game Nash equilibrium is going to be played tomorrow, at
any information set, we can ignore the past payoffs
I The basic idea of the proof for this proposition is exactly the same that we saw in
the repeated prisoner’s dilemma
I All past payoffs are sunk
I In the last period, the incentives of all players are exactly the same as if the game
were being played once
I Thus all players must play the stage game Nash equilibrium action regardless of
the history of play up to that point
I But then we can induct
I Knowing that the stage game Nash equilibrium is going to be played tomorrow, at
any information set, we can ignore the past payoffs
I We concentrate just on the payoffs in the future. Thus in period T − 1, player i
simply wants to maximize:
T −1
max δ T −2 ui (ai , a−i ) + δ T −1 ui (a∗ ).
ai ∈Ai
I What player i plays today has no consequences for what happens in period T
since we saw that all players will play a∗ no matter what happens in period T − 1
I What player i plays today has no consequences for what happens in period T
since we saw that all players will play a∗ no matter what happens in period T − 1

I So, the maximization problem above is the same as:


T −1
max ui (ai , a−i ).
ai ∈Ai
I What player i plays today has no consequences for what happens in period T
since we saw that all players will play a∗ no matter what happens in period T − 1

I So, the maximization problem above is the same as:


T −1
max ui (ai , a−i ).
ai ∈Ai

I Thus again, for this to be a Nash equilibrium, we need a1T −1 = a1∗ , . . . , anT −1 = an∗ .
I What player i plays today has no consequences for what happens in period T
since we saw that all players will play a∗ no matter what happens in period T − 1

I So, the maximization problem above is the same as:


T −1
max ui (ai , a−i ).
ai ∈Ai

I Thus again, for this to be a Nash equilibrium, we need a1T −1 = a1∗ , . . . , anT −1 = an∗ .

I Following exactly this induction, we can conclude that every player must play ai∗
at all times and all histories

You might also like