Game-Theory pt.1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Multi-Agent Decision-Making under

Uncertainty
• Game theory is widely considered a misnomer to describe the field of “Multi-Agent
Decision-Making under Uncertainty.”
• Scenarios involving multiple agents where each makes decisions under uncertainty are
modeled using “games.”
• A game is a collection of the following

• In non-cooperative games, each player is focused on their own interests.


• In non-cooperative games, players may be in competition with each other, or their
interests may align, but there are no binding agreements made between players
regarding their actions or choices.
• They may make their choices simultaneously or make choices in sequence.
• In cooperative games, players arrive at a binding agreement regarding their actions.
• In these games, players are generally in full communication with each other and have
mechanisms to assure implementation of any agreements made.
• The common goal of a cooperative game is to find a socially optimal outcome: one
that collectively optimizes the outcomes for individual players.
• We will not be able to cover this topic despite its heavy application.
• Utility functions provide a mechanism for modeling player choices among the different
outcomes in a game. The two main classes of utility functions are ordinal and von
Neumann-Morgenstern utilities.
• We shall again not be able to study the theory behind these utility functions in detail.
However, it must be remembered that often constructing such utilities is the most
crucial part of such decision-making problems.
Strategic Games
• A strategic game consists of the following

• The combined strategies of all players is called the strategy profile.


• A strategy profile determines the outcome which in turn determines the players utility.
• Strategic games are non-cooperative since there are no binding agreements for how players must act.
Prisoner’s Dilemma
• Two suspects are taken into custody by the police for a crime. They are put into separate rooms for
interrogation.
• Evidence against them is slim, so the police need one or both to confess to the crime.
• Therefore, each suspect is informed that if they confess to the crime and implicate other as the
principal culprit, then the confessor will get the lightest possible sentence and the principal culprit
will get the heaviest possible sentence.
• They both will be convicted of a misdemeanor crime if neither confesses.
• If both confess, they each get moderate sentences as the police cannot identify the principal culprit.
• We will call the two players in this strategic game Row and Column as per game-theoretic
conventions.
• They each have two strategies available to them, Quiet and Confess.
• Assume that each suspect is primarily concerned about their own sentence and wants to minimize it.
• Utilities of each player are generally referred to as payoffs in game-theoretic texts.
• We use the utility function denoted by 6 minus the number of years to be spent in prison, which
serves as a proxy for how many years they could avoid spending in prison.
• Table below lists each of the strategy profiles in the form (R, C) and the resulting outcome.

Strategy Profile Outcome


(Quiet, Quiet) Each suspect receives a three-year sentence.
(Quiet, Confess) R receives a six-year sentence and C one-year.
(Confess, Quiet) R receives a one-year sentence and C six-year.
(Confess, Confess) Each suspect receives a five-year sentence.

• Table below provides payoffs for each player.


Strategy Profile R’s Payoff C’s Payoff

(Quiet, Quiet) 3 3
(Quiet, Confess) 0 5
(Confess, Quiet) 5 0
(Confess, Confess) 1 1
• Define 𝑠−𝑖 = (𝑠1 , … , 𝑠𝑖−1 , 𝑠𝑖+1 , … , 𝑠𝑛 ) to be the strategy profile 𝑠 with player 𝑖’s strategy
removed.
• Define 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) to be the payoff of player 𝑖 for the strategy profile with 𝑠𝑖 removed and
replaced by 𝑡𝑖 .
• Definition. Player 𝑖’s strategy 𝑠𝑖 is a best response to the profile 𝑠−𝑖 of other player
strategies if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) ≥ 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all other strategies 𝑡𝑖 ∈ 𝑆𝑖 where 𝑆𝑖 denotes all the
strategies of player 𝑖.
• Confess is the best response strategy for R if C’s strategy is set to Quiet.
• Confess is also a best response for R if C’s strategy is set to Confess.
• Definition. Player 𝑖’s strategy 𝑠𝑖 strongly dominates player 𝑖’s strategy 𝑡𝑖 if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) >
𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all strategy profiles 𝑠−𝑖 ∈ 𝑆−𝑖 = 𝑆1 × 𝑆2 × ⋯ × 𝑆𝑖−1 × 𝑆𝑖+1 × 𝑆𝑛 available to
the remaining players.
• Definition. Player 𝑖’s strategy 𝑠𝑖 dominates player 𝑖’s strategy 𝑡𝑖 if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) ≥ 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 )
for all strategy profiles 𝑠−𝑖 ∈ 𝑆−𝑖 , and strict inequality holds for at least one strategy profile
𝑠−𝑖 ∈ 𝑆−𝑖 .
• The strategy Confess strongly dominates the strategy Quiet for R because R’s best response
is the same regardless of C’s choice of strategy.
• Definition. A strategy is strongly dominant [resp., dominant] for player 𝑖 if it strongly
dominates [resp., dominates] all other strategies for player 𝑖.
• Definition. A strategy is dominated for player 𝑖 if some strategy of player 𝑖 dominates it for
all strategy profiles 𝑠−𝑖 .
• When strategy 𝑠𝑖 strongly dominates strategy 𝑡𝑖 , player 𝑖 should select strategy 𝑠𝑖 over
strategy 𝑡𝑖 unless strategy selections by other players result in player 𝑖 obtaining the same
utility for either.
• Confess strongly dominates Quiet for R and by symmetry, Confess also strongly dominates
Quiet for C.
• Thus, both players select Confess resulting in payoffs of 1 for each player, which we denote
with the payoff pair (1,1).
• Knowing that this was the thinking of the other suspect, neither regrets or second guesses
their own decision to confess.
• Such a regret-free strategy profile is known as a Nash equilibrium.
• Definition. A strategy profile 𝑠 is a Nash equilibrium if 𝑢𝑗 (𝑠) ≥ 𝑢𝑗 (𝑡𝑗 , 𝑠−𝑗 ) for all
players j ∈ 𝑁 and all strategies 𝑡𝑗 ∈ 𝑆𝑗 available to that player.
• That is, 𝑠 is a Nash equilibrium if, given what the other players have chosen to do,
𝑠−𝑗 , each player 𝑗 cannot unilaterally improve their payoff by replacing their current
strategy, 𝑠𝑗 , with a new strategy, 𝑡𝑗 .
• Thus no player has regrets about their strategy selection in a Nash equilibrium.
• A solution concept is simply a formal rule for predicting how a game will be played.
• We introduced two solution concepts above, viz. dominance and Nash equilibria.
• Dominance thinks about maximizing payoffs for each player whereas Nash
equilibrium thinks about whether the player has incentive to switch if others are
fixed.
• A dominant strategy may or may not exist but in his famous paper, Nash proved that
Nash equilibrium always exists, either pure or mixed, for a finite game (having a
finite number of players and a finite strategy space).
Prudence and Efficiency
• Now we look at a third solution concept.
• A player might decide to think prudentially and so chooses a strategy by looking at
the worst thing that can happen with each strategy choice.
• A prudential player chooses the strategy that makes the worst case as “least bad” as
possible.
• So the player chooses a strategy that maximizes their minimum payoff with respect
to the strategy choices of other players.
• For example, if R chooses Quiet, the worst that can happen is a payoff of 0 when C
chooses Confess.
• On the other hand, R’s worst payoff if R chooses Confess is 1.
• This suggests that R should choose Confess if R is strategically risk averse (=
prudential).
Prudence
• Definition. Player 𝑖’s strategy 𝑠𝑖 is prudential if

• The value of 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) which denotes the best worst-case payoff is called the security
level for player 𝑖.
• Confess is the unique dominant and unique prudential strategy for each player. Verify!
• Although (Confess, Confess) is the unique Nash equilibrium, the two players in the
Prisoner’s Dilemma strategy game would be better off if they both chose Quiet,
resulting in the payoff pair (3,3) instead of the payoff pair (1,1).
• Thus, the three solution methods do not always yield the best overall payoff for each
player.
• Therefore, we introduce the fourth solution concept.
Efficiency
• Definition. A strategy profile 𝑠, and its associated outcome 𝑜, are efficient if there
does not exist a strategy profile t ∈ 𝑆 such that 𝑢𝑗 (𝑡) ≥ 𝑢𝑗 (𝑠) for all players 𝑗, with at
least one of the inequalities being strict.
• So a strategy profile is efficient if we cannot find another strategy profile that at least
maintains the utility for all players, while strictly improving the utility for at least one
player.
• For the Prisoner’s Dilemma strategic game, the strategy profile (Confess, Confess) is
not efficient because both players obtain larger utilities with (Quiet, Quiet).
• Each of the other three strategy profiles are efficient because it is impossible to make
a change without reducing at least one of the player’s payoffs.
• As in the Prisoner’s Dilemma strategic game, players may not have an incentive to
choose a strategy that is part of an efficient strategy profile.
• However, there is no such dilemma when Nash equilibria are also efficient.
• In the Prisoner’s Dilemma strategic game, there is a tension between
1. choosing the dominant strategy (Confess), which will always yield a higher
payoff regardless of the other player’s choice, and
2. knowing that a better outcome might be possible if both players choose their
dominated strategy (Quiet).
• This tension is what puts the dilemma into the Prisoner’s Dilemma: each player
selecting the logical, rational strategy does not lead to an efficient outcome!
• One way to resolve this dilemma would be for the players to enter into a binding
agreement to stay quiet.
• But if we introduce binding agreements, then we no longer have a strategic game
and Nash equilibrium solution concept does not apply.
Office Scenario
• Suppose there is a shared fund in an office and the employees voluntarily contribute
to a pool of money to replenish the supplies.
• Each employee who uses the fund must decide whether or not to contribute to the
pool.
• Player strategies are Contribute or Not Contribute.
• Not Contribute is the strongly dominant strategy because it helps the players save
maximum money, hence has maximum payoff compared to any other strategy.
• But if everyone selects Not Contribute then there will be no funds.
• So this is another example of Prisoner’s dilemma scenario.
• A multi-agent decision-making scenario is said to be a prisoner’s dilemma scenario if
❖it can be modeled as a strategic game
❖there is a single strategy for each player that strongly dominates all of that player’s
other strategies
❖but all players would receive a higher payoff if they choose a specific dominated,
rather than the dominant, strategy.
• Since the mutual benefit result requires all players to cooperate by choosing the
dominated strategy, it is often called the Cooperate strategy.
• Since there is always an incentive for any individual player to switch their choice to
the dominant strategy, the dominant strategy is often called the Defect strategy.
• In the original prisoner’s dilemma scenario, Quiet is the Cooperate strategy.
• In the Office scenario, Contribute is the Cooperate strategy.
• In the original prisoner’s dilemma scenario, Confess is the Defect strategy.
• In the Office scenario, Not Contribute is the Defect strategy.
• Assume a prisoner’s dilemma scenario involving exactly two players and each player
has exactly two strategies such that their payoffs are as described below.
• The payoff to player 𝑖 for cooperating when the other player is defecting is 𝑆𝑖 .
• The payoff when both players defect is 𝑃𝑖 .
• The payoff when both cooperate is 𝑅𝑖 .
• The payoff to entice player 𝑖 to defect is 𝑇𝑖 .
• For the payoffs of the players, we have the relationship as stated below.
• However, there need not be any relationship between the two sequences of payoffs for
the two players.
• Given this ordering of the payoffs, let us verify that Defect is the strongly dominant
strategy for each player.
• To do this it is easier to write the payoffs in terms of a matrix.
• Now let us fix a player, say 1. Now we have to show that Defect is better than
Cooperate.
• So we have to show this for all possible strategy combinations for other players.
• Luckily, here we only have one more player to consider.
• If player 2 is assigned Cooperate, then 𝑇1 > 𝑅1 which implies that Defect is indeed
better than Cooperate for player 1.
• If player 2 is assigned Defect, then 𝑃1 > 𝑆1 which implies that Defect is indeed better
than Cooperate for player 1.
• We repeat this same exercise for player 2, and then establish that “Defect is the
strongly dominant strategy for each player.”
• We will also show that Defect is the prudential strategy for each player.
• Now let us fix a player, say 1.
• Pick a strategy for it, say Defect. Fixing these two choices what is the minimum
utility that player 1 can achieve? It is 𝑃1 .
• Pick another strategy for player 1, viz. Cooperate. What is the minimum utility that
player 1 can now achieve? It is 𝑆1 .
• Since 𝑃1 > 𝑆1 , Defect is the prudential strategy for player 1. Similar reasoning holds
for player 2.
• The strategy profile (Defect, Defect) is the unique Nash equilibrium because no
player would want to switch as long as the other player keeps his strategy fixed.
• But observe that each player would be better off if the strategy profile (Cooperate,
Cooperate) were chosen instead of the strategy profile (Defect, Defect).
• Thus, Nash equilibrium is not the final word on the most intelligent strategy to adopt.
• (Defect, Defect) is the only strategy profile that is not efficient.
• All other strategy profiles can be easily verified to be efficient.

You might also like