0% found this document useful (0 votes)

25 views10 pages

Lecture 14

Uploaded by

andybao291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views10 pages

Lecture 14

Uploaded by

andybao291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Game Theory Lecture #14

Outline:

• Multiagent learning
• Regret matching
• Fictitious play
Single Agent Learning

• Setup:
– Two players: Player 1 vs. Nature
– Actions set: A1 and AN
– Payoffs: U : A1 × AN → R

Nature
Rain No Rain
Umbrella 1 0
P1
No umbrella 0 1
Player 1’s Payoff

• Player repeatedly interacts with nature

– Player’s action day t: a1 (t)
– Nature’s action day t: aN (t)
– Payoff day t: U (a1 (t), aN (t))
• Goal: Implement strategy that provides desirable guarantees with regard to average
performance
• Case 1: Stationary environment

– Nature’s choice according to non-adaptive (fixed) prob distribution pN ∈ ∆(An )

– Theory available to optimize average performance, e.g., reinforcement learning

• Case 2: Non-stationary environment

– Nature’s choice according to adaptive prob distribution, i.e., pN (t) 6= pN (t − 1)

– In general, pN (t) = f (a1 (0), ..., a1 (t − 1), aN (0), ..., aN (t − 1))
– One choice: aN (t) = βN (a1 (t − 1)) (assume zero sum game)

• Question: Is a player’s environment stationary or non-stationary in a game?

1
Single agent learning (cont)

• Challenge: Hard to predict what nature is going to do

• Previous direction: Optimize worst-case payoffs (e.g., security strategies)
• Problem: Derived strategies might be highly inefficient given behavior of nature
• Example:

Nature
Rain No Rain Thunder
Umbrella 1 0 0
P1 No umbrella 0 1 0
Jacket 0.1 0.1 0.1
Player 1’s Payoff
– What is security strategy?
– What is security level?
– How would answers change if there was never any thunder?

• Fact: Security strategies and values can be highly influenced by “rare” actions
• Are there “online” policies that can provide potentially better performance guarantees?

2
What about regret?

• New direction: Can a player optimize “what if” scenarios?

• Definition: Player’s average payoff at day t
t
1X
Ū (t) = U (a1 (τ ), aN (τ ))
t τ =1

• Definition: Player’s perceived average payoff at day t if committed to fixed action and
nature was unchanged
t
a1 1X
v̄ (t) = U (a1 , aN (τ ))
t τ =1

• Definition: Player’s regret at day t for not having used action a1

R̄a1 (t) = v̄ a1 (t) − Ū (t)

• Example:

Day 1 2 3 4 5 6 ...
Player’s Decision NU U NU U NU NU ...
Nature’s Decision R NR R R NR R ...
Payoff 0 0 0 1 1 0 ...

– Ū (6)?
– v̄ U (6)?
– v̄ N U (6)?
– R̄U (6)?
– R̄N U (6)?

3
Regret Matching

• Positive regret = Player could have done something better in hindsight

• Q: Is it possible to make positive regret vanish asymptotically “irrespective” of nature?
• Consider the strategy Regret Matching : At day t play strategy p(t) ∈ ∆(A1 )
U
R̄ (t) +
pU (t + 1) = U
R̄ (t) + + R̄N U (t) +
NU
R̄ (t) +
pN U (t + 1) = U
R̄ (t) + + R̄N U (t) +

• Notation: [·]+ is projection to positive orthant, i.e., [x]+ = max{x, 0}

• Strategy generalizes to more than two actions
• Fact: Positive regret asymptotically vanishes irrespective of nature
U
R̄ (t) →0
N U +
R̄ (t) + → 0

• Example revisited:

Day 1 2 3 4 5 6 ...
Player’s Decision NU U NU U NU NU ...
Nature’s Decision R NR R R NR R ...
Payoff 0 0 0 1 1 0 ...
– Regret matching strategy day 2?
– Regret matching strategy day 3?
– Regret matching strategy day 4?
– Regret matching strategy day 5?
– Regret matching strategy day 6?

4
Learning in games

• Consider the following one-shot game

– Players N
– Actions Ai
– Utility functions Ui : A → R
• Consider a repeated version of the above one-shot game where at each time t ∈ {1, 2, ...},
each player i ∈ N simultaneously
– Selects a strategy pi (t) ∈ ∆(Ai )
– Selects an action ai (t) randomly according to strategy pi (t)
– Receives utility Ui (ai (t), a−i (t))
– Each player updates strategy using available information

pi (t + 1) = f (a(0), a(1), ..., a(t); Ui )

• The strategy update function f (·) is referred to as the learning rule

– Ex: Cournot adjustment process

• Concern: How much information do players have access to?

– Structural form of utility function, i.e., Ui (·)?

– Action of other players, i.e., a−i (t)?
– Perceived reward for alternative actions, i.e., Ui (ai , a−i (t)) for any ai
– Utility received, Ui (a(t))

• Informational restrictions place restriction on class of admissible learning rules

• Goal: Provide asymptotic guarantees if all players follow a specific f (·)

5
Regret matching

• Consider the learning rule f (·) where

ai
R̄i (t) +
pai i (t + 1) = P
ãi (t)

ãi ∈Ai R̄ +

– pai i (t + 1) = Probability player i plays action ai at time t + 1

– R̄iai (t) = Regret of player i for action ai at time t
• Fact: Max regret of all players goes to 0 (think of other players as “nature”)
ai
R̄i (t) + → 0
• Result restated: The behavior converges to a “no-regret” point
• Question: Where are we? Is this a NE?
• Rewrite regret in terms of empirical frequency z(t) ∈ ∆(A)
Ūi (t) = 1t tτ =1 Ui (a(τ )) = Ui (z(t))
P

Pt
v̄iai (t) = 1
t τ =1 Ui (ai , a−i (t)) = Ui (ai , z−i (t))

R̄iai (t) = v̄iai (t) − Ūi (t) = Ui (ai , z−i (t)) − Ui (z(t))
• Characteristic of no-regret point
R̄iai (t) ≤ 0 ⇔ Ui (ai , z−i (t)) ≤ Ui (z(t))
• No-regret point restated: For any player i and action ai
Ui (ai , z−i (t)) ≤ ui (z(t))
• No-regret point = Coarse correlated equilibrium (slightly weaker notion than correlated
equilibrium)
• Slightly modified (and more complex) version of regret matching ensures convergence to
correlated equilibrium.
• Theorem: If all players follow the regret matching strategy then the empirical frequency
converges to the set of coarse correlated equilibrium.

6
Convergence to NE?

• Recap: If all players follow the regret matching strategy then the empirical frequency
converges to the set of coarse correlated equilibria.
• This result holds irrespective of the underlying game!
• Problems:
– Predictability: Behavior will not necessarily settle down, i.e., only guarantees that
empirical frequency of play will be in the set of CCE
– Efficiency: Set of CCE much larger than the set of NE. Are CCE worse than NE in
terms of efficiency?
• Revised goal: Are there learning rules that converge to NE (as opposed to CCE) for any
game?
• Answer: No
• Theorem: There are no “natural” dynamics that lead to NE in any game (Hart, 2009).
– Natural = adaptive, simple, efficient (e.g., regret matching, cournot, ...)
– Not natural = exhaustive search, mediator, ...
• Question: Are there natural dynamics that converge to NE for special game structures?
(e.g., zero-sum games?)

7
Fictitious Play

• Recall: A learning rule is of the form

pi (t + 1) = f (a(1), a(2), ..., a(t); Ui )

• Fictitious play: A learning rule where the strategy pi (t + 1) is a best response to the
scenario where all players j 6= i are selecting their action independently according to the
empirical frequency of their past decisions.
• Define empirical frequencies qi (t) as follows:
t
1X
qiai (t) = I{ai (τ ) = ai }
t τ =1

• Fictitious play: Each player best responds to empirical frequencies

pi (t + 1) ∈ arg max ui (pi , q−i (t))

pi ∈∆(Ai )

where
a
X Y
ui (pi , q−i (t)) = ui (a1 , a2 , ..., an )pai i qj j (t)
a∈A j6=i

• FP facts: Beliefs (i.e., empirical frequencies) converge to NE for

– For 2-player games with 2 moves per player
– Zero sum games with arbitrary moves per player
– Other game structures as well (more to come on this)

8
Fictitious play example

• Consider the following two-player zero-sum games

L C R
T −1 0 1
M 1 −1 0
B 0 1 −1

• Suppose a(1) = {T, L}

– What is qrow (1)?

– What is qcol (1)?

• What is a(2)?

– What is qrow (2)?

– What is qcol (2)?

• What is a(3)?

Lionel Corbett - The Religious Function of The Psyche (1996)
100% (1)
Lionel Corbett - The Religious Function of The Psyche (1996)
273 pages
Anne of Green Gables L2 Orginal
100% (5)
Anne of Green Gables L2 Orginal
65 pages
Lectures
100% (1)
Lectures
98 pages
Introduction To Game Theory Markus Mobius Harvard
100% (3)
Introduction To Game Theory Markus Mobius Harvard
176 pages
Game Theory
100% (1)
Game Theory
155 pages
Dynamic and Adaptive Games
No ratings yet
Dynamic and Adaptive Games
77 pages
Game Theoretic Decision Making PHD Thesis CMU-CS-23-117
No ratings yet
Game Theoretic Decision Making PHD Thesis CMU-CS-23-117
358 pages
Introduction To Game Theory (Harvard) PDF
No ratings yet
Introduction To Game Theory (Harvard) PDF
167 pages
Simple Adaptive Strategies From Regretmatching To Uncoupled Dynamics Sergiu Hart PDF Download
No ratings yet
Simple Adaptive Strategies From Regretmatching To Uncoupled Dynamics Sergiu Hart PDF Download
77 pages
Games With Sequential Moves PDF
100% (1)
Games With Sequential Moves PDF
3 pages
Games in Learning
No ratings yet
Games in Learning
99 pages
Hoelle, M. Game Theory. Parduespr12
No ratings yet
Hoelle, M. Game Theory. Parduespr12
258 pages
Game Theory (Part 1)
No ratings yet
Game Theory (Part 1)
81 pages
DS Unit Iv
No ratings yet
DS Unit Iv
89 pages
M3S11: Games, Risks and Decisions
No ratings yet
M3S11: Games, Risks and Decisions
78 pages
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
No ratings yet
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
85 pages
Lecture Notes On Decision Theory: Brian Weatherson
No ratings yet
Lecture Notes On Decision Theory: Brian Weatherson
149 pages
9 - Future of LLM Agents
No ratings yet
9 - Future of LLM Agents
69 pages
Nagel 95
No ratings yet
Nagel 95
15 pages
GameTheory Lecture 02 Choice With Uncertainty and Dynamic Choice
No ratings yet
GameTheory Lecture 02 Choice With Uncertainty and Dynamic Choice
44 pages
Unit 3 Intertemporal Preferences - Handout
No ratings yet
Unit 3 Intertemporal Preferences - Handout
50 pages
GT 12 RepeatedGame
No ratings yet
GT 12 RepeatedGame
57 pages
Microeconomics II: Game Theory and Its Applications
No ratings yet
Microeconomics II: Game Theory and Its Applications
31 pages
Near-Optimal No-Regret Learning For Correlated Equilibria in Multi-Player General-Sum Games
No ratings yet
Near-Optimal No-Regret Learning For Correlated Equilibria in Multi-Player General-Sum Games
37 pages
Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World
No ratings yet
Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World
34 pages
Decision Theory Notes
No ratings yet
Decision Theory Notes
226 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
PhysRevX 14 021039
No ratings yet
PhysRevX 14 021039
38 pages
A Reinforcement Procedure Leading To Correlated Equilibrrium - Mascollel Hart
No ratings yet
A Reinforcement Procedure Leading To Correlated Equilibrrium - Mascollel Hart
20 pages
CHicken or Checkin'?
No ratings yet
CHicken or Checkin'?
29 pages
Game Theory
No ratings yet
Game Theory
29 pages
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
No ratings yet
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
52 pages
Haurie A., An Introduction To Dynamic Games LCTN
No ratings yet
Haurie A., An Introduction To Dynamic Games LCTN
125 pages
Nau 2008 Learning
No ratings yet
Nau 2008 Learning
49 pages
21ai020 & Reinforcement Learning: The Agent-Environment Interface
No ratings yet
21ai020 & Reinforcement Learning: The Agent-Environment Interface
8 pages
The Folk Theorems For Repeated Games: A Synthesis
No ratings yet
The Folk Theorems For Repeated Games: A Synthesis
36 pages
Decision 1
No ratings yet
Decision 1
15 pages
Lecture I-II: Motivation and Decision Theory: 1 Motivating Experiment: Guess The Average
No ratings yet
Lecture I-II: Motivation and Decision Theory: 1 Motivating Experiment: Guess The Average
8 pages
Game Theory Project - Adaptive Heuristics
No ratings yet
Game Theory Project - Adaptive Heuristics
8 pages
Unit 1-RL
No ratings yet
Unit 1-RL
11 pages
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
No ratings yet
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
26 pages
Chapter 5
No ratings yet
Chapter 5
13 pages
Conditional Sentence0
No ratings yet
Conditional Sentence0
4 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
8 pages
Lecture I-II: Motivation and Decision Theory - Markus M. MÄobius
No ratings yet
Lecture I-II: Motivation and Decision Theory - Markus M. MÄobius
9 pages
Algorithmic Game Theory - No-Regret Dynamics
No ratings yet
Algorithmic Game Theory - No-Regret Dynamics
9 pages
Learning in Extensive-Form Games I. Self-Confirming Equilibria
No ratings yet
Learning in Extensive-Form Games I. Self-Confirming Equilibria
36 pages
Mid-Term Revision
No ratings yet
Mid-Term Revision
8 pages
Decision and Game Theory Resume
No ratings yet
Decision and Game Theory Resume
12 pages
Game Theory Behavioral Finance
No ratings yet
Game Theory Behavioral Finance
5 pages
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
No ratings yet
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
19 pages
Contents Preface Oct07
No ratings yet
Contents Preface Oct07
10 pages
Learning Pareto-Optimal Solutions in 2x2 Con Ict Games
No ratings yet
Learning Pareto-Optimal Solutions in 2x2 Con Ict Games
14 pages
Game Theory: Lecture Notes by Y. Narahari
No ratings yet
Game Theory: Lecture Notes by Y. Narahari
7 pages
Lecture VIII: Learning: Markus M. M Obius March 6, 2003
No ratings yet
Lecture VIII: Learning: Markus M. M Obius March 6, 2003
9 pages
336 Lecture9 2007
No ratings yet
336 Lecture9 2007
4 pages
Syllabus: Economics 805, Part 1 Evolution and Learning in Games
No ratings yet
Syllabus: Economics 805, Part 1 Evolution and Learning in Games
3 pages
EFA and CFA
No ratings yet
EFA and CFA
36 pages
Eligible List - R-1 - ApMoSys Technologies - 2025 Batch - UIT, B
No ratings yet
Eligible List - R-1 - ApMoSys Technologies - 2025 Batch - UIT, B
2 pages
Shadowing Technique For Interpreting
100% (4)
Shadowing Technique For Interpreting
4 pages
Action Research DepEd Format 1
No ratings yet
Action Research DepEd Format 1
29 pages
Punjabi Language
No ratings yet
Punjabi Language
5 pages
Addition, Subtraction, Multiplication, Division
No ratings yet
Addition, Subtraction, Multiplication, Division
8 pages
Zahir Curriculum Study Unit Plan Portfolio
No ratings yet
Zahir Curriculum Study Unit Plan Portfolio
6 pages
Plaintiff's Second Amended Complaint
No ratings yet
Plaintiff's Second Amended Complaint
116 pages
Read-Aloud Strategies Newsletter
No ratings yet
Read-Aloud Strategies Newsletter
5 pages
Mathematical Advances Towards Sustainable Environmental Systems
No ratings yet
Mathematical Advances Towards Sustainable Environmental Systems
355 pages
Learning & Development 2
No ratings yet
Learning & Development 2
10 pages
Beury Donald Attorney California 141733 Binder6
No ratings yet
Beury Donald Attorney California 141733 Binder6
47 pages
(Ebook) Sailing School Navigating Science and Skill, 1550-1800 by Margaret E. Schotte ISBN 9781421429533, 9781421429540, 1421429535, 1421429543
100% (1)
(Ebook) Sailing School Navigating Science and Skill, 1550-1800 by Margaret E. Schotte ISBN 9781421429533, 9781421429540, 1421429535, 1421429543
81 pages
Lesson 7 Function of Communication
No ratings yet
Lesson 7 Function of Communication
45 pages
Who HR Administration
No ratings yet
Who HR Administration
5 pages
Lab 5
No ratings yet
Lab 5
4 pages
Lecture 5
No ratings yet
Lecture 5
38 pages
Python Programming Brochure
No ratings yet
Python Programming Brochure
7 pages
Gsu List English 15032018
No ratings yet
Gsu List English 15032018
13 pages
Ntu Min Subj Req
No ratings yet
Ntu Min Subj Req
11 pages
SESlides 5
No ratings yet
SESlides 5
21 pages
Lecture15 Notes
No ratings yet
Lecture15 Notes
12 pages
Lecture 06
No ratings yet
Lecture 06
8 pages
Four Common Stages of Cultural Adjustment : STAGE 1: "The Honeymoon"-Initial Euphoria/Excitement
100% (1)
Four Common Stages of Cultural Adjustment : STAGE 1: "The Honeymoon"-Initial Euphoria/Excitement
2 pages
Essential Communication Skills For Conflict Resolution
No ratings yet
Essential Communication Skills For Conflict Resolution
15 pages
2nd-QUARTER-EXAM-key To Correction 2.0
No ratings yet
2nd-QUARTER-EXAM-key To Correction 2.0
5 pages
The Software Quality Challenge
No ratings yet
The Software Quality Challenge
16 pages
Attitudetowardsresearch
No ratings yet
Attitudetowardsresearch
5 pages
Appreciative Inquiry: References
No ratings yet
Appreciative Inquiry: References
5 pages
Coursera S6NWUF93EUMF PDF
No ratings yet
Coursera S6NWUF93EUMF PDF
1 page
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Differential Games
From Everand
Differential Games
Avner Friedman
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet