Lecture 12 Evaluating and Learning in Multi-Agent Systems 2
Lecture 12 Evaluating and Learning in Multi-Agent Systems 2
and Evaluation in MA
systems
Start Recording!
2
Reminders
3
References for this lecture:
4
Motivations : Learning Objectives
Single player: Multi-player:
Poker Starcraft II
3.Training of Agents
3.1.Self-Play
3.2.Fictitious Self-Play
AntiSymmetric (zero-sum) Game (Functional Form)
Anti-symmetric
Payoff:
u beats w.
w beats u.
it is a tie.
Proba of
winning
Example:
Transitive game:
Answer: No!
Anti-symmetric Payoff:
Problem:
(stochastic) Gradient
Descent on that loss:
Average
Elo
Individual
Elo
Getting Elo From A
Theorem:
- If
We have:
Transitive component Cyclic component:
Take-away:
● ELO = f
● Meaningful if B << f
Cyclic component:
There exists cycles: P1 beats P2, P2 beats P3,
P3 beats P1
Why do we care about that
Elo is useful to predict win-loss probability:
- Under the assumption that the game is transitive
Assuming we ‘know’ f_i and f_j we can predict who will win.
Orthogonal matrices
Orthogonal matrices
Estimated with an
empirical payoff
matrix
Table from NeurIPS tutorial on learning dynamics by Marta Garnelo, Wojciech Czarnecki
and David Balduzzi
26
How Tasks are Combined?
Table from NeurIPS tutorial on learning dynamics by Marta Garnelo, Wojciech Czarnecki
and David Balduzzi
27
Desired properties
Desired properties:
1. Invariant: adding redundant copies of an agent or
task to the data should make no difference.
Meta-Agent Meta-Task
Perf against the Env (uniform or Difficulty of the env against an avg player
Nash Avg) (uniform or Nash Avg)
Training of
Multiple
Agents
Who are the players?
Here we care about the agent/player (same thing)
Problem:
Example: Self-play:
Open-ended Learning
General Framework to answer the question:
“Who plays against who?”
Conclusion:
- General framework to understand
general algorithm such as self-play or
Fictitious self play.
Self-Play
Simple payoff:
Self play:
Remark: in practice
Mixture of Agents:
Sample vi with probability
pi.
Matrix of the empirical game
We can use this matrix for several purposes:
1. Evaluate (a group of) agents.
2. Evaluate the diversity of a group of agents
3. Setup efficient Training.
- Group of agents vi
- Play against to ‘best’ opponent
- Used in Starcraft II
- [Vinyals · 2019]
Conclusion
● Self-play is a very powerful method to train agents in a
Multi-Agent framework.