Neural Auto-Curricula

Feng, Xidong; Slumbers, Oliver; Wan, Ziyu; Liu, Bo; McAleer, Stephen; Wen, Ying; Wang, Jun; Yang, Yaodong

Computer Science > Artificial Intelligence

arXiv:2106.02745 (cs)

[Submitted on 4 Jun 2021 (v1), last revised 1 Nov 2021 (this version, v2)]

Title:Neural Auto-Curricula

Authors:Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang

View PDF

Abstract:When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper, we introduce a novel framework -- Neural Auto-Curricula (NAC) -- that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

Comments:	corresponding to <[email protected]>
Subjects:	Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:2106.02745 [cs.AI]
	(or arXiv:2106.02745v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2106.02745

Submission history

From: Dr. Yaodong Yang [view email]
[v1] Fri, 4 Jun 2021 22:30:25 UTC (5,091 KB)
[v2] Mon, 1 Nov 2021 09:13:57 UTC (7,398 KB)

Computer Science > Artificial Intelligence

Title:Neural Auto-Curricula

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Neural Auto-Curricula

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators