The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Yu, Chao; Velu, Akash; Vinitsky, Eugene; Gao, Jiaxuan; Wang, Yu; Bayen, Alexandre; Wu, Yi

Computer Science > Machine Learning

arXiv:2103.01955 (cs)

[Submitted on 2 Mar 2021 (v1), last revised 4 Nov 2022 (this version, v4)]

Title:The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Authors:Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu

View PDF

Abstract:Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, Google Research Football, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. Finally, through ablation studies, we analyze implementation and hyperparameter factors that are critical to PPO's empirical performance, and give concrete practical suggestions regarding these factors. Our results show that when using these practices, simple PPO-based methods can be a strong baseline in cooperative multi-agent reinforcement learning. Source code is released at \url{this https URL}.

Comments:	This paper has been accepted by NeurIPS 2022 Datasets and Benchmarks
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:2103.01955 [cs.LG]
	(or arXiv:2103.01955v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.01955

Submission history

From: Chao Yu [view email]
[v1] Tue, 2 Mar 2021 18:59:56 UTC (16,035 KB)
[v2] Mon, 5 Jul 2021 23:45:06 UTC (19,815 KB)
[v3] Thu, 21 Jul 2022 06:57:33 UTC (23,508 KB)
[v4] Fri, 4 Nov 2022 06:16:11 UTC (33,520 KB)

Computer Science > Machine Learning

Title:The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators