Distributional Reinforcement Learning via Moment Matching

Nguyen, Thanh Tang; Gupta, Sunil; Venkatesh, Svetha

Computer Science > Machine Learning

arXiv:2007.12354 (cs)

[Submitted on 24 Jul 2020 (v1), last revised 9 Dec 2020 (this version, v3)]

Title:Distributional Reinforcement Learning via Moment Matching

Authors:Thanh Tang Nguyen, Sunil Gupta, Svetha Venkatesh

View PDF

Abstract:We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in (Bellemare, Dabney, and Munos 2017; Dabney et al. 2018b). Existing distributional RL methods however constrain the learned statistics to \emph{predefined} functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn \emph{unrestricted} statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.

Comments:	To appear in AAAI'21; code available at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2007.12354 [cs.LG]
	(or arXiv:2007.12354v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2007.12354
Journal reference:	AAAI 2021

Submission history

From: Thanh Nguyen Tang [view email]
[v1] Fri, 24 Jul 2020 05:18:17 UTC (29,930 KB)
[v2] Sat, 5 Dec 2020 06:43:28 UTC (8,973 KB)
[v3] Wed, 9 Dec 2020 00:38:36 UTC (9,278 KB)

Computer Science > Machine Learning

Title:Distributional Reinforcement Learning via Moment Matching

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Distributional Reinforcement Learning via Moment Matching

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators