A unified view of entropy-regularized Markov decision processes

Neu, Gergely; Jonsson, Anders; Gómez, Vicenç

Computer Science > Machine Learning

arXiv:1705.07798 (cs)

[Submitted on 22 May 2017]

Title:A unified view of entropy-regularized Markov decision processes

Authors:Gergely Neu, Anders Jonsson, Vicenç Gómez

View PDF

Abstract:We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1705.07798 [cs.LG]
	(or arXiv:1705.07798v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1705.07798

Submission history

From: Gergely Neu [view email] [via Gergely Neu as proxy]
[v1] Mon, 22 May 2017 15:06:25 UTC (95 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2017-05

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Gergely Neu
Anders Jonsson
Vicenç Gómez

export BibTeX citation

Computer Science > Machine Learning

Title:A unified view of entropy-regularized Markov decision processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A unified view of entropy-regularized Markov decision processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators