Easy Monotonic Policy Iteration

Achiam, Joshua

Computer Science > Machine Learning

arXiv:1602.09118 (cs)

[Submitted on 29 Feb 2016]

Title:Easy Monotonic Policy Iteration

Authors:Joshua Achiam

View PDF

Abstract:A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or $Q$-function may fail to improve performance---or worse, actually cause the policy performance to degrade. Prior work has addressed this for policy iteration by deriving tight policy improvement bounds; by optimizing the lower bound on policy improvement, a better policy is guaranteed. However, existing approaches suffer from bounds that are hard to optimize in practice because they include sup norm terms which cannot be efficiently estimated or differentiated. In this work, we derive a better policy improvement bound where the sup norm of the policy divergence has been replaced with an average divergence; this leads to an algorithm, Easy Monotonic Policy Iteration, that generates sequences of policies with guaranteed non-decreasing returns and is easy to implement in a sample-based framework.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1602.09118 [cs.LG]
	(or arXiv:1602.09118v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1602.09118

Submission history

From: Joshua Achiam [view email]
[v1] Mon, 29 Feb 2016 19:59:16 UTC (40 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2016-02

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Joshua Achiam

export BibTeX citation

Computer Science > Machine Learning

Title:Easy Monotonic Policy Iteration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Easy Monotonic Policy Iteration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators