Towards Characterizing Divergence in Deep Q-Learning

Achiam, Joshua; Knight, Ethan; Abbeel, Pieter

Computer Science > Machine Learning

arXiv:1903.08894 (cs)

[Submitted on 21 Mar 2019]

Title:Towards Characterizing Divergence in Deep Q-Learning

Authors:Joshua Achiam, Ethan Knight, Pieter Abbeel

View PDF

Abstract:Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation. Prior work has demonstrated that together these can lead to divergence in Q-learning algorithms, but the conditions under which divergence occurs are not well-understood. In this note, we give a simple analysis based on a linear approximation to the Q-value updates, which we believe provides insight into divergence under the deadly triad. The central point in our analysis is to consider when the leading order approximation to the deep-Q update is or is not a contraction in the sup norm. Based on this analysis, we develop an algorithm which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions). We demonstrate that our algorithm performs above or near state-of-the-art on standard MuJoCo benchmarks from the OpenAI Gym.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1903.08894 [cs.LG]
	(or arXiv:1903.08894v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.08894

Submission history

From: Joshua Achiam [view email]
[v1] Thu, 21 Mar 2019 09:42:41 UTC (2,101 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-03

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Joshua Achiam
Ethan Knight
Pieter Abbeel

export BibTeX citation

Computer Science > Machine Learning

Title:Towards Characterizing Divergence in Deep Q-Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Characterizing Divergence in Deep Q-Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators