Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Russo, Daniel

Computer Science > Machine Learning

arXiv:1906.02870 (cs)

[Submitted on 7 Jun 2019 (v1), last revised 16 Aug 2019 (this version, v3)]

Title:Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Authors:Daniel Russo

View PDF

Abstract:This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning. These randomized value functions are generated by injecting random noise into the training data, making the approach compatible with many popular methods for estimating parameterized value functions. By providing a worst-case regret bound for tabular finite-horizon Markov decision processes, we show that planning with respect to these randomized value functions can induce provably efficient exploration.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
Cite as:	arXiv:1906.02870 [cs.LG]
	(or arXiv:1906.02870v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.02870

Submission history

From: Daniel Russo [view email]
[v1] Fri, 7 Jun 2019 02:36:00 UTC (20 KB)
[v2] Sat, 22 Jun 2019 09:40:10 UTC (21 KB)
[v3] Fri, 16 Aug 2019 22:43:53 UTC (21 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-06

Change to browse by:

cs
cs.AI
cs.SY
eess
eess.SY
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Daniel Russo

export BibTeX citation

Computer Science > Machine Learning

Title:Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators