Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Cowan, Wesley; Katehakis, Michael N.; Pirutinsky, Daniel

Computer Science > Machine Learning

arXiv:1909.13158 (cs)

[Submitted on 28 Sep 2019]

Title:Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Authors:Wesley Cowan, Michael N. Katehakis, Daniel Pirutinsky

View PDF

Abstract:In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method based on Posterior sampling (MDP-PS).

Comments:	A version of some of the algorithms and comparisons has appeared in a previous technical note by Cowan, Katehakis, and Pirutinsky (2019) arXiv:1909.06019
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1909.13158 [cs.LG]
	(or arXiv:1909.13158v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.13158

Submission history

From: Daniel Pirutinsky [view email]
[v1] Sat, 28 Sep 2019 21:56:11 UTC (286 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-09

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Wesley Cowan
Michael N. Katehakis

export BibTeX citation

Computer Science > Machine Learning

Title:Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators