Batch Policy Learning under Constraints

Le, Hoang M.; Voloshin, Cameron; Yue, Yisong

Computer Science > Machine Learning

arXiv:1903.08738 (cs)

[Submitted on 20 Mar 2019]

Title:Batch Policy Learning under Constraints

Authors:Hoang M. Le, Cameron Voloshin, Yisong Yue

View PDF

Abstract:When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1903.08738 [cs.LG]
	(or arXiv:1903.08738v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.08738

Submission history

From: Hoang M. Le [view email]
[v1] Wed, 20 Mar 2019 21:01:22 UTC (1,750 KB)

Computer Science > Machine Learning

Title:Batch Policy Learning under Constraints

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Batch Policy Learning under Constraints

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators