REALab: An Embedded Perspective on Tampering

Kumar, Ramana; Uesato, Jonathan; Ngo, Richard; Everitt, Tom; Krakovna, Victoria; Legg, Shane

Computer Science > Machine Learning

arXiv:2011.08820v1 (cs)

[Submitted on 17 Nov 2020]

Title:REALab: An Embedded Perspective on Tampering

Authors:Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

View PDF

Abstract:This paper describes REALab, a platform for embedded agency research in reinforcement learning (RL). REALab is designed to model the structure of tampering problems that may arise in real-world deployments of RL. Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e.g., rewards). This may be unrealistic in settings where agents are embedded and can corrupt the processes producing feedback (e.g., human supervisors, or an implemented reward function). We describe an alternative Corrupt Feedback MDP formulation and the REALab environment platform, which both avoid the secure feedback assumption. We hope the design of REALab provides a useful perspective on tampering problems, and that the platform may serve as a unit test for the presence of tampering incentives in RL agent designs.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2011.08820 [cs.LG]
	(or arXiv:2011.08820v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.08820

Submission history

From: Jonathan Uesato [view email]
[v1] Tue, 17 Nov 2020 18:37:20 UTC (1,026 KB)

Computer Science > Machine Learning

Title:REALab: An Embedded Perspective on Tampering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:REALab: An Embedded Perspective on Tampering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators