Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Gupta, Abhishek; Kumar, Vikash; Lynch, Corey; Levine, Sergey; Hausman, Karol

Computer Science > Machine Learning

arXiv:1910.11956 (cs)

[Submitted on 25 Oct 2019]

Title:Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Authors:Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, Karol Hausman

View PDF

Abstract:We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage that produces goal-conditioned hierarchical policies, and a reinforcement learning phase that finetunes these policies for task performance. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via environment interaction, allowing it to scale to challenging long-horizon tasks. We simplify the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved. While we rely on demonstration data to bootstrap policy learning, we do not assume access to demonstrations of every specific tasks that is being solved, and instead leverage unstructured and unsegmented demonstrations of semantically meaningful behaviors that are not only less burdensome to provide, but also can greatly facilitate further improvement using reinforcement learning. We demonstrate the effectiveness of our method on a number of multi-stage, long-horizon manipulation tasks in a challenging kitchen simulation environment. Videos are available at this https URL

Comments:	Published at CoRL 2019
Subjects:	Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
Cite as:	arXiv:1910.11956 [cs.LG]
	(or arXiv:1910.11956v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.11956

Submission history

From: Abhishek Gupta [view email]
[v1] Fri, 25 Oct 2019 23:01:43 UTC (6,444 KB)

Computer Science > Machine Learning

Title:Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators