LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers

Deits, Robin; Koolen, Twan; Tedrake, Russ

Computer Science > Robotics

arXiv:1809.05802 (cs)

[Submitted on 16 Sep 2018]

Title:LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers

Authors:Robin Deits, Twan Koolen, Russ Tedrake

View PDF

Abstract:Guided policy search is a popular approach for training controllers for high-dimensional systems, but it has a number of pitfalls. Non-convex trajectory optimization has local minima, and non-uniqueness in the optimal policy itself can mean that independently-optimized samples do not describe a coherent policy from which to train. We introduce LVIS, which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function (or cost-to-go) rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate the LVIS approach on a cart-pole system with walls and a planar humanoid robot model and show that it can be applied to a fundamentally hard problem in feedback control--control through contact.

Comments:	7 pages, 8 figures. Submitted to the 2019 IEEE International Conference on Robotics and Automation (ICRA 2019)
Subjects:	Robotics (cs.RO)
Cite as:	arXiv:1809.05802 [cs.RO]
	(or arXiv:1809.05802v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.1809.05802

Submission history

From: Robin Deits [view email]
[v1] Sun, 16 Sep 2018 03:39:23 UTC (4,586 KB)

Computer Science > Robotics

Title:LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators