Eureka: Human-Level Reward Design via Coding Large Language Models

Ma, Yecheng Jason; Liang, William; Wang, Guanzhi; Huang, De-An; Bastani, Osbert; Jayaraman, Dinesh; Zhu, Yuke; Fan, Linxi; Anandkumar, Anima

Computer Science > Robotics

arXiv:2310.12931 (cs)

[Submitted on 19 Oct 2023 (v1), last revised 30 Apr 2024 (this version, v2)]

Title:Eureka: Human-Level Reward Design via Coding Large Language Models

Authors:Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time, a simulated Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at rapid speed.

Comments:	ICLR 2024. Project website and open-source code: this https URL
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2310.12931 [cs.RO]
	(or arXiv:2310.12931v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2310.12931

Submission history

From: Yecheng Jason Ma [view email]
[v1] Thu, 19 Oct 2023 17:31:01 UTC (3,317 KB)
[v2] Tue, 30 Apr 2024 21:35:53 UTC (3,306 KB)

Computer Science > Robotics

Title:Eureka: Human-Level Reward Design via Coding Large Language Models

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Eureka: Human-Level Reward Design via Coding Large Language Models

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators