Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Li, Sicen; Tang, Qinyun; Pang, Yiming; Ma, Xinmeng; Wang, Gang

Computer Science > Machine Learning

arXiv:2110.09712 (cs)

[Submitted on 19 Oct 2021 (v1), last revised 26 Oct 2022 (this version, v6)]

Title:Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Authors:Sicen Li, Qinyun Tang, Yiming Pang, Xinmeng Ma, Gang Wang

View PDF

Abstract:Model-free deep reinforcement learning (RL) has been successfully applied to challenging continuous control domains. However, poor sample efficiency prevents these methods from being widely used in real-world domains. This paper introduces a novel model-free algorithm, Realistic Actor-Critic(RAC), which can be incorporated with any off-policy RL algorithms to improve sample efficiency. RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn a policy family with the same neural network, each with different trade-offs between underestimation and overestimation. To learn such policies, we introduce uncertainty punished Q-learning, which uses uncertainty from the ensembling of multiple critics to build various confidence-bounds of Q-function. We evaluate RAC on the MuJoCo benchmark, achieving 10x sample efficiency and 25\% performance improvement on the most challenging Humanoid environment compared to SAC.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2110.09712 [cs.LG]
	(or arXiv:2110.09712v6 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2110.09712

Submission history

From: Sicen Li [view email]
[v1] Tue, 19 Oct 2021 03:35:01 UTC (3,035 KB)
[v2] Wed, 20 Oct 2021 00:59:22 UTC (3,035 KB)
[v3] Wed, 10 Nov 2021 03:35:29 UTC (4,488 KB)
[v4] Tue, 14 Jun 2022 02:24:31 UTC (2,754 KB)
[v5] Wed, 22 Jun 2022 10:55:58 UTC (2,754 KB)
[v6] Wed, 26 Oct 2022 08:42:31 UTC (3,035 KB)

Computer Science > Machine Learning

Title:Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators