DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning

He, Longxiang; Shen, Li; Zhang, Linrui; Tan, Junbo; Wang, Xueqian

Computer Science > Machine Learning

arXiv:2310.05333 (cs)

[Submitted on 9 Oct 2023 (v1), last revised 28 Feb 2024 (this version, v2)]

Title:DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning

Authors:Longxiang He, Li Shen, Linrui Zhang, Junbo Tan, Xueqian Wang

View PDF HTML (experimental)

Abstract:Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR). However, previous methods may still encounter out-of-distribution actions due to the limited expressivity of Gaussian-based policies. On the other hand, directly applying the state-of-the-art models with distribution expression capabilities (i.e., diffusion models) in the AWR framework is intractable since AWR requires exact policy probability densities, which is intractable in diffusion models. In this paper, we propose a novel approach, $\textbf{Diffusion-based Constrained Policy Search}$ (dubbed DiffCPS), which tackles the diffusion-based constrained policy search with the primal-dual method. The theoretical analysis reveals that strong duality holds for diffusion-based CPS problems, and upon introducing parameter approximation, an approximated solution can be obtained after $\mathcal{O}(1/\epsilon)$ number of dual iterations, where $\epsilon$ denotes the representation ability of the parametrized policy. Extensive experimental results based on the D4RL benchmark demonstrate the efficacy of our approach. We empirically show that DiffCPS achieves better or at least competitive performance compared to traditional AWR-based baselines as well as recent diffusion-based offline RL methods. The code is now available at this https URL.

Comments:	22 pages, 9 figures, 6 tables. Submitted to ICML 2024. arXiv admin note: text overlap with arXiv:1910.13393 by other authors
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.05333 [cs.LG]
	(or arXiv:2310.05333v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.05333

Submission history

From: Longx He [view email]
[v1] Mon, 9 Oct 2023 01:29:17 UTC (1,672 KB)
[v2] Wed, 28 Feb 2024 13:48:09 UTC (1,718 KB)

Computer Science > Machine Learning

Title:DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators