Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Hu, Hao; Yang, Yiqin; Ye, Jianing; Wu, Chengjie; Mai, Ziqing; Hu, Yujing; Lv, Tangjie; Fan, Changjie; Zhao, Qianchuan; Zhang, Chongjie

Computer Science > Machine Learning

arXiv:2405.20984 (cs)

[Submitted on 31 May 2024]

Title:Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Authors:Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

View PDF HTML (experimental)

Abstract:Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies.
Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

Comments:	Forty-first International Conference on Machine Learning (ICML), 2024
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2405.20984 [cs.LG]
	(or arXiv:2405.20984v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.20984

Submission history

From: Hao Hu [view email]
[v1] Fri, 31 May 2024 16:31:07 UTC (17,677 KB)

Computer Science > Machine Learning

Title:Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators