SPO: Sequential Monte Carlo Policy Optimisation

Macfarlane, Matthew V; Toledo, Edan; Byrne, Donal; Duckworth, Paul; Laterre, Alexandre

Computer Science > Artificial Intelligence

arXiv:2402.07963 (cs)

[Submitted on 12 Feb 2024 (v1), last revised 31 Oct 2024 (this version, v3)]

Title:SPO: Sequential Monte Carlo Policy Optimisation

Authors:Matthew V Macfarlane, Edan Toledo, Donal Byrne, Paul Duckworth, Alexandre Laterre

View PDF HTML (experimental)

Abstract:Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they often result in a negative impact on performance. In this paper, we introduce SPO: Sequential Monte Carlo Policy Optimisation, a model-based reinforcement learning algorithm grounded within the Expectation Maximisation (EM) framework. We show that SPO provides robust policy improvement and efficient scaling properties. The sample-based search makes it directly applicable to both discrete and continuous action spaces without modifications. We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines across both continuous and discrete environments. Furthermore, the parallel nature of SPO's search enables effective utilisation of hardware accelerators, yielding favourable scaling laws.

Comments:	Accepted to NeurIPS 2024. 34 pages, 3 main figures
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2402.07963 [cs.AI]
	(or arXiv:2402.07963v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2402.07963

Submission history

From: Matthew Macfarlane [view email]
[v1] Mon, 12 Feb 2024 10:32:47 UTC (3,122 KB)
[v2] Sun, 7 Jul 2024 09:48:13 UTC (801 KB)
[v3] Thu, 31 Oct 2024 17:05:49 UTC (1,618 KB)

Computer Science > Artificial Intelligence

Title:SPO: Sequential Monte Carlo Policy Optimisation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:SPO: Sequential Monte Carlo Policy Optimisation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators