Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Sakhi, Otmane; Rohde, David; Chopin, Nicolas

Computer Science > Machine Learning

arXiv:2308.01566 (cs)

[Submitted on 3 Aug 2023 (v1), last revised 29 Dec 2023 (this version, v2)]

Title:Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Authors:Otmane Sakhi, David Rohde, Nicolas Chopin

View PDF HTML (experimental)

Abstract:An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

Comments:	Transactions on Machine Learning Research
Subjects:	Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Cite as:	arXiv:2308.01566 [cs.LG]
	(or arXiv:2308.01566v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2308.01566

Submission history

From: Otmane Sakhi [view email]
[v1] Thu, 3 Aug 2023 07:13:27 UTC (536 KB)
[v2] Fri, 29 Dec 2023 11:26:51 UTC (1,220 KB)

Computer Science > Machine Learning

Title:Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators