Learning from Bandit Feedback: An Overview of the State-of-the-art

Jeunen, Olivier; Mykhaylov, Dmytro; Rohde, David; Vasile, Flavian; Gilotte, Alexandre; Bompaire, Martin

Computer Science > Information Retrieval

arXiv:1909.08471 (cs)

[Submitted on 18 Sep 2019]

Title:Learning from Bandit Feedback: An Overview of the State-of-the-art

Authors:Olivier Jeunen, Dmytro Mykhaylov, David Rohde, Flavian Vasile, Alexandre Gilotte, Martin Bompaire

View PDF

Abstract:In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs, applying this principle becomes a problem because we do not have available the reward of decisions we did not do. In order to handle this "bandit-feedback" setting, several Counterfactual Risk Minimisation (CRM) methods have been proposed in recent years, that attempt to estimate the performance of different policies on historical data. Through importance sampling and various variance reduction techniques, these methods allow more robust learning and inference than classical approaches. It is difficult to accurately estimate the performance of policies that frequently perform actions that were infrequently done in the past and a number of different types of estimators have been proposed.
In this paper, we review several methods, based on different off-policy estimators, for learning from bandit feedback. We discuss key differences and commonalities among existing approaches, and compare their empirical performance on the RecoGym simulation environment. To the best of our knowledge, this work is the first comparison study for bandit algorithms in a recommender system setting.

Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1909.08471 [cs.IR]
	(or arXiv:1909.08471v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1909.08471

Submission history

From: Olivier Jeunen [view email]
[v1] Wed, 18 Sep 2019 14:26:28 UTC (222 KB)

Computer Science > Information Retrieval

Title:Learning from Bandit Feedback: An Overview of the State-of-the-art

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Learning from Bandit Feedback: An Overview of the State-of-the-art

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators