Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Bubeck, Sébastien; Cesa-Bianchi, Nicolò

Computer Science > Machine Learning

arXiv:1204.5721 (cs)

[Submitted on 25 Apr 2012 (v1), last revised 3 Nov 2012 (this version, v2)]

Title:Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Authors:Sébastien Bubeck, Nicolò Cesa-Bianchi

View PDF

Abstract:Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.

Comments:	To appear in Foundations and Trends in Machine Learning
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1204.5721 [cs.LG]
	(or arXiv:1204.5721v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1204.5721

Submission history

From: Sebastien Bubeck [view email]
[v1] Wed, 25 Apr 2012 18:04:32 UTC (89 KB)
[v2] Sat, 3 Nov 2012 18:50:58 UTC (94 KB)

Computer Science > Machine Learning

Title:Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators