Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
International Conference on Artificial Intelligence and Statistics, 2024•proceedings.mlr.press
We study best-of-both-worlds algorithms for $ K $-armed linear contextual bandits. Our
algorithms deliver near-optimal regret bounds in both the adversarial and stochastic
regimes, without prior knowledge about the environment. In the stochastic regime, we
achieve the polylogarithmic rate $\frac {(dK)^ 2\mathrm {poly}\!\log (dKT)}{\Delta_ {\min}} $,
where $\Delta_ {\min} $ is the minimum suboptimality gap over the $ d $-dimensional
context space. In the adversarial regime, we obtain either the first-order $\widetilde {\mathcal …
algorithms deliver near-optimal regret bounds in both the adversarial and stochastic
regimes, without prior knowledge about the environment. In the stochastic regime, we
achieve the polylogarithmic rate $\frac {(dK)^ 2\mathrm {poly}\!\log (dKT)}{\Delta_ {\min}} $,
where $\Delta_ {\min} $ is the minimum suboptimality gap over the $ d $-dimensional
context space. In the adversarial regime, we obtain either the first-order $\widetilde {\mathcal …
Abstract
We study best-of-both-worlds algorithms for -armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate , where is the minimum suboptimality gap over the -dimensional context space. In the adversarial regime, we obtain either the first-order bound, or the second-order bound, where is the cumulative loss of the best action and is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining regret bounds in the adversarial regime.
proceedings.mlr.press
Showing the best result for this search. See all results