Skip to main content

Showing 1–13 of 13 results for author: Kaptein, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14319  [pdf, other

    cs.LG stat.ML

    Rethinking Knowledge Transfer in Learning Using Privileged Information

    Authors: Danil Provodin, Bram van den Akker, Christina Katsimerou, Maurits Kaptein, Mykola Pechenizkiy

    Abstract: In supervised machine learning, privileged information (PI) is information that is unavailable at inference, but is accessible during training time. Research on learning using privileged information (LUPI) aims to transfer the knowledge captured in PI onto a model that can perform inference without PI. It seems that this extra bit of information ought to make the resulting model better. However, f… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  2. arXiv:2405.19017  [pdf, other

    cs.LG

    Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

    Authors: Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy

    Abstract: We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of $\tilde{O} (DS\sqrt{AT})$ for an… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: To appear at ICML'24

  3. arXiv:2309.15737  [pdf, other

    cs.LG

    Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

    Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

    Abstract: We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of \tilde{O} (HS \sqrt{AT}) for any… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  4. arXiv:2209.03596  [pdf, other

    cs.LG

    An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning

    Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

    Abstract: We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationally cheaper. The first algorithm is based on a linear formulation of CMDP, and the second algorithm leverages the saddle-point formulation of CMDP. Ou… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  5. arXiv:2202.06657  [pdf, other

    cs.LG

    The Impact of Batch Learning in Stochastic Linear Bandits

    Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

    Abstract: We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main the… ▽ More

    Submitted 1 September, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: This is a longer version of the paper published at ICDM'22. arXiv admin note: text overlap with arXiv:2111.02071

  6. arXiv:2111.02071  [pdf, other

    cs.LG stat.ML

    The Impact of Batch Learning in Stochastic Bandits

    Authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

    Abstract: We consider a special case of bandit problems, namely batched bandits. Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning. We provide a policy-agnostic regret analysis… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: To appear at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021

  7. arXiv:1908.07808  [pdf, other

    cs.LG stat.ML

    Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

    Authors: Jules Kruijswijk, Petri Parvinen, Maurits Kaptein

    Abstract: The (contextual) multi-armed bandit problem (MAB) provides a formalization of sequential decision-making which has many applications. However, validly evaluating MAB policies is challenging; we either resort to simulations which inherently include debatable assumptions, or we resort to expensive field trials. Recently an offline evaluation method has been suggested that is based on empirical data,… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

  8. arXiv:1904.09339  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models

    Authors: Reza Mohammadi, Matthew Pratola, Maurits Kaptein

    Abstract: Decision trees are flexible models that are well suited for many statistical regression problems. In a Bayesian framework for regression trees, Markov Chain Monte Carlo (MCMC) search algorithms are required to generate samples of tree models according to their posterior probabilities. The critical component of such an MCMC algorithm is to construct good Metropolis-Hastings steps for updating the t… ▽ More

    Submitted 26 October, 2020; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: Published at http://jmlr.org/papers/v21/19-307 in the Journal of Machine Learning Research (https://www.jmlr.org)

    Journal ref: Journal of Machine Learning Research 2020, Vol. 21, No. 201, 1-26

  9. arXiv:1811.01926  [pdf, other

    cs.LG math.OC stat.ML

    contextual: Evaluating Contextual Multi-Armed Bandit Problems in R

    Authors: Robin van Emden, Maurits Kaptein

    Abstract: Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance to clinical trial design and personalized medicine. At the same time, there are, as of yet, surprisingly few options that enable researchers and practitioners to simulate and compare the wealth of… ▽ More

    Submitted 1 January, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: 55 pages, 12 figures

    MSC Class: I.2.6; K.4.4; F.2.0 ACM Class: I.2.6; K.4.4; F.2.0

  10. arXiv:1802.10529  [pdf, other

    cs.LG stat.CO stat.ML

    Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

    Authors: Maurits Kaptein, Paul Ketelaar

    Abstract: In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We pro… ▽ More

    Submitted 28 February, 2018; originally announced February 2018.

    Comments: 1 figure. Working paper including [R] package

  11. arXiv:1602.06700  [pdf, other

    cs.HC cs.CY

    StreamingBandit; Experimenting with Bandit Policies

    Authors: Jules Kruijswijk, Robin van Emden, Petri Parvinen, Maurits Kaptein

    Abstract: A large number of statistical decision problems in the social sciences and beyond can be framed as a (contextual) multi-armed bandit problem. However, it is notoriously hard to develop and evaluate policies that tackle these types of problem, and to use such policies in applied studies. To address this issue, this paper introduces StreamingBandit, a Python web application for developing and testin… ▽ More

    Submitted 4 September, 2018; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: 47 pages, 15 figures, accepted for publication in Journal of Statistical Software (JSS)

  12. arXiv:1502.00598  [pdf, other

    cs.LG

    Lock in Feedback in Sequential Experiments

    Authors: Maurits Kaptein, Davide Iannuzzi

    Abstract: We often encounter situations in which an experimenter wants to find, by sequential experimentation, $x_{max} = \arg\max_{x} f(x)$, where $f(x)$ is a (possibly unknown) function of a well controllable variable $x$. Taking inspiration from physics and engineering, we have designed a new method to address this problem. In this paper, we first introduce the method in continuous time, and then present… ▽ More

    Submitted 12 January, 2016; v1 submitted 2 February, 2015; originally announced February 2015.

    Comments: 20 Pages, 7 Figures

  13. arXiv:1410.4009  [pdf, other

    cs.LG stat.CO stat.ML

    Thompson sampling with the online bootstrap

    Authors: Dean Eckles, Maurits Kaptein

    Abstract: Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstrap… ▽ More

    Submitted 15 October, 2014; originally announced October 2014.

    Comments: 13 pages, 4 figures

    MSC Class: 68W27; 62L05 ACM Class: G.3; I.2.6