About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
8th International Conference on Performance Evaluation Methodologies and Tools

Research Article

Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms

Download1316 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/icst.valuetools.2014.258207,
        author={Julia Kuhn and Michel Mandjes and Yoni Nazarathy},
        title={Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms},
        proceedings={8th International Conference on Performance Evaluation Methodologies and Tools},
        publisher={EAI},
        proceedings_a={VALUETOOLS},
        year={2015},
        month={2},
        keywords={restless bandits partially observable whittle index performance evaluation asymptotic dynamics},
        doi={10.4108/icst.valuetools.2014.258207}
    }
    
  • Julia Kuhn
    Michel Mandjes
    Yoni Nazarathy
    Year: 2015
    Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms
    VALUETOOLS
    ICST
    DOI: 10.4108/icst.valuetools.2014.258207
Julia Kuhn1,*, Michel Mandjes2, Yoni Nazarathy3
  • 1: The University of Queensland, University of Amsterdam
  • 2: University of Amsterdam
  • 3: The University of Queensland
*Contact email: [email protected]

Abstract

We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm is only observed when it is played and the state-dependent reward is collected. Since arms are only partially observable, a good decision policy needs to account for the fact that information about the state of an arm becomes more and more obsolete while the arm is not being played. Thus, the decision maker faces a tradeoff between exploiting those arms that are believed to be currently the most rewarding (i.e. those with the largest conditional mean), and exploring arms with a high conditional variance. Moreover, one would like the decision policy to remain tractable despite the infinite state space and also in systems with many arms. A policy that gives some priority to exploration is the Whittle index policy, for which we establish structural properties. These motivate a parametric index policy that is computationally much simpler than the Whittle index but can still outperform the myopic policy. Furthermore, we examine the many-arm behavior of the system under the parametric policy, identifying equations describing its asymptotic dynamics. Based on these insights we provide a simple heuristic algorithm to evaluate the performance of index policies; the latter is used to optimize the parametric index.

Keywords
restless bandits, partially observable, whittle index, performance evaluation, asymptotic dynamics
Published
2015-02-19
Publisher
EAI
Appears in
ACM Digital Library
http://dx.doi.org/10.4108/icst.valuetools.2014.258207
Copyright © 2014–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL