Skip to main content

Showing 1–10 of 10 results for author: Meshram, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.00410  [pdf, other

    cs.LG eess.SY stat.ML

    Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

    Authors: Vishesh Mittal, Rahul Meshram, Deepak Dev, Surya Prakash

    Abstract: We consider finite state restless multi-armed bandit problem. The decision maker can act on M bandits out of N bandits in each time step. The play of arm (active arm) yields state dependent rewards based on action and when the arm is not played, it also provides rewards based on the state and action. The objective of the decision maker is to maximize the infinite horizon discounted reward. The cla… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 15 Pages, submitted to conference

  2. arXiv:2108.00892  [pdf, other

    cs.LG eess.SY

    Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits

    Authors: Rahul Meshram, Kesav Kaza

    Abstract: Restless multi-armed bandits with partially observable states has applications in communication systems, age of information and recommendation systems. In this paper, we study multi-state partially observable restless bandit models. We consider three different models based on information observable to decision maker -- 1) no information is observable from actions of a bandit 2) perfect information… ▽ More

    Submitted 29 July, 2021; originally announced August 2021.

    Comments: 8 pages, submitted to CDC

  3. arXiv:2102.04321  [pdf, ps, other

    eess.SY cs.LG

    Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

    Authors: Rahul Meshram, Kesav Kaza

    Abstract: We model online recommendation systems using the hidden Markov multi-state restless multi-armed bandit problem. To solve this we present Monte Carlo rollout policy. We illustrate numerically that Monte Carlo rollout policy performs better than myopic policy for arbitrary transition dynamics with no specific structure. But, when some structure is imposed on the transition dynamics, myopic policy pe… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: 5 Pages, 4 figures, conference COMSNETS 2021

  4. arXiv:2007.12933  [pdf, ps, other

    eess.SY cs.LG

    Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

    Authors: Rahul Meshram, Kesav Kaza

    Abstract: We consider multi-dimensional Markov decision processes and formulate a long term discounted reward optimization problem. Two simulation based algorithms---Monte Carlo rollout policy and parallel rollout policy are studied, and various properties for these policies are discussed. We next consider a restless multi-armed bandit (RMAB) with multi-dimensional state space and multi-actions bandit model… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 3 Figures

  5. arXiv:1910.01860  [pdf, ps, other

    cs.GT eess.SY

    Online repeated posted price auctions with a demand side platform

    Authors: Rahul Meshram, Kesav Kaza

    Abstract: We consider an online ad network problem in which an ad exchange auctions ad slots and intermediaries called demand side platforms (DSPs) buy these ad slots for their clients (advertisers). An intermediary represents multiple advertisers. Different types of ad slots are auctioned by the ad exchange, e.g., video ad, banner ad etc. We study repeated posted price auctions for homogeneous and heteroge… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  6. arXiv:1904.08962  [pdf, ps, other

    eess.SY cs.LG

    Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems

    Authors: Kesav Kaza, Rahul Meshram, Varun Mehta, S. N. Merchant

    Abstract: This paper studies a class of constrained restless multi-armed bandits (CRMAB). The constraints are in the form of time varying set of actions (set of available arms). This variation can be either stochastic or semi-deterministic. Given a set of arms, a fixed number of them can be chosen to be played in each decision interval. The play of each arm yields a state dependent reward. The current state… ▽ More

    Submitted 6 September, 2021; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: 17 pages, 2 figures

  7. arXiv:1803.08651  [pdf, ps, other

    cs.IR cs.LG stat.ML

    Learning Recommendations While Influencing Interests

    Authors: Rahul Meshram, D. Manjunath, Nikhil Karamchandani

    Abstract: Personalized recommendation systems (RS) are extensively used in many services. Many of these are based on learning algorithms where the RS uses the recommendation history and the user response to learn an optimal strategy. Further, these algorithms are based on the assumption that the user interests are rigid. Specifically, they do not account for the effect of learning strategy on the evolution… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

    Comments: 13 pages, submitted to conference

  8. arXiv:1801.01301  [pdf, ps, other

    eess.SY cs.IT

    Sequential Decision Making with Limited Observation Capability: Application to Wireless Networks

    Authors: Kesav Kaza, Rahul Meshram, Varun Mehta, S. N. Merchant

    Abstract: This work studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRB) as the events of decision-making are sparser than events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition event… ▽ More

    Submitted 29 January, 2019; v1 submitted 4 January, 2018; originally announced January 2018.

  9. arXiv:1603.09233  [pdf, ps, other

    cs.LG

    Optimal Recommendation to Users that React: Online Learning for a Class of POMDPs

    Authors: Rahul Meshram, Aditya Gopalan, D. Manjunath

    Abstract: We describe and study a model for an Automated Online Recommendation System (AORS) in which a user's preferences can be time-dependent and can also depend on the history of past recommendations and play-outs. The three key features of the model that makes it more realistic compared to existing models for recommendation systems are (1) user preference is inherently latent, (2) current recommendatio… ▽ More

    Submitted 30 March, 2016; originally announced March 2016.

    Comments: 8 pages, submitted to conference

  10. arXiv:1110.0310  [pdf, other

    cs.NI stat.CO

    Joint Routing, Scheduling And Power Control For Multihop Wireless Networks With Multiple Antennas

    Authors: Harish Vangala, Rahul Meshram, Prof. Vinod Sharma

    Abstract: We consider the problem of Joint Routing, Scheduling and Power-control (JRSP) problem for multihop wireless networks (MHWN) with multiple antennas. We extend the problem and a (sub-optimal) heuristic solution method for JRSP in MHWN with single antennas. We present an iterative scheme to calculate link capacities(achievable rates) in the interference environment of the network using SINR model. We… ▽ More

    Submitted 3 October, 2011; originally announced October 2011.

    Comments: Submitted to NCC-2012. First Draft is here. Final version has many changes