Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
Assumptions:
1. Stochasticity: The reward for each arm is sampled from its underlying
distribution. The
2. Finiteness and Independence: The number of arms is finite and the reward
for each arm is independent of the others.
3. Stationarity: The reward distributions of the arms do not change over time.
Introduction
UPDATE: Use the mean of rewards obtained on pulling arm i as the empirical
estimated reward for that arm.
(Gap-dependent Bound)
(Gap-free Bound)
Epsilon-Greedy
Update
● Linear Bandits:
● Combinatorial Bandits: The space of arms are related according to a
combinatorial constraint.
Contextual Bandits
UPDATE:
Linear Bandits:
(Non)-Linear Bandits
Epsilon-Greedy
- O(n^{2/3}) regret
+ Easy to extend for
non-linear bandits
LinUCB
Bootstrapping
- Not well developed theory.
+ Need to compute only point
estimates.
Bandits everywhere!
● Adversarial Bandits (relaxing assumption 1)
● Gaussian process Bandits (relaxing assumption 2)
● Restless Bandits (relaxing assumption 3)
● Rotting Bandits
● Duelling Bandits
● Firing Bandits
● ………….