0% found this document useful (0 votes)

73 views16 pages

Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)

This document introduces multi-armed bandits problems. It discusses that bandits problems model situations where an agent must explore unknown options and exploit the best ones to maximize rewards. The document outlines key assumptions of bandits problems and summarizes popular algorithms like epsilon-greedy, UCB, and Thompson sampling that balance exploration vs exploitation. It also discusses extensions to structured and contextual bandits.

Uploaded by

Shaukat Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views16 pages

Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)

Uploaded by

Shaukat Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Introduction to bandits

(some slides stolen from Csaba’s AAAI tutorial)

Motivation
Do not have complete information about the
effectiveness or side-effects of the drugs.
Aim: Infer the best drug by running a
sequence of trials
Mapping to a bandits algorithm:
● Each drug choice is mapped to an arm and
its reward is mapped to the drug's
effectiveness.
● Administering a drug is an action and is
equivalent to pulling the corresponding arm.
● The trial goes on for n rounds.
Other applications: Recommender Systems, Viral Marketing, Network Routing,
Ad Placement
Introduction

Assumptions:
1. Stochasticity: The reward for each arm is sampled from its underlying
distribution. The
2. Finiteness and Independence: The number of arms is finite and the reward
for each arm is independent of the others.
3. Stationarity: The reward distributions of the arms do not change over time.
Introduction

Is a special tractable case of RL

Performance Metric: Cumulative regret

Results in an exploration-exploitation trade-off:

Exploration: Pull an arm to learn more about it.
Exploitation: Pull the arm that we know has a higher reward.
Multi-armed bandits
OBSERVE: Can observe reward immediately on pulling the arm. Rewards are
scalars bounded on the [0,1] interval.

UPDATE: Use the mean of rewards obtained on pulling arm i as the empirical
estimated reward for that arm.

SELECT: Explore-Then-Commit, Epsilon-Greedy, Upper Confidence Bound,

Thompson sampling
Explore-Then-Commit
Explore-Then-Commit
When to commit:

(Gap-dependent Bound)

(Gap-free Bound)
Epsilon-Greedy

+ Interleaves exploration and exploitation.

+ Doesn’t require knowledge of the gap or the horizon.
+ Popularly used and works well in practice.

- Performance is sensitive to the choice of epsilon.

- Results in suboptimal n^{⅔} regret.
Optimism in the face of uncertainty
Optimism in the face of uncertainty

+ Doesn’t require knowledge of the gap or the horizon.

+ Results in near-optimal regret.
Thompson sampling
P_i is the posterior distribution (conditioned on the observed rewards) for arm i

Update

+ Simple to implement. Only requires a sampling procedure

+ Theoretically, it results in near-optimal regret.
+ Often works better than UCB in practice.

- In some variants, it tends to over-explore.

Structured Bandits
● Arms (choices) can be related by a structural assumption on the action space
or according to their corresponding features. Eg: Items in a Rec-sys.
● In problems with large number of arms, learning about each arm separately is
inefficient.
● Contextual Bandits: Each arm j has a feature vector xj and there exists

● Linear Bandits:
● Combinatorial Bandits: The space of arms are related according to a
combinatorial constraint.
Contextual Bandits
UPDATE:

Linear Bandits:
(Non)-Linear Bandits
Epsilon-Greedy
- O(n^{2/3}) regret
+ Easy to extend for
non-linear bandits

LinUCB

- Don’t know how to construct

confidence intervals for
complex functions
(Non)-Linear Bandits
Thompson sampling
+ O(d n^{½}) regret
+ Can use approximate sampling
procedures for complex functions

Bootstrapping
- Not well developed theory.
+ Need to compute only point
estimates.
Bandits everywhere!
● Adversarial Bandits (relaxing assumption 1)
● Gaussian process Bandits (relaxing assumption 2)
● Restless Bandits (relaxing assumption 3)
● Rotting Bandits
● Duelling Bandits
● Firing Bandits
● ………….

Difference objective functions:

Best-arm identification
Bayesian bandits

Master Thesis On Mixed Model Bandits
No ratings yet
Master Thesis On Mixed Model Bandits
73 pages
Project
No ratings yet
Project
47 pages
Book PDF
No ratings yet
Book PDF
582 pages
Vincent NCC Paper On Multi Objective BAI
No ratings yet
Vincent NCC Paper On Multi Objective BAI
33 pages
LinUCB Ote
No ratings yet
LinUCB Ote
68 pages
Fan Glynn
No ratings yet
Fan Glynn
32 pages
Contextual Bandits
No ratings yet
Contextual Bandits
34 pages
Finite-Time Regret of Thompson Sampling Algorithms For Exponential Family Multi-Armed Bandits
No ratings yet
Finite-Time Regret of Thompson Sampling Algorithms For Exponential Family Multi-Armed Bandits
49 pages
Hayashi 2025
No ratings yet
Hayashi 2025
14 pages
Thompson Sampling For Contextual Bandits With Linear Payoffs
No ratings yet
Thompson Sampling For Contextual Bandits With Linear Payoffs
22 pages
Multi-Armed Bandit Problem With Online Clustering As Side
No ratings yet
Multi-Armed Bandit Problem With Online Clustering As Side
13 pages
Agrawal&Goyal 2013
No ratings yet
Agrawal&Goyal 2013
9 pages
Mod6 Slides
No ratings yet
Mod6 Slides
105 pages
MAB Assignment 2
No ratings yet
MAB Assignment 2
2 pages
Cs6046-Notes 2
No ratings yet
Cs6046-Notes 2
34 pages
29117-Article Text-33171-1-2-20240324
No ratings yet
29117-Article Text-33171-1-2-20240324
8 pages
CS181 P - A - : Roject New Exploration of The Multi Armed Bandit Problem
No ratings yet
CS181 P - A - : Roject New Exploration of The Multi Armed Bandit Problem
9 pages
Aifinal
No ratings yet
Aifinal
15 pages
Bandits and Graphs
No ratings yet
Bandits and Graphs
13 pages
RL Sem Ans
No ratings yet
RL Sem Ans
90 pages
Non-Stochastic Best Arm Identification and Hyperparameter Optimization
No ratings yet
Non-Stochastic Best Arm Identification and Hyperparameter Optimization
13 pages
Bandit Problems
No ratings yet
Bandit Problems
8 pages
2022 Multiarmed Bandit Algorithms On Zynq System-On-Chip Go Frequentist or Bayesian
No ratings yet
2022 Multiarmed Bandit Algorithms On Zynq System-On-Chip Go Frequentist or Bayesian
14 pages
NIPS 2008 Algorithms For Infinitely Many Armed Bandits Paper
No ratings yet
NIPS 2008 Algorithms For Infinitely Many Armed Bandits Paper
8 pages
Bandit Algorithms
No ratings yet
Bandit Algorithms
596 pages
Stacked Thompson Bandits: Lenz Belzner Thomas Gabor
No ratings yet
Stacked Thompson Bandits: Lenz Belzner Thomas Gabor
4 pages
DLMAIRIL01 Q4-2024 Session3
No ratings yet
DLMAIRIL01 Q4-2024 Session3
47 pages
Bandit
No ratings yet
Bandit
8 pages
Open Problem: Regret Bounds For Thompson Sampling: 1. Background
No ratings yet
Open Problem: Regret Bounds For Thompson Sampling: 1. Background
3 pages
Online Learning For Causal Bandits
No ratings yet
Online Learning For Causal Bandits
7 pages
Reading 3-Russo & Van Roy 2014
No ratings yet
Reading 3-Russo & Van Roy 2014
24 pages
Assignment 1: CS747: F I L A
No ratings yet
Assignment 1: CS747: F I L A
10 pages
Multi-Armed Bandits Epsilon-Greedy Algorithm
No ratings yet
Multi-Armed Bandits Epsilon-Greedy Algorithm
14 pages
Exploration Exploitation
No ratings yet
Exploration Exploitation
40 pages
A12-Online Learning Short 2020
No ratings yet
A12-Online Learning Short 2020
61 pages
RL Unit 1 - QA
No ratings yet
RL Unit 1 - QA
10 pages
Dissecting Reinforcement Learning-Part6
No ratings yet
Dissecting Reinforcement Learning-Part6
25 pages
NeurIPS 2021 Breaking The Moments Condition Barrier No Regret Algorithm For Bandits With Super Heavy Tailed Payoffs Paper
No ratings yet
NeurIPS 2021 Breaking The Moments Condition Barrier No Regret Algorithm For Bandits With Super Heavy Tailed Payoffs Paper
11 pages
Multi-Arm-Bandit Problem
No ratings yet
Multi-Arm-Bandit Problem
11 pages
Multi Armed Bandits
No ratings yet
Multi Armed Bandits
34 pages
MCQ& FB - Unit 1
No ratings yet
MCQ& FB - Unit 1
9 pages
Expanded Multi Armed Bandit and Probability Basics
No ratings yet
Expanded Multi Armed Bandit and Probability Basics
5 pages
EAS 240 MAB Project Description Spring 2025
No ratings yet
EAS 240 MAB Project Description Spring 2025
10 pages
Finite-Time Analysis of The Multi-Armed Bandit Problem With Known Trend
No ratings yet
Finite-Time Analysis of The Multi-Armed Bandit Problem With Known Trend
7 pages
Lecture 9: Exploration and Exploitation: David Silver
No ratings yet
Lecture 9: Exploration and Exploitation: David Silver
47 pages
Lecture 2 EE675
No ratings yet
Lecture 2 EE675
4 pages
November 2015
100% (3)
November 2015
100 pages
Garbage In, Reward Out Bootstrapping Exploration in Multi-Armed Bandits
No ratings yet
Garbage In, Reward Out Bootstrapping Exploration in Multi-Armed Bandits
19 pages
IntroMulti Armed Bandits Slivkin Microsoft PDF
No ratings yet
IntroMulti Armed Bandits Slivkin Microsoft PDF
174 pages
Bandit Algorithms
No ratings yet
Bandit Algorithms
2 pages
RL Unit5
No ratings yet
RL Unit5
101 pages
Multi-Armed Bandit Algorithms and Empirical Evaluation
No ratings yet
Multi-Armed Bandit Algorithms and Empirical Evaluation
12 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Data Challenge - NC Soft
No ratings yet
Data Challenge - NC Soft
4 pages
Unit II
No ratings yet
Unit II
10 pages
06. ĐỀ SỐ 06 HSG ANH 9 (HUYỆN)
No ratings yet
06. ĐỀ SỐ 06 HSG ANH 9 (HUYỆN)
7 pages
Final hmt-1 PDF
No ratings yet
Final hmt-1 PDF
211 pages
Math C4 Practice
No ratings yet
Math C4 Practice
53 pages
66279238
No ratings yet
66279238
8 pages
So Harian N3 TGL 03 Juni 2024
No ratings yet
So Harian N3 TGL 03 Juni 2024
160 pages
7 ROOT LOCUS Part 1
No ratings yet
7 ROOT LOCUS Part 1
7 pages
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
No ratings yet
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
8 pages
Worksheet (AS)
No ratings yet
Worksheet (AS)
4 pages
Hays Report V4 02122013 Online
No ratings yet
Hays Report V4 02122013 Online
13 pages
Ae8502 Question Bank-2022
No ratings yet
Ae8502 Question Bank-2022
153 pages
GSTR1 Excel Workbook Template V1.4
No ratings yet
GSTR1 Excel Workbook Template V1.4
84 pages
Carbon and Alloy Steel Nuts For Bolts For High Pressure or High Temperature Service, or Both
No ratings yet
Carbon and Alloy Steel Nuts For Bolts For High Pressure or High Temperature Service, or Both
11 pages
ServerAdmin v10.6
No ratings yet
ServerAdmin v10.6
197 pages
4-Lens and Cataract
No ratings yet
4-Lens and Cataract
59 pages
OPEL Aplication - Form
No ratings yet
OPEL Aplication - Form
4 pages
Designation
No ratings yet
Designation
12 pages
Financial Math Assignment
No ratings yet
Financial Math Assignment
2 pages
Peps Grade 3 Sample
No ratings yet
Peps Grade 3 Sample
3 pages
Bacte - Medically Significant Fungi
No ratings yet
Bacte - Medically Significant Fungi
4 pages
GDC BCP Template
No ratings yet
GDC BCP Template
53 pages
Nandkishor Patil
No ratings yet
Nandkishor Patil
2 pages
Chemistry Homework 8-1
No ratings yet
Chemistry Homework 8-1
7 pages
Sabino 2017
No ratings yet
Sabino 2017
15 pages
MVP Comprehensive Resource Impacts Agreement
No ratings yet
MVP Comprehensive Resource Impacts Agreement
16 pages
190 MP IgM-IFU-en-EU-IVDD-V2.1
No ratings yet
190 MP IgM-IFU-en-EU-IVDD-V2.1
2 pages
Chapter-4 Bullet
No ratings yet
Chapter-4 Bullet
5 pages
Chapter1 Electrical Machine Direct Current
No ratings yet
Chapter1 Electrical Machine Direct Current
11 pages
Emergency Cart Checklist
No ratings yet
Emergency Cart Checklist
1 page
Fascinating Photos of Afghanistan in The 1960s Show Life Before The Taliban
No ratings yet
Fascinating Photos of Afghanistan in The 1960s Show Life Before The Taliban
1 page
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Exercises of Limits
From Everand
Exercises of Limits
Simone Malacrida
No ratings yet
Canny Edge Detector: Unveiling the Art of Visual Perception
From Everand
Canny Edge Detector: Unveiling the Art of Visual Perception
Fouad Sabry
No ratings yet
Exercises of Function Study
From Everand
Exercises of Function Study
Simone Malacrida
No ratings yet

Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)

Uploaded by

Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)

Uploaded by

Introduction to bandits

(some slides stolen from Csaba’s AAAI tutorial)

Is a special tractable case of RL

Performance Metric: Cumulative regret

Results in an exploration-exploitation trade-off:

SELECT: Explore-Then-Commit, Epsilon-Greedy, Upper Confidence Bound,

+ Interleaves exploration and exploitation.

- Performance is sensitive to the choice of epsilon.

+ Doesn’t require knowledge of the gap or the horizon.

+ Simple to implement. Only requires a sampling procedure

- In some variants, it tends to over-explore.

- Don’t know how to construct

Difference objective functions:

You might also like