0% found this document useful (0 votes)

26 views42 pages

CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan

The document discusses Thompson sampling for multi-armed bandit problems. It explains that Thompson sampling maintains a beta distribution over the reward probability of each arm based on past rewards. At each round, it samples from these distributions and pulls the arm with the highest sample. This incorporates exploration and exploitation by favoring both high-reward and high-uncertainty arms.

Uploaded by

Srishti Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views42 pages

CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan

Uploaded by

Srishti Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

CS 747, Autumn 2023: Lecture 4

Shivaram Kalyanakrishnan

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

Autumn 2023

1/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 1 / 14

Multi-armed Bandits
The exploration-exploitation dilemma
Definitions: Bandit, Algorithm
ϵ-greedy algorithms
Evaluating algorithms: Regret
Achieving sub-linear regret
A lower bound on regret
UCB, KL-UCB algorithms
Thompson Sampling algorithm

Understanding Thompson Sampling

Concentration bounds

Analysis of UCB
Other bandit problems 2/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 14

Understanding Thompson Sampling

Concentration bounds

Analysis of UCB
Other bandit problems 2/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 14

Thompson Sampling (Thompson, 1933)
- At time t, arm a has sat successes (1’s) and fat failures (0’s).

3/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

Thompson Sampling (Thompson, 1933)
- At time t, arm a has sat successes (1’s) and fat failures (0’s).
- Beta(sat + 1, fat + 1) represents a “belief” about pa .
1

0
R

3/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

Thompson Sampling (Thompson, 1933)
- At time t, arm a has sat successes (1’s) and fat failures (0’s).
- Beta(sat + 1, fat + 1) represents a “belief” about pa .
1

0
R

- Computational step: For every arm a, draw a sample

xat ∼ Beta(sat + 1, fat + 1).
- Sampling step: Pull an arm a for which xat is maximum. 3/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

Thompson Sampling (Thompson, 1933)
- At time t, arm a has sat successes (1’s) and fat failures (0’s).
- Beta(sat + 1, fat + 1) represents a “belief” about pa .
1

0
R

- Computational step: For every arm a, draw a sample

xat ∼ Beta(sat + 1, fat + 1).
- Sampling step: Pull an arm a for which xat is maximum. 3/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

Bayesian Inference
Bayes’ Rule of Probability for events A and B:
P{B|A}P{A}
P{A|B} = .
P{B}

4/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

4/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

Bayesian Inference
Bayes’ Rule of Probability for events A and B:
P{B|A}P{A}
P{A|B} = .
P{B}
Application: there is an unknown world w from among possible worlds W , in
which we live.
We maintain a belief distribution over w ∈ W .
Belief0 (w) = P{w}.
The process by/probability with which each w produces evidence e is known.
Evidence samples e1 , e2 , . . . , em are produced i.i.d. by the unknown world w.

4/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

Bayesian Inference

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

5/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Bayesian Inference

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

P{e1 , e2 , . . . , em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }

5/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Bayesian Inference

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

P{e1 , e2 , . . . , em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }
P{e1 , e2 , . . . , em |w}P{em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }

5/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Bayesian Inference

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

P{e1 , e2 , . . . , em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }
P{e1 , e2 , . . . , em |w}P{em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }
P{e1 , e2 , . . . , em , w}P{em+1 |w}
=
P{e1 , e2 , . . . , em+1 }

5/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Bayesian Inference

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

P{e1 , e2 , . . . , em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }
P{e1 , e2 , . . . , em |w}P{em+1 |w}P{w}
=
P{e1 , e2 , . . . , em+1 }
P{e1 , e2 , . . . , em , w}P{em+1 |w}
=
P{e1 , e2 , . . . , em+1 }
P{w|e1 , e2 , . . . , em }P{e1 , e2 , . . . , em }P{em+1 |w}
=
P{e1 , e2 , . . . , em+1 }

5/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Bayesian Inference

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Bayesian Inference in Thompson Sampling
View each arm a’s mean pa as world w, estimated from rewards (evidence).

6/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

Bayesian Inference in Thompson Sampling
View each arm a’s mean pa as world w, estimated from rewards (evidence).
Belief0 over pa is typically set to Uniform(0, 1), but need not.

6/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

Bayesian Inference in Thompson Sampling
View each arm a’s mean pa as world w, estimated from rewards (evidence).
Belief0 over pa is typically set to Uniform(0, 1), but need not.
If em+1 is a 1-reward, we must set for x ∈ [0, 1]
Beliefm (x) · x
Beliefm+1 (x) = R 1 .
y =0
Belief m (y ) · y

6/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

If em+1 is a 0-reward, we must set for x ∈ [0, 1]

Beliefm (x) · (1 − x)
Beliefm+1 (x) = R 1 .
y =0
Beliefm (y ) · (1 − y )

6/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

If em+1 is a 0-reward, we must set for x ∈ [0, 1]

Beliefm (x) · (1 − x)
Beliefm+1 (x) = R 1 .
y =0
Beliefm (y ) · (1 − y )

We achieve exactly that by taking

Beliefm (x) = Betas+1,f +1 (x)dx
when the first m pulls yield s 1’s and f 0’s! 6/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

Principle of Selecting Arm to Pull
We have a belief distribution for each arm’s mean.
Together, these distributions represent a belief distribution over bandit
instances.
We sample a bandit instance I from the joint belief distribution, and
We act optimally w.r.t. I.

7/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 7 / 14

Alternative view: the probability with which we pick an arm is our belief that it
is optimal. For example, if A = {1, 2}, the probability of pulling 1 is
Z 1 Z x1
P{x1t > x2t } = Betas1t +1,f1t +1, (x1 )Betas2t +1,f2t +1, (x2 )dx2 dx1 .
x1 =0 x2 =0

7/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 7 / 14

Multi-armed Bandits

1. Understanding Thompson Sampling

2. Concentration bounds

8/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 8 / 14

Hoeffding’s Inequality (Hoeffding, 1963)
Let X be a random variable bounded in [0, 1], with E[X ] = µ;

9/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Hoeffding’s Inequality (Hoeffding, 1963)
Let X be a random variable bounded in [0, 1], with E[X ] = µ;
Let u ≥ 1;
Let x1 , x2 , . . . , xu be i.i.d. samples of X ; and

9/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Hoeffding’s Inequality (Hoeffding, 1963)
Let X be a random variable bounded in [0, 1], with E[X ] = µ;
Let u ≥ 1;
Let x1 , x2 , . . . , xu be i.i.d. samples of X ; and
Let x̄ be the mean of these samples (an empirical mean):
u
1X
x̄ = xi .
u
i=1

9/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Then, for or any fixed ϵ > 0, we have

2
P{x̄ ≥ µ + ϵ} ≤ e−2uϵ , and
2
P{x̄ ≤ µ − ϵ} ≤ e−2uϵ .

9/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Then, for or any fixed ϵ > 0, we have

2
P{x̄ ≥ µ + ϵ} ≤ e−2uϵ , and
2
P{x̄ ≤ µ − ϵ} ≤ e−2uϵ .

Note the bounds are trivial for large ϵ, since x̄ ∈ [0, 1]. 9/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

10/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

Applications
For given mistake probability δ and tolerance ϵ, how many samples u0 of X
do we need to guarantee that with probability at least 1 − δ, the empirical
mean x̄ will not exceed the true mean µ by ϵ or more?
u0 = ⌈ 2ϵ12 ln( 1δ )⌉ pulls are sufficient, since Hoeffding’s Inequality gives
2
P{x̄ ≥ µ + ϵ} ≤ e−2u0 ϵ ≤ δ.

10/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

We have u samples of X . How do we fill up this blank?:

With probability at least 1 − δ, the empirical mean x̄ exceeds the true mean µ
by at most ϵ0 = .

10/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

We have u samples of X . How do we fill up this blank?:

With probability at least 1 − δ, the empirical mean x̄ exceeds the true mean µ
by at most ϵ0 = q .
1
We can write ϵ0 = 2u
ln( 1δ ); by Hoeffding’s Inequality:
2
P{x̄ ≥ µ + ϵ0 } ≤ e−2u(ϵ0 ) ≤ δ.
10/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

Arbitrary Bounded Range
Suppose X is a random variable bounded in [a, b]. Can we still apply
Hoeffding’s Inequality?

11/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Arbitrary Bounded Range
Suppose X is a random variable bounded in [a, b]. Can we still apply
Hoeffding’s Inequality?
Yes. Assume u; x1 , x2 , . . . , xu ; ϵ as defined earlier.

11/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Arbitrary Bounded Range
Suppose X is a random variable bounded in [a, b]. Can we still apply
Hoeffding’s Inequality?
Yes. Assume u; x1 , x2 , . . . , xu ; ϵ as defined earlier.
−a
Consider Y = Xb−a ; for 1 ≤ i ≤ u, yi = xb−a i −a
; ȳ = u1 ui=1 yi .
P

11/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Since Y is bounded in [0, 1], we get

2
µ−a ϵ − 2uϵ
P{x̄ ≥ µ + ϵ} = P ȳ ≥ + ≤ e (b−a)2 , and
b−a b−a
2
µ−a ϵ − 2uϵ 2
P{x̄ ≤ µ − ϵ} = P ȳ ≤ − ≤e (b−a) .
b−a b−a

11/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

A “KL” Inequality
Let X be a random variable bounded in [0, 1], with E[X ] = µ;
Let u ≥ 1;
Let x1 , x2 , . . . , xu be i.i.d. samples of X ; and
Let x̄ be the mean of these samples (an empirical mean):
u
1X
x̄ = xi .
u
i=1

12/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 12 / 14

Then, for or any fixed ϵ ∈ [0, 1 − µ], we have

P{x̄ ≥ µ + ϵ} ≤ e−uKL(µ+ϵ,µ) ,
and for or any fixed ϵ ∈ [0, µ], we have
P{x̄ ≤ µ − ϵ} ≤ e−uKL(µ−ϵ,µ) ,
where for p, q ∈ [0, 1], KL(p, q) = p ln( qp ) + (1 − p) ln( 1−q
1−p
def
). 12/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 12 / 14

Some Observations
The KL inequality gives a tighter upper bound:
For p, q ∈ [0, 1],
2
KL(p, q) ≥ 2(p − q)2 =⇒ e−uKL(p,q) ≤ e−2u(p−q) .

Both bounds are instances of “Chernoff bounds”, of which there are many
more forms.

Similar bounds can also be given when X has infinite support (such as a
Gaussian), but might need additional assumptions.

13/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 13 / 14

Understanding Thompson Sampling

Concentration bounds

Analysis of UCB
Other bandit problems 14/14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 14 / 14

Jeff Byers - Machine Learning and Advanced Statitics
No ratings yet
Jeff Byers - Machine Learning and Advanced Statitics
48 pages
MCMC Brief
100% (1)
MCMC Brief
69 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
Software Engineer
No ratings yet
Software Engineer
207 pages
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
No ratings yet
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
58 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
Approximate Inference
No ratings yet
Approximate Inference
37 pages
Machine Learning - The Science of Selection Under Uncertainty
No ratings yet
Machine Learning - The Science of Selection Under Uncertainty
85 pages
CS 747, Autumn 2023: Lecture 6: Shivaram Kalyanakrishnan
No ratings yet
CS 747, Autumn 2023: Lecture 6: Shivaram Kalyanakrishnan
68 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
CS 747, Autumn 2023: Lecture 5: Shivaram Kalyanakrishnan
No ratings yet
CS 747, Autumn 2023: Lecture 5: Shivaram Kalyanakrishnan
63 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
58 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
No ratings yet
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
27 pages
Contact Session6
No ratings yet
Contact Session6
57 pages
AM207 14 Introduction UQ
No ratings yet
AM207 14 Introduction UQ
63 pages
CS 747, Autumn 2023 - Lecture 3
No ratings yet
CS 747, Autumn 2023 - Lecture 3
27 pages
8051 Interrupts
No ratings yet
8051 Interrupts
34 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
Bayesian Uncertainty Quantification
No ratings yet
Bayesian Uncertainty Quantification
23 pages
Machine Learning Models and Theories
No ratings yet
Machine Learning Models and Theories
38 pages
Report Endterm
No ratings yet
Report Endterm
30 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
Unit 3 - 3.2 Inference in Bayesian Networks
No ratings yet
Unit 3 - 3.2 Inference in Bayesian Networks
37 pages
08 Learning Representations
No ratings yet
08 Learning Representations
38 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Ai Unit Ivm
No ratings yet
Ai Unit Ivm
24 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
Bayes Expected Utility
No ratings yet
Bayes Expected Utility
50 pages
04 Sampling
No ratings yet
04 Sampling
24 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
23 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Bayes Rule
No ratings yet
Bayes Rule
29 pages
Unit IV CI PDF
No ratings yet
Unit IV CI PDF
24 pages
Tungban Probabilistic ML 2021 - 04 - Sampling
No ratings yet
Tungban Probabilistic ML 2021 - 04 - Sampling
24 pages
Unit 4 Uncertainty
No ratings yet
Unit 4 Uncertainty
14 pages
Unit 4 - Acting Logically
No ratings yet
Unit 4 - Acting Logically
33 pages
4.1 Bayes Decision Theory
No ratings yet
4.1 Bayes Decision Theory
23 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
EEL 6935 Data Analytics: Probability Theory
No ratings yet
EEL 6935 Data Analytics: Probability Theory
11 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Report Mid
No ratings yet
Report Mid
19 pages
Lecture 3 - Bayesian Inference
No ratings yet
Lecture 3 - Bayesian Inference
53 pages
Assignment 1: CS747: F I L A
No ratings yet
Assignment 1: CS747: F I L A
10 pages
Unit-V POAI
No ratings yet
Unit-V POAI
50 pages
Unit - 3 AI
No ratings yet
Unit - 3 AI
9 pages
People Perform Their Best While They Compete If There Were No Competirion
No ratings yet
People Perform Their Best While They Compete If There Were No Competirion
7 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
Exp1 A09 DS
No ratings yet
Exp1 A09 DS
6 pages
EE675A Lecture 4
No ratings yet
EE675A Lecture 4
7 pages
EE224 Handout Fast Adders: 1 The Problem
No ratings yet
EE224 Handout Fast Adders: 1 The Problem
6 pages
Samp Sol
No ratings yet
Samp Sol
14 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Expanded Multi Armed Bandit and Probability Basics
No ratings yet
Expanded Multi Armed Bandit and Probability Basics
5 pages
AIATS Schedule For XI Studying JEE-Main-2020
No ratings yet
AIATS Schedule For XI Studying JEE-Main-2020
1 page
Note 2
No ratings yet
Note 2
4 pages
Bayes
No ratings yet
Bayes
10 pages
Aiml Partb Unit II QP
No ratings yet
Aiml Partb Unit II QP
5 pages
CS 188 Introduction To AI Midterm Study Guide
No ratings yet
CS 188 Introduction To AI Midterm Study Guide
2 pages
Lecture Notes For ECE 695-09/08/03
No ratings yet
Lecture Notes For ECE 695-09/08/03
3 pages
Cheat Sheet 4
No ratings yet
Cheat Sheet 4
2 pages

CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan

Uploaded by

CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan

Uploaded by

CS 747, Autumn 2023: Lecture 4

Department of Computer Science and Engineering

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 1 / 14

Understanding Thompson Sampling

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 14

Understanding Thompson Sampling

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

- Computational step: For every arm a, draw a sample

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

- Computational step: For every arm a, draw a sample

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 14

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Beliefm+1 (w) = P{w|e1 , e2 , . . . , em+1 }

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

If em+1 is a 0-reward, we must set for x ∈ [0, 1]

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

If em+1 is a 0-reward, we must set for x ∈ [0, 1]

We achieve exactly that by taking

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 7 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 7 / 14

1. Understanding Thompson Sampling

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 8 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Then, for or any fixed ϵ > 0, we have

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Then, for or any fixed ϵ > 0, we have

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

We have u samples of X . How do we fill up this blank?:

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

We have u samples of X . How do we fill up this blank?:

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Since Y is bounded in [0, 1], we get

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 11 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 12 / 14

Then, for or any fixed ϵ ∈ [0, 1 − µ], we have

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 12 / 14

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 13 / 14

Understanding Thompson Sampling

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 14 / 14

You might also like