CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan
CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan
Shivaram Kalyanakrishnan
Autumn 2023
1/14
Analysis of UCB
Other bandit problems 2/14
Analysis of UCB
Other bandit problems 2/14
3/14
0
R
3/14
0
R
0
R
4/14
4/14
4/14
5/14
5/14
5/14
5/14
5/14
6/14
6/14
6/14
6/14
7/14
Alternative view: the probability with which we pick an arm is our belief that it
is optimal. For example, if A = {1, 2}, the probability of pulling 1 is
Z 1 Z x1
P{x1t > x2t } = Betas1t +1,f1t +1, (x1 )Betas2t +1,f2t +1, (x2 )dx2 dx1 .
x1 =0 x2 =0
7/14
2. Concentration bounds
8/14
9/14
9/14
9/14
2
P{x̄ ≥ µ + ϵ} ≤ e−2uϵ , and
2
P{x̄ ≤ µ − ϵ} ≤ e−2uϵ .
9/14
2
P{x̄ ≥ µ + ϵ} ≤ e−2uϵ , and
2
P{x̄ ≤ µ − ϵ} ≤ e−2uϵ .
Note the bounds are trivial for large ϵ, since x̄ ∈ [0, 1]. 9/14
10/14
10/14
10/14
11/14
11/14
11/14
11/14
12/14
Both bounds are instances of “Chernoff bounds”, of which there are many
more forms.
Similar bounds can also be given when X has infinite support (such as a
Gaussian), but might need additional assumptions.
13/14
Analysis of UCB
Other bandit problems 14/14