Note 2
Note 2
Nan Jiang
This note introduces the basics of concentration inequalities and examples of its applications (often
with union bound), which will be useful for the rest of this course.
1 Hoeffding’s Inequality
Theorem 1. Let X1 , . . . , Xn be independent random variables on R such that Xi is bounded in the interval
Pn
[ai , bi ] . Let Sn = i=1 Xi . Then for all t > 0,
2 Pn 2
Pr[Sn − E[Sn ] ≥ t] ≤ e−2t / i=1 (bi −ai ) , (1)
−2t2 /
Pn 2
Pr[Sn − E[Sn ] ≤ −t] ≤ e i=1 (bi −ai ) . (2)
Remarks:
2 Pn 2
• By union bound, we have Pr[|Sn − E[Sn ]| ≥ t] ≤ 2e−2t / i=1 (bi −ai ) .
• We often care
h about the convergence
i of the empirical mean to the true average, so we can devide
2 2 Pn 2
Sn by n: Pr Snn − E[Snn ] ≥ t ≤ 2e−2n t / i=1 (bi −ai ) .
• A useful rephrase of the result whenqall variables share the same support [a, b]: with probability
Sn E[Sn ] 1
at least 1 − δ, n − n ≤ (b − a) 2n ln 2δ .
• The number of variables, n, is a constant in the theorem statement. When n is a random variable
itself, for Hoeffding’s inequality to apply, n cannot depend on the realization of X1 , . . . , Xn .
Example: Consider the following Markov chain:
s1
p 1-p
s2 s4
s3
1
Say we start at s1 and sample a path of length T (T is a constant). Let n be the number of times
we visit s1 , and we can use the transitions from s1 to estimate p.
1. Can we directly apply Hoeffding’s inequality here with n as the number of coin tosses? If
you want to derive a concentration bound for this problem, look up Azuma’s inequality.
2. What if we sample a path until we visit s1 N times for some constant N ? Can we apply
Hoeffding’s inequality with N as the number of random variables?
A popular objective for MAB is the pseudo-regret, which poses the exploration-exploitation challenge:
T
X
RegretT = (µ⋆ − µit ).
t=1
µ⋆ − µî ,
where î is the arm that the learner picks after T rounds of interactions. This poses the “pure explo-
ration” challenge, since all it matters is to make a good final guess and the regret incurred within the
T rounds does not matter. A related objective is called Best-Arm Identification, which asks whether
î ∈ arg maxi∈[K] µi ; Best-Arm Identification results often require additional gap conditions.
2
Now we want accurate estimation for all arms simultaneously. That is, we want to bound the proba-
bility of the event that any µ̂i deviating from µi too much. This is where union bound is useful:
"K #
[
Pr {|µ̂i − µi | ≥ ϵ} (the event that estimation is ϵ-inaccurate for at least 1 arm)
i=1
K
X 2
≤ Pr [|µ̂i − µi | ≥ ϵ] ≤ 2Ke−2T ϵ /K
. (union bound, then Hoeffding’s inequality)
i=1
q
K
To rephrase this result: with probability at least 1 − δ, |µ̂i − µi | ≤ 2T ln 2K
δ holds for all i simultane-
ously.
Finally, we use the estimation error to bound the decision loss: recall that î = arg maxi∈[K] µ̂i , and
let i⋆ = arg maxi∈[K] µi .
The theorem itself is stated as a best-arm identification lower bound, but it is also a lower bound
for simple regret minimization. This is because all arms except the best one is ϵ worse than µ⋆ , so
missing the optimal arm means a simple regret of at least ϵ.
See the proof in [1] (Theorem 2); the technique is due to [2] and can be also used to prove the lower
bound on the regret of MAB.
3
where E[·] is w.r.t. PX,Y . Given only a finite sample, one natural thing to do is empirical risk minimiza-
tion, i.e., find the classifer that has the lowest training error rate on data:
n
1X
fˆ = arg min E[I[f
b (X) ̸= Y ]] := I[f (Xi ) ̸= Yi ].
f ∈F n i=1
The question is, can we give any guarantee to how good the learned classifier fˆ is compared to the
optimal one f ⋆ , as a function of n? In other words, we want to bound
We provide the analysis below, which mainly uses Hoeffding’s and union bound. First of all,
References
[1] Akshay Krishnamurthy, Alekh Agarwal, and John Langford. PAC reinforcement learning with
rich observations. In Advances in Neural Information Processing Systems, pages 1840–1848, 2016.
[2] Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit
problem. Machine learning, 47(2-3):235–256, 2002.