27 Concentration Inequalities
27 Concentration Inequalities
A concentration inequality is a result that gives us a probability bound on certain random variables taking
atypically large or atypically small values. While concentration of probability measures is a vast topic, we
will only discuss some foundational concentration inequalities in this lecture.
If X is a non-negative random variable, with E[X] < ∞, then for any α > 0,
E[X]
P(X > α) ≤ .
α
Proof:
(a)
E[X] = E[XI{X≤α} ] + E[XI{X>α} ],
(b)
≥ E[XI{X>α} ],
≥ αP(X > α).
where (a) follows from linearity of expectations. Since X is a non-negative random variable
E[XI{X≤α} ] ≥ 0 and thus (b) follows.
Markov Inequality is probably the most fundamental concentration inequality, although it is usually quite
loose. After all, the bound decays rather slowly, as 1/α. Tighter bounds can be derived under stronger as-
sumptions on the random variable. For example, when the variance is finite, we have Chebyshev’s inequality.
1
P(| X − µ |> kσ) ≤ , k > 0.
k2
σ2
P(| X − µ |> c) ≤ , c > 0.
c2
27-1
27-2 Lecture 27: Concentration Inequalities
Proof: The proof follows by applying Markov’s inequality to the non-negative random variable | X − µ |2 .
E(| X − µ |2 )
P(| X − µ |2 > (kσ)2 ) ≤ ,
(kσ)2
σ2
= ,
(kσ)2
1
= ,
k2
1
⇒ P(| X − µ |> (kσ)) ≤ .
k2
Note that the Chebyshev’s bound decays as 1/k 2 , an improvement over the basic Markov inequality. As one
might imagine, exponentially decaying bounds can be derived by invoking the Markov inequality, as long as
the moment generating function exists in a neighbourhood of the origin. This result is known as the Chernoff
bound, which we present briefly.
Let MX (s) = E[esX ] and assume that MX (s) < ∞ for s ∈ [−, ] for some > 0. Then
∗
P(X > α) ≤ e−Λ (α)
,
In (27.1), note that the bound decays exponentially in α for every s > 0 belonging to DX . The tightest such
exponential bound is obtained by infimising the right hand side:
P(X > α) ≤ inf MX (s)e−sα ,
s>0
−sup(sα−log MX (s))
=e s>0
.
Thus
∗
P(X > α) ≤ e−Λ (α)
.
This gives us an exponentially decaying bound for the ‘positive tail’ P(X > α). Similarly we can prove a
Chernoff bound for the negative tail P(X < α) by taking s < 0.
Lecture 27: Concentration Inequalities 27-3
27.4 Exercise
1. Let X1 , X2 , ..., Xn be i.i.d. random variables with PDF fX . Then the set of random variables X1 ,
X2 , ..., Xn is called a random sample of size n of X. The sample mean is defined as
1
Xn = (X1 + X2 + ... + Xn ).
n
Let X1 , X2 ,.. Xn be a random sample of X with mean µ and variance σ 2 . How many samples of X
are required for the probability that the sample mean will not deviate from the true mean µ by more
than σ/10 to be at least .95?
1
2. A biased coin, which lands heads with probability 10 each time it is flipped, is flipped 200 times
consecutively. Give an upper bound on the probability that it lands heads at least 120 times.
3. A post-office handles 10,000 letters per day with a variance of 2,000 letters. What can be said about
the probability that this post office handles between 8,000 and 12,000 letters tomorrow? What about
the probability that more than 15,000 letters come in?