0% found this document useful (0 votes)
46 views

Lecture Notes 2 1 Probability Inequalities

The document provides proofs of several probability inequalities including: - The Gaussian tail inequality bounds the probability that a normally distributed random variable exceeds a threshold. - Markov's inequality bounds the probability that a non-negative random variable exceeds a value in terms of its expected value. - Chebyshev's inequality bounds the probability that the difference between a random variable and its mean exceeds a threshold in terms of its variance. - Hoeffding's inequality bounds this probability for bounded random variables and gives an application to binomial distributions. - McDiarmid's inequality, also called the bounded difference inequality, extends Hoeffding's inequality to more general functions of independent random variables. - Jensen's inequality relates the
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Lecture Notes 2 1 Probability Inequalities

The document provides proofs of several probability inequalities including: - The Gaussian tail inequality bounds the probability that a normally distributed random variable exceeds a threshold. - Markov's inequality bounds the probability that a non-negative random variable exceeds a value in terms of its expected value. - Chebyshev's inequality bounds the probability that the difference between a random variable and its mean exceeds a threshold in terms of its variance. - Hoeffding's inequality bounds this probability for bounded random variables and gives an application to binomial distributions. - McDiarmid's inequality, also called the bounded difference inequality, extends Hoeffding's inequality to more general functions of independent random variables. - Jensen's inequality relates the
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lecture Notes 2

Probability Inequalities

Inequalities are useful for bounding quantities that might otherwise be hard to compute.
They will also be used in the theory of convergence.

Theorem 1 (The Gaussian Tail Inequality) Let X N (0, 1). Then


2 /2

2e
P(|X| > )


If X1 , . . . , Xn N (0, 1) then
1
2
P(|X n | > ) en /2 .
n

Proof. The density of X is (x) = (2)1/2 ex /2 . Hence,


Z
Z
1
P(X > ) =
(s)ds
s (s)ds
 

Z
2
()
1 0
e /2
(s)ds =
=

.
 


By symmetry,
2

2e /2
.

P
d
Now let X1 , . . . , Xn N (0, 1). Then X n = n1 ni=1 Xi N (0, 1/n). Thus, X n = n1/2 Z
where Z N (0, 1) and
P(|X| > )

P(|X n | > ) = P(n1/2 |Z| > ) = P(|Z| >




1
2
n ) en /2 .
n

Theorem 2 (Markovs inequality) Let X be a non-negative random variable and


suppose that E(X) exists. For any t > 0,
E(X)
.
t

P(X > t)

(1)

Proof. Since X > 0,


Z

xp(x)dx
t

x p(x)dx t

x p(x)dx +

x p(x)dx =

E(X) =

p(x)dx = t P(X > t).


Theorem 3 (Chebyshevs inequality) Let = E(X) and 2 = Var(X). Then,
P(|X | t)

2
t2

and

P(|Z| k)

1
k2

(2)

where Z = (X )/. In particular, P(|Z| > 2) 1/4 and P(|Z| > 3) 1/9.
Proof. We use Markovs inequality to conclude that
P(|X | t) = P(|X |2 t2 )

2
E(X )2
=
.
t2
t2

The second part follows by setting t = k. 


P
If X1 , . . . , Xn Bernoulli(p) then and X n = n1 ni=1 Xi Then, Var(X n ) = Var(X1 )/n =
p(1 p)/n and
Var(X n )
p(1 p)
1
P(|X n p| > )
=

2
2

n
4n2
since p(1 p) 14 for all p.

Hoeffdings Inequality

Hoeffdings inequality is similar in spirit to Markovs inequality but it is a sharper inequality.


We begin with the following important result.
Lemma 4 Supppose that E(X) = 0 and that a X b. Then
2 (ba)2 /8

E(etX ) et
2

Recall that a function g is convex if for each x, y and each [0, 1],
g(x + (1 )y) g(x) + (1 )g(y).
Proof. Since a X b, we can write X as a convex combination of a and b, namely,
X = b + (1 )a where = (X a)/(b a). By the convexity of the function y ety we
have
X a tb b X ta
e +
e .
etX etb + (1 )eta =
ba
ba
Take expectations of both sides and use the fact that E(X) = 0 to get
EetX

a tb
b ta
e +
e = eg(u)
ba
ba

(3)

where u = t(b a), g(u) = u + log(1 + eu ) and = a/(b a). Note that
00
g(0) = g 0 (0) = 0. Also, g (u) 1/4 for all u > 0. By Taylors theorem, there is a (0, u)
such that
0

g(u) = g(0) + ug (0) +


2 (ba)2 /8

Hence, EetX eg(u) et

u2 00
u2
t2 (b a)2
u2 00
g () = g ()
=
.
2
2
8
8

.

Next, we need to use Chernoff s method.

Lemma 5 Let X be a random variable. Then


P(X > ) inf et E(etX ).
t0

Proof. For any t > 0,


P(X > ) = P(eX > e ) = P(etX > et ) et E(etX ).
Since this is true for every t 0, the result follows. 
Theorem 6 (Hoeffdings Inequality) Let Y1 , . . . , Yn be iid observations such that
E(Yi ) = and a Yi b. Then, for any  > 0,

2
2
P |Y n |  2e2n /(ba) .
(4)

Corollary 7 If X1 , X2 , . . . , Xn are independent with P(a Xi b) = 1 and common


mean , then, with probability at least 1 ,
s
 
c
2
|X n |
log
(5)
2n

where c = (b a)2 .
Proof. Without los of generality, we asume that = 0. First we have
P(|Y n | ) = P(Y n ) + P(Y n )
= P(Y n ) + P(Y n ).
Next we use Chernoffs method. For any t > 0, we have, from Markovs inequality, that
!
n
 Pn

X
P(Y n ) = P
Yi n = P e i=1 Yi en


i=1
P
t n
i=1 Yi

= P e

= etn


 Pn 
etn etn E et i=1 Yi

E(etYi ) = etn (E(etYi ))n .

i
2 (ba)2 /8

From Lemma 4, E(etYi ) et

. So
2 n(ba)2 /8

P(Y n ) etn et

This is minimized by setting t = 4/(b a)2 giving


P(Y n ) e2n

2 /(ba)2

Applying the same argument to P(Y n ) yields the result. 


Example 8 Let X1 , . . . , Xn Bernoulli(p). From, Hoeffdings inequality,
2

P(|X n p| > ) 2e2n .

The Bounded Difference Inequality

So far we have focused on sums of random variables. The following result extends Hoeffdings
inequality to more general functions g(x1 , . . . , xn ). Here we consider McDiarmids inequality,
also known as the Bounded Difference inequality.
4

Theorem 9 (McDiarmid) Let X1 , . . . , Xn be independent random variables. Suppose that








0
sup g(x1 , . . . , xi1 , xi , xi+1 , . . . , xn ) g(x1 , . . . , xi1 , xi , xi+1 , . . . , xn ) ci (6)

x1 ,...,xn ,x0i
for i = 1, . . . , n. Then
!
P g(X1 , . . . , Xn ) E(g(X1 , . . . , Xn )) 


22
exp Pn

2
i=1 ci


.

(7)

Proof.
Let Vi = E(g|X1 , . . . , Xi )E(g|X1 , . . . , Xi1 ). Then g(X1 , . . . , Xn )E(g(X1 , . . . , Xn )) =
Pn
i=1 Vi and E(Vi |X1 , . . . , Xi1 ) = 0. Using a similar argument as in Hoeffdings Lemma we
have,
2 2
E(etVi |X1 , . . . , Xi1 ) et ci /8 .
(8)
Now, for any t > 0,
P (g(X1 , . . . , Xn ) E(g(X1 , . . . , Xn )) ) = P

n
X

!
Vi 

i=1
 Pn 
=P e
e et E et i=1 Vi

!!

Pn1

= et E et i=1 Vi E etVn X1 , . . . , Xn1



Pn1
2 2
et et cn /8 E et i=1 Vi


Pn

i=1 Vi

t

..
.
Pn
2
2
et et i=1 ci .
P
The result follows by taking t = 4/ ni=1 c2i . 
P
Example 10 If we take g(x1 , . . . , xn ) = n1 ni=1 xi then we get back Hoeffdings inequality.
Example 11 Suppose we throw m balls into n bins. What fraction of bins are empty? Let
Z be P
the number of empty bins and let F = Z/n be the fraction of empty bins. We can write
Z = ni=1 Zi where Zi = 1 of bin i is empty and Zi = 0 otherwise. Then
= E(Z) =

n
X

E(Zi ) = n(1 1/n)m = nem log(11/n) nem/n

i=1

and = E(F ) = /n em/n . How close is Z to ? Note that the Zi s are not independent
so we cannot just apply Hoeffding. Instead, we proceed as follows.
5

Define variables X1 , . . . , Xm where Xs = i if ball s falls into bin i. Then Z = g(X1 , . . . , Xm ).


If we move one ball into a different bin, then Z can change by at most 1. Hence, (6) holds
with ci = 1 and so
2
P(|Z | > t) 2e2t /m .
Recall that he fraction of empty bins is F = Z/m with mean = /n. We have
2 t2 /m

P(|F | > t) = P(|Z | > nt) 2e2n

Bounds on Expected Values

Theorem 12 (Cauchy-Schwartz inequality) If X and Y have finite variances


then
p
(9)
E |XY | E(X 2 )E(Y 2 ).

The Cauchy-Schwarz inequality can be written as


2 2
Cov2 (X, Y ) X
Y .

Recall that a function g is convex if for each x, y and each [0, 1],
g(x + (1 )y) g(x) + (1 )g(y).
If g is twice differentiable and g 00 (x) 0 for all x, then g is convex. It can be shown that if
g is convex, then g lies above any line that touches g at some point, called a tangent line.
A function g is concave if g is convex. Examples of convex functions are g(x) = x2 and
g(x) = ex . Examples of concave functions are g(x) = x2 and g(x) = log x.
Theorem 13 (Jensens inequality) If g is convex, then
Eg(X) g(EX).

(10)

Eg(X) g(EX).

(11)

If g is concave, then

Proof. Let L(x) = a + bx be a line, tangent to g(x) at the point E(X). Since g is convex,
it lies above the line L(x). So,
Eg(X) EL(X) = E(a + bX) = a + bE(X) = L(E(X)) = g(EX).


Example 14 From Jensens inequality we see that E(X 2 ) (EX)2 .

Example 15 (Kullback Leibler Distance) Define the Kullback-Leibler distance between


two densities p and q by


Z
p(x)
dx.
D(p, q) = p(x) log
q(x)
Note that D(p, p) = 0. We will use Jensen to show that D(p, q) 0. Let X f . Then




Z
Z
q(X)
q(X)
q(x)
D(p, q) = E log
log E
= log p(x)
dx = log q(x)dx = log(1) = 0.
p(X)
p(X)
p(x)
So, D(p, q) 0 and hence D(p, q) 0.
Example 16 It follows from Jensens inequality that 3 types of means can be ordered. Assume that a1 , . . . , an are positive numbers and define the arithmetic, geometric and harmonic
means as
1
(a1 + . . . + an )
n
= (a1 . . . an )1/n
1
= 1 1
.
( + . . . + a1n )
n a1

aA =
aG
aH
Then aH aG aA .

Suppose we have an exponential bound on P(Xn > ). In that case we can bound E(Xn ) as
follows.
Theorem 17 Suppose that Xn 0 and that for every  > 0,
2

P(Xn > ) c1 ec2 n

(12)

for some c2 > 0 and c1 > 1/e. Then,


r
E(Xn )

C
.
n

(13)

where C = (1 + log(c1 ))/c2 .


R
Proof. Recall that for any nonnegative random variable Y , E(Y ) = 0 P(Y t)dt. Hence,
for any a > 0,
Z
Z a
Z
Z
2
2
2
2
E(Xn ) =
P(Xn t)dt =
P(Xn t)dt +
P(Xn t)dt a +
P(Xn2 t)dt.
0


Equation (12) implies that P(Xn > t) c1 ec2 nt . Hence,
Z
Z
Z

2
2
E(Xn ) a +
P(Xn t)dt = a +
P(Xn t)dt a + c1
a

ec2 nt dt = a +

c1 ec2 na
.
c2 n

Set a = log(c1 )/(nc2 ) and conclude that


E(Xn2 )

log(c1 )
1
1 + log(c1 )
+
=
.
nc2
nc2
nc2

Finally, we have
s
p

E(Xn )

E(Xn2 )

1 + log(c1 )
.
nc2


Now we consider bounding the maximum of a set of random variables.
Theorem 18 Let X1 , . . . , Xn be random variables. Suppose there exists > 0 such
2
that E(etXi ) et /2 for all t > 0. Then


p
E max Xi 2 log n.
(14)
1in

Proof. By Jensens inequality,


 




exp tE max Xi
E exp t max Xi
1in

1in


max exp {tXi }

= E

1in

n
X

2 2 /2

E (exp {tXi }) net

i=1

Thus,

E
The result follows by setting t =


max Xi

1in

log n t 2
+
.
t
2

2 log n/. 

OP and oP

In statisics, probability and machine learning, we make use of oP and OP notation.


Recall first, that an = o(1) means that an 0 as n . an = o(bn ) means that
an /bn = o(1).
an = O(1) means that an is eventually bounded, that is, for all large n, |an | C for some
C > 0. an = O(bn ) means that an /bn = O(1).
8

We write an bn if both an /bn and bn /an are eventually bounded. In computer sicence this
s written as an = (bn ) but we prefer using an bn since, in statistics, often denotes a
parameter space.
Now we move on to the probabilistic versions. Say that Yn = oP (1) if, for every  > 0,
P(|Yn | > ) 0.
Say that Yn = oP (an ) if, Yn /an = oP (1).
Say that Yn = OP (1) if, for every  > 0, there is a C > 0 such that
P(|Yn | > C) .
Say that Yn = OP (an ) if Yn /an = OP (1).

Lets use Hoeffdings inequality to show that sample proportions are OP (1/ n) within the
the true mean. Let Y1 , . . . , Yn be coin flips i.e. Yi {0, 1}. Let p = P(Yi = 1). Let
n
1X
Yi .
pbn =
n i=1

We will show that: pbn p = oP (1) and pbn p = OP (1/ n).


We have that
2
P(|b
pn p| > ) 2e2n 0
and so pbn p = oP (1). Also,



C
pn p| > C) = P |b
pn p| >
P( n|b
n
2

2e2C <
if we pick C large enough. Hence,

n(b
pn p) = OP (1) and so


1
pbn p = OP
.
n

Now consider m coins with probabilities p1 , . . . , pm . Then


m
X
P(max |b
pj pj | > )
P(|b
pj pj | > )
j

j=1
m
X

2e2n

union bound

Hoeffding

j=1



2
= 2me2n = 2 exp (2n2 log m) .

Supose that m en where 0 < 1. Then




P(max |b
pj pj | > ) 2 exp (2n2 n ) 0.
j

Hence,
max |b
pj pj | = oP (1).
j

You might also like