0% found this document useful (0 votes)

47 views

Lecture Notes 2 1 Probability Inequalities

1. The document presents several probability inequalities including the Gaussian Tail Inequality, Markov's Inequality, Chebyshev's Inequality, and Hoeffding's Inequality. 2. Hoeffding's Inequality bounds the probability that the average of independent random variables deviates from their expected value, and is sharper than Markov's Inequality. 3. McDiarmid's Inequality, also known as the Bounded Difference Inequality, extends Hoeffding's Inequality to functions of multiple independent random variables rather than just their average.

Uploaded by

hadithya369

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Lecture Notes 2 1 Probability Inequalities

Uploaded by

hadithya369

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lecture Notes 2

1 Probability Inequalities
Inequalities are useful for bounding quantities that might otherwise be hard to compute.
They will also be used in the theory of convergence.

Theorem 1 (The Gaussian Tail Inequality) Let X ∼ N (0, 1). Then

2 /2
2e−
P(|X| > ) ≤ .

If X1 , . . . , Xn ∼ N (0, 1) then

2 2 large n 2 /2
P(|X n | > ) ≤ √ e−n /2 ≤ e−n .
n
2
Proof. The density of X is φ(x) = (2π)−1/2 e−x /2 . Hence,
Z ∞ Z ∞
1 ∞
Z
s
P(X > ) = φ(s)ds = φ(s)ds ≤ s φ(s)ds
s
2
1 ∞ 0 e− /2
Z
φ()
= − φ (s)ds = ≤ .

By symmetry,
2
2e− /2
P(|X| > ) ≤ .

d
Now let X1 , . . . , Xn ∼ N (0, 1). Then X n = n−1 ni=1 Xi ∼ N (0, 1/n). Thus, X n = n−1/2 Z
P
where Z ∼ N (0, 1) and
√ 2 2
P(|X n | > ) = P(n−1/2 |Z| > ) = P(|Z| > n ) ≤ √ e−n /2 .
n

1
Theorem 2 (Markov’s inequality) Let X be a non-negative random variable and
suppose that E(X) exists. For any t > 0,

E(X)
P(X > t) ≤ . (1)
t

Proof. Since X > 0,

Z ∞ Z t Z ∞
E(X) = x p(x)dx = x p(x)dx + xp(x)dx
0 0 t
Z ∞ Z ∞
≥ x p(x)dx ≥ t p(x)dx = t P(X > t).
t t

Theorem 3 (Chebyshev’s inequality) Let µ = E(X) and σ 2 = Var(X). Then,

σ2 1
P(|X − µ| ≥ t) ≤ and P(|Z| ≥ k) ≤ (2)
t2 k2
where Z = (X − µ)/σ. In particular, P(|Z| > 2) ≤ 1/4 and P(|Z| > 3) ≤ 1/9.

Proof. We use Markov’s inequality to conclude that

E(X − µ)2 σ2
P(|X − µ| ≥ t) = P(|X − µ|2 ≥ t2 ) ≤ = .
t2 t2
The second part follows by setting t = kσ.

If X1 , . . . , Xn ∼ Bernoulli(p) then and X n = n−1 ni=1 Xi Then, Var(X n ) = Var(X1 )/n =

P
p(1 − p)/n and
Var(X n ) p(1 − p) 1
P(|X n − p| > ) ≤ 2
= 2
≤
n 4n2
since p(1 − p) ≤ 14 for all p.

2 Hoeffding’s Inequality
Hoeffding’s inequality is similar in spirit to Markov’s inequality but it is a sharper inequality.
We begin with the following important result.

Lemma 4 Suppose that a ≤ X ≤ b. Then

t2 (b−a)2
E(etX ) ≤ etµ e 8

where µ = E[X].

2
Before we start the proof, reecall that a function g is convex if for each x, y and each
α ∈ [0, 1],
g(αx + (1 − α)y) ≤ αg(x) + (1 − α)g(y).
Proof. We will assume that µ = 0. Since a ≤ X ≤ b, we can write X as a convex
combination of a and b, namely, X = αb + (1 − α)a where α = (X − a)/(b − a). By the
convexity of the function y → ety we have
X − a tb b − X ta
etX ≤ αetb + (1 − α)eta = e + e .
b−a b−a
Take expectations of both sides and use the fact that E(X) = 0 to get

a tb b ta
EetX ≤ − e + e = eg(u) (3)
b−a b−a
where u = t(b − a), g(u) = −γu + log(1 − γ + γeu ) and γ = −a/(b − a). Note that
00
g(0) = g 0 (0) = 0. Also, g (u) ≤ 1/4 for all u > 0. By Taylor’s theorem, there is a ξ ∈ (0, u)
such that
0 u2 00 u2 00 u2 t2 (b − a)2
g(u) = g(0) + ug (0) + g (ξ) = g (ξ) ≤ = .
2 2 8 8
2 (b−a)2 /8
Hence, EetX ≤ eg(u) ≤ et .

Next, we need to use Chernoff ’s method.

Lemma 5 Let X be a random variable. Then

P(X > ) ≤ inf e−t E(etX ).

t≥0

Proof. For any t > 0,

P(X > ) = P(eX > e ) = P(etX > et ) ≤ e−t E(etX ).

Since this is true for every t ≥ 0, the result follows.

Theorem 6 (Hoeffding’s Inequality) Let Y1 , . . . , Yn be iid observations such that

E(Yi ) = µ and a ≤ Yi ≤ b. Then, for any > 0,
2 2
P |Y n − µ| ≥ ≤ 2e−2n /(b−a) .

(4)

3
Corollary 7 If X1 , X2 , . . . , Xn are independent with P(a ≤ Xi ≤ b) = 1 and common
mean µ, then, with probability at least 1 − δ,
s
(b − a)2 2
|X n − µ| ≤ log . (5)
2n δ

Proof. Without los of generality, we asume that µ = 0. First we have

P(|Y n | ≥ ) = P(Y n ≥ ) + P(Y n ≤ −)

= P(Y n ≥ ) + P(−Y n ≥ ).

Next we use Chernoff’s method. For any t > 0, we have, from Markov’s inequality, that
n
!
X Pn
Yi n
P(Y n ≥ ) = P Yi ≥ n = P e i=1 ≥e
i=1
Pn
t n
P
= P e i=1 Yi ≥ etn ≤ e−tn E et i=1 Yi
Y
= e−tn E(etYi ) = e−tn (E(etYi ))n .
i

2 (b−a)2 /8
From Lemma 4, E(etYi ) ≤ et . So
2 n(b−a)2 /8
P(Y n ≥ ) ≤ e−tn et .

This is minimized by setting t = 4/(b − a)2 giving

2 /(b−a)2
P(Y n ≥ ) ≤ e−2n .

Applying the same argument to P(−Y n ≥ ) yields the result.

Example 8 Let X1 , . . . , Xn ∼ Bernoulli(p). From, Hoeffding’s inequality,

2
P(|X n − p| > ) ≤ 2e−2n .

3 The Bounded Difference Inequality

So far we have focused on sums of random variables. The following result extends Hoeffding’s
inequality to more general functions g(x1 , . . . , xn ). Here we consider McDiarmid’s inequality,
also known as the Bounded Difference inequality.

4
Theorem 9 (McDiarmid) Let X1 , . . . , Xn be independent random variables. Sup-
pose that

0
sup g(x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) − g(x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) ≤ ci (6)

x1 ,...,xn ,x0i

for i = 1, . . . , n. Then
!
22

P g(X1 , . . . , Xn ) − E(g(X1 , . . . , Xn )) ≥ ≤ exp − Pn 2
. (7)
i=1 ci

Proof.
Pn Let Vi = E(g|X1 , . . . , Xi )−E(g|X1 , . . . , Xi−1 ). Then g(X1 , . . . , Xn )−E(g(X1 , . . . , Xn )) =
i=1 Vi and E(Vi |X1 , . . . , Xi−1 ) = 0. Using a similar argument as in Hoeffding’s Lemma we
have,
2 2
E(etVi |X1 , . . . , Xi−1 ) ≤ et ci /8 . (8)
Now, for any t > 0,
n
!
X
P (g(X1 , . . . , Xn ) − E(g(X1 , . . . , Xn )) ≥ ) = P Vi ≥
i=1
Pn Pn
= P et i=1 Vi ≥ et ≤ e−t E et i=1 Vi
!!
Pn−1
= e−t E et i=1 Vi E etVn X1 , . . . , Xn−1

2 2
Pn−1
≤ e−t et cn /8 E et i=1 Vi
..
.
2
Pn 2
≤ e−t et i=1 ci .

The result follows by taking t = 4/ ni=1 c2i .

Pn
Example 10 If we take g(x1 , . . . , xn ) = n−1 i=1 xi then we get back Hoeffding’s inequality.

4 Bounds on Expected Values

Theorem 11 (Cauchy-Schwartz inequality) If X and Y have finite variances

then p
E |XY | ≤ E(X 2 )E(Y 2 ). (9)

5
The Cauchy-Schwarz inequality can be written as

Cov2 (X, Y ) ≤ σX
2 2
σY .

Recall that a function g is convex if for each x, y and each α ∈ [0, 1],

g(αx + (1 − α)y) ≤ αg(x) + (1 − α)g(y).

If g is twice differentiable and g 00 (x) ≥ 0 for all x, then g is convex. It can be shown that if
g is convex, then g lies above any line that touches g at some point, called a tangent line.
A function g is concave if −g is convex. Examples of convex functions are g(x) = x2 and
g(x) = ex . Examples of concave functions are g(x) = −x2 and g(x) = log x.

Theorem 12 (Jensen’s inequality) If g is convex, then

Eg(X) ≥ g(EX). (10)

If g is concave, then
Eg(X) ≤ g(EX). (11)

Proof. Let L(x) = a + bx be a line, tangent to g(x) at the point E(X). Since g is convex,
it lies above the line L(x). So,

Eg(X) ≥ EL(X) = E(a + bX) = a + bE(X) = L(E(X)) = g(EX).

Example 13 From Jensen’s inequality we see that E(X 2 ) ≥ (EX)2 .

Example 14 (Kullback Leibler Distance) Define the Kullback-Leibler distance between

two densities p and q by Z
p(x)
D(p, q) = p(x) log dx.
q(x)
Note that D(p, p) = 0. We will use Jensen to show that D(p, q) ≥ 0. Let X ∼ p. Then
Z Z
q(X) q(X) q(x)
−D(p, q) = E log ≤ log E = log p(x) dx = log q(x)dx = log(1) = 0.
p(X) p(X) p(x)
So, −D(p, q) ≤ 0 and hence D(p, q) ≥ 0.

Suppose we have an exponential bound on P(Xn > ). In that case we can bound E(Xn ) as
follows.

6
Theorem 15 Suppose that Xn ≥ 0 and that for every > 0,
2
P(Xn > ) ≤ c1 e−c2 n (12)

for some c2 > 0 and c1 > 1/e. Then,

r
C
E(Xn ) ≤ . (13)
n
where C = (1 + log(c1 ))/c2 .
R∞
Proof. Recall that for any nonnegative random variable Y , E(Y ) = 0 P(Y ≥ t)dt. Hence,
for any a > 0,
Z ∞ Z a Z ∞ Z ∞
2 2 2 2
E(Xn ) = P(Xn ≥ t)dt = P(Xn ≥ t)dt + P(Xn ≥ t)dt ≤ a + P(Xn2 ≥ t)dt.
0 0 a a
√
Equation (12) implies that P(Xn > t) ≤ c1 e−c2 nt . Hence,
Z ∞ Z ∞ √ Z ∞
c1 e−c2 na
2
E(Xn ) ≤ a + 2
P(Xn ≥ t)dt = a + P(Xn ≥ t)dt ≤ a + c1 e−c2 nt dt = a + .
a a a c2 n
Set a = log(c1 )/(nc2 ) and conclude that
log(c1 ) 1 1 + log(c1 )
E(Xn2 ) ≤ + = .
nc2 nc2 nc2
Finally, we have s
p 1 + log(c1 )
E(Xn ) ≤ E(Xn2 ) ≤ .
nc2

Now we consider bounding the maximum of a set of random variables.

Theorem 16 Let X1 , . . . , Xn be random variables. Suppose there exists σ > 0 such

2 2
that E(etXi ) ≤ et σ /2 for all t > 0. Then

p
E max Xi ≤ σ 2 log n. (14)
1≤i≤n

Proof. By Jensen’s inequality,

exp tE max Xi ≤ E exp t max Xi
1≤i≤n 1≤i≤n
n
2 σ 2 /2
X
= E max exp {tXi } ≤ E (exp {tXi }) ≤ net .
1≤i≤n
i=1

7
Thus,
log n tσ 2

E max Xi ≤ + .
1≤i≤n t 2
√
The result follows by setting t = 2 log n/σ.

5 OP and oP
In statisics, probability and machine learning, we make use of oP and OP notation.
Recall first, that an = o(1) means that an → 0 as n → ∞. an = o(bn ) means that
an /bn = o(1).
an = O(1) means that an is eventually bounded, that is, for all large n, |an | ≤ C for some
C > 0. an = O(bn ) means that an /bn = O(1).
We write an ∼ bn if both an /bn and bn /an are eventually bounded. In computer sicence this
s written as an = Θ(bn ) but we prefer using an ∼ bn since, in statistics, Θ often denotes a
parameter space.
Now we move on to the probabilistic versions. Say that Yn = oP (1) if, for every > 0,
P(|Yn | > ) → 0.
Say that Yn = oP (an ) if, Yn /an = oP (1).
Say that Yn = OP (1) if, for every > 0, there is a C > 0 such that
P(|Yn | > C) ≤ .
Say that Yn = OP (an ) if Yn /an = OP (1).
√
Let’s use Hoeffding’s inequality to show that sample proportions are OP (1/ n) within the
the true mean. Let Y1 , . . . , Yn be coin flips i.e. Yi ∈ {0, 1}. Let p = P(Yi = 1). Let
n
1X
pbn = Yi .
n i=1
√
We will show that: pbn − p = oP (1) and pbn − p = OP (1/ n).
We have that
2
pn − p| > ) ≤ 2e−2n → 0
P(|b
and so pbn − p = oP (1). Also,
√

C
pn − p| > √
pn − p| > C) = P |b
P( n|b
n
2
≤ 2e−2C < δ
√
if we pick C large enough. Hence, pn − p) = OP (1) and so
n(b

1
pbn − p = OP √ .
n

8
Make sure you can prove the following:

OP (1)oP (1) = oP (1)

OP (1)OP (1) = OP (1)
oP (1) + OP (1) = OP (1)
OP (an )oP (bn ) = oP (an bn )
OP (an )OP (bn ) = OP (an bn )

Anthony Orton-Learning Mathematics - Issues, Theory and Classroom Practice, 3rd Rev - Edition (2004) PDF
100% (6)
Anthony Orton-Learning Mathematics - Issues, Theory and Classroom Practice, 3rd Rev - Edition (2004) PDF
241 pages
Stochastic Calculus For Finance II Conti
No ratings yet
Stochastic Calculus For Finance II Conti
99 pages
Kaiser 1974
100% (1)
Kaiser 1974
7 pages
Shreve Stochcal4fin 2
No ratings yet
Shreve Stochcal4fin 2
99 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
Concentration Inequalities: Hoeffding and Mcdiarmid
No ratings yet
Concentration Inequalities: Hoeffding and Mcdiarmid
5 pages
Probability Bounds
No ratings yet
Probability Bounds
14 pages
Lecture 4 Inequalities and Asymptotic Estimates
No ratings yet
Lecture 4 Inequalities and Asymptotic Estimates
9 pages
Math556-05-Inequalities
No ratings yet
Math556-05-Inequalities
8 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
13 pages
Notes 2
No ratings yet
Notes 2
10 pages
Ashish Mcdiarmid
No ratings yet
Ashish Mcdiarmid
22 pages
03 Hoeffding
No ratings yet
03 Hoeffding
5 pages
Lec 3
No ratings yet
Lec 3
8 pages
MA 4040/ MA 2540: Probability Theory
No ratings yet
MA 4040/ MA 2540: Probability Theory
12 pages
SC633 Lecture Notes
No ratings yet
SC633 Lecture Notes
4 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
Problems: MN) MN
No ratings yet
Problems: MN) MN
11 pages
Notes
No ratings yet
Notes
32 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
ADASD
No ratings yet
ADASD
4 pages
1 Inequalities: 1.1 Markov
No ratings yet
1 Inequalities: 1.1 Markov
15 pages
Math 710 Homework 7: Problem 1
No ratings yet
Math 710 Homework 7: Problem 1
7 pages
Prob Notes
No ratings yet
Prob Notes
70 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
MUML Preliminiaries
No ratings yet
MUML Preliminiaries
24 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Bernstein's Inequality, and Generalizations: CS281B/Stat241B (Spring 2003) Statistical Learning Theory
No ratings yet
Bernstein's Inequality, and Generalizations: CS281B/Stat241B (Spring 2003) Statistical Learning Theory
4 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
Probability Final
No ratings yet
Probability Final
42 pages
Hoeffding Bounds
No ratings yet
Hoeffding Bounds
9 pages
Various Modes of Convergence: Definitions
No ratings yet
Various Modes of Convergence: Definitions
6 pages
STA 211 Lecture 3
No ratings yet
STA 211 Lecture 3
21 pages
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
No ratings yet
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
12 pages
Concentration
No ratings yet
Concentration
28 pages
Unit 3 - Bounds and Inequalities
No ratings yet
Unit 3 - Bounds and Inequalities
25 pages
확통1 LectureNote06 on Limit Theorems
No ratings yet
확통1 LectureNote06 on Limit Theorems
36 pages
Section 6 - The law of large numbers and the central limit theorem(1)
No ratings yet
Section 6 - The law of large numbers and the central limit theorem(1)
10 pages
Covergence
No ratings yet
Covergence
18 pages
Vjeravatnost
No ratings yet
Vjeravatnost
429 pages
CS229 Supplemental Lecture Notes Hoeffding's Inequality: 1 Basic Probability Bounds
No ratings yet
CS229 Supplemental Lecture Notes Hoeffding's Inequality: 1 Basic Probability Bounds
8 pages
High Dimensional Probability MA3K0 Notes 3
No ratings yet
High Dimensional Probability MA3K0 Notes 3
108 pages
Essentials On The Analysis of Randomized Algorithms: 1 Basics
No ratings yet
Essentials On The Analysis of Randomized Algorithms: 1 Basics
8 pages
Solutions To Exam 1: 1 2 N N A N
No ratings yet
Solutions To Exam 1: 1 2 N N A N
3 pages
Probability II Upload Week 9
No ratings yet
Probability II Upload Week 9
3 pages
Appendix A Solutions of Selected Problems
No ratings yet
Appendix A Solutions of Selected Problems
19 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
12 pages
Rosenthal
No ratings yet
Rosenthal
7 pages
Probability Bounds: Simple Bounds On Expectation
No ratings yet
Probability Bounds: Simple Bounds On Expectation
3 pages
Unif Gauss Tail
No ratings yet
Unif Gauss Tail
13 pages
18MAB203T U3 Book PDF
No ratings yet
18MAB203T U3 Book PDF
38 pages
AOPSUMS97
No ratings yet
AOPSUMS97
12 pages
Unit3-Probability and Stochastic Processes (18MAB203T)
No ratings yet
Unit3-Probability and Stochastic Processes (18MAB203T)
25 pages
2 On The Strong Law of Large Number For Pairwise I.I.D Random Variables With General Moment Conditions
No ratings yet
2 On The Strong Law of Large Number For Pairwise I.I.D Random Variables With General Moment Conditions
6 pages
728HA1
No ratings yet
728HA1
6 pages
Prob 2 B English
No ratings yet
Prob 2 B English
81 pages
Lec 4
No ratings yet
Lec 4
8 pages
RIP Routing Protocol
No ratings yet
RIP Routing Protocol
27 pages
MTL106 - by Amaiya Singhal (PROBABILITY AND STOCHASTIC PROCESSES)
No ratings yet
MTL106 - by Amaiya Singhal (PROBABILITY AND STOCHASTIC PROCESSES)
53 pages
The 123 Theorem and Its Extensions
No ratings yet
The 123 Theorem and Its Extensions
9 pages
A e Convergence
No ratings yet
A e Convergence
6 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
2024 Mathematics Olympiad Past Paper Central Province Category III English Medium
No ratings yet
2024 Mathematics Olympiad Past Paper Central Province Category III English Medium
5 pages
Math 4 2
No ratings yet
Math 4 2
2 pages
Sim Diffproc
No ratings yet
Sim Diffproc
176 pages
MPH 006 Classical Mechanics II
No ratings yet
MPH 006 Classical Mechanics II
296 pages
Cat TSD
No ratings yet
Cat TSD
55 pages
DSTL Unit 4
No ratings yet
DSTL Unit 4
3 pages
MAP Class Report
No ratings yet
MAP Class Report
10 pages
Math3201ch1.4anotes-Workings Key
No ratings yet
Math3201ch1.4anotes-Workings Key
5 pages
Quiz 8B Solutions
No ratings yet
Quiz 8B Solutions
3 pages
Application of Elman Neural Network and MATLAB To Load Forecasting
No ratings yet
Application of Elman Neural Network and MATLAB To Load Forecasting
5 pages
Central Values PDF
No ratings yet
Central Values PDF
22 pages
Water Jag Problem
No ratings yet
Water Jag Problem
36 pages
Subset Sum Problem
No ratings yet
Subset Sum Problem
29 pages
Sample Papers Maths (III)
No ratings yet
Sample Papers Maths (III)
7 pages
SDO Imus City LeaP ABM Bus1st Quarter Week 1 1 1
No ratings yet
SDO Imus City LeaP ABM Bus1st Quarter Week 1 1 1
4 pages
AMC 12 Contest B: Solutions Pamphlet
No ratings yet
AMC 12 Contest B: Solutions Pamphlet
12 pages
Без Названия - SAT Suite Question Bank - Results
No ratings yet
Без Названия - SAT Suite Question Bank - Results
198 pages
Statement and Conclusions - Logical Reasoning questions
No ratings yet
Statement and Conclusions - Logical Reasoning questions
9 pages
Markov Chain Monte Carlo in R
No ratings yet
Markov Chain Monte Carlo in R
14 pages
Computer Graphics Using Opengl, 3 Edition F. S. Hill, Jr. and S. Kelley
No ratings yet
Computer Graphics Using Opengl, 3 Edition F. S. Hill, Jr. and S. Kelley
57 pages
Probability Distributions: Lecture #5
No ratings yet
Probability Distributions: Lecture #5
50 pages
50 Drafts
No ratings yet
50 Drafts
6 pages
Basics of Partial Differentiation: Learning Enhancement Team
No ratings yet
Basics of Partial Differentiation: Learning Enhancement Team
6 pages
Solving Exponential and Log Equations - Classzone
No ratings yet
Solving Exponential and Log Equations - Classzone
8 pages
Mathematics Performance Task 2 and 3
No ratings yet
Mathematics Performance Task 2 and 3
4 pages
Homework Solving Exponential and Logarithmic Equations
100% (1)
Homework Solving Exponential and Logarithmic Equations
5 pages