0% found this document useful (0 votes)
62 views9 pages

Lecture 4 Inequalities and Asymptotic Estimates

This document summarizes key inequalities and asymptotic estimates that are useful for analyzing random graphs and their applications. It begins by outlining basic inequalities like Markov's inequality, Chebyshev's inequality, and Jensen's inequality. It then discusses elementary inequalities and asymptotic estimates related to logarithms, Stirling's approximation, and exponential functions. The document concludes by introducing Chernoff bounds, which provide tail inequalities for the sum of independent random variables.

Uploaded by

kientrungle2001
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views9 pages

Lecture 4 Inequalities and Asymptotic Estimates

This document summarizes key inequalities and asymptotic estimates that are useful for analyzing random graphs and their applications. It begins by outlining basic inequalities like Markov's inequality, Chebyshev's inequality, and Jensen's inequality. It then discusses elementary inequalities and asymptotic estimates related to logarithms, Stirling's approximation, and exponential functions. The document concludes by introducing Chernoff bounds, which provide tail inequalities for the sum of independent random variables.

Uploaded by

kientrungle2001
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CSE 713: Random Graphs and Applications Lecturer: Hung Q.

Ngo
SUNY at Buffalo, Fall 2003 Scribe: Hung Q. Ngo

Lecture 4: Inequalities and Asymptotic Estimates

We draw materials from [2, 5, 8–10, 17, 18]. Unless specified otherwise, we use µ, σ 2 to denote the
mean and variance of the the variable under consideration. This note shall be updated throughout the
seminar as I find more useful inequalities.

1 Basic inequalities
Theorem 1.1 (Markov’s Inequality). If X is a random variable taking only non-negative values, then
for any a > 0
E[X]
Pr[X ≥ a] ≤ . (1)
a
Proof. We show this for the discrete case only, the continuous case is similar. By definition, we have
X X X X
E[X] = xp(x) = xp(x) + xp(x) ≥ ap(x) = aPr[X ≥ a].
x x<a x≥a x≥a

Intuitively, when a ≤ E[X] the inequality is trivial. For a > E[X], it means the larger a is relative
to the mean, the harder it is to have X ≥ a. Thus, the inequality meets common sense. A slightly more
intuitive form of (1) is
1
Pr[X ≥ aµ] ≤ . (2)
a
Theorem 1.2 (Chebyshev’s Inequality). If X is a random variable with mean µ and variance σ 2 , then
for any a > 0,
  σ2
Pr |X − µ| ≥ a ≤ 2 . (3)
a
Proof. This inequality makes a lot of sense. The probability that X is far from its mean gets smaller
when X is further, and smaller when its variance is smaller. The proof is almost an immediate corollary
of Markov’s. Let Z = (X − µ)2 , then E[Z] = σ 2 by definition of variance. Since |X − µ| ≥ a iff
Z ≥ a2 , applying Markov’s inequality completes the proof.

Again, there is a more intuitive way of writing (3):


  1
Pr |X − µ| ≥ aσ ≤ 2 . (4)
a
Theorem 1.3.
σ2
Pr[X = 0] ≤ . (5)
σ 2 + µ2

1
Proof. We show this for the discrete case. The continuous case is shown similarly.
 2   
X X X
µ2 =  xPr[X = x] ≤  x2 Pr[X = x]  Pr[X = x] = σ 2 (1 − Pr[X = 0]) .
x6=0 x6=0 x6=0

Theorem 1.4 (One-sided Chebyshev Inequality). Let X be a random variable with E[X] = µ and
Var [X] = σ 2 , then for any a > 0,

σ2
Pr[X ≥ µ + a] ≤ (6)
σ 2 + a2
σ2
Pr[X ≤ µ − a] ≤ . (7)
σ 2 + a2

Proof. Let t ≥ −µ be a variable. Then, Y = (X + t)2 has and

E[Y ] = E[X 2 ] + 2tµ + t2 = σ 2 + (t + µ)2 .

Thus, by Markov’s inequality we get

σ 2 + (t + µ)2
Pr[X ≥ µ + a] ≤ Pr[Y ≥ (µ + a + t)2 ] ≤ .
(a + t + µ)2

The right most expression is minimized when t = σ 2 /a − µ, in which case it becomes σ 2 /(σ 2 + a2 ) as
desired. The other inequality is proven similarly.

A twice-differentiable function f is convex if f 00 (x) ≥ 0 for all x, and concave when f 00 (x) ≥ 0 for
all x.

Theorem 1.5 (Jenssen’s inequality). Let f (x) be a convex function, then

E[f (X)] ≥ f (E[X]). (8)

The same result holds for multiple random variables.

Proof. Taylor’s theorem gives

f (x) = f (µ) + f 0 (µ)(x − µ) + f 00 (ξ)(x − µ)2 /2,

where ξ is some number between x and µ. When f (x) is convex, f 00 (ξ) ≥ 0, which implies

f (x) ≥ f (µ) + f 0 (µ)(x − µ).

Consequently,
E[f (X)] ≥ f (µ) + f 0 (µ)E[X − µ] = f (µ).

2
2 Elementary Inequalities and Asymptotic Estimates
Fact 2.1. For p ∈ [0, 1], (1 − p) ≤ e−p . The inequality is good for small p.
Fact 2.2. For any x ∈ [−1, 1], (1 + x) ≤ ex . The inequality is good for small x.
The following theorem was shown by Robbins [16].
1
Theorem 2.3 (Stirling’s approximation). For each positive integer n, there is an αn , where 12n+1 <
1
αn < 12n , such that
√  n n
n! = 2πn eαn . (9)
e
We often find it useful to remember the asymptotic form of Stirling’s approximation:
√  n n
n! = 2πn (1 + o(1)). (10)
e
The following theorem follows from trivial applications of the Taylor’s expansions for ln(1 + t) and
ln(1 − t).
Theorem 2.4 (Estimates of ln(1 + t)). (a) If t > −1, then
1 1
ln(1 + t) ≤ min{t, t − t2 + t3 }. (11)
2 3

(b)
1
ln(1 + t) > t − t2 . (12)
2
(c)
1 1
ln(1 + t) > t − t2 + t3 . (13)
2 4
(d)
ln(1 − t) > −t − t2 . (14)

(e)
1 1
ln(1 − t) > −t − t2 − t3 . (15)
2 2
Lemma 2.5. Let cosh(x) = (ex + e−x )/2, and sinh(x) = (ex − e−x )/2. Then for all reals α, x with
|α| ≤ 1,
2
cosh(x) + α sinh(x) ≤ ex /2+αx . (16)

Proof. This follows from elementary analysis.

Corollary 2.6. The following are often more useful than the general result above
2 /2
(i) cosh(t) ≤ et .

(ii) For all p ∈ [0, 1], and all t,


2 /8
pet(1−p) + (1 − p)e−tp ≤ et (17)

Proof. Firstly, (i) follows from Lemma 2.5 by setting α = 0, t = x. On the other hand, (ii) follows by
setting p = (1 + α)/2 and t = 2x.

3
3 Chernoff bounds
The following idea from Chernoff (1952, [6]) is infuential on showing many different “tail inequalities”.

Theorem 3.1 (Chernoff bound). Let X be a random variable with moment generating function M (t) =
E[etX ]. Then,

Pr[X ≥ a] ≤ e−ta M (t) for all t > 0


Pr[X ≤ a] ≤ e−ta M (t) for all t < 0.

Proof. The best bound can be obtained by minimizing the function on the right hand side. We show the
first relation, the second is similar. When t > 0, by Markov’s inequality we get

Pr[X ≥ a] = Pr[etX ≥ eta ] ≤ E[etX ]e−ta .

Let us first consider a set of mutually independent Bernulli random variables X1 , . . . , Xn , where
Pr[Xi = 1] = pi , and Pr[Xi = 0] = 1 − pi , for 0 < pi < 1. Let Sn = X1 + · · · + Xn , then µ =
E[Sn ] = p1 + · · · + pn . Note that when pi = p, Sn has the usual Binomial distribution Binomial(n, p).

Theorem 3.2. Under the above assumptions, for any a > 0,


n
Pr[Sn ≥ a] ≤ e−ta 1 + p(et − 1) . (18)

Proof. The proof makes use of Chernoff’s idea: for any t > 0, Markov’s inequality gives

Pr[Sn ≥ a] = Pr[etSn ≥ eta ] ≤ e−ta E[etSn ] = e−ta E[etX1 +···+tXn ] = e−ta E[etXi ] . . . E[etXn ].
(19)
Note that the independence assumption is crucial. On the other hand,

f (pi ) := ln(E[etXi ]) = ln(pi et + (1 − pi )) = ln(1 + pi (et − 1))

is concave in pi , which - by Jensen’s inequality - implies


n
X
ln(E[etXi ]) ≤ n ln(1 + p(et − 1)).
i=1

Exponentiating both sides and recall inequality (19), we get


n
Pr[Sn ≥ a] ≤ e−ta 1 + p(et − 1) ,

as desired.

Theorem 3.3. Let X1 , . . . , Xn be mutually independent random variables with |Xi | ≤ ci and E[Xi ] =
0, where ci > 0 is a function on i. Let S = X1 + · · · + Xn , then

a2
 
Pr[S ≥ a] ≤ exp − 2 . (20)
2(c1 + · · · + c2n )

4
Proof. For any t > 0, Chernoff’s bound gives

Pr[S ≥ a] ≤ e−ta E[etS ] = e−ta E[etX1 +···+tXn ] = e−ta E[etX1 ] . . . E[etXn ].

Note that for x ∈ [−c, c], we have etx ≤ f (x), where

ect + e−ct ect − e−ct


f (x) = + x = cosh (ct) + x sinh (ct).
2 2c
To see etx ≤ f (x), note that y = f (x) is the chord through the points x = −c, x = c of the convex
curve y = etx . Thus,
2 /2
E[etXi ] ≤ E[f (Xi )] = f (E[Xi ]) = f (0) = cosh(ci t) ≤ e(ci t) .

Consequently,
2 2 2 /2
Pr[S ≥ a] ≤ e−ta e(c1 +···+cn )t .
P 2
Pick t = a/( i ci ) to minimize the right hand side, we get the desired result.

4 Martingale Tail Inequalities


Theorem 4.1 (Kolmogorov-Doob Inequality). Let X0 , X1 , . . . be a martingale sequence. Then, for
any a > 0,
E[|Xn |]
Pr[ max Xi ≥ a] ≤ . (21)
0≤i≤n a
Proof. TBD.

The following result was shown by Hoeffding (1963, [12]) and Azuma (1967, [3]).

Theorem 4.2 (Hoeffding-Azuma Inequality). Let X0 , . . . , Xn be a martingale sequence such that for
each k = 1, . . . , n,
|Xk − Xk−1 | ≤ ck , (22)
where ck is a function on k. Then, for all m ≥ 0, a > 0,

−a2
 
Pr[|Xm − X0 | ≥ a] ≤ 2 exp Pm 2 (23)
2 k=1 ck

Condition (22) on a martingale sequence is often called the Lipschitz condition.

Proof. Let F0 ⊆ F1 ⊆ · · · ⊆ Fn be a filtration corresponding to the martingale sequence, i.e.

E[Xk | Fk−1 ] = Xk−1 , or E[Xk − Xk−1 | Fk−1 ] = 0.

Note also that Xi is Fj -measurable for all j ≥ i, i.e. Xi is constant on the elementary events of Fj .
Hence, for any function f on Xi , we have E[f (Xi ) | Fj ] = f (Xi ) for all j ≥ i.
For k = 1, . . . , n, let Yk = Xk − Xk−1 . Then, Xm − X0 = Y1 + · · · + Ym and |Yk | ≤ ck . It is easy
to see that, for any t > 0,

E[etY1 +···+tYm ] = E etY1 +···+tYm−1 E[etYm | Fm−1 ] .


 

5
We first try to bound the upper tail, proceeding in the same way as in the proof of Theorem 3.3. For any
t > 0, Chernoff bound gives

Pr[Y1 + · · · + Ym ≥ a] ≤ e−ta E[etY1 +···+tYm ]


= e−ta E etY1 +···+tYm−1 E[etYm | Fm−1 ]
 
2 2
≤ e−ta ecm t /2 E etY1 +···+tYm−1
 
2 2 2 /2
≤ e−ta e(c1 +···+cm )t .

The rest is the same as in Theorem 3.3. We get half of the right hand side of (23). To show the same
upper bound for Pr[Xm − X0 ≤ −a], we can just let Yk = Xk−1 − Xk .

We next develop two more general versions of tail inequalities for martingales, one comes from
Maurey (1979, [15]), the other from Alon-Kim-Spencer (1997, [1]).
Let A, B be finite sets, AB denote the set of all mappings from B into A. (It might be instructive to
try to explain the choice of the notation AB on your own.) For example, if B is a set of edges of a graph
G, and A = {0, 1}, then AB can be thought of as the set of all spanning subgraphs of G.
Now, let Ω = AB , and define a measure on Ω by giving values pab and, for each g ∈ AB , define

Pr[g(b) = a] = pab ,

where g(b) are mutually independent.


Fix a gradation ∅ = B0 ⊂ B1 ⊂ · · · ⊂ Bm = B. (In the simplest case, |Bi − Bi−1 | = 1, m = |B|,
and thus the gradation defines a total order on B.) The gradation induces a filtration F0 ⊆ F1 ⊆ · · · ⊆
Fm on Ω, where the elementary events of Fi are sets of functions from B into A whose restrictions on
Bi are identical. Thus, there are |A|| Bi | elementary events for Fi , each correspond to a distinct element
of ABi .
To this end, let L : AB → R be a functional (like χ, ω, α in the G(n, p) case), which could be
thought of as a random variable on Ω. The sequence Xi = E[L | Fi ] is a martingale. It is easy to see
that X0 = E[L] and Xm = L.
Definition 4.3. The functional L is said to satisfy the Lipschitz condition relative to the gradation if,
∀k ∈ [m],
g and h differ only on Bk − Bk−1 implies |L(g) − L(h)| ≤ ck ,
where ck is a function of k.
The following lemma helps generalize Hoeffding-Azuma’s inequality.
Lemma 4.4. Let L satisfy Lipschitz condition, then the corresponding martingale satisfies

|Xk − Xk−1 | ≤ ck , ∀k ∈ [m].

Proof. TBD.

Corollary 4.5 (Generalized Hoeffding-Azuma Inequality). In the setting of Lemma 4.4, let µ = E[L].
Then, for all a > 0,
−a2
 
Pr[L ≥ µ + a] ≤ exp Pm 2 , (24)
2 k=1 ck
and
−a2
 
Pr[L ≤ µ − a] ≤ exp Pm 2 , (25)
2 k=1 ck

Proof. This follows directly from Lemmas 4.4 and 4.2.

6
5 Lovász Local Lemma
Let A1 , . . . , An be events on an arbitrary probability space. A directed graph G = (V, E) with V =
[n] is called a dependency digraph for A1 , . . . , An if each Ai is independent from the set of events
{Aj | (i, j) ∈ / E}. (In other words, Ai is at most dependent on its neighbors.) The following lemma,
often referred to as Lovász Local Lemma, was originally shown in Erdős and Lovász (1975, [8]). The
lemma is very useful when showing a certain event has positive probability, albeit exponentially small.
It is most useful when the dependency digraph has small maximum degree.

Lemma 5.1 (Lovász Local Lemma). Let G = (V, E) be a dependency digraph for the events A1 , . . . , An .
Suppose there are real numbers α1 , . . . , αn , such that 0 ≤ αi < 1, ∀i, and
Y
Pr[Ai ] ≤ αi (1 − αj ).
j:(i,j)∈E

Then,

(a) For all S ⊂ [n], |S| = s < n, and any i ∈


/ S,
 
^
Pr Ai | Āj  ≤ αi . (26)
j∈S

(b) Moreover, the probability that none of the Ai happens is positive. In particular
"n # n
^ Y
Pr Āi ≥ (1 − αi ). (27)
i=1 i=1

Proof. Firstly, we show that (a) implies (b). This follows as


"n #
^
n−1
Pr Āi = Pr[Ā1 ] · Pr[Ā2 | Ā1 ] . . . Pr[Ān | ∧j=1 Āj ]
i=1
n−1
= (1 − Pr[A1 ])(1 − Pr[A2 | Ā1 ]) . . . (1 − Pr[An | ∧j=1 Āj ])
= (1 − α1 )(1 − α2 ) . . . (1 − αn ).

To show (a), we induct on s = |S|. There is nothing to do for s = 0. For s ≥ 1, assume that (26)
holds for all |S| ≤ s − 1. Consider some S with |S| = s ≥ 1. Let Di = {j ∈ S | (i, j) ∈ E}, and
D̄i = S − Di . We have
      
^ ^ ^
Pr Ai | Āj  = Pr Ai |  Āj  ∧  Āj 
j∈S j∈Di j∈D̄i
h V  Vi
Pr Ai ∧ j∈Di Ā j | j∈D̄i Ā j
= hV V i .
Pr j∈Di Āj | j∈D̄i Āj

We first bound the numerator:


     
^ ^ ^ Y
Pr Ai ∧  Āj  | Āj  ≤ Pr Ai | Āj  = Pr[Ai ] ≤ αi (1 − αj ).
j∈Di j∈D̄i j∈D̄i j:(i,j)∈E

7
Next, the denominator (which would be 1 if ∧j∈D̄i Āj = ∅) can be bounded with induction hypothesis.
Suppose Di = {j1 , . . . jk }, then
 
^ ^
Pr  Āj | Āj 
j∈Di j∈D̄i
     
^ ^
= 1 − Pr Aj1 | Āj  1 − Pr Aj2 | Āj  . . .
j∈D̄i j∈D̄i ∪{j1 }
  
^
. . . 1 − Pr Ajk | Āj 
j∈D̄i ∪Di −{jk }
Y
≥ (1 − αj )
j∈Di
Y
≥ (1 − αj ).
j:(i,j)∈E

As we have mentioned earlier, the Local Lemma is most useful when the maximum degree of a
dependency graph is small. We now give a particular version of the Lemma which helps us make use of
this observation:

Corollary 5.2 (Local Lemma; Symmetric Case). Suppose each event Ai is independent of all others
except for at most ∆ (i.e. the dependency graph has maximum degree at most ∆), and that Pr(Ai ) ≤ p
for all i = 1 . . . , n.
If
ep(∆ + 1) ≤ 1, (28)
then Pr(∧ni=1 Āi ) > 0.

Proof. The case ∆ = 0 is trivial. Otherwise, take αi = 1/(∆ + 1) (which is < 1) in the Local Lemma,
we have  ∆
1 1 1 Y
P [Ai ] ≤ p ≤ ≤ αi 1 − ≤ αi (1 − αj ).
∆+1e ∆+1
j:(i,j)∈E
 ∆
1
Here, we have used the fact that for ∆ ≥ 1, 1 − ∆+1 > 1/e, which follows from (14) with t =
1/(∆ + 1) ≤ 0.5.
For applications of Lovász Local Lemma and its algorithmic aspects, see Beck [4] and others [7, 11,
13, 14]

References
[1] N. A LON , J.-H. K IM , AND J. S PENCER, Nearly perfect matchings in regular simple hypergraphs, Israel J. Math., 100
(1997), pp. 171–187.

[2] N. A LON AND J. H. S PENCER, The probabilistic method, Wiley-Interscience Series in Discrete Mathematics and Opti-
mization, Wiley-Interscience [John Wiley & Sons], New York, second ed., 2000. With an appendix on the life and work
of Paul Erdős.

[3] K. A ZUMA, Weighted sums of certain dependent random variables, Tôhoku Math. J. (2), 19 (1967), pp. 357–367.

8
[4] J. B ECK, An algorithmic approach to the Lovász local lemma. I, Random Structures Algorithms, 2 (1991), pp. 343–365.

[5] B. B OLLOB ÁS, Random graphs, vol. 73 of Cambridge Studies in Advanced Mathematics, Cambridge University Press,
Cambridge, second ed., 2001.

[6] H. C HERNOFF, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math.
Statistics, 23 (1952), pp. 493–507.

[7] A. C ZUMAJ AND C. S CHEIDELER, Coloring nonuniform hypergraphs: a new algorithmic approach to the general Lovász
local lemma, in Proceedings of the Ninth International Conference “Random Structures and Algorithms” (Poznan, 1999),
vol. 17, 2000, pp. 213–237.

[8] P. E RD ŐS AND L. L OV ÁSZ, Problems and results on 3-chromatic hypergraphs and some related questions, in Infinite
and finite sets (Colloq., Keszthely, 1973; dedicated to P. Erdős on his 60th birthday), Vol. II, North-Holland, Amsterdam,
1975, pp. 609–627. Colloq. Math. Soc. János Bolyai, Vol. 10.

[9] W. F ELLER, An introduction to probability theory and its applications. Vol. I, John Wiley & Sons Inc., New York, 1968.

[10] , An introduction to probability theory and its applications. Vol. II., John Wiley & Sons Inc., New York, 1971.

[11] V. G URUSWAMI , J. H ÅSTAD , AND M. S UDAN, Hardness of approximate hypergraph coloring, SIAM J. Comput., 31
(2002), pp. 1663–1686 (electronic).

[12] W. H OEFFDING, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963),
pp. 13–30.

[13] M. K RIVELEVICH AND V. H. V U, Choosability in random hypergraphs, J. Combin. Theory Ser. B, 83 (2001), pp. 241–
257.

[14] T. L EIGHTON , C.-J. L U , S. R AO , AND A. S RINIVASAN, New algorithmic aspects of the local lemma with applications
to routing and partitioning, SIAM J. Comput., 31 (2001), pp. 626–641 (electronic).

[15] B. M AUREY, Construction de suites symétriques, C. R. Acad. Sci. Paris Sér. A-B, 288 (1979), pp. A679–A681.

[16] H. ROBBINS, A remark onStirling’s formula, Amer. Math. Monthly, 62 (1955), pp. 26–29.

[17] S. ROSS, A first course in probability, Macmillan Co., New York, second ed., 1984.

[18] S. M. ROSS, Introduction to probability models, Harcourt/Academic Press, San Diego, CA, seventh ed., 2000.

You might also like