ee5110-lecture-limit-theorems
ee5110-lecture-limit-theorems
2. Chebyshev Inequality
• “If the variance of X is small, then X is unlikely to take values away
from the mean”
• If X is a random variable with mean µ and variance σ 2 , then
σ2
P(|X − µ| ≥ a) ≤ for all a > 0
a2
• Example: Let X be uniformly distributed in the interval [0, 4]. Find
a bound on P(X ≥ 2), P(X ≥ 3) and P(X ≥ 4) using Chebyshev
inequality. (Ans: 1, 1 and 31 .)
3. Weak Law of Large Numbers
• “Sample mean of i.i.d. r.v.s is likely to be close to the true mean”
• Let X1 , X2 , · · · , be i.i.d. r.v.s with mean µ and variance σ 2 . Define
sample mean as
n
1X
X̄n = Xi
n i=1
For every a > 0, we have,
X1 + · · · + Xn
P(|X̄n − µ| ≥ a) = P − µ ≥ a → 0, n→∞
n
1
Figure 1: Distribution of X̄n for different values of n (n = 10, 20, 30, 40, 60
and 100) when Xi are Bernoulli with p = 12 . Observe that the sample mean
converges around p = 12 .
2
of time that event A occurs; Mn is often called the empirical frequency of
A. The weak law applies and shows that when n is large, the empirical
frequency is most likely to be within ϵ of p. Empirical frequencies are
faithful estimates of probability of an event!
5. Example: Let p be the fraction of voters who support a particular candi-
date for office. We interview n “randomly selected” voters and record Mn ,
the fraction of them that support the candidate. Show that with a sample
size of n = 100, the probability that our estimate of p is incorrect by more
than 0.1 (accuracy) is no larger than 0.25 (confidence). Suppose, we like
to have high confidence (probability at least 95%) that our estimate will
be very accurate (within .01 of p). How many voters should be sampled?
(Hint: 50,000.)
2 Convergence in Probability
1. Convergence in Probability:
• Let Y1 , Y2 , · · · be a sequence of random variables. We say that Yn
converges to a (a real number) in probability if, for every ϵ > 0,
lim P(|Yn − a| ≥ ϵ) = 0
n→∞
i.p. P
In this case, we write, Yn −−→ a or Yn −
→ a.
– If Yn (ω) → a for all ω ∈ Ω, we say Yn converges pointwise to
a. Here, the notion of convergence considered is the standard
definition of convergence of a sequence of real numbers.
1 P
2. Example: Let P(Yn = 1) = n = 1 − P(Yn = 0). Then, Yn −
→ 0.
1 P
3. Example: Let P(Yn = n) = 1, for all n. Then, Yn −
→ 0.
4. Example: (WLLN) The sample mean {X̄n } converges to the true mean µ
P
in probability, X̄n −
→ µ.
5. Example: Let X1 , X2 , · · · be i.i.d. uniform random variables between
i.p.
[0, 1]. Define, for all n, Yn = min(X1 , · · · , Xn ). Then, Yn −−→ 0.
6. Example: Let X1 , X2 , · · · be i.i.d. uniform random variables between
i.p.
[0, 1]. Define, for all n, Yn = max(X1 , · · · , Xn ). Then, Yn −−→ 1.
7. Properties: Let Xn → a and Yn → b in probability. Then,
• Xn + Yn → a + b in probability
• If g(·) is continuous, then g(Xn ) → g(a) in probability
• E[Xn ] need not converge to a
3
8. Let Y1 , Y2 , · · · be a sequence of random variables. We say that Yn con-
verges to Y (a random variable) in probability if, for every ϵ > 0,
lim P(|Yn − Y | ≥ ϵ) = 0
n→∞
i.p. P
In this case, we write Yn −−→ Y , or Yn −
→Y.
• Convergence in probability characterizes the behaviour of the se-
quence of random variables {Yn } individually (in terms of their marginal
distributions) and their relationship to a limiting random variable
Y , but it does not imply much about the joint distribution of the
sequence {Yn }.
3 Convergence in Distribution
1. A sequence of random variables Y1 , Y2 , · · · is said to converge to a random
variable Y in distribution if,
Pn 2
• n1 i=1 (Xi − µ) has variance σn
4
Pn
• √1
n i=1 (Xi − µ) has variance σ 2
Zn has zero mean and unit variance. The CDF of Zn converges to the
standard normal CDF, i.e.,
D
Zn −→ Z, where Z ∼ N (0, 1).
3. CLT is quite general and very useful!
• sum of random variables is approximately normal
4. Characteristic function of a random variable X:
R ∞ itx
itX −∞
e fX (x) dx, X is continuous
ΦX (t) = E[e ] = P itx
x e pX (x), X is discrete
(k) dk ΦX (t)
ΦX (0) = = ik E[X k ]
dtk t=0
5
5. Proof of CLT:
Pn Xi −µ
Given Zn = √1 .
n i=1 σ
Consider Y1 = X1σ−µ , which is a zero mean random variable with unit
variance. Using a Taylor series expansion around 0, we have,
(1) (2) t2 t2
ΦY1 (t) = ΦY1 (0) + ΦY1 (0)t + ΦY1 (0) + o(t2 ) = 1 + 0 − + o(t2 )
2! 2
Then, ΦZn (t) is
n
t2 t2
t t2
ΦZn (t) = ΦnY1 √ = 1− +o → e− 2 = ΦZ (t)
n 2n n
where Z ∼ N (0, 1). This (using Levy’s theorem) implies that Zn converges
in distribution to Z, a standardized normal random variable.
6
6. Illustration of Convolution and CLT:
Figure 3: Illustration of CLT. Figure plots the density of sum (scaled) of i.i.d.
random variables for different values of n and for three different densities.
7
5 Convergence with Probability One
1. Let Y1 , Y2 , · · · be a sequence of random variables. We say that Yn con-
verges to another random variable Y with probability one (or almost
surely) if
P ({ω : Yn (ω) → Y (ω)}) = 1
w.p.1 a.s.
If Y = c, then we say Yn −−−→ c or Yn −−→ c.
8
Proof : Suppose that the fourth moment of Xn exists, i.e., E[Xn4 ] = K <
∞.
Pn
Without loss of generality, assume that µ = 0. Define Sn = i=1 Xi .
Then,
4 4 4 n 2 2
E[Sn ] = nE[X1 ] + E [X1 ] ≤ nK + 3n(n − 1)K
2 2
Dividing by n4 , we have,
Sn4
K 3K
E 4
≤ 3+ 2
n n n
and,
∞ 4 X∞
X Sn K 3K
E 4 ≤ + 2 <∞
n=1
n n=1
n3 n
4
Sn Sn
and, n4 → 0 w.p.1., and n → 0 w.p.1.
Hence, X̄n → µ with probability one.
7. Some properties
• Convergence with probability one implies convergence in probabil-
ity, but convergence in probability does not imply convergence with
probability one.
• Let Xn → a and Yn → b w.p.1.
– Xn + Yn → a + b w.p.1.
– If g(·) is continuous, then g(Xn ) → g(a) w.p.1.
– E[Xn ] need not converge to a
8. For the experiments given below, compare the time average of the outcome
with the expected outcome (ensemble average) at any time n.
(a) X1 is Bernoulli with mean 0.5. And, X2 = X1 , X3 = X1 , X4 =
X1 , · · ·
(b) X1 is Bernoulli with mean 0.5. And, Xi = X1 if i is odd and Xi =
1 − X1 when i is even.
(c) {Xn } are i.i.d. Bernoulli with mean 0.5.