0% found this document useful (0 votes)
9 views

ee5110-lecture-limit-theorems

The document outlines key concepts in probability theory relevant for electrical engineers, including the Weak Law of Large Numbers, convergence in probability, convergence in distribution, and the Central Limit Theorem. It provides definitions, inequalities, and examples to illustrate these concepts, emphasizing their applications in estimating probabilities and understanding random variables. Additionally, it discusses the Strong Law of Large Numbers and convergence with probability one, reinforcing the importance of these principles in statistical analysis.

Uploaded by

Aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ee5110-lecture-limit-theorems

The document outlines key concepts in probability theory relevant for electrical engineers, including the Weak Law of Large Numbers, convergence in probability, convergence in distribution, and the Central Limit Theorem. It provides definitions, inequalities, and examples to illustrate these concepts, emphasizing their applications in estimating probabilities and understanding random variables. Additionally, it discusses the Strong Law of Large Numbers and convergence with probability one, reinforcing the importance of these principles in statistical analysis.

Uploaded by

Aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

EE5110: Probability Foundations for Electrical Engineers

July - Nov 2024

1 Weak Law of Large Numbers


1. Markov Inequality
• “If X ≥ 0 and E[X] is small, then X is unlikely to take large values”
• If a random variable X can only take non-negative values, then
E[X]
P(X ≥ a) ≤ , for all a > 0
a
• Example: Let X be uniformly distributed in the interval [0, 4].
(a) Find P(X ≥ 2), P(X ≥ 3) and P(X ≥ 4). (Ans: 12 , 14 and 0).
(b) Bound the values of P(X ≥ 2), P(X ≥ 3) and P(X ≥ 4) using
Markov inequality. (Ans: 1, 23 and 42 ).

2. Chebyshev Inequality
• “If the variance of X is small, then X is unlikely to take values away
from the mean”
• If X is a random variable with mean µ and variance σ 2 , then

σ2
P(|X − µ| ≥ a) ≤ for all a > 0
a2
• Example: Let X be uniformly distributed in the interval [0, 4]. Find
a bound on P(X ≥ 2), P(X ≥ 3) and P(X ≥ 4) using Chebyshev
inequality. (Ans: 1, 1 and 31 .)
3. Weak Law of Large Numbers
• “Sample mean of i.i.d. r.v.s is likely to be close to the true mean”
• Let X1 , X2 , · · · , be i.i.d. r.v.s with mean µ and variance σ 2 . Define
sample mean as
n
1X
X̄n = Xi
n i=1
For every a > 0, we have,
 
X1 + · · · + Xn
P(|X̄n − µ| ≥ a) = P − µ ≥ a → 0, n→∞
n

(use Chebyshev inequality)

1
Figure 1: Distribution of X̄n for different values of n (n = 10, 20, 30, 40, 60
and 100) when Xi are Bernoulli with p = 12 . Observe that the sample mean
converges around p = 12 .

Figure 2: Sample average X̄n (corresponding the above figure) as a function


of n, for different realisations. Observe that most of the realisations have their
sample average around p = 12 .

4. Example: Consider an event A defined in the context of some probabilistic


experiment. Let p = P (A) be the probability of this event. We consider
n independent repetitions of the experiment, and let Mn be the fraction

2
of time that event A occurs; Mn is often called the empirical frequency of
A. The weak law applies and shows that when n is large, the empirical
frequency is most likely to be within ϵ of p. Empirical frequencies are
faithful estimates of probability of an event!
5. Example: Let p be the fraction of voters who support a particular candi-
date for office. We interview n “randomly selected” voters and record Mn ,
the fraction of them that support the candidate. Show that with a sample
size of n = 100, the probability that our estimate of p is incorrect by more
than 0.1 (accuracy) is no larger than 0.25 (confidence). Suppose, we like
to have high confidence (probability at least 95%) that our estimate will
be very accurate (within .01 of p). How many voters should be sampled?
(Hint: 50,000.)

2 Convergence in Probability
1. Convergence in Probability:
• Let Y1 , Y2 , · · · be a sequence of random variables. We say that Yn
converges to a (a real number) in probability if, for every ϵ > 0,

lim P(|Yn − a| ≥ ϵ) = 0
n→∞

i.p. P
In this case, we write, Yn −−→ a or Yn −
→ a.
– If Yn (ω) → a for all ω ∈ Ω, we say Yn converges pointwise to
a. Here, the notion of convergence considered is the standard
definition of convergence of a sequence of real numbers.
1 P
2. Example: Let P(Yn = 1) = n = 1 − P(Yn = 0). Then, Yn −
→ 0.
1 P
3. Example: Let P(Yn = n) = 1, for all n. Then, Yn −
→ 0.
4. Example: (WLLN) The sample mean {X̄n } converges to the true mean µ
P
in probability, X̄n −
→ µ.
5. Example: Let X1 , X2 , · · · be i.i.d. uniform random variables between
i.p.
[0, 1]. Define, for all n, Yn = min(X1 , · · · , Xn ). Then, Yn −−→ 0.
6. Example: Let X1 , X2 , · · · be i.i.d. uniform random variables between
i.p.
[0, 1]. Define, for all n, Yn = max(X1 , · · · , Xn ). Then, Yn −−→ 1.
7. Properties: Let Xn → a and Yn → b in probability. Then,
• Xn + Yn → a + b in probability
• If g(·) is continuous, then g(Xn ) → g(a) in probability
• E[Xn ] need not converge to a

3
8. Let Y1 , Y2 , · · · be a sequence of random variables. We say that Yn con-
verges to Y (a random variable) in probability if, for every ϵ > 0,

lim P(|Yn − Y | ≥ ϵ) = 0
n→∞

i.p. P
In this case, we write Yn −−→ Y , or Yn −
→Y.
• Convergence in probability characterizes the behaviour of the se-
quence of random variables {Yn } individually (in terms of their marginal
distributions) and their relationship to a limiting random variable
Y , but it does not imply much about the joint distribution of the
sequence {Yn }.

3 Convergence in Distribution
1. A sequence of random variables Y1 , Y2 , · · · is said to converge to a random
variable Y in distribution if,

lim FYn (y) = FY (y)


n→∞

for all y at which Fy (y) is continuous.


D
2. Example: Let P(Yn = 1) = n1 = 1 − P(Yn = 0). Then, Yn −→ 0. (Here,
Y = 0 is the degenerate random variable.)
D
3. Example: Let P(Yn = n1 ) = 1, then Yn −→ 0. (Here, Y = 0 is the
degenerate random variable.)
D
4. Example: Let Xn be a sequence of i.i.d. random variables. Then, Xn −→
X1 .
Xn
5. Example: Let Xn ∼ geo( nλ ). Let Yn = n . Show that Yn → Y in
distribution, where Y ∼ exp(λ).
6. Example: Let Yn ∼ Binomial(n, nλ ). Show that Yn → Y in distribution,
where Y ∼ Poisson(λ).
7. Convergence in probability implies convergence in distribution, but con-
vergence in distribution does not imply convergence in probability.

4 Central Limit Theorem


1. Motivation: Revisiting Chebyshev inequality
Pn
• i=1 (Xi − µ) has variance nσ
2

Pn 2
• n1 i=1 (Xi − µ) has variance σn

4
Pn
• √1
n i=1 (Xi − µ) has variance σ 2

2. Central limit theorem: Let X1 , X2 , · · · be i.i.d. random variables with


finite µ and σ 2 . Define
n  
X Xi − µ
Zn = √
i=1
σ n

Zn has zero mean and unit variance. The CDF of Zn converges to the
standard normal CDF, i.e.,

lim P(Zn ≤ z) = Φ(z) for all z


n→∞

D
Zn −→ Z, where Z ∼ N (0, 1).
3. CLT is quite general and very useful!
• sum of random variables is approximately normal
4. Characteristic function of a random variable X:
 R ∞ itx
itX −∞
e fX (x) dx, X is continuous
ΦX (t) = E[e ] = P itx
x e pX (x), X is discrete

where t ∈ R and i2 = −1.


• characteristic functions are well defined for all t and X.
• characteristic function determines the distribution uniquely (you can
invert the characteristic function to identify the distribution).
• Characteristic functions can be used to generate moments of the ran-
dom variable.

(k) dk ΦX (t)
ΦX (0) = = ik E[X k ]
dtk t=0

• If X and Y are independent, then ΦX+Y (t) = ΦX (t) ΦY (t).


• Example: Let X be Bernoulli p. Then, ΦX (t) = (1 − p) + eit p.
• Example: Let X be Binomial (n,p). Then, ΦX (t) = ((1 − p) + eit p)n .
it
• Example: Let X be Poisson λ. Then, ΦX (t) = eλ(−1+e ) .
t2
• Example: Let X ∼ N (0, 1). Then, ΦX (t) = e− 2 .
• Example: Let Xn ∼ geo( nλ ). Let Yn = Xn
n . Show that Yn → Y in
distribution, where Y ∼ exp(λ).
• Example: Let Yn ∼ Binomial(n, nλ ). Show that Yn → Y in distri-
bution, where Y ∼ Poisson(λ), by showing that ΦYn (t) → ΦY (t) as
n → ∞.

5
5. Proof of CLT:  
Pn Xi −µ
Given Zn = √1 .
n i=1 σ

As {Xn } are i.i.d., we have,


n    
Y t t
ΦZn (t) = Φ( Xi −µ ) √ = ΦnX1 −µ √
σ n ( σ ) n
i=1

 
Consider Y1 = X1σ−µ , which is a zero mean random variable with unit
variance. Using a Taylor series expansion around 0, we have,

(1) (2) t2 t2
ΦY1 (t) = ΦY1 (0) + ΦY1 (0)t + ΦY1 (0) + o(t2 ) = 1 + 0 − + o(t2 )
2! 2
Then, ΦZn (t) is
n
t2 t2
   
t t2
ΦZn (t) = ΦnY1 √ = 1− +o → e− 2 = ΦZ (t)
n 2n n

where Z ∼ N (0, 1). This (using Levy’s theorem) implies that Zn converges
in distribution to Z, a standardized normal random variable.

6
6. Illustration of Convolution and CLT:

Figure 3: Illustration of CLT. Figure plots the density of sum (scaled) of i.i.d.
random variables for different values of n and for three different densities.

7. Applications of central limit theorem: Let XP 1 , X2 , · · · , be i.i.d. random


n
variables with finite µ and σ 2 . Define Sn = i=1 Xi . If n is large, Sn is
approximately normal, and
 
c − nµ
P(Sn ≤ c) ≈ Φ √

8. Example: We load on a plane 100 packages whose weights are independent


random variables that are uniformly distributed between 5 and 50 pounds.
What is the probability that the total weight will exceed 3000 pounds?
(Hint: 1 - ΦZ (1.92) = 0.0274)
9. Example: We poll n voters and record the fraction Mn of those polled
who are in favor of a particular candidate. Find the minimum n (using
CLT) such that our error in estimate is ±0.01 with 0.95 confidence. (Hint:
9604, see Bertsekas and Tsitsiklis).

7
5 Convergence with Probability One
1. Let Y1 , Y2 , · · · be a sequence of random variables. We say that Yn con-
verges to another random variable Y with probability one (or almost
surely) if
P ({ω : Yn (ω) → Y (ω)}) = 1
w.p.1 a.s.
If Y = c, then we say Yn −−−→ c or Yn −−→ c.

2. Example: Let X1 , X2 , · · · be a sequence of independent random variables


that are uniformly distributed in [0, 1]. Define Yn = min(X1 , · · · , Xn ).
Show that Yn → 0 w.p.1. (Hint: Example 5.1.4 from Chapter 5: Limit
Theorems).
3. Example: Let X1 , X2 , · · · be a sequence of independent random variables
that are uniformly distributed in [0, 1]. Define Yn = max(X1 , · · · , Xn ).
Show that Yn → 1 w.p.1.
1 w.p.1
4. Example: Let P(Yn = 1) = n = 1 − P(Yn = 0). Does Yn −−−→ 0? (Hint:
No and Yes!)
1 w.p.1
5. Example: Let P(Yn = n) = 1, for all n. Then, show that Yn −−−→ 0.

6. Strong Law of Large Numbers: Let X1 , X2 , · · · be i.i.d. random variables


with finite µ. The sample mean converges to µ with probability one, i.e.,
( n
)!
1X
P ω: Xi (ω) → µ =1
n i=1

8
Proof : Suppose that the fourth moment of Xn exists, i.e., E[Xn4 ] = K <
∞.
Pn
Without loss of generality, assume that µ = 0. Define Sn = i=1 Xi .
Then,
  
4 4 4 n 2 2
E[Sn ] = nE[X1 ] + E [X1 ] ≤ nK + 3n(n − 1)K
2 2

Dividing by n4 , we have,

Sn4
 
K 3K
E 4
≤ 3+ 2
n n n

and,
∞  4 X∞  
X Sn K 3K
E 4 ≤ + 2 <∞
n=1
n n=1
n3 n

If E[Y ] < ∞, then P(Y < ∞) = 1. So,



X Sn4
< ∞, w.p.1.
n=1
n4

4
Sn Sn
and, n4 → 0 w.p.1., and n → 0 w.p.1.
Hence, X̄n → µ with probability one.
7. Some properties
• Convergence with probability one implies convergence in probabil-
ity, but convergence in probability does not imply convergence with
probability one.
• Let Xn → a and Yn → b w.p.1.
– Xn + Yn → a + b w.p.1.
– If g(·) is continuous, then g(Xn ) → g(a) w.p.1.
– E[Xn ] need not converge to a

8. For the experiments given below, compare the time average of the outcome
with the expected outcome (ensemble average) at any time n.
(a) X1 is Bernoulli with mean 0.5. And, X2 = X1 , X3 = X1 , X4 =
X1 , · · ·
(b) X1 is Bernoulli with mean 0.5. And, Xi = X1 if i is odd and Xi =
1 − X1 when i is even.
(c) {Xn } are i.i.d. Bernoulli with mean 0.5.

You might also like