0% found this document useful (0 votes)
36 views

Random Variables

This document discusses three important limit theorems: 1. The Law of Large Numbers states that as the sample size increases, the sample mean converges to the true population mean. 2. The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the underlying distribution. 3. Examples are provided to illustrate how sample means from uniform distributions become normally distributed as the number of observations increases, visually demonstrating the Central Limit Theorem.

Uploaded by

Mauricio Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Random Variables

This document discusses three important limit theorems: 1. The Law of Large Numbers states that as the sample size increases, the sample mean converges to the true population mean. 2. The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the underlying distribution. 3. Examples are provided to illustrate how sample means from uniform distributions become normally distributed as the number of observations increases, visually demonstrating the Central Limit Theorem.

Uploaded by

Mauricio Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 5.

Multiple Random Variables


5.7: Limit Theorems
Slides (Google Drive) Video (YouTube)

This is definitely one of the most important sections in the entire text! The Central Limit Theorem is used
everywhere in statistics (hypothesis testing), and it also has its applications in computing probabilities. We’ll
see three results here, each getting more powerful and surprising.

are iid random variables with mean µ and variance σ 2 , then we define the sample mean
If X1 , . . . , Xn P
1 n
to be X n = n i=1 Xi . We’ll see the following results:
  
• The expectation of the sample mean E X n is exactly the true mean µ, and the variance Var X n =
σ 2 /n goes to 0 as you get more samples.
• (Law of Large Numbers) As n → ∞, the sample mean X n converges (in probability) to the true mean
µ. That is, as you get more samples, you will be able to get an excellent estimate of µ.
• (Central Limit Theorem) In fact, Xn follows a Normal distribution as n → ∞ (in practice n as low as
30 is good enough for this to be true). When we talk about the distribution of Xn , this means: if we
take n samples and take the sample mean, another n samples and take the sample mean, and so on,
how will these sample means look in a histogram? This is crazy - regardless of what the distribution
of Xi ’s were (discrete, continuous), their average will be approximately Normal! We’ll see pictures and
describe this more soon!

5.7.1 The Sample Mean

Before we start, we will define the sample mean of n random variables, and compute its mean and variance.

Definition 5.7.1: The Sample Mean + Properties

Let X1 , X2 , . . . , Xn be a sequence of iid (independent and identically distributed) random variables


with mean µ and variance σ 2 . The sample mean is:
n
1X
X̄n = Xi
n i=1

Further:
" n # n
  1X 1X 1
E X̄n = E Xi = E [Xi ] = nµ = µ
n i=1 n i=1 n

Also, since the Xi ’s are independent:


n
! n
 1X 1 X 1 2 σ2
Var X̄n = Var Xi = Var (Xi ) = nσ =
n i=1 n2 i=1 n2 n

1
2 Probability & Statistics with Applications to Computing 5.7

Again, none of this is “mind-blowing” to prove: we just used linearity of expectation and properties of
variance to show this.

What is this saying? Basically, if you wanted to estimate the mean height of the U.S. population by sampling
n people uniformly at random:

 
• In expectation, your sample average will be “on point” at E X n = µ. This even includes the case
n = 1: if you just sample one person, on average, you will be correct. However, the variance is high.

• The variance of your estimate (the sample mean) for the true mean goes down (σ 2 /n) as your sample
size n gets larger. This makes sense right? If you have more samples, you have more confidence in
your estimate because you are more “sure” (less variance).

In fact, as n → ∞, the variance of the sample mean approaches 0. A distribution with mean µ and variance
0 is essentially the degenerate random variable that takes on µ with probability 1. We’ll actually see that
the Law of Large Numbers argues exactly that!

5.7.2 The Law of Large Numbers (LLN)

Using the fact that the variance is approaching 0 as n → ∞, we can argue that, by averaging more and more
samples (n → ∞), we get a really good estimate of the true mean µ since the variance of the sample mean
is σ 2 /n → 0 (as we showed earlier). Here is the formal mathematical statement:

Theorem 5.7.1: The Law of Large Numbers

Weak Law of Large Numbers (WLLN): Let X1 , X2 , . . . , Xn be aPsequence of independent and


n
identically distributed random variables with mean µ. Let X̄n = n1 i=1 Xi be the sample mean.
Then, X̄n converges in probability to µ. That is for any  > 0:

lim P |X̄n − µ| >  = 0
n→∞

Strong Law of Large Numbers (SLLN): Let X1 , X2 , . . . , Xn be aPsequence of independent and


n
identically distributed random variables with mean µ. Let X̄n = n1 i=1 Xi be the sample mean.
Then, X̄n converges almost surely to µ. That is:
 
P lim X̄n = µ = 1
n→∞

The SLLN implies the WLLN, but not vice versa. The difference is subtle and is basically swapping
the limit and probability operations.

The proof the WLLN will be given in 6.1 when we prove Chebyshev’s inequality, but the proof of the SLLN
is out of the scope of this class and much harder to prove.
5.7 Probability & Statistics with Applications to Computing 3

5.7.3 The Central Limit Theorem (CLT)

Theorem 5.7.2: The Central Limit Theorem (CLT)

Let X1 , . . . Xn be a sequence of independent and identically distributed random variables with mean
2
µ and (finite) variance σ 2 . We’ve seen that the sample mean X̄n has mean µ and variance σn . Then
as n → ∞, the following equivalent statements hold:
2
1. X̄n → N (µ, σn ).
X̄n −µ
2. √ 2
→ N (0, 1)
σ /n
Pn
3. i=1 Xi ∼ N (nµ, nσ 2 ). This is not “technically” correct, but is useful for applications.
Pn
X −nµ
4. √ i
i=1
→ N (0, 1)
nσ 2

The mean or variance are not a surprise (we computed these at the beginning of these notes for any
sample mean); the importance of the CLT is, regardless of the distribution of Xi ’s, the sample mean
approaches a Normal distribution as n → ∞.

We will prove the central limit theorem in 5.11 using MGFs, but take a second to appreciate this crazy
result! The LLN say that as n → ∞, the sample mean of iid variables X n converges to µ. The CLT says
that, as n → ∞, the sample mean actually converges to a Normal distribution! For any original distribution
of the Xi ’s (discrete or continuous), the average/sum will become approximately normally distributed.

If you’re still having trouble with figuring out what “the distribution of the sample mean” means, that’s
completely normal (double pun!). Let’s consider n = 2, so we just take the average of X1 + X2 , which is
X1 +X2
2 . The distribution of X1 + X2 means: if we repeatedly sample X1 , X2 and add them, what might
the density look like? For example, if X1 , X2 ∼ Unif(0, 1) (continuous), we showed the density of X1 + X2
looked like a triangle. We figured out how to compute the PMF/PDF of the sum using convolution in 5.5,
and the average is just dividing this by 2: X1 +X
2
2
, which you can find the PMF/PDF by transforming RVs
in 4.4. On the next page, you’ll see exactly the CLT applied to these Uniform distributions. With n = 1, it
looks (and is) Uniform. When n = 2, you get the triangular shape. And as n gets larger, it starts looking
more and more like a Normal!

You’ll see some examples below of how we start with some arbitrary distributions and how the density
function of their mean becomes shaped like a Gaussian (you know how to compute the pdf of the mean now
using convolution in 5.5 and transforming RV’s in 4.4)!
On the next two pages, we’ll see some visual “proof” of this surprising result!
4 Probability & Statistics with Applications to Computing 5.7

Let’s see the CLT applied to the (discrete) Uniform distribution.

1
• The first (n = 1) of the four graphs below shows a discrete · Unif(0, 29) PMF in the dots (and
29
a blue line with the curve of the normal distribution
 with the same mean and variance). That is,
1 1 2 28
P (X = k) = for each value in the range 0, , , . . . , , 1 .
30 29 29 29
• The second graph (n = 2) has the average of two of these distributions, again with a blue line with
the curve of the normal distribution with the same mean and variance. Remember we expected this
triangular distribution when summing either discrete or continuous Uniforms. (e.g., when summing
two fair 6-sided die rolls, you’re most likely to get a 7, and the probability goes down linearly as you
approach 2 or 12. See the example in 5.5 if you forgot how we got this!
• The third (n = 3) and fourth (n = 4) have the average of 3 and 4 identically distributed random
variables respectively, each of the distribution shown in the distribution in the first graph. We can see
that as we average more, the sum approaches a normal distribution.

Again, if you don’t believe me, you can compute the PMF yourself using convolution: first add two Unif(0, 1),
then convolve it with a third, and a fourth!
Despite this being a discrete random variable, when we take an average of many, there become increasingly
many values we can get between 0 and 1. The average of these iid discrete rv’s approaches a continuous
Normal random variable even after just averaging 4 of them!

Image Credit: Larry Ruzzo (a previous University of Washington CSE 312 instructor).
5.7 Probability & Statistics with Applications to Computing 5

You might still be skeptical, because the Uniform distribution is “nice” and already looked pretty “Normal”
even with n = 2 samples. We now illustrate the same idea with a strange distribution shown in the first
(n = 1) of the four graphs below, illustrated with the dots (instead of a “nice” uniform distribution). Even
this crazy distribution nearly looks Normal after just averaging 4 of them. This is the power of the CLT!

What we are getting at here is that, regardless of the distribution, as we have more independent and
identically distributed random variables, the average follows a Normal distribution (with the same mean and
variance as the sample mean).
6 Probability & Statistics with Applications to Computing 5.7

Now let’s see how we can apply the CLT to problems! There were four different equivalent forms (just
scaling/shifting) stated, but I find it easier to just look at the problem and decide what’s best. Seeing
examples is the best way to understand!

Example(s)

Let’s consider the example of flipping a fair coin 40 times independently. What’s the probability of
getting between 15 to 25 heads? First compute this exactly and then give an approximation using
the CLT.

Solution Define X to be the number of heads in the 40 flips. Then we have X ∼ Bin(n = 40, p = 12 ), so we
just sum the Binomial PMF:
25    k  40−k
X 40 1 1
P (15 ≤ X ≤ 25) = 1− ≈ 0.9193.
k 2 2
k=15

Now, let’s use the CLT. Since X can be thought of as the sum of 40 iid Ber( 21 ) RVs, we can apply the
CLT. We have E [X] = np = 40( 12 ) = 20 and Var (X) = np(1 − p) = 40( 21 )(1 − 12 ) = 10. So we can use the
approximation X ≈ N (µ = 20, σ 2 = 10).

This gives us the following good but not great approximation:

P (15 ≤ X ≤ 25) ≈ P (15 ≤ N (20, 10) ≤ 25)


 
15 − 20 25 − 20
=P √ ≤Z≤ √ [standardize]
10 10
≈ P (−1.58 ≤ Z ≤ 1.58)
= Φ(1.58) − Φ(−1.58)
= 0.8862

We’ll see how to improve our approximation below!

5.7.4 The Continuity Correction

Notice that in the prior example in computing P (15 ≤ X ≤ 25), we sum over 25 − 15 + 1 = 11 terms of the
PMF. However, our integral P (15 ≤ N (20, 10) ≤ 25) has width 25 − 15 = 10. We’ll always be off-by-one
since the number of integers in [a, b] is (b − a) + 1 (for integers a ≤ b) and not b − a (e.g., the number of
integers between [12, 15] is (15 − 12) + 1 = 4 : {12, 13, 14, 15}).
The continuity correction says we should add 0.5 in each direction. That is, we should ask for P (a − 0.5 ≤ X ≤ b + 0.5)
instead so the width is b − a + 1 instead. Notice that if we do the final calculation, to approximate
P (15 ≤ X ≤ 25) using the central limit theorem, now with the continuity correction, we get the following:

Example(s)

Use the continuity correction to get a better estimate than we did earlier for the coin problem.
5.7 Probability & Statistics with Applications to Computing 7

Solution We’ll apply the exact same steps, except changing the bounds from 15 and 25 to 14.5 and 25.5.
P (15 ≤ X ≤ 25) ≈ P (14.5 ≤ N (20, 10) ≤ 25.5) [apply continuity correction]
 
14.5 − 20 25.5 − 20
=P √ ≤Z≤ √
10 10
≈ P (−1.74 ≤ Z ≤ 1.74)
= Φ(1.74) − Φ(−1.74)
≈ 0.9182
Notice that this is much closer to the exact answer from the first part of the prior example (0.9193) than
approximating with the central limit theorem without the continuity correction!

Definition 5.7.2: The Continuity Correction

When approximating an integer-valued (discrete) random variable X with a continuous one Y


(such as in the CLT), if asked to find a P (a ≤ X ≤ b) for integers a ≤ b, you should compute
P (a − 0.5 ≤ Y ≤ b + 0.5) so that the width of the interval being integrated is the same as the number
of terms summed over (b − a + 1). This is called the continuity correction.

Note: If you are applying the CLT to sums/averages of continuous RVs instead, you should not
apply the continuity correction.

See the additional exercises below to get more practice with the CLT!

5.7.5 Exercises
1. Each day, the number of customers who come to the CSE 312 probability gift shop is approximately
Poi(11). Approximate the probability that, after the quarter ends (9 × 7 = 63 days), that we had at
least 700 customers.

Solution: The total number of customers that come is X = X1 + · · · + X63 , where each Xi ∼ Poi(11)
has E [Xi ] = Var (Xi ) = λ = 11 from the chart. By the CLT, X ≈ N (µ = 63 · 11, σ 2 = 63 · 11) (sum of
the means and sum of the variances). Hence,

P (X ≥ 700) ≈ P (X ≥ 699.5) [continuity correction]


≈ P (N (693, 693) ≥ 699.5) [CLT]
 
699.5 − 693
=P Z≥ √ [standardize]
693
= 1 − Φ(0.2469)
= 1 − 0.598
= 0.402
Note that you could compute this exactly as well since you know the sum of iid Poissons is Poisson.
In fact, X ∼ Poi(693) (the average rate in 63 days is 693 per 63 days), and you could do a sum which
would be very annoying.
2. Suppose I have a flashlight which requires one battery to operate, and I have 18 identical batteries. I
want to go camping for a week (24 × 7 = 168) hours. If the lifetime of a single battery is Exp(0.1),
8 Probability & Statistics with Applications to Computing 5.7

what’s the probability my flashlight can operate for the entirety of my trip?

Solution: The total lifetime of the battery is X = X1 + · · · + X18 where each Xi ∼ Exp(0.1)
1 1
has E [Xi ] = = 10 and Var (Xi ) = = 100. Hence, E [X] = 180 and Var (X) = 1800 by linearity
0.1 0.12
of expectation and since variance adds for independent rvs. In fact, X ∼ Gamma(r = 18, λ = 0.1),
but we don’t have a closed-form for its CDF. By the CLT, X ≈ N (µ = 180, σ 2 = 1800), so

P (X ≥ 168) ≈ P (N (180, 1800) ≥ 168) [CLT]


 
168 − 180
=P Z≥ √ [standardize]
1800
= 1 − Φ(−0.28284)
= Φ(0.28284) [symmetry of Normal]
= 0.611

Note that we don’t use the continuity correction here because the RV’s we are summing are already
continuous RVs.

You might also like