0% found this document useful (0 votes)
3 views

Week10class1 Solution

The document discusses the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT), which are fundamental concepts in probability and statistics. The LLN states that as the sample size increases, the sample mean converges to the population mean, while the CLT describes the distribution of the sum (or average) of a large number of independent and identically distributed random variables, approximating a normal distribution. The document includes definitions, examples, and activities to illustrate these concepts.

Uploaded by

Shadman Saiful
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Week10class1 Solution

The document discusses the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT), which are fundamental concepts in probability and statistics. The LLN states that as the sample size increases, the sample mean converges to the population mean, while the CLT describes the distribution of the sum (or average) of a large number of independent and identically distributed random variables, approximating a normal distribution. The document includes definitions, examples, and activities to illustrate these concepts.

Uploaded by

Shadman Saiful
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Law of large numbers Central limit theorem Extra notes

10.022 Modelling Uncertainty


Week 10 Class 1: Central Limit Theorem

Term 3, 2024

1 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Outline

1 Law of large numbers


Definitions
LLN

2 Central limit theorem


Statement and examples
Activities

3 Extra notes

2 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Sample mean of RV’s


Last class, we studied the sum of random variables. We now
consider the ‘average’ (or sample mean) of random variables.

Definition
Let X1 , X2 , . . . , Xn be random variables. The sample mean of
these random variables is defined as
n
1X
X̄n := Xi .
n
i=1

When the context is clear, we may use the simpler notation X̄.

Do not confuse X̄n with E(Xi ), the expectation (mean) of Xi .

Do not confuse X̄n with x̄ (the sample mean of data values).


Note that x̄ is just a number, whereas X̄n is a random variable.
3 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Sample mean; i.i.d.


Each Xi represents the outcomes of a random experiment. Before
experimental data is observed or available, X̄n is a random variable
that models the sample mean of the outcomes from a sequence of
experiments. After data has been observed, the sample mean can
then be computed as a number x̄, which is a particular value that
X̄n can take. (See Excel demo about a sequence of 3 dice roll.)

We are often interested in the sample mean of independent


random variables that follow exactly the same distribution; such
RV’s can model (for instance) repeated random experiments.

Definition
Let X1 , X2 , . . . , Xn be random variables. If these random variables
are independent and all have the same distribution, then they are
said to be independent and identically distributed (i.i.d.).

4 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Activity 1 (5 minutes)

Let X1 , X2 , . . . , Xn be i.i.d. random variables, each with mean µ


and variance σ 2 . Using the properties of expectation and variance,
show that

σ2
E(X̄n ) = µ, Var(X̄n ) = .
n

5 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Activity 1 (solution)

Using the properties of expectation (Week 5 Class 2), we have


X + X + · · · + X 
1 2 n
E(X̄n ) = E
n
1
= E(X1 + X2 + · · · + Xn )
n
1 
= E(X1 ) + E(X2 ) + · · · + E(Xn )
n
1
= (µ + µ + · · · + µ)
n | {z }
n terms
1
= nµ
n
= µ.

6 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Activity 1 (solution, continued)


Similarly, using the properties of variance (Week 5 Class 2 and
Week 9 Class 2), we get
X + X + · · · + X 
1 2 n
Var(X̄n ) = Var
n
1
= Var(X1 + X2 + · · · + Xn )
n2
1 
= 2
Var(X1 ) + Var(X2 ) + · · · + Var(Xn )
n
1
= (σ 2 + σ 2 + · · · + σ 2 )
n2
1
= nσ 2
n2
σ2
= .
n
We have used the independence of the Xi ’s in the third equality.
7 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Activity 1, interpretation

The two formulas in Activity 1 are very important. E(X̄n ) = µ


says that X̄n is centred around µ (i. e. the same center as the
original distribution), regardless of n. Var(X̄n ) = σ 2 /n says that
the spread of X̄n around µ decreases as n increases (i. e. smaller
variability than the original distribution).

So as n increases, X̄n gets more and more concentrated towards


the central value µ. Intuitively, this happens because it is unlikely
for the average of the Xi ’s to be very high or very low (since it
would require multiple values to be simultaneously high or low),
but it is likely for the average to be near the center (as there are
many ways for large and small values to cancel out).

The next result makes the above observation more precise.

8 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Law of large numbers

Law of large numbers (LLN)


Let X1 , X2 , . . . , Xn be i.i.d. random variables, each with mean
µ < ∞. Then, for any fixed number ϵ > 0,

lim P X̄n − µ < ϵ = 1.
n→∞

In words, the LLN states that for any positive margin ϵ (no
matter how small), as n (the ‘sample size’) get arbitrarily large,
the probability that X̄n is within ϵ of µ approaches 1.

So in a sense, the sample mean X̄n converges to the population


mean µ. This is remarkable, as X̄n models the average outcomes
of repeated experiments (including real life experiments), whereas
µ is just a (theoretical) number computed from a definition.
9 / 30
Law of large numbers Central limit theorem Extra notes Definitions LLN

Importance of the LLN

The law of large numbers is a very important result and has many
implications. One implication is that the empirical frequency of
any repeatable event converges to its theoretical probability.

Another implication is that histograms constructed from data


approximate the true distribution from which the data is drawn.
Proofs of these two implications are sketched in the Extra notes.

The LLN does not make any assumptions about the variance σ 2 of
the Xi ’s. When σ 2 < ∞, the result in the next section (central
limit theorem) is more useful, in that it describes the distribution
of X̄n around µ, and makes it simple to sum the Xi ’s.

10 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Outline

1 Law of large numbers


Definitions
LLN

2 Central limit theorem


Statement and examples
Activities

3 Extra notes

11 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Central limit theorem

The central limit theorem (CLT) is one of the most important,


remarkable, and useful results in probability and statistics. It
succinctly describes the behaviour of the sum (and the average) of
a large number of i.i.d. random variables.

Central limit theorem (CLT)


Let X1 , X2 , . . . , Xn be i.i.d. random variables, each with mean µ
and variance σ 2 < ∞. Then, for any real number x,
 X̄ − µ 
n
lim P √ ≤ x = Φ(x),
n→∞ σ/ n

where Φ(x) is the cdf of the standard normal random variable.

12 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

CLT, informal statements


Informally, the CLT states that, when Xi ’s are i.i.d. and n is large,

X̄n − µ
√ ≈ N (0, 1) = Z, (1)
σ/ n
 σ2 
X̄n ≈ N µ, , (2)
n
Xn
Xi ≈ N (nµ, nσ 2 ). (3)
i=1

The above formulas are equivalent. In each formula, the left


hand side random variable is being approximated by a normal
RV with the same mean and same variance as the left hand side.
Xi can have any discrete or continuous distribution; the limiting
behaviour of the sum (and the average) of the Xi ’s does not
depend on its distribution, but only on its mean and variance.
13 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Visualization

Top: the pdf of a (very non-normal-


looking) uniform(0, 1) RV.

Middle: the pdf of the sum of 2


independent uniform(0, 1) RV’s
(from last class).

Bottom: the pdf of the sum of 3


independent uniform(0, 1) RV’s,
already resembling a bell curve
(superimposed).

See more visualizations here.


14 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

CLT, continued

Recall from last class that the sum of any n independent normal
RV’s is exactly normal. The CLT can be thought of as extending
this result to other i.i.d. random variables, except now n needs to
be large and the resulting normality is only approximate.

The CLT explains why the normal approximation to the binomial


(Week 8) works: in formula (3), let the Xi ’s be i.i.d. Bernoulli
random variables, which makes the left hand side binomial.

How large must n be for the CLT to be a good approximation?


The answer depends on how skewed the distribution of Xi is. If
the distribution is symmetric, then even very small n can give
approximately normality (see Slide 14). But if the distribution is
very skewed, then n might need to be huge. For the purpose of
this course, n ≥ 30 is considered ‘large’ (and larger is better).
15 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

CLT, work flow


The CLT allows us to solve complicated problems. Typically, when
solving a problem using the CLT, we use the following steps:

1. Define all random variables of interest.

2. Find the mean and variance of the RV’s.

3. Apply an appropriate version of the CLT (see Slide 13).

4. Compute a probability involving a normal RV. This often requires


converting it into the standard normal (‘standardizing’), then
using a table or software.

5. Sometimes, we may need to apply a continuity correction (when


approximating a discrete RV by a continuous RV), or solve an
equation for n (if it is unknown).
16 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Example

Example
An unloaded dice is rolled 100 times; find the probability that the
sum of all the dice rolls is less than or equal to 330.

Solution: Let Xi be the outcome of the ith dice roll, and let
100
X
Y = Xi be the sum of all the dice rolls. The Xi ’s are clearly
i=1
i.i.d., and n = 100 is large, so the CLT may be applied.

We have computed from previous classes that


7 35
µ = E(Xi ) = , σ 2 = Var(Xi ) = .
2 12

17 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Example, continued
Now, by formula (3) of the CLT, Y is approximately normal with
mean 100 × 27 and variance
 100 × 35
12 , that is, Y ≈ X where
X ∼ N 350, 3500/12 .
Note that Y is discrete but X is continuous, so we should apply a
continuity correction when approximating Y by X. We get

P(Y ≤ 330) ≈ P(X ≤ 330.5)


 X − 350 330.5 − 350 
=P p ≤ p
3500/12 3500/12
≈ P(Z ≤ −1.1418) ≈ 0.127.

This agrees with the exact probability to 3 decimal places, but is far
easier to compute. In fact, even finding the exact pmf for the sum
of (say) 4 dice rolls is hard, let alone 100 rolls. (So, paradoxically,
it is easier to find probabilities for 100 rolls than it is for 4 rolls.)
18 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 2 (10 minutes)

Suppose that in a repeatable experiment, measurements X1 , X2 ,


. . . , Xn are i.i.d. with mean µ and standard deviation σ.

How many measurements are required to be 95% sure that X̄n is


found between µ − σ/4 and µ + σ/4?

(Note: in general, measurements are continuous.)

19 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 2 (solution)

We wish to find n, the required number of measurements. By


2
formula (2) of the CLT, X̄n ≈ X where X ∼ N µ, σn . We then
have
 σ σ  σ σ
P µ − ≤ X̄n ≤ µ + ≈P µ− ≤X ≤µ+
 σ4 σ
4 4 4
= P − ≤X −µ≤
4 4
 σ X −µ σ 
=P − √ ≤ √ ≤ √
4(σ/ n) σ/ n 4(σ/ n)
 √n √ 
n  √n 
=P − ≤Z≤ = 1 − 2Φ − .
4 4 4
√  √
Equating this to 95%, we find that Φ − 4n = 0.025, so − 4n
≈ −1.96, or n ≈ 61.5. Since n must be an integer, we conclude
that at least 62 measurements are required.
20 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 3 (10 minutes)

50 numbers are rounded off to the nearest integer and then


summed. Assume that the individual round-off errors are uniformly
distributed on [−0.5, 0.5]. Approximate the probability that the
resulting sum differs from the exact sum by more than 3.

Hints: the mean and variance of a uniform RV can be found in


Week 6 Class 1; think about the total round-off error.

21 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 3 (solution)
Let Xi be the round-off error for the ith number; it is reasonable to
assume independence, and so the Xi ’s are i.i.d. uniform(−0.5, 0.5)
random variables. Using formulas from Week 6 Class 1, we get
1
µ = E(Xi ) = 0, σ 2 = Var(Xi ) = .
12
R 0.5
(Alternately, just compute Var(Xi ) from E(Xi2 ) = −0.5 x2 1 dx.)
Let Y = 50
P
i=1 Xi be the total round-off error. We wish to find the
probability that Y is more than 3 or less than −3. By formula (3)
of the CLT, Y ≈ X where X ∼ N (0, 50/12). Hence, the required
probability is given by
P(|Y | > 3) ≈ P(|X| > 3) = 2 P(X > 3)
 X −0 3−0 
= 2P p >p ≈ 2 P(Z > 1.4697) ≈ 0.142.
50/12 50/12
(No continuity correction is needed here, as Y is continuous.)
22 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 4 (10 minutes)

The life time in months of a type of projector bulb is an


exponential(2) random variable. You have 100 projector bulbs.
Every time a bulb fails, you immediately install a replacement.

Estimate the number k, such that you can be 90% confident that
there will still be at least one working bulb after k months.

Hints: the mean and variance of an exponential RV can be found


in Week 6 Class 2; think about the total life time.

23 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 4 (solution)
Let Xi be the life time of the ith bulb; it is reasonable to assume
independence here, and so the Xi ’s are i.i.d. exponential(2)
random variables. Recall from Week 6 Class 2 that
1 1
µ = E(Xi ) = , σ 2 = Var(Xi ) = .
2 4
P100
Let Y = i=1 Xi be the total life time of the 100 bulbs. By
formula (3) of the CLT, Y ≈ X where X ∼ N (50, 25). For there
to still be at least one working bulb after k months, we require the
total life time to be at least k. Hence,
 k − 50 
0.9 = P(Y ≥ k) ≈ P(X ≥ k) = P Z ≥ √ .
25
k − 50
It follows that ≈ −1.28, or k ≈ 43.6 months.
5
(Recall that in Week 9 Class 2, we did a difficult computation for 2
bulbs using integrals; yet here, for 100 bulbs, the computation is
relatively easy due to the CLT.) 24 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 5 (15 minutes)

Suppose that a brand of candy comes in small packets; however,


the number of candies in each packet varies: 60% of the packets
contain 3 candies each, 30% contain 4 each, and 10% contain 2
each. For a party, you require at least 100 candies. How many
packets do you need to buy in order to be 95% certain that they
will contain enough candies?
Hints: continuity correction; quadratic formula.

25 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 5 (solution)
Let Xi be the number of candies P in the ith packet, n be the
number of packets, and Y = ni=1 Xi be the total number of
candies. It is reasonable to assume independence here, and so the
Xi ’s are i.i.d. random variables. We first compute their mean and
variance:

µ = E(Xi ) = 2 × 0.1 + 3 × 0.6 + 4 × 0.3 = 3.2,


σ 2 = Var(Xi ) = 22 × 0.1 + 32 × 0.6 + 42 × 0.3 − 3.22 = 0.36.

Then, by the CLT, Y ≈ X where X ∼ N (3.2n, 0.36n). Note that


Y is discrete but X is continuous, so using a continuity correction,
we have
 99.5 − 3.2n 
P(Y ≥ 100) ≈ P(X ≥ 99.5) = P Z ≥ √ .
0.36n

26 / 30
Law of large numbers Central limit theorem Extra notes Statement and examples Activities

Activity 5 (solution, continued)

We require the above probability to be (at least) 95%. Hence, we


set  99.5 − 3.2n 
P Z≥ √ = 0.95 = P(Z ≥ −1.645),
0.6 n
99.5 − 3.2n
hence √ = −1.645 (to 3 decimal places).
0.6 n

Letting x = n, we get the equation 3.2x2 − 0.987x − 99.5 = 0.
This can be solved using the quadratic formula, with the unique
positive solution being x ≈ 5.73, or n ≈ 32.9.

Therefore, at least n = 33 packets need to be bought to be 95%


certain that they contain ≥ 100 candies. (In hindsight, this n is
indeed large enough to justify the use of CLT in the first place.)

27 / 30
Law of large numbers Central limit theorem Extra notes

Outline

1 Law of large numbers


Definitions
LLN

2 Central limit theorem


Statement and examples
Activities

3 Extra notes

28 / 30
Law of large numbers Central limit theorem Extra notes

LLN
The version of LLN presented on Slide 9 is technically called the
weak law of large numbers, because there is a stronger version
called the strong law. The difference between the two versions is
very subtle and beyond the scope of the course.

To see why the first implication on Slide 10 is true, let A be the


event of interest, with probability P(A). Define a Bernoulli random
variable Xi = 1 if A occurs on the ith (independent) trial, and 0
otherwise. Then X̄n is the empirical frequency of A, and by the
LLN, X̄n converges to µ = E(Xi ) = P(A).

To understand the second implication on Slide 10, consider a bin in


a histogram corresponding to the interval [a, b). Define Xi = 1 if
the ith data value Yi falls into the bin, and 0 otherwise. Then X̄n
is the proportion of data values in that bin, and it converges to
µ = E(Xi ) = P(a ≤ Yi < b).
29 / 30
Law of large numbers Central limit theorem Extra notes

CLT
The CLT is a statement about the sum or average of the Xi ’s,
not about the Xi ’s themselves. So e. g. getting more data values
or measurements (hence increasing the sample size n) does not
magically make the data or measurements normally distributed.

Many natural phenomena can be modelled as a sum of i.i.d.


random variables, so the CLT is one way to explain why the
normal distribution is seen very frequently in nature.

That the CLT applies to independent events (such as dice rolls)


is quite remarkable, since each outcome cannot affect any other
outcomes, yet the collective behaviour of the outcomes can be
modelled by a normal distribution. Hence, the CLT shows that
surprisingly, collective behaviour can be easier to predict, more
structured, and ‘less random’ than individual behaviour.

A proof of the CLT is beyond the scope of the course.


30 / 30

You might also like