0% found this document useful (0 votes)
4 views53 pages

11 CentralLimitTheorem

The document provides an overview of the Central Limit Theorem (CLT) and its implications for sampling data, emphasizing the behavior of sample means as sample size increases. It discusses concepts such as box models, chance variability, and the law of averages, illustrating these ideas through historical experiments conducted by John Edmund Kerrich. The document also includes mathematical formulations for expectation and standard error related to sums and averages from random samples.

Uploaded by

ishrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views53 pages

11 CentralLimitTheorem

The document provides an overview of the Central Limit Theorem (CLT) and its implications for sampling data, emphasizing the behavior of sample means as sample size increases. It discusses concepts such as box models, chance variability, and the law of averages, illustrating these ideas through historical experiments conducted by John Edmund Kerrich. The document also includes mathematical formulations for expectation and standard error related to sums and averages from random samples.

Uploaded by

ishrat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Central Limit Theorem

Sampling Data | Chance Variability

© University of Sydney MATH1062/1005


12 September 2024
Course Overview

Population

3 Sampling Data 4 Decisions with Data

1 Exploring Data Sample 2 Modelling Data

2/53

 Module3 Sampling Data

Understanding probability
What is probability?

Counting and chance simulation


How to count the number of possible outcomes?

Chance variability
How can we model chance variability by a box model?

Central limit theorem


What is the behaviour of the sample mean for a large sample size?

3/53

 Today’s outline

A review of box models

Sums and averages


The Central Limit Theorem (CLT)

4/53
A review of box models
Single draws from box models
· Suppose we have a “box” containing tickets each bearing a number:
{𝑥1 , … , 𝑥𝑁 }.
· The probability a random draw 𝑋 from the box takes a value 𝑥 is given by
𝑃(𝑋 = 𝑥) = no. of 𝑥's𝑁in the box .
· The expectation of 𝑋 , denoted as 𝐸(𝑋), is the average of the box:
𝜇 = 𝑁1 ∑𝑁 𝑖=1 𝑥𝑖 .
· The standard error of 𝑋 , denoted as 𝑆𝐸(𝑋) , the SD of the box:
𝜎 = √‾𝑁1‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
[(𝑥1 − 𝜇)2 + ⋯ + (𝑥𝑁 − 𝜇)2 ].

6/53
Chance error
· We may decompose such a random draw 𝑋 into two parts:

𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 ,


where 𝜀 is a random draw from the error box containing {𝑥1 − 𝜇, … , 𝑥𝑁 − 𝜇} .

· The SD 𝜎 of the original box is the root-mean-square of the error box, and
describes the “size” of the errors.
- We may interpret 𝜎 = 𝑆𝐸(𝑋) as the “likely size” of the chance error 𝜀 ,
i.e. the likely size of the deviation of 𝑋 from its expected value 𝐸(𝑋).

7/53
Random samples from a box
· The sum 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 of a random sample (with repl.) of size 𝑛 from the
box has
- 𝐸(𝑆) = 𝑛𝜇 ;
- 𝑆𝐸(𝑆) = 𝜎√𝑛 .
- i.e. the box of all possible sums has average 𝑛𝜇 and SD 𝜎√𝑛 .

· The sample mean 𝑋¯ = 𝑆/𝑛 = 𝑋1 +⋯+𝑋𝑛 has


𝑛
- 𝐸(𝑋¯ ) = 𝜇
- 𝑆𝐸(𝑋¯ ) = 𝜎 .
𝑛 √
- i.e. the box of all possible sample means has average 𝜇 and SD 𝜎/√𝑛 .

· The two boxes (all possible sums, all possible sample means) have the same
shape.

8/53
Increasing the sample size
Coin tossing in WWII
· John Edmund Kerrich (1903–1985) was a mathematician noted for a series of
experiments in probability which he conducted while interned in Nazi-occupied
Denmark (Viborg, Midtjylland) in the 1940s.

· Two days before he was scheduled to fly to England, Nazi Germany invaded
Denmark.

10/53
Various “random experiments”
· With a fellow internee Eric Christensen, Kerrich set up a sequence of experiments
demonstrating the empirical validity of a number of fundamental laws of probability.

- They tossed a (fair) coin 10,000 times and counted the number of heads
(5,067).

- They made 5,000 draws from a container with 4 ping pong balls (2x2 different
brands), ‘at the rate of 400 an hour, with - need it be stated - periods of rest
between successive hours.’

- They investigated tosses of a “biased coin”, made from a wooden disk partly
coated in lead.

· In 1946 Kerrich published his finding in a monograph An Experimental Introduction


to the Theory of Probability.

11/53
Simulating Kerrich’s 1st experiment
· Each coin flip (assuming the coin is fair) is like a random draw from the “box”

0 1

· This box has average 𝜇 = 1 and also SD


2

‾1‾‾‾‾‾‾‾‾‾‾2
√2
−( ) =
1 ‾‾
1 1

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
𝜎 = √mn.sq. − (mean) =
2
= .
2 4 2

· We may then model 𝑛 “independent” flips 𝑋1 , … , 𝑋𝑛 as a random sample with


replacement of size 𝑛 from this box.

· The sum 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 is the number of heads.


· The average 𝑋¯ = 𝑆/𝑛 is the proportion of heads.

12/53
Simulating 1st experiment: chance error in sums
flips = sample(c(0, 1), size = 10000, replace = T) # 'box' is c(0,1)
cumulative.sums = cumsum(flips)
n = 1:10000
ES = n/2
plot(n, cumulative.sums - ES, type = "l")
abline(h = 0, col = "red")

13/53
Cumulative proportion
cumulative.propns = cumulative.sums/n # remember n = 1:10000 is a vector!
plot(n, cumulative.propns, type = "l", ylim = c(0, 1))
abline(h = 0.5, col = "red")

14/53
Logarithmic scale
par(mfrow = c(1, 2))
plot(log(n, base = 10), cumulative.sums - ES, type = "l")
abline(h = 0, col = "red")
plot(log(n, base = 10), cumulative.propns, type = "l", ylim = c(0, 1))
abline(h = 0.5, col = "red")

15/53
Size of chance errors as 𝑛 increases
· It seems that
- the size of the chance error in the sums increases;
- the size of the chance error in the proportion decreases;

· This makes perfect sense, because


- the “likely size” of the chance error for the sum, i.e.

𝑆𝐸(𝑆) = 𝜎√𝑛 → ∞
as 𝑛 → ∞
- the “likely size” of the chance error for the proportion, i.e.

¯ 𝜎
𝑆𝐸(𝑋) = →0
√𝑛
as 𝑛 → ∞ .
16/53
Law of Averages
· For the sample mean 𝑋¯ from any box model,
𝜎
𝑆𝐸(𝑋¯ ) = →0
√𝑛
as 𝑛 → ∞ .

· So the likely size of the chance error between 𝑋¯ and 𝐸(𝑋¯ ) = 𝜇 gets smaller and
smaller as 𝑛 increases.

· In other words, as the sample size 𝑛 increases, the distribution of a sample mean
𝑋¯ gets “more concentrated” about the “population mean” 𝜇 .
· This “phenomenon” is (loosely) known as the “Law of Averages” or the “Law of
Large Numbers”.

17/53
Demonstration
· We can determine the box of all possible sums for small values of 𝑛 :

box = c(0, 1)
s2 = outer(box, box, "+") # forms two-way array of all possible sums for n=2
s2

## [,1] [,2]
## [1,] 0 1
## [2,] 1 2

as.vector(s2) # converts matrix to a vector

## [1] 0 1 1 2

· We can iterate this procedure to get all sums for 𝑛 = 3 :

s3 = as.vector(outer(box, s2, "+")) # each sum for n=3 adds 0 or 1 to each sum in s2
s3

## [1] 0 1 1 2 1 2 2 3

18/53
· Again, for 𝑛 = 4 :

s4 = as.vector(outer(box, s3, "+")) # each sum for n=4 adds 0 or 1 to each sum in s3
s4

## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4

· Again, for 𝑛 = 5 :

s5 = as.vector(outer(box, s4, "+")) # each sum for n=5 adds 0 or 1 to each sum in s4
s5

## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5

· Again, for 𝑛 = 6 :

s6 = as.vector(outer(box, s5, "+")) # each sum for n=6 adds 0 or 1 to each sum in s5
s6

## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 1 2 2 3 2 3
## [39] 3 4 2 3 3 4 3 4 4 5 2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6

19/53
All possible sums to all possible averages
m2 = as.vector(s2)/2
m2

## [1] 0.0 0.5 0.5 1.0

m3 = s3/3
m3

## [1] 0.0000000 0.3333333 0.3333333 0.6666667 0.3333333 0.6666667 0.6666667


## [8] 1.0000000

m4 = s4/4
m5 = s5/5
m6 = s6/6
s7 = as.vector(outer(box, s6, "+"))
m7 = s7/7
means = list(`n=2` = m2, `n=3` = m3, `n=4` = m4, `n=5` = m5, `n=6` = m6, `n=7` = m7)

20/53
head(means, 5)

## $`n=2`
## [1] 0.0 0.5 0.5 1.0
##
## $`n=3`
## [1] 0.0000000 0.3333333 0.3333333 0.6666667 0.3333333 0.6666667 0.6666667
## [8] 1.0000000
##
## $`n=4`
## [1] 0.00 0.25 0.25 0.50 0.25 0.50 0.50 0.75 0.25 0.50 0.50 0.75 0.50 0.75 0.75
## [16] 1.00
##
## $`n=5`
## [1] 0.0 0.2 0.2 0.4 0.2 0.4 0.4 0.6 0.2 0.4 0.4 0.6 0.4 0.6 0.6 0.8 0.2 0.4 0.4
## [20] 0.6 0.4 0.6 0.6 0.8 0.4 0.6 0.6 0.8 0.6 0.8 0.8 1.0
##
## $`n=6`
## [1] 0.0000000 0.1666667 0.1666667 0.3333333 0.1666667 0.3333333 0.3333333
## [8] 0.5000000 0.1666667 0.3333333 0.3333333 0.5000000 0.3333333 0.5000000
## [15] 0.5000000 0.6666667 0.1666667 0.3333333 0.3333333 0.5000000 0.3333333
## [22] 0.5000000 0.5000000 0.6666667 0.3333333 0.5000000 0.5000000 0.6666667
## [29] 0.5000000 0.6666667 0.6666667 0.8333333 0.1666667 0.3333333 0.3333333
## [36] 0.5000000 0.3333333 0.5000000 0.5000000 0.6666667 0.3333333 0.5000000
## [43] 0.5000000 0.6666667 0.5000000 0.6666667 0.6666667 0.8333333 0.3333333
## [50] 0.5000000 0.5000000 0.6666667 0.5000000 0.6666667 0.6666667 0.8333333
## [57] 0.5000000 0.6666667 0.6666667 0.8333333 0.6666667 0.8333333 0.8333333
## [64] 1.0000000

21/53
par(mfrow = c(2, 3))
for (i in 1:6) {
n = i + 1
br = (0:(n + 1) - 0.5)/n # breaks midway between possible values
hist(means[[n - 1]], pr = T, breaks = br, main = paste("n=", n), ylim = c(0,
4))
}

22/53
All possible averages for 𝑛 = 8, … , 13

23/53
All possible averages for 𝑛 = 14, … , 19

24/53
All possible averages for 𝑛 = 20, … , 25

…and so on…

25/53
Two important things
· In this example it is very clear that TWO important things are happening:
1. the spread of the distribution of all possible averages/proportions is getting
more concentrated about 𝜇 = 0.5 as 𝑛 increases;
2. the shape of the histogram of all possible averages/proportions is becoming
“normal-shaped”.

· Is the “normal shape” due to something special about this particular simple box?
- Not really!

26/53
Rolling a 6-sided die
· Suppose we are interested in rolling a 6-sided die 𝑛 times. How does the sum of
the rolls behave?
· This is like taking a random sample of size 𝑛 from the box

1 2 3 4 5 6

· This box has


- mean 𝜇 = 3.5 = 7
2
- mean square 1+4+9+16+25+36 = 91
6 6
‾‾‾‾‾‾‾‾‾
7 ‾ ‾182−(3×49)
‾‾‾‾‾‾‾‾ = ‾‾
‾ ≈ 1.708.
( )
-
SD 𝜎 = √ 91
2
√ √ 12
35
6 − 2 = 12

27/53
box = 1:6
box

## [1] 1 2 3 4 5 6

s2 = as.vector(outer(box, box, "+"))


s3 = as.vector(outer(s2, box, "+"))
s4 = as.vector(outer(s3, box, "+"))
s5 = as.vector(outer(s4, box, "+"))
s6 = as.vector(outer(s5, box, "+"))
sums.rolls = list(box, s2, s3, s4, s5, s6)

28/53
Histograms of all possible sums-of-𝑛-rolls

For 𝑛 = 6 this is certainly normal-shaped too!

29/53
Asymmetric example
· Instead of the sum of the rolls, how about the number of times we roll ⚅?
· We instead use the box

0 0 0 0 0 1

· The number of times we get ⚅ in 𝑛 rolls is just like the sum 𝑆 when we take a
random sample of size 𝑛 from this new box.
· This new box has
- mean 𝜇 = 1
6
- mean square 1
6
- ‾‾‾‾‾‾‾‾‾2 ‾6−1
‾‾‾
SD 𝜎 = √ 16 − ( 16 ) = √ 36 = √6 ≈ 0.373.
5

30/53
Histograms of all possible no.s-of-⚅s

Not looking very normal-shaped…what about if we let 𝑛 get larger?

31/53
32/53
33/53
34/53
35/53
36/53
37/53
38/53
We get a normal shape, but only for larger 𝑛
· So although the histograms of all possible sums (“no.s-of-times-we-roll-⚅”) are not
normal-shaped for smaller 𝑛 , as 𝑛 increases the shape gets closer to a normal.
· By the time 𝑛 > 100, the shape is quite symmetric.
· It turns out that for essentially any box, we get the same phenomenon occurring:
- as 𝑛 gets larger and larger, the box of all possible sums gets a “more normal”
shape.

39/53
The Central Limit Theorem
Comparing the histogram and normal curve
· We can use the previous fact to estimate probabilities using normal curves!
· Knowing only the mean and SD of the box, we can approximate proportions in the
histogram, and hence chances of getting different values.
· Let’s consider 6 draws 𝑋1 , … , 𝑋6 from a six sided die

1 2 3 4 5 6

· The expected value and SE of 𝑆 = 𝑋1 + ⋯ + 𝑋6 :


- 𝐸(𝑆) = 𝑛𝐸(𝑋1 ) = 6 ⋅ 3.5 = 21
-
𝑆𝐸(𝑆) = √𝑛 ⋅ 𝑆𝐸(𝑋1 ) = √6‾ ⋅ √‾‾
‾ ≈ 4.18
35
12

41/53
In R
box_sums = 1:6
for (i in 1:5) {
box_sums = as.vector(outer(box_sums, 1:6, "+"))
}
mn.box_sums = mean(box_sums)
mn.box_sums

## [1] 21

SD.box_sums = sqrt(mean((box_sums - mn.box_sums)^2))


SD.box_sums

## [1] 4.1833

42/53
In R
br = (5:36) + 0.5
hist(box_sums, breaks = br, pr = T)
curve(dnorm(x, mn.box_sums, SD.box_sums), add = T, lty = 2) # lty=2 gives a dashed line

43/53
Normal approximation
· We can find the “area” to the left of 18 for a normal curve with the same mean and
SD:

pnorm(18, mn.box_sums, SD.box_sums)

## [1] 0.2366447

· This approximates the actual probability:

sum(box_sums < 18)/length(box_sums)

## [1] 0.2058471

44/53
Normal approximation
· Extension: Note that a better approximation is obtained if we get the area to the
left of 17.5:

pnorm(17.5, mn.box_sums, SD.box_sums) # much closer to the true value!

## [1] 0.2013918

· This works because the values in the box are whole numbers, and the area under
the rectangles we want is actually to the left of 17.5 (see the histogram repeated
on the next slide).

45/53
Normal approximation
hist(box_sums, breaks = br, pr = T)
curve(dnorm(x, mn.box_sums, SD.box_sums), add = T, lty = 2) # lty=2 gives a dashed line
abline(v = 18, col = "blue")
abline(v = 17.5, col = "red")
legend("topright", leg = c("18", "17.5"), lty = c(1, 1), col = c("blue", "red"))

46/53
Most important result in Statistics
· This phenomenon can be mathematically proven to hold for any fixed (finite) box.
· This result is a special case of the Central Limit Theorem.
- It is a “limit theorem” because it describes what happens “in the limit” as
𝑛 → ∞.
- “Central” here means “most important”.
· The “standard normal CDF” Φ(𝑧) is the function given in R by pnorm( 𝑧 ) .

If 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 is the sum of random sample (with replacement) of size 𝑛


from a box with mean 𝜇 and SD 𝜎 , then for large 𝑛 ,

( 𝜎√ 𝑛 ) ( 𝜎√ 𝑛 )
𝑆 − 𝑛𝜇 𝑠 − 𝑛𝜇 𝑠 − 𝑛𝜇
𝑃(𝑆 ≤ 𝑠) = 𝑃 ≤ ≈Φ
𝜎√ 𝑛

47/53
Deconstructing the Central Limit Theorem
· The CLT states that areas under the histogram for sample sums 𝑆 are
approximated by areas under the normal curve
· Note that the desired sum value 𝑠 being considered here, when converted into
standard units is

𝑠 − 𝐸(𝑆) 𝑠 − 𝑛𝜇
𝑧𝑠 = = ,
𝑆𝐸(𝑆) 𝜎√ 𝑛
which is the ratio inside the Φ(⋅).
· Therefore, converting to R code, we have
𝑃(𝑆 ≤ 𝑠) ≈ 𝚙𝚗𝚘𝚛𝚖((𝑠 − 𝑛𝜇)/(𝜎√𝑛)) = 𝚙𝚗𝚘𝚛𝚖(𝑠, 𝚖 =𝑛𝜇, 𝚜 =𝜎√𝑛) .
· The theorem equally applies to the sample mean 𝑋¯ as well. Letting 𝑠 = 𝑛𝑥:

( 𝜎√ 𝑛 )
¯ 𝑆 𝑠 − 𝑛𝜇
𝑃(𝑋 ≤ 𝑥) = 𝑃( ≤ 𝑥) = 𝑃(𝑆 ≤ 𝑛𝑥) ≈ Φ
𝑛
48/53
Example: Roulette
· A roulette wheel has slots numbered 1 to 36, plus 1 (or more) slots marked 0.
- half the positive numbers are coloured black;
- the remaining positive numbers are coloured red;
- the zero slots are coloured green.
· If you bet on either “red” or “black”,
- you double your money if the ball lands in a slot of your colour
- you lose your money otherwise.
· Suppose a wheel has two green slots (“0” and “00”), each slot is equally likely and
a player bets $1 on “red” for 𝑛 consecutive spins.
· Let 𝑆 denote the total winnings after 𝑛 spins.
· Approximate 𝑃(𝑆 > 0) for 𝑛 = 5, 25, 125, 625.

49/53
The Roulette Box
· There are 38 slots in total, 18 of which are red.
· If the ball
- lands in a red slot the player wins $1;
- does not land in a red slot, the player loses $1, i.e. they win –$1.
· Use the following box:

−1 ⋯ −1 +1 ⋯ +1
 
20 of these 18 of these

· which has
- mean 𝜇 = −2 = − 1 ;
38 19
‾‾‾‾‾‾‾‾‾
1 ‾
mean square 1; SD 𝜎 = √1 − ( 19 ‾360
‾‾‾ ≈ 0.9986.
)
- 2
= √ 361
50/53
Exact answers
· It is possible to work out the exact probabilities (using the “binomial distribution”,
more on this later).

· These are

n = c(5, 25, 125, 625)


prob.win = 1 - pbinom(n/2, n, 18/38)
rbind(n, prob.win)

## [,1] [,2] [,3] [,4]


## n 5.0000000 25.0000000 125.0000000 625.00000000
## prob.win 0.4507489 0.3951246 0.2775865 0.09388094

51/53
Normal approximation
According to the Central Limit Theorem, for “large 𝑛 ”,

⎛ ⎞
⎜ 0 − (− 19 ) ⎟
( 19√‾360
‾‾‾ )
𝑛
√‾361𝑛
‾‾‾‾
𝑃(𝑆 > 0) = 1 − 𝑃(𝑆 ≤ 0) ≈ 1 − Φ ⎜ ⎟ = 1 − 𝚙𝚗𝚘𝚛𝚖
⎜ ‾‾‾ ⎟
‾360𝑛
⎝ √ 361 ⎠

This gives

1 - pnorm(sqrt(361 * n)/(19 * sqrt(360)))

## [1] 0.45309281 0.39607370 0.27784490 0.09381616

· These are quite good approximations (even for 𝑛 = 5 )!


· Makes sense, because the box is reasonably symmetric (not that different in
shape to Kerrich’s box).

52/53
Final comments
· When we take a random sample of size 𝑛 (with replacement) from a box with
mean 𝜇 and SD 𝜎 , the box of all possible sums
- has mean equal to 𝐸(𝑆) = 𝑛𝜇 ;
- has SD equal to 𝑆𝐸(𝑆) = 𝜎√𝑛 ;
- is (approx.) normal-shaped for “large enough 𝑛 ”.

· For such 𝑛 we can approximate probabilities for the random sum 𝑆 or average
𝑋¯ = 𝑆/𝑛, using pnorm() .
· How large is “large enough 𝑛 ”? It depends, on how “non-normal” the original box
is.
· If the original box is
- reasonably symmetric (without too many outliers), 𝑛 = 5 or 10 may do;
- very skewed, we may need 𝑛 > 100 before the box of all possible sums has
a nice, symmetric normal shape.
53/53

You might also like