11 CentralLimitTheorem
11 CentralLimitTheorem
Population
2/53
Module3 Sampling Data
Understanding probability
What is probability?
Chance variability
How can we model chance variability by a box model?
3/53
Today’s outline
4/53
A review of box models
Single draws from box models
· Suppose we have a “box” containing tickets each bearing a number:
{𝑥1 , … , 𝑥𝑁 }.
· The probability a random draw 𝑋 from the box takes a value 𝑥 is given by
𝑃(𝑋 = 𝑥) = no. of 𝑥's𝑁in the box .
· The expectation of 𝑋 , denoted as 𝐸(𝑋), is the average of the box:
𝜇 = 𝑁1 ∑𝑁 𝑖=1 𝑥𝑖 .
· The standard error of 𝑋 , denoted as 𝑆𝐸(𝑋) , the SD of the box:
𝜎 = √‾𝑁1‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
[(𝑥1 − 𝜇)2 + ⋯ + (𝑥𝑁 − 𝜇)2 ].
6/53
Chance error
· We may decompose such a random draw 𝑋 into two parts:
· The SD 𝜎 of the original box is the root-mean-square of the error box, and
describes the “size” of the errors.
- We may interpret 𝜎 = 𝑆𝐸(𝑋) as the “likely size” of the chance error 𝜀 ,
i.e. the likely size of the deviation of 𝑋 from its expected value 𝐸(𝑋).
7/53
Random samples from a box
· The sum 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 of a random sample (with repl.) of size 𝑛 from the
box has
- 𝐸(𝑆) = 𝑛𝜇 ;
- 𝑆𝐸(𝑆) = 𝜎√𝑛 .
- i.e. the box of all possible sums has average 𝑛𝜇 and SD 𝜎√𝑛 .
· The two boxes (all possible sums, all possible sample means) have the same
shape.
8/53
Increasing the sample size
Coin tossing in WWII
· John Edmund Kerrich (1903–1985) was a mathematician noted for a series of
experiments in probability which he conducted while interned in Nazi-occupied
Denmark (Viborg, Midtjylland) in the 1940s.
· Two days before he was scheduled to fly to England, Nazi Germany invaded
Denmark.
10/53
Various “random experiments”
· With a fellow internee Eric Christensen, Kerrich set up a sequence of experiments
demonstrating the empirical validity of a number of fundamental laws of probability.
- They tossed a (fair) coin 10,000 times and counted the number of heads
(5,067).
- They made 5,000 draws from a container with 4 ping pong balls (2x2 different
brands), ‘at the rate of 400 an hour, with - need it be stated - periods of rest
between successive hours.’
- They investigated tosses of a “biased coin”, made from a wooden disk partly
coated in lead.
11/53
Simulating Kerrich’s 1st experiment
· Each coin flip (assuming the coin is fair) is like a random draw from the “box”
0 1
‾1‾‾‾‾‾‾‾‾‾‾2
√2
−( ) =
1 ‾‾
1 1
√
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
𝜎 = √mn.sq. − (mean) =
2
= .
2 4 2
12/53
Simulating 1st experiment: chance error in sums
flips = sample(c(0, 1), size = 10000, replace = T) # 'box' is c(0,1)
cumulative.sums = cumsum(flips)
n = 1:10000
ES = n/2
plot(n, cumulative.sums - ES, type = "l")
abline(h = 0, col = "red")
13/53
Cumulative proportion
cumulative.propns = cumulative.sums/n # remember n = 1:10000 is a vector!
plot(n, cumulative.propns, type = "l", ylim = c(0, 1))
abline(h = 0.5, col = "red")
14/53
Logarithmic scale
par(mfrow = c(1, 2))
plot(log(n, base = 10), cumulative.sums - ES, type = "l")
abline(h = 0, col = "red")
plot(log(n, base = 10), cumulative.propns, type = "l", ylim = c(0, 1))
abline(h = 0.5, col = "red")
15/53
Size of chance errors as 𝑛 increases
· It seems that
- the size of the chance error in the sums increases;
- the size of the chance error in the proportion decreases;
𝑆𝐸(𝑆) = 𝜎√𝑛 → ∞
as 𝑛 → ∞
- the “likely size” of the chance error for the proportion, i.e.
¯ 𝜎
𝑆𝐸(𝑋) = →0
√𝑛
as 𝑛 → ∞ .
16/53
Law of Averages
· For the sample mean 𝑋¯ from any box model,
𝜎
𝑆𝐸(𝑋¯ ) = →0
√𝑛
as 𝑛 → ∞ .
· So the likely size of the chance error between 𝑋¯ and 𝐸(𝑋¯ ) = 𝜇 gets smaller and
smaller as 𝑛 increases.
· In other words, as the sample size 𝑛 increases, the distribution of a sample mean
𝑋¯ gets “more concentrated” about the “population mean” 𝜇 .
· This “phenomenon” is (loosely) known as the “Law of Averages” or the “Law of
Large Numbers”.
17/53
Demonstration
· We can determine the box of all possible sums for small values of 𝑛 :
box = c(0, 1)
s2 = outer(box, box, "+") # forms two-way array of all possible sums for n=2
s2
## [,1] [,2]
## [1,] 0 1
## [2,] 1 2
## [1] 0 1 1 2
s3 = as.vector(outer(box, s2, "+")) # each sum for n=3 adds 0 or 1 to each sum in s2
s3
## [1] 0 1 1 2 1 2 2 3
18/53
· Again, for 𝑛 = 4 :
s4 = as.vector(outer(box, s3, "+")) # each sum for n=4 adds 0 or 1 to each sum in s3
s4
## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4
· Again, for 𝑛 = 5 :
s5 = as.vector(outer(box, s4, "+")) # each sum for n=5 adds 0 or 1 to each sum in s4
s5
## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
· Again, for 𝑛 = 6 :
s6 = as.vector(outer(box, s5, "+")) # each sum for n=6 adds 0 or 1 to each sum in s5
s6
## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 1 2 2 3 2 3
## [39] 3 4 2 3 3 4 3 4 4 5 2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
19/53
All possible sums to all possible averages
m2 = as.vector(s2)/2
m2
m3 = s3/3
m3
m4 = s4/4
m5 = s5/5
m6 = s6/6
s7 = as.vector(outer(box, s6, "+"))
m7 = s7/7
means = list(`n=2` = m2, `n=3` = m3, `n=4` = m4, `n=5` = m5, `n=6` = m6, `n=7` = m7)
20/53
head(means, 5)
## $`n=2`
## [1] 0.0 0.5 0.5 1.0
##
## $`n=3`
## [1] 0.0000000 0.3333333 0.3333333 0.6666667 0.3333333 0.6666667 0.6666667
## [8] 1.0000000
##
## $`n=4`
## [1] 0.00 0.25 0.25 0.50 0.25 0.50 0.50 0.75 0.25 0.50 0.50 0.75 0.50 0.75 0.75
## [16] 1.00
##
## $`n=5`
## [1] 0.0 0.2 0.2 0.4 0.2 0.4 0.4 0.6 0.2 0.4 0.4 0.6 0.4 0.6 0.6 0.8 0.2 0.4 0.4
## [20] 0.6 0.4 0.6 0.6 0.8 0.4 0.6 0.6 0.8 0.6 0.8 0.8 1.0
##
## $`n=6`
## [1] 0.0000000 0.1666667 0.1666667 0.3333333 0.1666667 0.3333333 0.3333333
## [8] 0.5000000 0.1666667 0.3333333 0.3333333 0.5000000 0.3333333 0.5000000
## [15] 0.5000000 0.6666667 0.1666667 0.3333333 0.3333333 0.5000000 0.3333333
## [22] 0.5000000 0.5000000 0.6666667 0.3333333 0.5000000 0.5000000 0.6666667
## [29] 0.5000000 0.6666667 0.6666667 0.8333333 0.1666667 0.3333333 0.3333333
## [36] 0.5000000 0.3333333 0.5000000 0.5000000 0.6666667 0.3333333 0.5000000
## [43] 0.5000000 0.6666667 0.5000000 0.6666667 0.6666667 0.8333333 0.3333333
## [50] 0.5000000 0.5000000 0.6666667 0.5000000 0.6666667 0.6666667 0.8333333
## [57] 0.5000000 0.6666667 0.6666667 0.8333333 0.6666667 0.8333333 0.8333333
## [64] 1.0000000
21/53
par(mfrow = c(2, 3))
for (i in 1:6) {
n = i + 1
br = (0:(n + 1) - 0.5)/n # breaks midway between possible values
hist(means[[n - 1]], pr = T, breaks = br, main = paste("n=", n), ylim = c(0,
4))
}
22/53
All possible averages for 𝑛 = 8, … , 13
23/53
All possible averages for 𝑛 = 14, … , 19
24/53
All possible averages for 𝑛 = 20, … , 25
…and so on…
25/53
Two important things
· In this example it is very clear that TWO important things are happening:
1. the spread of the distribution of all possible averages/proportions is getting
more concentrated about 𝜇 = 0.5 as 𝑛 increases;
2. the shape of the histogram of all possible averages/proportions is becoming
“normal-shaped”.
· Is the “normal shape” due to something special about this particular simple box?
- Not really!
26/53
Rolling a 6-sided die
· Suppose we are interested in rolling a 6-sided die 𝑛 times. How does the sum of
the rolls behave?
· This is like taking a random sample of size 𝑛 from the box
1 2 3 4 5 6
27/53
box = 1:6
box
## [1] 1 2 3 4 5 6
28/53
Histograms of all possible sums-of-𝑛-rolls
29/53
Asymmetric example
· Instead of the sum of the rolls, how about the number of times we roll ⚅?
· We instead use the box
0 0 0 0 0 1
· The number of times we get ⚅ in 𝑛 rolls is just like the sum 𝑆 when we take a
random sample of size 𝑛 from this new box.
· This new box has
- mean 𝜇 = 1
6
- mean square 1
6
- ‾‾‾‾‾‾‾‾‾2 ‾6−1
‾‾‾
SD 𝜎 = √ 16 − ( 16 ) = √ 36 = √6 ≈ 0.373.
5
30/53
Histograms of all possible no.s-of-⚅s
31/53
32/53
33/53
34/53
35/53
36/53
37/53
38/53
We get a normal shape, but only for larger 𝑛
· So although the histograms of all possible sums (“no.s-of-times-we-roll-⚅”) are not
normal-shaped for smaller 𝑛 , as 𝑛 increases the shape gets closer to a normal.
· By the time 𝑛 > 100, the shape is quite symmetric.
· It turns out that for essentially any box, we get the same phenomenon occurring:
- as 𝑛 gets larger and larger, the box of all possible sums gets a “more normal”
shape.
39/53
The Central Limit Theorem
Comparing the histogram and normal curve
· We can use the previous fact to estimate probabilities using normal curves!
· Knowing only the mean and SD of the box, we can approximate proportions in the
histogram, and hence chances of getting different values.
· Let’s consider 6 draws 𝑋1 , … , 𝑋6 from a six sided die
1 2 3 4 5 6
41/53
In R
box_sums = 1:6
for (i in 1:5) {
box_sums = as.vector(outer(box_sums, 1:6, "+"))
}
mn.box_sums = mean(box_sums)
mn.box_sums
## [1] 21
## [1] 4.1833
42/53
In R
br = (5:36) + 0.5
hist(box_sums, breaks = br, pr = T)
curve(dnorm(x, mn.box_sums, SD.box_sums), add = T, lty = 2) # lty=2 gives a dashed line
43/53
Normal approximation
· We can find the “area” to the left of 18 for a normal curve with the same mean and
SD:
## [1] 0.2366447
## [1] 0.2058471
44/53
Normal approximation
· Extension: Note that a better approximation is obtained if we get the area to the
left of 17.5:
## [1] 0.2013918
· This works because the values in the box are whole numbers, and the area under
the rectangles we want is actually to the left of 17.5 (see the histogram repeated
on the next slide).
45/53
Normal approximation
hist(box_sums, breaks = br, pr = T)
curve(dnorm(x, mn.box_sums, SD.box_sums), add = T, lty = 2) # lty=2 gives a dashed line
abline(v = 18, col = "blue")
abline(v = 17.5, col = "red")
legend("topright", leg = c("18", "17.5"), lty = c(1, 1), col = c("blue", "red"))
46/53
Most important result in Statistics
· This phenomenon can be mathematically proven to hold for any fixed (finite) box.
· This result is a special case of the Central Limit Theorem.
- It is a “limit theorem” because it describes what happens “in the limit” as
𝑛 → ∞.
- “Central” here means “most important”.
· The “standard normal CDF” Φ(𝑧) is the function given in R by pnorm( 𝑧 ) .
( 𝜎√ 𝑛 ) ( 𝜎√ 𝑛 )
𝑆 − 𝑛𝜇 𝑠 − 𝑛𝜇 𝑠 − 𝑛𝜇
𝑃(𝑆 ≤ 𝑠) = 𝑃 ≤ ≈Φ
𝜎√ 𝑛
47/53
Deconstructing the Central Limit Theorem
· The CLT states that areas under the histogram for sample sums 𝑆 are
approximated by areas under the normal curve
· Note that the desired sum value 𝑠 being considered here, when converted into
standard units is
𝑠 − 𝐸(𝑆) 𝑠 − 𝑛𝜇
𝑧𝑠 = = ,
𝑆𝐸(𝑆) 𝜎√ 𝑛
which is the ratio inside the Φ(⋅).
· Therefore, converting to R code, we have
𝑃(𝑆 ≤ 𝑠) ≈ 𝚙𝚗𝚘𝚛𝚖((𝑠 − 𝑛𝜇)/(𝜎√𝑛)) = 𝚙𝚗𝚘𝚛𝚖(𝑠, 𝚖 =𝑛𝜇, 𝚜 =𝜎√𝑛) .
· The theorem equally applies to the sample mean 𝑋¯ as well. Letting 𝑠 = 𝑛𝑥:
( 𝜎√ 𝑛 )
¯ 𝑆 𝑠 − 𝑛𝜇
𝑃(𝑋 ≤ 𝑥) = 𝑃( ≤ 𝑥) = 𝑃(𝑆 ≤ 𝑛𝑥) ≈ Φ
𝑛
48/53
Example: Roulette
· A roulette wheel has slots numbered 1 to 36, plus 1 (or more) slots marked 0.
- half the positive numbers are coloured black;
- the remaining positive numbers are coloured red;
- the zero slots are coloured green.
· If you bet on either “red” or “black”,
- you double your money if the ball lands in a slot of your colour
- you lose your money otherwise.
· Suppose a wheel has two green slots (“0” and “00”), each slot is equally likely and
a player bets $1 on “red” for 𝑛 consecutive spins.
· Let 𝑆 denote the total winnings after 𝑛 spins.
· Approximate 𝑃(𝑆 > 0) for 𝑛 = 5, 25, 125, 625.
49/53
The Roulette Box
· There are 38 slots in total, 18 of which are red.
· If the ball
- lands in a red slot the player wins $1;
- does not land in a red slot, the player loses $1, i.e. they win –$1.
· Use the following box:
−1 ⋯ −1 +1 ⋯ +1
20 of these 18 of these
· which has
- mean 𝜇 = −2 = − 1 ;
38 19
‾‾‾‾‾‾‾‾‾
1 ‾
mean square 1; SD 𝜎 = √1 − ( 19 ‾360
‾‾‾ ≈ 0.9986.
)
- 2
= √ 361
50/53
Exact answers
· It is possible to work out the exact probabilities (using the “binomial distribution”,
more on this later).
· These are
51/53
Normal approximation
According to the Central Limit Theorem, for “large 𝑛 ”,
⎛ ⎞
⎜ 0 − (− 19 ) ⎟
( 19√‾360
‾‾‾ )
𝑛
√‾361𝑛
‾‾‾‾
𝑃(𝑆 > 0) = 1 − 𝑃(𝑆 ≤ 0) ≈ 1 − Φ ⎜ ⎟ = 1 − 𝚙𝚗𝚘𝚛𝚖
⎜ ‾‾‾ ⎟
‾360𝑛
⎝ √ 361 ⎠
This gives
52/53
Final comments
· When we take a random sample of size 𝑛 (with replacement) from a box with
mean 𝜇 and SD 𝜎 , the box of all possible sums
- has mean equal to 𝐸(𝑆) = 𝑛𝜇 ;
- has SD equal to 𝑆𝐸(𝑆) = 𝜎√𝑛 ;
- is (approx.) normal-shaped for “large enough 𝑛 ”.
· For such 𝑛 we can approximate probabilities for the random sum 𝑆 or average
𝑋¯ = 𝑆/𝑛, using pnorm() .
· How large is “large enough 𝑛 ”? It depends, on how “non-normal” the original box
is.
· If the original box is
- reasonably symmetric (without too many outliers), 𝑛 = 5 or 10 may do;
- very skewed, we may need 𝑛 > 100 before the box of all possible sums has
a nice, symmetric normal shape.
53/53