0% found this document useful (0 votes)

4 views53 pages

11 CentralLimitTheorem

The document provides an overview of the Central Limit Theorem (CLT) and its implications for sampling data, emphasizing the behavior of sample means as sample size increases. It discusses concepts such as box models, chance variability, and the law of averages, illustrating these ideas through historical experiments conducted by John Edmund Kerrich. The document also includes mathematical formulations for expectation and standard error related to sums and averages from random samples.

Uploaded by

ishrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views53 pages

11 CentralLimitTheorem

Uploaded by

ishrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Central Limit Theorem

Sampling Data | Chance Variability

© University of Sydney MATH1062/1005

12 September 2024
Course Overview

Population

3 Sampling Data 4 Decisions with Data

1 Exploring Data Sample 2 Modelling Data

2/53

 Module3 Sampling Data

Understanding probability
What is probability?

Counting and chance simulation

How to count the number of possible outcomes?

Chance variability
How can we model chance variability by a box model?

Central limit theorem

What is the behaviour of the sample mean for a large sample size?

3/53

 Today’s outline

A review of box models

Sums and averages

The Central Limit Theorem (CLT)

4/53
A review of box models
Single draws from box models
· Suppose we have a “box” containing tickets each bearing a number:
{𝑥1 , … , 𝑥𝑁 }.
· The probability a random draw 𝑋 from the box takes a value 𝑥 is given by
𝑃(𝑋 = 𝑥) = no. of 𝑥's𝑁in the box .
· The expectation of 𝑋 , denoted as 𝐸(𝑋), is the average of the box:
𝜇 = 𝑁1 ∑𝑁 𝑖=1 𝑥𝑖 .
· The standard error of 𝑋 , denoted as 𝑆𝐸(𝑋) , the SD of the box:
𝜎 = √‾𝑁1‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
[(𝑥1 − 𝜇)2 + ⋯ + (𝑥𝑁 − 𝜇)2 ].

6/53
Chance error
· We may decompose such a random draw 𝑋 into two parts:

𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 ,

where 𝜀 is a random draw from the error box containing {𝑥1 − 𝜇, … , 𝑥𝑁 − 𝜇} .

· The SD 𝜎 of the original box is the root-mean-square of the error box, and
describes the “size” of the errors.
- We may interpret 𝜎 = 𝑆𝐸(𝑋) as the “likely size” of the chance error 𝜀 ,
i.e. the likely size of the deviation of 𝑋 from its expected value 𝐸(𝑋).

7/53
Random samples from a box
· The sum 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 of a random sample (with repl.) of size 𝑛 from the
box has
- 𝐸(𝑆) = 𝑛𝜇 ;
- 𝑆𝐸(𝑆) = 𝜎√𝑛 .
- i.e. the box of all possible sums has average 𝑛𝜇 and SD 𝜎√𝑛 .

· The sample mean 𝑋¯ = 𝑆/𝑛 = 𝑋1 +⋯+𝑋𝑛 has

𝑛
- 𝐸(𝑋¯ ) = 𝜇
- 𝑆𝐸(𝑋¯ ) = 𝜎 .
𝑛 √
- i.e. the box of all possible sample means has average 𝜇 and SD 𝜎/√𝑛 .

· The two boxes (all possible sums, all possible sample means) have the same
shape.

8/53
Increasing the sample size
Coin tossing in WWII
· John Edmund Kerrich (1903–1985) was a mathematician noted for a series of
experiments in probability which he conducted while interned in Nazi-occupied
Denmark (Viborg, Midtjylland) in the 1940s.

· Two days before he was scheduled to fly to England, Nazi Germany invaded
Denmark.

10/53
Various “random experiments”
· With a fellow internee Eric Christensen, Kerrich set up a sequence of experiments
demonstrating the empirical validity of a number of fundamental laws of probability.

- They tossed a (fair) coin 10,000 times and counted the number of heads
(5,067).

- They made 5,000 draws from a container with 4 ping pong balls (2x2 different
brands), ‘at the rate of 400 an hour, with - need it be stated - periods of rest
between successive hours.’

- They investigated tosses of a “biased coin”, made from a wooden disk partly
coated in lead.

· In 1946 Kerrich published his finding in a monograph An Experimental Introduction

to the Theory of Probability.

11/53
Simulating Kerrich’s 1st experiment
· Each coin flip (assuming the coin is fair) is like a random draw from the “box”

0 1

· This box has average 𝜇 = 1 and also SD

‾1‾‾‾‾‾‾‾‾‾‾2
√2
−( ) =
1 ‾‾
1 1
√
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
𝜎 = √mn.sq. − (mean) =
2
= .
2 4 2

· We may then model 𝑛 “independent” flips 𝑋1 , … , 𝑋𝑛 as a random sample with

replacement of size 𝑛 from this box.

· The sum 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 is the number of heads.

· The average 𝑋¯ = 𝑆/𝑛 is the proportion of heads.

12/53
Simulating 1st experiment: chance error in sums
flips = sample(c(0, 1), size = 10000, replace = T) # 'box' is c(0,1)
cumulative.sums = cumsum(flips)
n = 1:10000
ES = n/2
plot(n, cumulative.sums - ES, type = "l")
abline(h = 0, col = "red")

13/53
Cumulative proportion
cumulative.propns = cumulative.sums/n # remember n = 1:10000 is a vector!
plot(n, cumulative.propns, type = "l", ylim = c(0, 1))
abline(h = 0.5, col = "red")

14/53
Logarithmic scale
par(mfrow = c(1, 2))
plot(log(n, base = 10), cumulative.sums - ES, type = "l")
abline(h = 0, col = "red")
plot(log(n, base = 10), cumulative.propns, type = "l", ylim = c(0, 1))
abline(h = 0.5, col = "red")

15/53
Size of chance errors as 𝑛 increases
· It seems that
- the size of the chance error in the sums increases;
- the size of the chance error in the proportion decreases;

· This makes perfect sense, because

- the “likely size” of the chance error for the sum, i.e.

𝑆𝐸(𝑆) = 𝜎√𝑛 → ∞
as 𝑛 → ∞
- the “likely size” of the chance error for the proportion, i.e.

¯ 𝜎
𝑆𝐸(𝑋) = →0
√𝑛
as 𝑛 → ∞ .
16/53
Law of Averages
· For the sample mean 𝑋¯ from any box model,
𝜎
𝑆𝐸(𝑋¯ ) = →0
√𝑛
as 𝑛 → ∞ .

· So the likely size of the chance error between 𝑋¯ and 𝐸(𝑋¯ ) = 𝜇 gets smaller and
smaller as 𝑛 increases.

· In other words, as the sample size 𝑛 increases, the distribution of a sample mean
𝑋¯ gets “more concentrated” about the “population mean” 𝜇 .
· This “phenomenon” is (loosely) known as the “Law of Averages” or the “Law of
Large Numbers”.

17/53
Demonstration
· We can determine the box of all possible sums for small values of 𝑛 :

box = c(0, 1)
s2 = outer(box, box, "+") # forms two-way array of all possible sums for n=2
s2

## [,1] [,2]
## [1,] 0 1
## [2,] 1 2

as.vector(s2) # converts matrix to a vector

## [1] 0 1 1 2

· We can iterate this procedure to get all sums for 𝑛 = 3 :

s3 = as.vector(outer(box, s2, "+")) # each sum for n=3 adds 0 or 1 to each sum in s2
s3

## [1] 0 1 1 2 1 2 2 3

18/53
· Again, for 𝑛 = 4 :

s4 = as.vector(outer(box, s3, "+")) # each sum for n=4 adds 0 or 1 to each sum in s3
s4

## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4

· Again, for 𝑛 = 5 :

s5 = as.vector(outer(box, s4, "+")) # each sum for n=5 adds 0 or 1 to each sum in s4
s5

## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5

· Again, for 𝑛 = 6 :

s6 = as.vector(outer(box, s5, "+")) # each sum for n=6 adds 0 or 1 to each sum in s5
s6

## [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 1 2 2 3 2 3
## [39] 3 4 2 3 3 4 3 4 4 5 2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6

19/53
All possible sums to all possible averages
m2 = as.vector(s2)/2
m2

## [1] 0.0 0.5 0.5 1.0

m3 = s3/3
m3

## [1] 0.0000000 0.3333333 0.3333333 0.6666667 0.3333333 0.6666667 0.6666667

## [8] 1.0000000

m4 = s4/4
m5 = s5/5
m6 = s6/6
s7 = as.vector(outer(box, s6, "+"))
m7 = s7/7
means = list(`n=2` = m2, `n=3` = m3, `n=4` = m4, `n=5` = m5, `n=6` = m6, `n=7` = m7)

20/53
head(means, 5)

## $`n=2`
## [1] 0.0 0.5 0.5 1.0
##
## $`n=3`
## [1] 0.0000000 0.3333333 0.3333333 0.6666667 0.3333333 0.6666667 0.6666667
## [8] 1.0000000
##
## $`n=4`
## [1] 0.00 0.25 0.25 0.50 0.25 0.50 0.50 0.75 0.25 0.50 0.50 0.75 0.50 0.75 0.75
## [16] 1.00
##
## $`n=5`
## [1] 0.0 0.2 0.2 0.4 0.2 0.4 0.4 0.6 0.2 0.4 0.4 0.6 0.4 0.6 0.6 0.8 0.2 0.4 0.4
## [20] 0.6 0.4 0.6 0.6 0.8 0.4 0.6 0.6 0.8 0.6 0.8 0.8 1.0
##
## $`n=6`
## [1] 0.0000000 0.1666667 0.1666667 0.3333333 0.1666667 0.3333333 0.3333333
## [8] 0.5000000 0.1666667 0.3333333 0.3333333 0.5000000 0.3333333 0.5000000
## [15] 0.5000000 0.6666667 0.1666667 0.3333333 0.3333333 0.5000000 0.3333333
## [22] 0.5000000 0.5000000 0.6666667 0.3333333 0.5000000 0.5000000 0.6666667
## [29] 0.5000000 0.6666667 0.6666667 0.8333333 0.1666667 0.3333333 0.3333333
## [36] 0.5000000 0.3333333 0.5000000 0.5000000 0.6666667 0.3333333 0.5000000
## [43] 0.5000000 0.6666667 0.5000000 0.6666667 0.6666667 0.8333333 0.3333333
## [50] 0.5000000 0.5000000 0.6666667 0.5000000 0.6666667 0.6666667 0.8333333
## [57] 0.5000000 0.6666667 0.6666667 0.8333333 0.6666667 0.8333333 0.8333333
## [64] 1.0000000

21/53
par(mfrow = c(2, 3))
for (i in 1:6) {
n = i + 1
br = (0:(n + 1) - 0.5)/n # breaks midway between possible values
hist(means[[n - 1]], pr = T, breaks = br, main = paste("n=", n), ylim = c(0,
4))
}

22/53
All possible averages for 𝑛 = 8, … , 13

23/53
All possible averages for 𝑛 = 14, … , 19

24/53
All possible averages for 𝑛 = 20, … , 25

…and so on…

25/53
Two important things
· In this example it is very clear that TWO important things are happening:
1. the spread of the distribution of all possible averages/proportions is getting
more concentrated about 𝜇 = 0.5 as 𝑛 increases;
2. the shape of the histogram of all possible averages/proportions is becoming
“normal-shaped”.

· Is the “normal shape” due to something special about this particular simple box?
- Not really!

26/53
Rolling a 6-sided die
· Suppose we are interested in rolling a 6-sided die 𝑛 times. How does the sum of
the rolls behave?
· This is like taking a random sample of size 𝑛 from the box

1 2 3 4 5 6

· This box has

- mean 𝜇 = 3.5 = 7
2
- mean square 1+4+9+16+25+36 = 91
6 6
‾‾‾‾‾‾‾‾‾
7 ‾ ‾182−(3×49)
‾‾‾‾‾‾‾‾ = ‾‾
‾ ≈ 1.708.
( )
-
SD 𝜎 = √ 91
2
√ √ 12
35
6 − 2 = 12

27/53
box = 1:6
box

## [1] 1 2 3 4 5 6

s2 = as.vector(outer(box, box, "+"))

s3 = as.vector(outer(s2, box, "+"))
s4 = as.vector(outer(s3, box, "+"))
s5 = as.vector(outer(s4, box, "+"))
s6 = as.vector(outer(s5, box, "+"))
sums.rolls = list(box, s2, s3, s4, s5, s6)

28/53
Histograms of all possible sums-of-𝑛-rolls

For 𝑛 = 6 this is certainly normal-shaped too!

29/53
Asymmetric example
· Instead of the sum of the rolls, how about the number of times we roll ⚅?
· We instead use the box

0 0 0 0 0 1

· The number of times we get ⚅ in 𝑛 rolls is just like the sum 𝑆 when we take a
random sample of size 𝑛 from this new box.
· This new box has
- mean 𝜇 = 1
6
- mean square 1
6
- ‾‾‾‾‾‾‾‾‾2 ‾6−1
‾‾‾
SD 𝜎 = √ 16 − ( 16 ) = √ 36 = √6 ≈ 0.373.
5

30/53
Histograms of all possible no.s-of-⚅s

Not looking very normal-shaped…what about if we let 𝑛 get larger?

31/53
32/53
33/53
34/53
35/53
36/53
37/53
38/53
We get a normal shape, but only for larger 𝑛
· So although the histograms of all possible sums (“no.s-of-times-we-roll-⚅”) are not
normal-shaped for smaller 𝑛 , as 𝑛 increases the shape gets closer to a normal.
· By the time 𝑛 > 100, the shape is quite symmetric.
· It turns out that for essentially any box, we get the same phenomenon occurring:
- as 𝑛 gets larger and larger, the box of all possible sums gets a “more normal”
shape.

39/53
The Central Limit Theorem
Comparing the histogram and normal curve
· We can use the previous fact to estimate probabilities using normal curves!
· Knowing only the mean and SD of the box, we can approximate proportions in the
histogram, and hence chances of getting different values.
· Let’s consider 6 draws 𝑋1 , … , 𝑋6 from a six sided die

1 2 3 4 5 6

· The expected value and SE of 𝑆 = 𝑋1 + ⋯ + 𝑋6 :

- 𝐸(𝑆) = 𝑛𝐸(𝑋1 ) = 6 ⋅ 3.5 = 21
-
𝑆𝐸(𝑆) = √𝑛 ⋅ 𝑆𝐸(𝑋1 ) = √6‾ ⋅ √‾‾
‾ ≈ 4.18
35
12

41/53
In R
box_sums = 1:6
for (i in 1:5) {
box_sums = as.vector(outer(box_sums, 1:6, "+"))
}
mn.box_sums = mean(box_sums)
mn.box_sums

## [1] 21

SD.box_sums = sqrt(mean((box_sums - mn.box_sums)^2))

SD.box_sums

## [1] 4.1833

42/53
In R
br = (5:36) + 0.5
hist(box_sums, breaks = br, pr = T)
curve(dnorm(x, mn.box_sums, SD.box_sums), add = T, lty = 2) # lty=2 gives a dashed line

43/53
Normal approximation
· We can find the “area” to the left of 18 for a normal curve with the same mean and
SD:

pnorm(18, mn.box_sums, SD.box_sums)

## [1] 0.2366447

· This approximates the actual probability:

sum(box_sums < 18)/length(box_sums)

## [1] 0.2058471

44/53
Normal approximation
· Extension: Note that a better approximation is obtained if we get the area to the
left of 17.5:

pnorm(17.5, mn.box_sums, SD.box_sums) # much closer to the true value!

## [1] 0.2013918

· This works because the values in the box are whole numbers, and the area under
the rectangles we want is actually to the left of 17.5 (see the histogram repeated
on the next slide).

45/53
Normal approximation
hist(box_sums, breaks = br, pr = T)
curve(dnorm(x, mn.box_sums, SD.box_sums), add = T, lty = 2) # lty=2 gives a dashed line
abline(v = 18, col = "blue")
abline(v = 17.5, col = "red")
legend("topright", leg = c("18", "17.5"), lty = c(1, 1), col = c("blue", "red"))

46/53
Most important result in Statistics
· This phenomenon can be mathematically proven to hold for any fixed (finite) box.
· This result is a special case of the Central Limit Theorem.
- It is a “limit theorem” because it describes what happens “in the limit” as
𝑛 → ∞.
- “Central” here means “most important”.
· The “standard normal CDF” Φ(𝑧) is the function given in R by pnorm( 𝑧 ) .

If 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 is the sum of random sample (with replacement) of size 𝑛

from a box with mean 𝜇 and SD 𝜎 , then for large 𝑛 ,

( 𝜎√ 𝑛 ) ( 𝜎√ 𝑛 )
𝑆 − 𝑛𝜇 𝑠 − 𝑛𝜇 𝑠 − 𝑛𝜇
𝑃(𝑆 ≤ 𝑠) = 𝑃 ≤ ≈Φ
𝜎√ 𝑛

47/53
Deconstructing the Central Limit Theorem
· The CLT states that areas under the histogram for sample sums 𝑆 are
approximated by areas under the normal curve
· Note that the desired sum value 𝑠 being considered here, when converted into
standard units is

𝑠 − 𝐸(𝑆) 𝑠 − 𝑛𝜇
𝑧𝑠 = = ,
𝑆𝐸(𝑆) 𝜎√ 𝑛
which is the ratio inside the Φ(⋅).
· Therefore, converting to R code, we have
𝑃(𝑆 ≤ 𝑠) ≈ 𝚙𝚗𝚘𝚛𝚖((𝑠 − 𝑛𝜇)/(𝜎√𝑛)) = 𝚙𝚗𝚘𝚛𝚖(𝑠, 𝚖 =𝑛𝜇, 𝚜 =𝜎√𝑛) .
· The theorem equally applies to the sample mean 𝑋¯ as well. Letting 𝑠 = 𝑛𝑥:

( 𝜎√ 𝑛 )
¯ 𝑆 𝑠 − 𝑛𝜇
𝑃(𝑋 ≤ 𝑥) = 𝑃( ≤ 𝑥) = 𝑃(𝑆 ≤ 𝑛𝑥) ≈ Φ
𝑛
48/53
Example: Roulette
· A roulette wheel has slots numbered 1 to 36, plus 1 (or more) slots marked 0.
- half the positive numbers are coloured black;
- the remaining positive numbers are coloured red;
- the zero slots are coloured green.
· If you bet on either “red” or “black”,
- you double your money if the ball lands in a slot of your colour
- you lose your money otherwise.
· Suppose a wheel has two green slots (“0” and “00”), each slot is equally likely and
a player bets $1 on “red” for 𝑛 consecutive spins.
· Let 𝑆 denote the total winnings after 𝑛 spins.
· Approximate 𝑃(𝑆 > 0) for 𝑛 = 5, 25, 125, 625.

49/53
The Roulette Box
· There are 38 slots in total, 18 of which are red.
· If the ball
- lands in a red slot the player wins $1;
- does not land in a red slot, the player loses $1, i.e. they win –$1.
· Use the following box:

−1 ⋯ −1 +1 ⋯ +1
 
20 of these 18 of these

· which has
- mean 𝜇 = −2 = − 1 ;
38 19
‾‾‾‾‾‾‾‾‾
1 ‾
mean square 1; SD 𝜎 = √1 − ( 19 ‾360
‾‾‾ ≈ 0.9986.
)
- 2
= √ 361
50/53
Exact answers
· It is possible to work out the exact probabilities (using the “binomial distribution”,
more on this later).

· These are

n = c(5, 25, 125, 625)

prob.win = 1 - pbinom(n/2, n, 18/38)
rbind(n, prob.win)

## [,1] [,2] [,3] [,4]

## n 5.0000000 25.0000000 125.0000000 625.00000000
## prob.win 0.4507489 0.3951246 0.2775865 0.09388094

51/53
Normal approximation
According to the Central Limit Theorem, for “large 𝑛 ”,

⎛ ⎞
⎜ 0 − (− 19 ) ⎟
( 19√‾360
‾‾‾ )
𝑛
√‾361𝑛
‾‾‾‾
𝑃(𝑆 > 0) = 1 − 𝑃(𝑆 ≤ 0) ≈ 1 − Φ ⎜ ⎟ = 1 − 𝚙𝚗𝚘𝚛𝚖
⎜ ‾‾‾ ⎟
‾360𝑛
⎝ √ 361 ⎠

This gives

1 - pnorm(sqrt(361 * n)/(19 * sqrt(360)))

## [1] 0.45309281 0.39607370 0.27784490 0.09381616

· These are quite good approximations (even for 𝑛 = 5 )!

· Makes sense, because the box is reasonably symmetric (not that different in
shape to Kerrich’s box).

52/53
Final comments
· When we take a random sample of size 𝑛 (with replacement) from a box with
mean 𝜇 and SD 𝜎 , the box of all possible sums
- has mean equal to 𝐸(𝑆) = 𝑛𝜇 ;
- has SD equal to 𝑆𝐸(𝑆) = 𝜎√𝑛 ;
- is (approx.) normal-shaped for “large enough 𝑛 ”.

· For such 𝑛 we can approximate probabilities for the random sum 𝑆 or average
𝑋¯ = 𝑆/𝑛, using pnorm() .
· How large is “large enough 𝑛 ”? It depends, on how “non-normal” the original box
is.
· If the original box is
- reasonably symmetric (without too many outliers), 𝑛 = 5 or 10 may do;
- very skewed, we may need 𝑛 > 100 before the box of all possible sums has
a nice, symmetric normal shape.
53/53

Traffic Analysis - LMC-01
67% (3)
Traffic Analysis - LMC-01
15 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
MTH2222 Mathematics of Uncertainty
No ratings yet
MTH2222 Mathematics of Uncertainty
96 pages
10 TheBoxModel
No ratings yet
10 TheBoxModel
47 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Random Variables: Jeff Chak Fu WONG
No ratings yet
Random Variables: Jeff Chak Fu WONG
104 pages
Modeling With Probability
No ratings yet
Modeling With Probability
91 pages
STAT2011 Week1 2024
No ratings yet
STAT2011 Week1 2024
14 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
12 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
STAT2011 Week3 2024
No ratings yet
STAT2011 Week3 2024
11 pages
Cosc 416
No ratings yet
Cosc 416
6 pages
Course Notes
No ratings yet
Course Notes
111 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Probability Theory (MATHIAS LOWE)
No ratings yet
Probability Theory (MATHIAS LOWE)
69 pages
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
No ratings yet
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
159 pages
PSR Module-Iia
No ratings yet
PSR Module-Iia
36 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
MIT14 30s09 Lec17
No ratings yet
MIT14 30s09 Lec17
9 pages
LN5 2017
No ratings yet
LN5 2017
28 pages
Topic 7
No ratings yet
Topic 7
57 pages
Machine Learning and Pattern Recognition Expectations
No ratings yet
Machine Learning and Pattern Recognition Expectations
3 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
STAT0009 Introductory Notes
No ratings yet
STAT0009 Introductory Notes
4 pages
Chapter 4: An Introduction To Probability and Statistics
No ratings yet
Chapter 4: An Introduction To Probability and Statistics
18 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Stochastic Dynamics
No ratings yet
Stochastic Dynamics
72 pages
01 - Probability Spaces
No ratings yet
01 - Probability Spaces
15 pages
Statistics Review
No ratings yet
Statistics Review
16 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
2.3 Mathematical Expectation: Xpxifxisad.R.Vwithp.M.F.Px X E Xfxdxifxisac.R.Vwithp.D.F.Fx
No ratings yet
2.3 Mathematical Expectation: Xpxifxisad.R.Vwithp.M.F.Px X E Xfxdxifxisac.R.Vwithp.D.F.Fx
11 pages
Notes
No ratings yet
Notes
69 pages
Part 2aa
No ratings yet
Part 2aa
89 pages
STAT230 Course Notes
No ratings yet
STAT230 Course Notes
51 pages
Probability p4
No ratings yet
Probability p4
25 pages
STAT 333 Assignment 1 Solutions
No ratings yet
STAT 333 Assignment 1 Solutions
6 pages
GSM 199 Prev
No ratings yet
GSM 199 Prev
25 pages
Probability 9th&10th Session 16-01-2024
No ratings yet
Probability 9th&10th Session 16-01-2024
66 pages
04 Estimation
No ratings yet
04 Estimation
48 pages
Probability and Statistics - 2
No ratings yet
Probability and Statistics - 2
72 pages
Probability Theory - Year 2 Applied Maths& Physics - 2019-2020 PDF
No ratings yet
Probability Theory - Year 2 Applied Maths& Physics - 2019-2020 PDF
126 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Notes ch3 Sampling Distributions
No ratings yet
Notes ch3 Sampling Distributions
20 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
5 pages
MAT3003 Modules - (1 2 3) - Updated
No ratings yet
MAT3003 Modules - (1 2 3) - Updated
40 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
Proba
No ratings yet
Proba
188 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Mit18 05 s22 Class04-Prep-B
No ratings yet
Mit18 05 s22 Class04-Prep-B
7 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
MATH2010 2022 23 AutumnNotes Gappy
No ratings yet
MATH2010 2022 23 AutumnNotes Gappy
92 pages
Material MAT3003 Modules - (1+2+3)
No ratings yet
Material MAT3003 Modules - (1+2+3)
63 pages
A First Course in Probability Notes
No ratings yet
A First Course in Probability Notes
103 pages
Doc-Cours MathsV
No ratings yet
Doc-Cours MathsV
69 pages
Random Variables
No ratings yet
Random Variables
11 pages
Trigonometric Graphs
No ratings yet
Trigonometric Graphs
5 pages
12 UnknownProportions
No ratings yet
12 UnknownProportions
37 pages
Topic2 Numerical Summary
No ratings yet
Topic2 Numerical Summary
62 pages
Topic3 NormalCurve
No ratings yet
Topic3 NormalCurve
40 pages
13 UnknownProportionsMore
No ratings yet
13 UnknownProportionsMore
38 pages
AERO1400 Quiz Notes
No ratings yet
AERO1400 Quiz Notes
4 pages
Python Notes
No ratings yet
Python Notes
27 pages
14 UnknownMeans
No ratings yet
14 UnknownMeans
43 pages
Topic5 Probability
No ratings yet
Topic5 Probability
39 pages
IEC Certification Kit Release Notes
No ratings yet
IEC Certification Kit Release Notes
27 pages
Biostatistics Classes PDF
No ratings yet
Biostatistics Classes PDF
156 pages
MS5105 Module Outline 2022-2023
No ratings yet
MS5105 Module Outline 2022-2023
4 pages
Bucks Engines 2007 GM Powertrain Owners Manual
100% (1)
Bucks Engines 2007 GM Powertrain Owners Manual
11 pages
Enlil Vertical Axis Wind Turbine
100% (1)
Enlil Vertical Axis Wind Turbine
20 pages
Practice Problem Set #6: Stocks I Theoretical and Conceptual Questions: (See Notes or Textbook For Solutions)
No ratings yet
Practice Problem Set #6: Stocks I Theoretical and Conceptual Questions: (See Notes or Textbook For Solutions)
2 pages
Sankalp 24
No ratings yet
Sankalp 24
1 page
Yanmar 4lha STP
No ratings yet
Yanmar 4lha STP
2 pages
WH Questions
100% (1)
WH Questions
13 pages
CMAX-DM60-CPUSEV53: Electrical Specifications
No ratings yet
CMAX-DM60-CPUSEV53: Electrical Specifications
3 pages
SSC CGL Physics in English d241009b
No ratings yet
SSC CGL Physics in English d241009b
13 pages
EST 1 (Last Minute) (16-5-2024)
No ratings yet
EST 1 (Last Minute) (16-5-2024)
13 pages
Choosing Right Automation Tool
No ratings yet
Choosing Right Automation Tool
8 pages
Catálogo de Electroválvulas SMC
100% (1)
Catálogo de Electroválvulas SMC
0 pages
January 1995 PW
100% (1)
January 1995 PW
78 pages
14 Loci and Transformations
No ratings yet
14 Loci and Transformations
83 pages
Inventory Management Summary
No ratings yet
Inventory Management Summary
5 pages
The Pearson Guide To The GPAT and Other Competitive Examinations in Pharmacy 3rd Edition Umang Shah Instant Download
No ratings yet
The Pearson Guide To The GPAT and Other Competitive Examinations in Pharmacy 3rd Edition Umang Shah Instant Download
67 pages
Unit III 1
No ratings yet
Unit III 1
11 pages
Geometry Formula Sheet 2D Shapes For 11 Plus Exam GSD
No ratings yet
Geometry Formula Sheet 2D Shapes For 11 Plus Exam GSD
1 page
Cbjescpl 02
No ratings yet
Cbjescpl 02
10 pages
Analysis of Temperature and Pressure Changes in Liquefied Natural
No ratings yet
Analysis of Temperature and Pressure Changes in Liquefied Natural
9 pages
Mordechai Ben-Ari. Principles of Concurrent and Distributed Programming
No ratings yet
Mordechai Ben-Ari. Principles of Concurrent and Distributed Programming
363 pages
Lens and Optical Instrument - Eng
No ratings yet
Lens and Optical Instrument - Eng
7 pages
Airport and Railway Engin
No ratings yet
Airport and Railway Engin
36 pages
6 2 Reflections (Day 1) Lesson Plan
No ratings yet
6 2 Reflections (Day 1) Lesson Plan
3 pages
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
No ratings yet
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
8 pages
B.Sc. II Year (Biotechnology), NEP (Session - 2024-25) Paper I (Major)
No ratings yet
B.Sc. II Year (Biotechnology), NEP (Session - 2024-25) Paper I (Major)
25 pages
Physics 2020 QP Set 1 English
No ratings yet
Physics 2020 QP Set 1 English
10 pages

11 CentralLimitTheorem

Uploaded by

11 CentralLimitTheorem

Uploaded by

Central Limit Theorem

Sampling Data | Chance Variability

© University of Sydney MATH1062/1005

3 Sampling Data 4 Decisions with Data

1 Exploring Data Sample 2 Modelling Data

Counting and chance simulation

Central limit theorem

A review of box models

Sums and averages

𝑋 = 𝐸(𝑋) + [𝑋 − 𝐸(𝑋)] = 𝐸(𝑋) + 𝜀 ,

· The sample mean 𝑋¯ = 𝑆/𝑛 = 𝑋1 +⋯+𝑋𝑛 has

· In 1946 Kerrich published his finding in a monograph An Experimental Introduction

· This box has average 𝜇 = 1 and also SD

· We may then model 𝑛 “independent” flips 𝑋1 , … , 𝑋𝑛 as a random sample with

· The sum 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 is the number of heads.

· This makes perfect sense, because

as.vector(s2) # converts matrix to a vector

· We can iterate this procedure to get all sums for 𝑛 = 3 :

## [1] 0.0 0.5 0.5 1.0

## [1] 0.0000000 0.3333333 0.3333333 0.6666667 0.3333333 0.6666667 0.6666667

· This box has

s2 = as.vector(outer(box, box, "+"))

For 𝑛 = 6 this is certainly normal-shaped too!

Not looking very normal-shaped…what about if we let 𝑛 get larger?

· The expected value and SE of 𝑆 = 𝑋1 + ⋯ + 𝑋6 :

SD.box_sums = sqrt(mean((box_sums - mn.box_sums)^2))

pnorm(18, mn.box_sums, SD.box_sums)

· This approximates the actual probability:

sum(box_sums < 18)/length(box_sums)

pnorm(17.5, mn.box_sums, SD.box_sums) # much closer to the true value!

If 𝑆 = 𝑋1 + ⋯ + 𝑋𝑛 is the sum of random sample (with replacement) of size 𝑛

n = c(5, 25, 125, 625)

## [,1] [,2] [,3] [,4]

1 - pnorm(sqrt(361 * n)/(19 * sqrt(360)))

## [1] 0.45309281 0.39607370 0.27784490 0.09381616

· These are quite good approximations (even for 𝑛 = 5 )!

You might also like