4.4 Sampling Distribution Models and The Central Limit Theorem
4.4 Sampling Distribution Models and The Central Limit Theorem
4.4 Sampling Distribution Models and The Central Limit Theorem
4 Sampling Distribution
Models and the Central Limit
Theorem
Transition from Data Analysis and
Probability to Statistics
Sampling Distributions
Population parameter: a numerical descriptive
measure of a population.
(for example: p (a population proportion);
the numerical value of a population parameter
is usually not known)
Example: mean height of all NCSU students
p=proportion of Raleigh residents who favor
stricter gun control laws
Sample statistic: a numerical descriptive
measure calculated from sample data.
(e.g, x, s, p (sample proportion))
Parameters; Statistics
In real life parameters of populations are
unknown and unknowable.
– For example, the mean height of US adult
(18+) men is unknown and unknowable
Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of
interest, and make an inference.
The sampling distribution of the statistic
is the tool that tells us how close the value
of the statistic is to the unknown value of
the parameter.
DEF: Sampling Distribution
The sampling distribution of a sample
statistic calculated from a sample of n
measurements is the probability
distribution of values taken by the
statistic in all possible samples of size n
taken from the same population.
Based on all possible samples of size n
In some cases the sampling distribution can
be determined exactly.
In other cases it must be approximated by
using a computer to draw some of the
possible samples of size n and drawing a
histogram.
2
Pop size = 5, n = 2, # of poss samples: 5 25
8
Pop size: 6; n = 8; # of poss. samples: 6
1,679,616
Pop size: 500,000, n = 10; # of samples:
10
500,000
Sampling distribution of p, the
sample proportion; an example
If a coin is fair the probability of a head on
any toss of the coin is p = 0.5.
Imagine tossing this fair coin 5 times and
calculating the proportion p of the 5 tosses
that result in heads (note that p = x/5, where x
is the number of heads in 5 tosses).
Objective: determine the sampling
distribution of p, the proportion of heads in 5
tosses of a fair coin.
Sampling distribution of p (cont.)
Step 1: The possible values of p are 0/5=0,
1/5=.2, 2/5=.4, 3/5=.6, 4/5=.8, 5/5=1
Binomial
Probabilities p 0 .2 .4 .6 .8 1
p(x) for n=5,
P(p) .03125 .15625 .3125 .3125 .15625 .03125
p = 0.5
x p(x)
0 0.03125
1 0.15625 The above table is the probability distribution of
2 0.3125
p, the proportion of heads in 5 tosses of a fair
3 0.3125
4 0.15625
coin.
5 0.03125
Sampling distribution of p (cont.)
p 0 .2 .4 .6 .8 1
P(p) .03125 .15625 .3125 .3125 .15625 .03125
.05
pq
SD(p) =
n
300
200
100
0
phat
The sampling distribution model for a
sample proportion p
Provided that the sampled values are independent and the
sample size n is large enough, the sampling distribution of
p is modeled by a normal distribution with E(p) = p and
pq
standard deviation SD(p) = n , that is
pq
pˆ ~ N p,
n
where q = 1 – p and where n large enough means np>=10
and nq>=10
The Central Limit Theorem will be a formal statement of
this fact.
Example: binge drinking by
college students
Study by Harvard School of Public Health:
44% of college students binge drink.
244 college students surveyed; 36% admitted
to binge drinking in the past week
Assume the 44% given in the study; compute
the probability that in a sample of 244
students, 36% or less have engaged in binge
drinking.
Example: binge drinking by
college students (cont.)
Let p be the proportion in a sample of 244
that engage in binge drinking.
We want to compute P ( pˆ .36)
pq .44 *.56
.032
E(p) = p = .44; SD(p) = n 244
Since np = 244*.44 = 107.36 and nq =
244*.56 = 136.64 are both greater than 10,
we can model the sampling distribution of p
with a normal distribution, so …
Example: binge drinking by
college students (cont.)
pˆ ~ N (.44,.032)
P(4 x 6) = p(4)+p(4.5)+p(5)+p(5.5)+p(6)
= 12/196 + 18/196 + 24/196 + 26/196 + 28/196 = 108/196
Expected Value and Standard
Deviation of the Sampling
Distribution of x
Example (cont.)
Population probability dist.
x 2 3 4 5 6 7 8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14
Sampling dist. of x
x 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
p(x) 1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196 1/196
Population probability dist.
x 2 3 4 5 6 7 8
p(x) 1/14 1/14 2/14 2/14 2/14 3/14 3/14
E(X)=2(1/14)+3(1/14)+4(2/14)+ … +8(3/14)=5.714
Sampling dist. of x Population mean E(X)= = 5.714
x 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
p(x) 1/196 2/196 5/196 8/196 12/196 18/196 24/196 26/196 28/196 24/196 21/196 18/196
1/196
E(X)=2(1/196)+2.5(2/196)+3(5/196)+3.5(8/196)+4(12/196)+4.5(18/196)+5(24/196
)
+5.5(26/196)+6(28/196)+6.5(24/196)+7(21/196)+7.5(18/196)+8(1/196) = 5.714
Note that
E ( X ) E ( X ) 5.714
and
Var ( X ) Var ( X )
n 3.4898
2 1.7449 (Recall that n = 2)
Sampling Distribution of the
Sample Mean X: Example
An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on
any throw.
– The probability distribution E(X) = 1(1/6) +2(1/6) +
3(1/6) +……… = 3.5
of X is
x 1 2 3 4 5 6 V(X) = (1-3.5)2(1/6)+
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 (2-3.5)2(1/6)+ ………
………. = 2.92
Suppose we want to estimate from the mean of a
sample of size n = 2.
x
What is the sampling distribution of in this
situation? x
Sample
Sample Mean Sample
Mean Sample Mean Sample
Mean Sample Mean
Mean
11 1,1
1,1 11 13
13 3,1
3,1 22 25
25 5,1
5,1 33
22 1,2
1,2 1.5
1.5 14
14 3,2
3,2 2.5
2.5 26
26 5,2
5,2 3.5
3.5
33 1,3
1,3 22 15
15 3,3
3,3 33 27
27 5,3
5,3 44
44 1,4
1,4 2.5
2.5 16
16 3,4
3,4 3.5
3.5 28
28 5,4
5,4 4.5
4.5
55 1,5
1,5 33 17
17 3,5
3,5 44 29
29 5,5
5,5 55
66 1,6
1,6 3.5
3.5 18
18 3,6
3,6 4.5
4.5 30
30 5,6
5,6 5.5
5.5
77 2,1
2,1 1.5
1.5 19
19 4,1
4,1 2.5
2.5 31
31 6,1
6,1 3.5
3.5
88 2,2
2,2 22 20
20 4,2
4,2 33 32
32 6,2
6,2 44
99 2,3
2,3 2.5
2.5 21
21 4,3
4,3 3.5
3.5 33
33 6,3
6,3 4.5
4.5
10
10 2,4
2,4 33 22
22 4,4
4,4 44 34
34 6,4
6,4 55
11
11 2,5
2,5 3.5
3.5 23
23 4,5
4,5 4.5
4.5 35
35 6,5
6,5 5.5
5.5
12
12 2,6
2,6 44 24
24 4,6
4,6 55 36
36 6,6
6,6 66
Sample
Sample Mean Sample
Mean Sample Mean Sample
Mean Sample Mean
Mean
11 1,1
1,1 11 13
13 3,1
3,1 22 25
25 5,1
5,1 33
22 1,2
1,2 1.5
1.5 14
14 3,2 2.5
3,2 2.5 26
26 5,2
5,2 3.5
3.5
33 1,3
1,3 22 15
15 3,3
3,3 33 27
27 5,3
5,3 44
44 1,4
1,4 2.5
2.5 16
16 3,4 3.5
3,4 3.5 28
28 5,4
5,4 4.5
4.5
55 1,5
1,5 33 17
17 Var ( X )
3,5
3,5 44 29
29 5,5
5,5 55
Note : E ( X ) E ( X ) and Var ( X )
66
77
1,6
1,6
2,1
2,1
3.5
3.5
1.5
1.5
18
18
19
19
3,6 4.5
3,6 4.5
4,1 2.5
4,1 2.5
30
30
31
31
5,6
5,6
6,1
6,1
5.5
5.5
3.5
3.5
88
99
2,2
2,2
2,3
2,3
22
2.5
2.5
20
20
21
21
4,2
4,2 2 33
4,3 3.5
4,3 3.5
32
32
33
33
6,2
6,2
6,3
6,3
44
4.5
4.5
10
10 2,4
2,4 33 22
22 4,4
4,4 44 34
34 6,4
6,4 55
11
11 2,5
2,5 3.5
3.5 23
23 4,5 4.5
4,5 4.5 35
35 6,5
6,5 5.5
5.5
12
12 2,6
2,6 44 24
24 4,6
4,6 55 36
36 6,6
6,6 66
E( x) =1.0(1/36)+
1.5(2/36)+….=3.5
6/36
5/36
V(X) = (1.0-3.5)2(1/36)+
4/36 (1.5-3.5)2(2/36)... = 1.46
3/36
2/36
1/36
1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 x
n5
E ( X ) 3.5
n 10
Var ( X )
Var ( X ) .5833 ( ) E ( X ) 3.5
5
n 25
Var ( X )
Var ( X ) .2917 ( ) E ( X ) 3.5
10
Var ( X )
Var ( X ) .1167 ( )
25
1 6
Population 1 1.5
1.5 22 2.5
2.5 3
1.5 22 2.5
1.5
1.5 2 2.5
2.5
Let us take samples Compare1.5 the variability
2 of the
2.5population
1.5
to the variability
1.5 of 22the sample
2.5
2.5 mean.
of two observations
1.5 2 2.5
1.5
1.5 2 2.5
2.5
1.5 2 2.5
1.5 2 2.5
Also,
Expected value of the population = (1 + 2 + 3)/3 = 2
Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2
Properties of the Sampling
Distribution of x
1. E ( x )
(the expected value of the sampling distribution
of x = the expected value of the sampled population)
SD ( x)
2. SD( x )
n n
where is the standard deviation of the
population from which the sample is taken and n is
the sample size.
Unbiased
Confidence
Precision
µ µ µ