Sampling Distribution of The Sample Proportion
Sampling Distribution of The Sample Proportion
Sampling Distribution of The Sample Proportion
proportion
In the same way that we were able to find a sampling distribution for the
sample mean (SDSM), we can find a sampling distribution for the sample
proportion (SDSP).
For example, maybe we want to know how many students in our school
have brown hair. If there are 5,000 students who attend our school, it might
not be possible to survey everybody. So instead we could take a random
sample of 100 students and see how many of them have brown hair. This is
the sample proportion, since it’s the proportion of students in the sample
with brown hair, which is given by
x
p̂ =
n
235
Just like for the SDSM, the sampling distribution of the sample proportion
(SDSP) is created when we take every possible sample from our
population, calculate the sample proportion for each sample, and then plot
all of those sample proportions into a probability distribution.
But even though the population follows a binomial distribution, we can still
use the Central Limit Theorem to create a sampling distribution of the
sample proportion. Just like the SDSM, the CLT tells us that the SDSP is
only guaranteed to be normally distributed when we use a sample size of
at least n = 30.
μp̂ = p
236
The standard deviation of the sampling distribution of the sample
proportion σp,̂ also called the standard error of the proportion, will be
p(1 − p)
SE = σp̂ =
n
where p is the population proportion and n is the sample size. We use this
formula for standard error of the proportion if our population is infinite, or
if the population is finite but large in comparison to our sample size (if
sample size is no more than 5 % of the population, n /N ≤ 0.05).
Of course, based on this formula for standard error, we can say that the
variance of the sampling distribution of the sample proportion is
p(1 − p)
σp2̂ =
n
̂ − p)̂
p(1
SE = σp̂ =
n
̂ − p)̂
p(1
σp2̂ =
n
237
In the case where the population is finite and n /N > 0.05, we have to apply
the finite population correction factor, and in that case the correct formula
for the standard error of the proportion is then
p(1 − p) N−n
SE = σp̂ =
n N−1
where p is the population proportion, n is the sample size, and N is the size
of the population. If we’re applying the FPC, then the formula for variance
of the SDSP is
( )(N − 1)
p(1 − p) N−n
σp2̂ =
n
̂ − p)̂
p(1 N−n
SE = σp̂ =
n N−1
( )(N − 1)
̂ − p)̂
p(1 N−n
σp2̂ =
n
Example
A group of 4 people have the following hair color: brown, brown, brown,
blonde. Find all possible random samples of size 2 if we’re sampling with
replacement. If we define brown hair as a “success,” then find the sample
238
proportion for every sample. Determine the probability distribution of the
sample proportion, the mean of the SDSP p,̂ and the standard error σp.̂
Let’s first determine the total number of possible samples, using N n, given
N = 4 and n = 2.
N n = 42 = 16
The complete sample space, and the proportion for each sample, is
239
Sample Sample proportion
brown, brown 1
brown, brown 1
brown, brown 1
brown, brown 1
brown, brown 1
brown, brown 1
brown, brown 1
brown, brown 1
brown, brown 1
blonde, blonde 0
240
Sample proportion P(pi)
0 1/16
1/2 6/16
1 9/16
Now we can calculate the mean of the sampling distribution of the sample
proportion, μp,̂ where pî is a given sample proportion, P( pî ) is the
probability of that particular sample proportion occurring, and N is the
number of samples.
N
∑
μp̂ = pî P( pî )
i=1
( 16 ) 2 ( 16 ) ( 16 )
1 1 6 9
μp̂ = 0 + +1
3 9
μp̂ = +
16 16
12
μp̂ =
16
3
μp̂ =
4
241
Both proportions are μp̂ = p = 3/4. The variance of the SDSP would be
N
σp2̂ = ( pî − p)2 P( pî )
∑
i=1
( 4 ) ( 16 ) ( 2 4 ) ( 16 ) ( 4 ) ( 16 )
2 2 2
2 3 1 1 3 6 3 9
σ p̂ = 0 − + − + 1−
( 4 ) ( 16 ) ( 4 ) ( 16 ) ( 4 ) ( 16 )
2 2 2
3 1 1 6 1 9
σp2̂ = − + − +
16 ( 16 ) 16 ( 16 ) 16 ( 16 )
9 1 1 6 1 9
σp2̂ = + +
9 6 9
σp2̂ = + +
256 256 256
24
σp2̂ =
256
3
σp2̂ =
32
3
σp̂ =
32
3
σp̂ =
4 2
6
σp̂ =
8
242
σp̂ ≈ 0.31
243