Topic 7 Inference For The Proportion (Student)
Topic 7 Inference For The Proportion (Student)
Topic 7
Inference for the Proportion
Reference
Levine, D.M., Kathryn, A.S. and David, F.S. Business Statistics: A First Course, Pearson
Education Ltd, Chapter 7 & 8 & 9
1
Outline
2
2024 Presidential Election
Right after President Biden announced that he would end his re-
election bid, Kamala Harris immediately declared that she would seek
the nomination in the president’s place
To gain insights on the election or a referendum campaign, CNN and
other media conducted pre-election polls regularly since early 2023
The results of October 21st, 2024 revealed a very close supportive rate
for the two candidates, with support for Donald Trump at 46%
compared to 48% for Kamala Harris
We do not know the true population of the US decision until the result
of the election is announced
In practice, most of these surveys have limited predictive power for
the final result because
The involved sample is biased and does not truly represent the population
Some voters had not honestly revealed their voting intention, or they
3
changed their mind after the survey
Sample Proportion
4
Sample Proportion
Cont’d
5
Sampling Distribution of Sample
Proportion
A small enterprise has 4 staff, 𝑁𝑁 = 4 (3 males and 1 female)
Variable of interest: Gender
Let 𝑌𝑌 = no. of male staff, 𝜋𝜋 = proportion of male staff = 0.75
Random samples of size 2 with replacement are taken (𝑛𝑛 =
2)
As Y obeys a binomial distribution C
𝑌𝑌~𝐵𝐵 2, 0.75 A B D
𝜇𝜇 = 𝑛𝑛𝑛𝑛 = 2 × 0.75 = 1.5
𝜎𝜎 = 𝑛𝑛𝑛𝑛(1 − 𝜋𝜋)
= 2 × 0.75 × 0.25 = 0.6124
6
Sampling Distribution of Sample
Proportion Cont’d
𝑌𝑌
Sample proportion of male staff, 𝑝𝑝 =
𝑛𝑛
16 possible sample proportions
Respondent A (M) B (M) C (F) D (M)
A (M) 2/2 = 1 2/2 = 1 1/2 = 0.5 2/2 = 1
B (M) 2/2 = 1 2/2 = 1 1/2 = 0.5 2/2 = 1
C (F) 1/2 = 0.5 1/2 = 0.5 0/2 = 0 1/2 = 0.5
D (M) 2/2 = 1 2/2 = 1 1/2 = 0.5 2/2 = 1
Probability distribution of 𝑝𝑝
𝑝𝑝 0 0.5 1
𝑃𝑃(𝑝𝑝) 1/16 6/16 9/16
7
Sampling Distribution of Sample
Proportion Cont’d
2
𝜎𝜎𝑝𝑝 = ∑ 𝑝𝑝𝑖𝑖 − 𝜇𝜇𝑝𝑝 𝑃𝑃(𝑝𝑝𝑖𝑖 )
2 1 2 6 2 9
= 0 − 0.75 + 0.5 − 0.75 + 1 − 0.75
16 16 16
= 0.3062
𝜋𝜋(1−𝜋𝜋) 𝑛𝑛𝜋𝜋(1−𝜋𝜋)
= =
𝑛𝑛 𝑛𝑛
We say the sample proportion 𝑝𝑝 is an unbiased estimator of the
8
population proportion 𝜋𝜋
Sampling Distribution of Sample
Proportion Cont’d
9
Sampling Distribution of Sample
Proportion Cont’d
𝑌𝑌 ∑ 𝑋𝑋𝑖𝑖
Since the sampling distribution of 𝑝𝑝 (= = ) has
𝑛𝑛 𝑛𝑛
𝜋𝜋(1−𝜋𝜋)
mean 𝜋𝜋 and standard deviation , then by Central
𝑛𝑛
Limit Theorem, sampling distribution of 𝑝𝑝 follows a
normal distribution approximately with mean 𝜋𝜋 and
𝜋𝜋(1−𝜋𝜋)
standard deviation for large 𝑛𝑛
𝑛𝑛
11
Sampling Distribution of Sample
Proportion Cont’d
Sampling Standardized
Distribution of 𝑝𝑝 Normal Distribution
𝜎𝜎𝑍𝑍 = 1
𝜎𝜎𝑝𝑝
𝑝𝑝 𝑍𝑍
𝜇𝜇𝑝𝑝 𝜇𝜇𝑍𝑍 = 0 14
Standardizing Sampling Distribution
of Proportion – Example
Suppose that the manager of the local bank determines
that 40% of all depositors have multiple accounts at the
bank
If you select a random sample of 200 depositors, what is
the probability that the sample proportion of depositors
with multiple accounts is less than 0.3 ?
15
Standardizing Sampling Distribution
of Proportion – Example Cont’d
𝑃𝑃 𝑝𝑝 < 0.3
0.3−0.4
= P(Z < )
0.4 1−0.4
200
16
Confidence Interval Estimate for the
Proportion
Since the population proportion, 𝜋𝜋, is unknown, the
standard deviation of 𝑝𝑝 can be estimated by sample
standard deviation 𝑆𝑆𝑝𝑝
𝑝𝑝(1 − 𝑝𝑝)
𝑆𝑆𝑝𝑝 =
𝑛𝑛
𝑝𝑝−𝜋𝜋
Hence, 𝑍𝑍 = ~ 𝑁𝑁(0,1) approximately, for large 𝑛𝑛
𝑆𝑆𝑝𝑝
As the population proportion 𝜋𝜋 is unknown, we may
verify the “large enough” condition by 𝑛𝑛𝑛𝑛 and 𝑛𝑛(1 − 𝑝𝑝)
17
Confidence Interval Estimate for the
Proportion Cont’d
Conditions
The no. of successes, 𝑌𝑌, follows Binomial distribution
Normal approximation can be used
𝑛𝑛 ≥ 30
𝑛𝑛𝑛𝑛 ≥ 5
𝑛𝑛(1 − 𝑝𝑝) ≥ 5
19
Confidence Interval Estimate for the
Proportion – Example Cont’d
95
For these data, 𝑝𝑝 = = 0.475
200
As 𝑛𝑛 = 200 > 30, 𝑛𝑛𝑛𝑛 = 95 > 5, 𝑛𝑛(1 − 𝑝𝑝) = 105 > 5
Special considerations
𝑝𝑝 1−𝑝𝑝
If 𝑝𝑝 − 𝑍𝑍𝛼𝛼⁄2 < 0, we have to replace the lower bound
𝑛𝑛
of the confidence interval by 0
𝑝𝑝 1−𝑝𝑝
If 𝑝𝑝 + 𝑍𝑍𝛼𝛼⁄2 > 1, we have to replace the upper bound
𝑛𝑛
of the confidence interval by 1
But why?
21
Factors Affecting Interval Width
(Precision)
Level of confidence, (1 − 𝛼𝛼)
1 − 𝛼𝛼 ↑ → |𝑍𝑍-value| ↑ → width of interval ↑
Sample size, 𝑛𝑛 Intervals extend from
𝑛𝑛 ↑ → 𝜎𝜎𝑝𝑝 ↓ → width of interval ↓ 𝑝𝑝 1−𝑝𝑝 𝑝𝑝 1−𝑝𝑝
𝑝𝑝 − 𝑍𝑍𝛼𝛼⁄2 to 𝑝𝑝 + 𝑍𝑍𝛼𝛼⁄2
Sample proportion, 𝑝𝑝 𝑛𝑛 𝑛𝑛
24
Determining Sample Size for the
Proportion – Example Cont’d
22
𝑝𝑝 = = 0.0022
10000
2
𝑍𝑍𝛼𝛼⁄2 𝜋𝜋(1 − 𝜋𝜋)
𝑛𝑛 =
𝐸𝐸 2
25
Determining Sample Size for the
Proportion Cont’d
26
Test of Hypothesis for the Proportion
Conditions
The no. of successes, 𝑌𝑌, follows Binomial distribution
Normal approximation can be used
𝑛𝑛 ≥ 30
𝑛𝑛𝑛𝑛 ≥ 5
𝑛𝑛(1 − 𝑝𝑝) ≥ 5
𝑝𝑝−𝜋𝜋0
Test statistic, 𝑍𝑍 =
𝜋𝜋0 (1−𝜋𝜋0 )
𝑛𝑛
27
Test of Hypothesis for the Proportion
– Exercise
Your bank had the business objective of serving 80% of
the customers within 5 minutes upon the time the
customer enters the bank
Of the 45 randomly selected customers, 39 are served
within 5 minutes upon their arrival
Test the claim of the bank at 5% level of significance
28
Test of Hypothesis for the Proportion
– Exercise Cont’d
Rejection Rejection
region, region,
𝛼𝛼⁄2 = 0.025 𝛼𝛼⁄2 = 0.025
0 𝑍𝑍 29
-1.96 +1.96
Test of Hypothesis for the Proportion
– Exercise Cont’d
p-value = 0.2628
Rejection Rejection
region, region,
𝛼𝛼⁄2 = 0.025 𝛼𝛼⁄2 = 0.025
0 𝑍𝑍
-1.118 +1.118
30
Do Voters Really Vote When They
Say They Do?
On November 8, 1994, a historic election took place in US, in
which the Republican Party won control of both houses of
Congress for the first time since 1952
But how many people actually voted?
On November 28, 1994, Time magazine reported that in a
telephone poll of 800 adults taken during the two days
following the election, 56% reported that they had voted
But based on information from the Committee for the Study
of the American Electorate, in fact, only 39% of American
adults had voted
Could it be the case that the results of the poll simply
reflected a sample that, by chance, voted with greater
frequency than the general population? 31
Do Voters Really Vote When They
Say They Do? Cont’d
Let’s suppose that the truth about the population is that only
39% of American adults voted, i.e. 𝜋𝜋 = 39% = 0.39
We can expect in samples of 800 adults, the size used by the
Time magazine poll, the mean is 0.39 and standard error is
𝜋𝜋(1−𝜋𝜋)
0.017, i.e. 𝜇𝜇𝑝𝑝 = 𝜋𝜋 = 0.39 and 𝜎𝜎𝑝𝑝 = = 0.017
𝑛𝑛
According to the Empirical Rule, we are almost certain that
the sample proportion based on a sample of 800 adults
should fall within 3 × 0.017 = 0.051 of the truth of 0.39
In order words, if respondents were telling the truth, the
sample proportion should be no higher than 44.1%
(=39%+5.1%), no where near the reported percentage of 56%
32
Do Voters Really Vote When They
Say They Do? Cont’d