0% found this document useful (0 votes)
4 views33 pages

Topic 7 Inference For The Proportion (Student)

The document covers inference for proportions in business statistics, including sampling distributions, confidence intervals, sample size determination, and hypothesis testing. It discusses the context of the 2024 Presidential Election and the limitations of pre-election polls. Additionally, it explains the sampling distribution of sample proportions, including calculations and conditions for normal approximation.

Uploaded by

rowendeng03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views33 pages

Topic 7 Inference For The Proportion (Student)

The document covers inference for proportions in business statistics, including sampling distributions, confidence intervals, sample size determination, and hypothesis testing. It discusses the context of the 2024 Presidential Election and the limitations of pre-election polls. Additionally, it explains the sampling distribution of sample proportions, including calculations and conditions for normal approximation.

Uploaded by

rowendeng03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CB2200 Business Statistics

Topic 7
Inference for the Proportion
Reference
Levine, D.M., Kathryn, A.S. and David, F.S. Business Statistics: A First Course, Pearson
Education Ltd, Chapter 7 & 8 & 9

1
Outline

 Sampling Distribution of the Sample Proportion


 Confidence Interval Estimate for the Proportion
 Sample Size Determination for the Proportion
 Hypothesis Testing for the Proportion

2
2024 Presidential Election
 Right after President Biden announced that he would end his re-
election bid, Kamala Harris immediately declared that she would seek
the nomination in the president’s place
 To gain insights on the election or a referendum campaign, CNN and
other media conducted pre-election polls regularly since early 2023
 The results of October 21st, 2024 revealed a very close supportive rate
for the two candidates, with support for Donald Trump at 46%
compared to 48% for Kamala Harris
 We do not know the true population of the US decision until the result
of the election is announced
 In practice, most of these surveys have limited predictive power for
the final result because
 The involved sample is biased and does not truly represent the population
 Some voters had not honestly revealed their voting intention, or they
3
changed their mind after the survey
Sample Proportion

 Let 𝑌𝑌 be the number of observations belong to the one


of the two levels (e.g. success and failure, yes and no,
etc.) of a categorical variable in a random sample of 𝑛𝑛
observations
 The proportion of observations belong to one of the two
levels (e.g. success, yes, etc.) in the sample
𝑌𝑌
𝑝𝑝 =
𝑛𝑛
is called the sample proportion

4
Sample Proportion
Cont’d

 We saw in Topic 3 that 𝑌𝑌, obeys a binomial distribution


with
𝑛𝑛!
𝑃𝑃 𝑌𝑌 = 𝑦𝑦 = 𝜋𝜋 𝑦𝑦 (1 − 𝜋𝜋)𝑛𝑛−𝑦𝑦
𝑦𝑦! 𝑛𝑛 − 𝑦𝑦 !
where
𝑃𝑃(𝑌𝑌 = 𝑦𝑦) = probability that 𝑌𝑌 = 𝑦𝑦 events of interest,
where 𝑦𝑦 = 0, 1, 2 ,…, 𝑛𝑛
𝜋𝜋 = probability of an event of interest, or the population
proportion of observations belong to the level of
interest

5
Sampling Distribution of Sample
Proportion
 A small enterprise has 4 staff, 𝑁𝑁 = 4 (3 males and 1 female)
 Variable of interest: Gender
 Let 𝑌𝑌 = no. of male staff, 𝜋𝜋 = proportion of male staff = 0.75
 Random samples of size 2 with replacement are taken (𝑛𝑛 =
2)
 As Y obeys a binomial distribution C

 𝑌𝑌~𝐵𝐵 2, 0.75 A B D
 𝜇𝜇 = 𝑛𝑛𝑛𝑛 = 2 × 0.75 = 1.5
 𝜎𝜎 = 𝑛𝑛𝑛𝑛(1 − 𝜋𝜋)
= 2 × 0.75 × 0.25 = 0.6124

6
Sampling Distribution of Sample
Proportion Cont’d

𝑌𝑌
 Sample proportion of male staff, 𝑝𝑝 =
𝑛𝑛
16 possible sample proportions
Respondent A (M) B (M) C (F) D (M)
A (M) 2/2 = 1 2/2 = 1 1/2 = 0.5 2/2 = 1
B (M) 2/2 = 1 2/2 = 1 1/2 = 0.5 2/2 = 1
C (F) 1/2 = 0.5 1/2 = 0.5 0/2 = 0 1/2 = 0.5
D (M) 2/2 = 1 2/2 = 1 1/2 = 0.5 2/2 = 1

 Probability distribution of 𝑝𝑝
𝑝𝑝 0 0.5 1
𝑃𝑃(𝑝𝑝) 1/16 6/16 9/16

7
Sampling Distribution of Sample
Proportion Cont’d

 Summary measures for the sampling distribution of sample


proportion
𝜇𝜇𝑝𝑝 = ∑ 𝑝𝑝𝑖𝑖 𝑃𝑃(𝑝𝑝𝑖𝑖 )
1 6 9
=0 + 0.5 +1 = 0.75 = 𝜋𝜋
16 16 16

2
𝜎𝜎𝑝𝑝 = ∑ 𝑝𝑝𝑖𝑖 − 𝜇𝜇𝑝𝑝 𝑃𝑃(𝑝𝑝𝑖𝑖 )

2 1 2 6 2 9
= 0 − 0.75 + 0.5 − 0.75 + 1 − 0.75
16 16 16
= 0.3062
𝜋𝜋(1−𝜋𝜋) 𝑛𝑛𝜋𝜋(1−𝜋𝜋)
= =
𝑛𝑛 𝑛𝑛
 We say the sample proportion 𝑝𝑝 is an unbiased estimator of the
8
population proportion 𝜋𝜋
Sampling Distribution of Sample
Proportion Cont’d

 The exact form of the sampling distribution of 𝑝𝑝


is rather complicated. Instead of using the exact
distribution of 𝑝𝑝, it is common to approximate
the sampling distribution by a normal distribution

9
Sampling Distribution of Sample
Proportion Cont’d

 Suppose you want to estimate the proportion (𝜋𝜋) of CityU students


who skipped 2 or more classes per week in last semester
 A sample of size 𝑛𝑛 is collected
 You register your data points as categorical observations: Yes,
skipped 2 or more classes; or No, skipped 1 or less class
 For subsequent data manipulation, you may code those who skipped
2 or more classes as a 1, and those who skipped 1 or less class as a 0
 Using the numeric coded values, and denotes 𝑋𝑋𝑖𝑖 as the numeric
coded value of the ith observed student in the sample, we see that
𝑌𝑌 = ∑ 𝑋𝑋𝑖𝑖 = observed number of students who skipped 2 or more classes
𝑌𝑌 ∑ 𝑋𝑋𝑖𝑖
𝑝𝑝 = = = sample proportion of students who skipped 2 or more classes
𝑛𝑛 𝑛𝑛
∑ 𝑋𝑋𝑖𝑖
 is like the formula for the sample mean, so, a sample proportion is a special
𝑛𝑛
case of a sample mean 10
Sampling Distribution of Sample
Proportion Cont’d

𝑌𝑌 ∑ 𝑋𝑋𝑖𝑖
 Since the sampling distribution of 𝑝𝑝 (= = ) has
𝑛𝑛 𝑛𝑛
𝜋𝜋(1−𝜋𝜋)
mean 𝜋𝜋 and standard deviation , then by Central
𝑛𝑛
Limit Theorem, sampling distribution of 𝑝𝑝 follows a
normal distribution approximately with mean 𝜋𝜋 and
𝜋𝜋(1−𝜋𝜋)
standard deviation for large 𝑛𝑛
𝑛𝑛

11
Sampling Distribution of Sample
Proportion Cont’d

 Hence, for large sample size, the distribution of the


random variable
𝑝𝑝 − 𝜋𝜋
𝑍𝑍 =
𝜋𝜋(1 − 𝜋𝜋)
𝑛𝑛
is approximately standard normal
 This statistic can be used to obtain confidence intervals,
and hypothesis testing for the population proportion
 In practice, “𝑛𝑛 is large enough” often means that 𝑛𝑛𝜋𝜋 ≥ 5
and 𝑛𝑛(1 − 𝜋𝜋) ≥ 5, that is 𝜋𝜋 cannot be too small or too
large
12
Sampling Distribution of Sample
Proportion Cont’d

 Normal approximation can be Sampling Distribution of 𝑝𝑝


used if 𝑓𝑓(𝑝𝑝)
 𝑛𝑛 ≥ 30 .3
.2
 𝑛𝑛𝜋𝜋 ≥ 5 .1
 𝑛𝑛(1 − 𝜋𝜋) ≥ 5 0 𝑝𝑝
0 .2 .4 .6 .8 1
 Sampling distribution of
sample proportion 𝑝𝑝~𝑁𝑁(𝜇𝜇𝑝𝑝 , 𝜎𝜎𝑝𝑝 2 )
𝜋𝜋 = population proportion
 2 parameters in sampling
distribution of sample
proportion
 Mean, 𝜇𝜇𝑝𝑝 = 𝜋𝜋
2 𝜋𝜋(1−𝜋𝜋)
 Variance, 𝜎𝜎𝑝𝑝 = 13
𝑛𝑛
Standardizing Sampling Distribution
of Proportion
 Converting the sample proportion 𝑝𝑝 to 𝑍𝑍 value
𝑝𝑝−𝜇𝜇𝑝𝑝 𝑝𝑝−𝜋𝜋
𝑍𝑍 = =
𝜎𝜎𝑝𝑝 𝜋𝜋(1−𝜋𝜋)
𝑛𝑛

Sampling Standardized
Distribution of 𝑝𝑝 Normal Distribution

𝜎𝜎𝑍𝑍 = 1
𝜎𝜎𝑝𝑝

𝑝𝑝 𝑍𝑍
𝜇𝜇𝑝𝑝 𝜇𝜇𝑍𝑍 = 0 14
Standardizing Sampling Distribution
of Proportion – Example
 Suppose that the manager of the local bank determines
that 40% of all depositors have multiple accounts at the
bank
 If you select a random sample of 200 depositors, what is
the probability that the sample proportion of depositors
with multiple accounts is less than 0.3 ?

15
Standardizing Sampling Distribution
of Proportion – Example Cont’d

 Given 𝜋𝜋 = population proportion of depositors with


multiple accounts = 0.4
 As 𝑛𝑛 = 200 > 30, 𝑛𝑛𝜋𝜋 = 80 > 5, 𝑛𝑛(1 − 𝜋𝜋) = 120 > 5
 The sampling distribution of 𝑝𝑝 follows Normal
distribution approximately, i.e. 𝑝𝑝~𝑁𝑁(𝜇𝜇𝑝𝑝 , 𝜎𝜎𝑝𝑝 2 )

𝑃𝑃 𝑝𝑝 < 0.3
0.3−0.4
= P(Z < )
0.4 1−0.4
200

16
Confidence Interval Estimate for the
Proportion
 Since the population proportion, 𝜋𝜋, is unknown, the
standard deviation of 𝑝𝑝 can be estimated by sample
standard deviation 𝑆𝑆𝑝𝑝
𝑝𝑝(1 − 𝑝𝑝)
𝑆𝑆𝑝𝑝 =
𝑛𝑛
𝑝𝑝−𝜋𝜋
 Hence, 𝑍𝑍 = ~ 𝑁𝑁(0,1) approximately, for large 𝑛𝑛
𝑆𝑆𝑝𝑝
 As the population proportion 𝜋𝜋 is unknown, we may
verify the “large enough” condition by 𝑛𝑛𝑛𝑛 and 𝑛𝑛(1 − 𝑝𝑝)

17
Confidence Interval Estimate for the
Proportion Cont’d

 Conditions
 The no. of successes, 𝑌𝑌, follows Binomial distribution
 Normal approximation can be used
 𝑛𝑛 ≥ 30
 𝑛𝑛𝑛𝑛 ≥ 5
 𝑛𝑛(1 − 𝑝𝑝) ≥ 5

 100 1 − 𝛼𝛼 % Confidence interval estimate


Standard Error, 𝜎𝜎𝑝𝑝
𝑝𝑝(1−𝑝𝑝)
𝑝𝑝 ± 𝑍𝑍𝛼𝛼⁄2
𝑛𝑛
Sampling Error, E
Critical Value 18
Confidence Interval Estimate for the
Proportion – Example
 Among the 200 depositors you randomly selected, 95 of
them have RMB deposit account at the bank
 Set up a 95% confidence interval estimate for the
population proportion of depositors having RMB deposit
account at the bank

19
Confidence Interval Estimate for the
Proportion – Example Cont’d

95
For these data, 𝑝𝑝 = = 0.475
200
As 𝑛𝑛 = 200 > 30, 𝑛𝑛𝑛𝑛 = 95 > 5, 𝑛𝑛(1 − 𝑝𝑝) = 105 > 5

95% confidence interval (C.I.) for 𝜋𝜋


𝑝𝑝(1 − 𝑝𝑝)
𝑝𝑝 ± 𝑍𝑍𝛼𝛼⁄2
𝑛𝑛
We are 95% confident that the population proportion of
depositors having RMB deposit account is between 0.406
and 0.544 20
Confidence Interval Estimate for the
Proportion Cont’d

 Special considerations
𝑝𝑝 1−𝑝𝑝
 If 𝑝𝑝 − 𝑍𝑍𝛼𝛼⁄2 < 0, we have to replace the lower bound
𝑛𝑛
of the confidence interval by 0

𝑝𝑝 1−𝑝𝑝
 If 𝑝𝑝 + 𝑍𝑍𝛼𝛼⁄2 > 1, we have to replace the upper bound
𝑛𝑛
of the confidence interval by 1
 But why?

21
Factors Affecting Interval Width
(Precision)
 Level of confidence, (1 − 𝛼𝛼)
 1 − 𝛼𝛼 ↑ → |𝑍𝑍-value| ↑ → width of interval ↑
 Sample size, 𝑛𝑛 Intervals extend from
 𝑛𝑛 ↑ → 𝜎𝜎𝑝𝑝 ↓ → width of interval ↓ 𝑝𝑝 1−𝑝𝑝 𝑝𝑝 1−𝑝𝑝
𝑝𝑝 − 𝑍𝑍𝛼𝛼⁄2 to 𝑝𝑝 + 𝑍𝑍𝛼𝛼⁄2
 Sample proportion, 𝑝𝑝 𝑛𝑛 𝑛𝑛

 If 𝑝𝑝 increases from 0 to 0.5, then 𝑝𝑝(1 − 𝑝𝑝)


increases from 0 to 0.25, leading to a
wider interval
 If 𝑝𝑝 further increases from 0.5 to 1, then
𝑝𝑝(1 − 𝑝𝑝) drops from 0.25 to 0, leading to
a narrower interval
22
Determining Sample Size for the
Proportion
 Sampling error (or margin of error)
𝜋𝜋(1−𝜋𝜋)
E = 𝑍𝑍𝛼𝛼⁄2
𝑛𝑛

 Solving the equation for 𝑛𝑛 gives


2
𝑍𝑍𝛼𝛼⁄2 𝜋𝜋(1−𝜋𝜋)
𝑛𝑛 =
𝐸𝐸 2
 If the computed 𝑛𝑛 is not an integer, round it up to
nearest integer
23
Determining Sample Size for the
Proportion – Example
 According to the Developments in the Banking Sectors
published by Hong Kong Monetary Authority in June
2014, at the end of the first quarter of 2014, 22 credit
card lending were found in each 10,000 transactions
 You want to have 99% confidence of estimating the
proportion of credit card lending at your bank to within
± 0.001
 What is the minimum sample size being needed?

24
Determining Sample Size for the
Proportion – Example Cont’d

22
𝑝𝑝 = = 0.0022
10000

2
𝑍𝑍𝛼𝛼⁄2 𝜋𝜋(1 − 𝜋𝜋)
𝑛𝑛 =
𝐸𝐸 2

25
Determining Sample Size for the
Proportion Cont’d

 What should we do if 𝜋𝜋 is unknown?

1. Use 𝑝𝑝 (sample proportion) from some similar studies


 As 𝑝𝑝 provides the best estimate of 𝜋𝜋

2. If 𝑝𝑝 also unknown, use 0.5


 When 𝜋𝜋 = 0.5, 𝜋𝜋(1 − 𝜋𝜋) becomes the largest, i.e. 0.25
 Hence you can determine a sample size fulfilling the
requirement of any other value for the true but unknown 𝜋𝜋

26
Test of Hypothesis for the Proportion

 Conditions
 The no. of successes, 𝑌𝑌, follows Binomial distribution
 Normal approximation can be used
 𝑛𝑛 ≥ 30
 𝑛𝑛𝑛𝑛 ≥ 5
 𝑛𝑛(1 − 𝑝𝑝) ≥ 5

𝑝𝑝−𝜋𝜋0
 Test statistic, 𝑍𝑍 =
𝜋𝜋0 (1−𝜋𝜋0 )
𝑛𝑛

27
Test of Hypothesis for the Proportion
– Exercise
 Your bank had the business objective of serving 80% of
the customers within 5 minutes upon the time the
customer enters the bank
 Of the 45 randomly selected customers, 39 are served
within 5 minutes upon their arrival
 Test the claim of the bank at 5% level of significance

28
Test of Hypothesis for the Proportion
– Exercise Cont’d

Rejection Rejection
region, region,
𝛼𝛼⁄2 = 0.025 𝛼𝛼⁄2 = 0.025
0 𝑍𝑍 29
-1.96 +1.96
Test of Hypothesis for the Proportion
– Exercise Cont’d

p-value = 0.2628

Rejection Rejection
region, region,
𝛼𝛼⁄2 = 0.025 𝛼𝛼⁄2 = 0.025

0 𝑍𝑍

-1.118 +1.118

30
Do Voters Really Vote When They
Say They Do?
 On November 8, 1994, a historic election took place in US, in
which the Republican Party won control of both houses of
Congress for the first time since 1952
 But how many people actually voted?
 On November 28, 1994, Time magazine reported that in a
telephone poll of 800 adults taken during the two days
following the election, 56% reported that they had voted
 But based on information from the Committee for the Study
of the American Electorate, in fact, only 39% of American
adults had voted
 Could it be the case that the results of the poll simply
reflected a sample that, by chance, voted with greater
frequency than the general population? 31
Do Voters Really Vote When They
Say They Do? Cont’d

 Let’s suppose that the truth about the population is that only
39% of American adults voted, i.e. 𝜋𝜋 = 39% = 0.39
 We can expect in samples of 800 adults, the size used by the
Time magazine poll, the mean is 0.39 and standard error is
𝜋𝜋(1−𝜋𝜋)
0.017, i.e. 𝜇𝜇𝑝𝑝 = 𝜋𝜋 = 0.39 and 𝜎𝜎𝑝𝑝 = = 0.017
𝑛𝑛
 According to the Empirical Rule, we are almost certain that
the sample proportion based on a sample of 800 adults
should fall within 3 × 0.017 = 0.051 of the truth of 0.39
 In order words, if respondents were telling the truth, the
sample proportion should be no higher than 44.1%
(=39%+5.1%), no where near the reported percentage of 56%
32
Do Voters Really Vote When They
Say They Do? Cont’d

 We can also find how likely the sample proportion of 0.56 or


above to happen
 Given 𝑛𝑛 = 800, 𝜇𝜇𝑝𝑝 = 0.39 and 𝜎𝜎𝑝𝑝 = 0.017
 𝑃𝑃 𝑝𝑝 ≥ 0.56 = 𝑃𝑃(𝑍𝑍 ≥ 10) ≈ 0
 It is virtually impossible to have such high proportion of
voters voted in the election

 The differences between data may be the result of a variety


of factors
 Differences in the respondents’ interpretation of the questions
 Respondents’ inability or unwillingness to provide correct information
or recall correct information
33

You might also like