22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
22nd Class
Influences Based on Two Samples:
Confidence Intervals and Tests of
Hypothesis
19 June 2023
Makoto Shimizu
Chapter 8
Inferences Based
on a Two Samples
Confidence
Intervals and
Tests of Hypothesis
2
Content
1. Identifying the Target Parameter
2. Comparing Two Population Means: Independent
Sampling
3. Comparing Two Population Means: Paired
Difference Experiments
4. Comparing Two Population Proportions:
Independent Sampling
5. Determining the Required Sample Size
6. Comparing Two Population Variances:
Independent Sampling
3
Where We’re Going
1. Learn how to identify the target parameter
for comparing two populations.
2. Learn how to compare two population
means using confidence intervals and tests
of hypotheses
3. Apply these inferential methods to
problems where we want to compare
• Two population proportions
• Two population variances
4
Where We’re Going
4. Determine the sizes of the samples
necessary to estimate the difference
between two population parameters with a
specified margin of error
5
8.1
6
Determining the Target Parameter
Differences between
proportions, percentages,
p1 – p2 fractions, or rates; compare Qualitative
proportions
( 1 ) 2 Ratio of variances;
differences in variability or Quantitative
( 2 ) 2 spread; compare variation
7
8.2
8
Properties of the Sampling
Distribution of ( x1 x2 )
1. The mean of the sampling distribution ( x1 x2 ) is ( 1 2 ).
2. If the two samples are independent, the standard deviation of
the sampling distribution is
12 22
x x
1 2
n1 n2
sampled, and n1 and n2 are the respective sample sizes. We also refer
to x1 x2 as the standard error of the statistic ( x1 x2 ).
9
Large-Sample Confidence Interval for (μ1
– μ2): Normal (z) Statistic
12 , 22 known:
12 22
x1 x2 z 2 x x x1 x2 z 2
1 2
n1
n2
12 , 22 unknown:
s12 s22
x1 x2 z 2 x x x1 x2 z 2
1 2
n1 n2
10
Large-Sample Test of Hypothesis for (µ1 –
µ2): Normal (z) Statistic
One-Tailed Test
H0: (µ1 – µ2) = D0
Ha: (µ1 – µ2) < D0 [or Ha: (µ1 – µ2) > D0 ]
where D0 = Hypothesized difference between the means (the
difference is often hypothesized to be equal to 0)
z
x1 x2 D0
12 22
Test statistic: x x x x
1 2
1 2
n1 n2
Rejection region: z < –z s12 s22
[or z > z when Ha: (µ1 – µ2) > D0 ] n1 n2
11
Large-Sample Test of Hypothesis for (µ1 –
µ2): Normal (z) Statistic
Two-Tailed Test
H0: (µ1 – µ2) = D0
Ha: (µ1 – µ2) ≠ D0
where D0 = Hypothesized difference between the means (the
difference is often hypothesized to be equal to 0)
z
x1 x2 D0 12 22
Test statistic: x x x x
1 2
1 2
n1 n2
2 2
Rejection region: |z | > z s s
1
2
n1 n2
12
Conditions Required for Valid Large-Sample
Inferences about
(μ1 – μ2)
1.The two samples are randomly selected in an independent
manner from the two target populations.
2.The sample sizes, n1 and n2, are both large (i.e., n1 ≥ 30 and
n2 ≥ 30). [Due to the Central Limit Theorem, this condition
guarantees that the sampling distribution of 1 2 will be
x x
approximately normal regardless of the shapes of the
underlying probability distributions of the populations.
s 2
s 2
Also, 1 and 2 will provide good approximations to 1 and 2
2
2
13
Example: Comparing Mean Car Prices
• An independent study of retail prices of 50 random
retail automobile sales in the United States and Japan
provide the following data:
Country Samples Mean Std. Dev
USA 50 26596 1981.4404
Japan 50 27236 1974.0934
14
Example: Comparing Mean Car Prices (Cont.)
• Let1 and 2represent the population mean retail sales
prices in the United States and Japan, respectively. If the
claim is true, then the mean retail price in Japan will exceed
the mean in the United States, i.e.
1 2 or 1 2 0.
• The elements of the test:
• H0: 1 2 0, i.e. 1 2 ; note that D0 = 0 for this test.
• Ha: 1 2 0, i.e. 1 2
z
x1 x2 D0
( x1 x2 ) 0
• Test statistic: x x x x
1 2 1 2
z
x1 x2 D0
(25,596 27,236) 0
x x 12 22
1 2
n1 n2
640 640 640
1.62
s2
s 2
(1,981) 2
(1,974) 2 396
1
2
n1 n2 50 50
1 1
x 1
x2 t 2
s
2
p
n1 n2
n
s2 1
1 s1
2
n2 1 s 2
2
where p
n1 n2 2
17
Small Independent Samples Test of Hypothesis for
(µ1 – µ2): Student’s t-Statistic
One-Tailed Test
H0: (µ1 – µ2) = D0
Ha: (µ1 – µ2) < D0 [or Ha: (µ1 – µ2) > D0 ]
t
x1 x2 D0
Test statistic:
1 1
s
2
p
n1 n2
Rejection region: t < –t
[or t > t when Ha: (µ1 – µ2) > D0 ]
p-value = P(t < tc) [or P(t > tc)]
where t is based on (n1 + n2 – 2) degrees of freedom.
18
Small Independent Samples Test of Hypothesis for
(µ1 – µ2): Student’s t-Statistic
Two-Tailed Test
H0: (µ1 – µ2) = D0
Ha: (µ1 – µ2) ≠ D0
t
x1 x2 D0
Test statistic:
1 1
s
2
p
n1 n2
Rejection region: |t| > t /2
• p-value = 2P(t < |tc|)
where t/2 is based on (n1 + n2 – 2) degrees of freedom.
19
Conditions Required for Valid Small-Sample Inferences about
(μ1 – μ2)
20
Example: Managerial Success
• A researcher is hired to compare the average success index for
two groups of managers by a manufacturer. Group 1 interact
with people outside a manager’s group while Group 2 managers
have little outside interactions.
• The following data was collected:
Success Std
Samples Mean
Index Deriv
Group 1 12 65.33 6.61
Group 2 15 49.47 9.33
21
Example: Managerial Success (Cont.)
df = n1 + n2 – 2 = 12 + 15 – 2 = 25
t/2 = t0.025 = 2.064 (t-Static Table)
Calculate the pooled estimate of variance:
1 1 2 2
n 1 s 2
n 1 s 2
s 2p
n1 n2 2
12 16.61 15 19.33
2 2
12 15 2
67.97
22
Example: Managerial Success (Cont.)
Find confidence interval for 1 2 , the difference between
mean managerial success indexes for the two groups:
1 1 1 1
x1 x2 t 2 s 2p (65.33 49.47) t0.025 67.97
n1 n2 12 15
15.86 (2.06)(3.19)
15.86 6.58
23
Example: Managerial Success (Cont.)
24
Approximate Small-Sample
Procedures when 1 2
2 2
v
s 2
1 n1 s n2
2
2 2
s
2
1 n1 s
2 2
2 n2
2
n1 1 n2 1
27
8.3
Comparing Two
Population Means:
Paired Difference Experiments
28
Paired Differences
• A valid method of analyzing data is by pairing the
data and then taking their differences. Paired
differences can provide more information about the
difference between population means than an
independent samples experiment.
• Let xd = sample mean difference
sd = sample standard deviation of differences
nd = number of differences = number of pairs
29
Paired-Difference Confidence Interval
for µd = (µ1 – µ2)
Large Sample, Normal (z) Statistic
d sd
xd z 2 xd z 2
nd nd
30
Paired-Difference Test of Hypothesis
for µd = (µ1 – µ2)
One-Tailed Test
H0: µd = D0
Ha: µd < D0 [or Ha: µd > D0 ]
Large Sample, Normal (z) Statistic
xd D0 xd D0
Test statistic: z
d nd sd nd
Rejection region: z < –z
[or z > z when Ha: µd > D0 ]
• p-value = P(z < zc) [or P(z > zc)]
• where zc is the calculated of the test statistic.
31
Paired-Difference Test of Hypothesis
for µd = (µ1 – µ2) (Cont.)
One-Tailed Test
H0: µd = D0
Ha: µd < D0 [or Ha: µd > D0 ]
• Small Sample, Student’s t-Statistic
xd D0
Test statistic: t
sd nd
Rejection region: t < –t
[or t > t when Ha: µd > D0 ]
• p-value = P(t < tc) [or P(t > tc)]
where t is based on (nd – 1) degrees of freedom.
32
Paired-Difference Test of Hypothesis
for µd = (µ1 – µ2) (Cont.)
Two-Tailed Test
H0: µd = D0
Ha: µd ≠ D0
Large Sample
xd D0 xd D0
Test statistic: z
d nd sd nd
Rejection region: |z| > z
• p-value = 2P(z > |zc|)
• where zc is the calculated of the test statistic.
33
Paired-Difference Test of Hypothesis for
µd = (µ1 – µ2) (Cont.)
Two-Tailed Test
H0: µd = D0
Ha: µd ≠ D0
Small Sample
xd D0
Test statistic: t
sd nd
Rejection region: |t| > t
• p-value = 2P(t > |tc|)
• where t is based on (nd – 1) degrees of freedom and
tc is the calculated of the test statistic
34
Conditions Required for Valid Large-Sample
Inferences about µd
1. A random sample of differences is selected
from the target population of differences.
2. The sample size nd is large (i.e., nd ≥ 30); due
to the Central Limit Theorem, this condition
guarantees that the test statistic will be
approximately normal regardless of the shape
of the underlying probability distribution of
the population.
35
Conditions Required for Valid Small-Sample
Inferences about µd
36
Paired-Difference Experiment
Data Collection Table
37
Example: Comparing Mean
Salaries of Males and Females
• An experiment is conducted to compare the starting
salaries of male and female college graduates who find jobs.
Pairs are formed by choosing a male and a female with the
same major and similar grade point averages (GPAs).
• Suppose a random sample of 10 pairs is formed in this
manner and the starting annual salary of each person is
recorded. The results are shown in the table (next slide).
• Compare the mean starting salary, 1, for males with the
mean starting salary, 2, for females using a 95% confidence
interval. Interpret the results.
38
Example: Comparing Mean
Salaries of Males and Females (cont)
• Data on Annual Salaries for Matched Pairs of College Graduates
xd 400
sd 435
39
Example: Comparing Mean
Salaries of Males and Females (cont)
Note that t0.025 = 2.262 (t-Static Table) for df = nd – 1 = 9
The 95% confidence interval for d ( 1 2 )
of this small sample is
sd sd
xd t 2 xd t0.025
nd nd
435
400 2.262
10
400 311
($89, $711)
As the interval falls above 0, we infer that 1 2 0 , i.e.
the mean salary for males exceeds the mean salary for females.
40
8.4
Comparing Two
Population Proportions:
Independent Sampling
41
Properties of the Sampling Distribution of
(p1 – p2)
1. The mean of the sampling distribution of p̂1 p̂2 is
(p1 – p2); that is,
E p̂1 p̂2 p1 p2
2. The standard deviation of the sampling distribution
of p̂1 p̂2 is
p1q1 p2 q2
p̂1 p̂2
n1 n2
3. If the sample sizes n1 and n2 are large, the sampling
distribution of p̂1 p̂2 is approximately normal.
42
Large-Sample (1 – )% Confidence Interval for (p1 – p2)
p1q1 p2 q2
p̂1 p̂2 z 2 p̂ p̂ p̂1 p̂2 z 2
1 2
n1
n2
p̂1q̂1 p̂2 q̂2
p̂1 p̂2 z 2
n1 n2
43
Conditions Required for Valid Large-Sample
Inferences about
(p1 – p2)
44
Large-Sample Test of Hypothesis
about (p1 – p2)
One-Tailed Test
H0: (p1 – p2) = 0
Ha: (p1 – p2) ≠ 0 [or Ha: (p1 – p2) > 0 ]
z
p̂1 p̂2
Test statistic: p̂ p̂
1 2
46
Example: Comparing Car Repair Rates
• A consumer advocacy group wants to determine if there is a
difference between the proportions of the two leading
automobile models that need major repairs (more than $500)
within 2 years of their purchase. A sample of 400 two-year
owners of model 1 is contacted, and a sample of 500 two-year
owners of model 2 is contacted. The numbers x1 and x2 of
owners who report that their cars needed major repairs within
the first 2 years are 53 and 78, respectively.
• Test the null hypothesis that no difference exists between the
proportions in populations 1 and 2 needing major repairs against
the alternative that a difference does exist. Use = 0.10.
47
Example: Comparing Car Repair Rates
(Cont.)
• Define p1 and p2 as the true proportions of model 1 and
model 2 owners, respectively, whose cars needed major
repairs within 2 years, the elements of the test are
• H0 : (p1 - p2) = 0
• Ha : (p1 - p2) ≠ 0
48
Example: Comparing Car Repair Rates
(Cont.)
• Calculate the sample proportions of owners who
needed major car repairs:
x1 53
pˆ1 0.1325
n1 400
x2 78
ˆp2 0.1560
n2 500
pˆ1 pˆ 2 0 pˆ1 pˆ 2
Then z
( pˆ pˆ
2) 1 1
ˆp qˆ
1
n1 n2
where
and ^ =1 − 𝑝
𝑞 ^ =1 − 0.1456=0.8544 49
Example: Comparing Car Repair Rates
(Cont.)
• p̂ is a weighted average of p̂1 and p̂2, with more
weight given to the larger sample of model 2 owners.
• The computed value of the test statistic is
0.1325 0.1560 0.0235
z 0.99
1 1 0.0237
(0.1456)(0.8544)
400 500
51
Exercise
In an exit poll, 42 of 75 men sampled supported a ballot initiative
to raise the local sales tax to build a new football stadium. In the
same poll, 41 of 85 women sampled supported the initiative.
Find and interpret the p-value for the test of hypothesis that the
proportions of men and women who support the initiative are
different.
52
Solution for the Exercise
42 41
^ 1=
𝑝 ^
=0.56∧𝑝 2= =0. 48
75 85
The p-value is
The probability of observing a value of z more contradictory to the
null hypothesis is 0.3124.
53
8.5
54
Determination of Sample Size for Estimating
µ1 – µ2
To estimate (μ1 – μ2) with a given margin of error ME
and with confidence level (1 – ), use the following
formula to solve for equal sample sizes that will achieve
the desired reliability:
2 1 2
2
z 2
2
n1 n2
(SE) 2
You will need to substitute estimates for the values of 12
and 22 before solving for the sample size. These
2 2
s
estimates might be sample variances 1 and 2 froms
prior sampling (e.g., a pilot sample) or from an
educated (and conservatively large) guess based on the
range–that is, s ≈ R/4.
55
Example: μ1 – μ2 Sample Size
What sample size is needed to estimate
μ1 – μ2 with 95% confidence and a margin of error of
5.8? Assume prior experience tells us σ1 =12 and σ2
=18.
n1 n2
1.96
2
12 2
182
53.44 54
2
(5.8)
56
Determination of Sample Size for Estimating
p1 – p2
To estimate (p1 – p2) with a given margin of error ME
and with confidence level (1 – ), use the following
formula to solve for equal sample sizes that will achieve
the desired reliability:
z pq p q
2
2 1 1 2 2
n1 n2
(SE) 2
You will need to substitute estimates for the values of
p1 and p2 before solving for the sample size. These
estimates might be based on prior samples, obtained
from educated guesses, or, most conservatively,
specified as p1 = p2 = 0.5.
57
Example: p1 – p2 Sample Size
What sample size is needed to estimate
p1 – p2 with 90% confidence and a width of 0.05?
width 0.05
ME 0.025
2 2
n1 n2 2
2164.82 2165
(0.025)
58
8.6
59
F-Test for Equal
Population Variances
One-Tailed Test
H0: 2
1 2
2
Ha: 2
1 2
2
(or Ha: 1
2
2
2 )
2
s s12 2
or F s 2 when H a : 1 2
2
Test statistic: F
2
2
s
1
2
Ha: 2
1 2
2
s12 s 2
2
Test statistic: F 2 when s1 s2 or 2 when s2 s1
2 2 2 2
s2 s1
Rejection region: F > F
• p-value = P(F* < 1/Fc) + P(F > Fc)
where F/2 is based on v1 = numerator degrees of freedom
and v2 = denominator degrees of freedom; v1 and v2 are
the degrees of freedom for the numerator and
denominator sample variances, respectively.
61
Conditions Required for a Valid F-Test for
Equal Variances
62
Key Ideas
Key Words for Identifying the Target Parameter
2
2 Ratio (or difference) in variances, spreads
63
Key Ideas
Determining the Sample Size
2 1 2
2
z 2
2
Estimating – : n1 n2 2
(SE)
z pq p q
2
Estimating p – p: n1 n2
2 1 1 2 2
2
(SE)
z
2 2
• Estimating 2 d
d : nd
(SE) 2
64
Key Ideas
Conditions Required for Inferences about µ1 – µ2
Large Samples:
1. Independent random samples
2. n1 ≥ 30, n2 ≥ 30
Small Samples:
1. Independent random samples
2. Both populations normal
3. 12 22
65
Key Ideas
Conditions Required for Inferences about 1 2
/ 2
2
66
Key Ideas
Conditions Required for Inferences about µd
Large Samples:
1. Random sample of paired differences
2. nd ≥ 30
Small Samples:
1. Random sample of paired differences
2. Population of differences is normal
67
Key Ideas
Conditions Required for Inferences about p1 – p2
Large Samples:
1. Independent random samples
2. n1p1 ≥ 15, n1q1 ≥ 15
3. n2p2 ≥ 15, n2q2 ≥ 15
68
Key Ideas
Using a Confidence Interval for (µ1 – µ2) or (p1 – p2)
to Determine whether a Difference Exists
1. If the confidence interval includes all positive numbers
(+, +): Infer µ1> µ2 or p1 > p2
2. If the confidence interval includes all negative numbers
(–, –): Infer µ1 < µ2 or p1 < p2
3. If the confidence interval includes 0 (–, +): Infer no
evidence of a difference
69