Two Sample Test

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

HCMC University of Technology Probability and Statistics

Dung Nguyen

Inferences Based on Two Samples


Outline I

1 Two Independent samples

2 Analysis of Paired Data

Dung Nguyen Probability and Statistics 2/40


Two Independent samples

1 Two Independent samples


Introduction
Inferences for Two Population Means

Dung Nguyen Probability and Statistics 3/40


Two Independent samples Introduction

Basic Assumptions
1 X1 , X2 , . . . , Xm is a random sample from a distribution with mean µ1
and variance σ12 .
2 Y1 , Y2 , . . . , Yn is a random sample from a distribution with mean µ2
and variance σ22 .
3 The X and Y samples are independent of one another.
If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
Z= q ∼ N (0, 1)
σ12 σ22
m + n

E(X − Y) = E(X) − E(Y) = µ1 − µ2


σ12 σ22
V(X − Y) = V(X) + V(Y) = +
m n

Dung Nguyen Probability and Statistics 4/40


Two Independent samples Inferences for Two Population Means

Normal Population + Known σ (Hypothesis Tests on


the Difference in Means)
First of all, compute a statistic
r
(x − y) − ∆ σ12 σ22
z= , se = +
se m n
Then apply the following decision rule

Dung Nguyen Probability and Statistics 5/40


Two Independent samples Inferences for Two Population Means

Normal Population + Known σ (Hypothesis Tests on


the Difference in Means)
First of all, compute a statistic
r
(x − y) − ∆ σ12 σ22
z= , se = +
se m n
Then apply the following decision rule
H1 Rejection Region
µ1 − µ2 6= ∆ |z| > zα/2
µ1 − µ2 < ∆ z < −zα
µ1 − µ2 > ∆ t > tα

Dung Nguyen Probability and Statistics 5/40


Two Independent samples Inferences for Two Population Means

Example 1 - Gas Mileage


A consumer-research organization routinely selects several car
models each year and evaluates their fuel efficiency. In this
year’s study of two similar subcompact models from two different
automakers, the average gas mileage for twelve cars of brand A was
27.2 miles per gallon. The nine brand B cars that were tested
averaged 32.1 mpg. At α = 0.01 should it conclude that brand B cars
have higher average gas mileage than brand A cars do? Suppose that
two populations have normal distribution with standard deviations
3.8 mpg and 4.3 mpg respectively.

Dung Nguyen Probability and Statistics 6/40


Two Independent samples Inferences for Two Population Means

Solution
We test the following hypotheses
H0 : µ1 − µ2 ≥ 0 vs. H 1 : µ1 − µ 2 < 0 (i.e.µ1 < µ2 )
Compute the statistic
(27.2 − 32.1) − 0
z= q = −2.715
3.82 4.32
12 + 9
Since c = −z0.01 = −2.326 and z < c, we can reject H0 .

Dung Nguyen Probability and Statistics 7/40


Two Independent samples Inferences for Two Population Means

Example 2 - Drying Time


A product developer is interested in reducing the drying time of a
primer paint. Two formulations of the paint are tested;
formulation 1 is the standard chemistry, and formulation 2 has a
new drying ingredient that should reduce the drying time. From
experience, it is known that the standard deviation of drying time
is 8 minutes, and this inherent variability should be unaffected by
the addition of the new ingredient. Ten specimens are painted with
formulation 1, and another 10 specimens are painted with
formulation 2; the 20 specimens are painted in random order. The
two sample average drying times are X = 121 minutes and Y = 112
minutes, respectively. What conclusions can the product developer
draw about the effectiveness of the new ingredient, using α = 0.05?

Dung Nguyen Probability and Statistics 8/40


Two Independent samples Inferences for Two Population Means

Solution
We test the following hypotheses
H 0 : µ1 − µ 2 ≤ 0 vs. H1 : µ1 − µ2 > 0 (i.e. µ1 > µ2 ).
m = n = 10. X = 121, Y = 112. Thus z = 2.52.

Interpretation: We can conclude that adding the new ingredient to


the paint significantly reduces the drying time.

Dung Nguyen Probability and Statistics 9/40


Two Independent samples Inferences for Two Population Means

Normal Population + Known σ (Confidence Interval


on a Difference in Means)
If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
Z= q ∼ N (0, 1)
σ12 σ22
m + n
A 100(1 − α)% confidence interval for µ1 − µ2 is
µ1 − µ2 = (X − Y) ± zα/2 se
where r
σ12 σ22
se = +
m n

Dung Nguyen Probability and Statistics 10/40


Two Independent samples Inferences for Two Population Means

Example 3 - Wings of aircrafts


Tensile strength tests were performed on two different grades of
aluminum spars used in manufacturing the wing of a commercial
transport aircraft. From past experience with the spar
manufacturing process and the testing procedure, the standard
deviations of tensile strengths are assumed to be known. The data
obtained are as follows: m = 10, x = 87.6, σ1 = 1, n = 12, y = 74.5, σ2 = 1.5.
If µ1 and µ2 denote the true mean tensile strengths for the two
grades of spars, determine a 90% confidence interval on the
difference in mean strength µ1 − µ2 .

Dung Nguyen Probability and Statistics 11/40


Two Independent samples Inferences for Two Population Means

Example 3 - Wings of aircrafts


Tensile strength tests were performed on two different grades of
aluminum spars used in manufacturing the wing of a commercial
transport aircraft. From past experience with the spar
manufacturing process and the testing procedure, the standard
deviations of tensile strengths are assumed to be known. The data
obtained are as follows: m = 10, x = 87.6, σ1 = 1, n = 12, y = 74.5, σ2 = 1.5.
If µ1 and µ2 denote the true mean tensile strengths for the two
grades of spars, determine a 90% confidence interval on the
difference in mean strength µ1 − µ2 .
Solution
r
12 1.52
µ1 − µ2 = (87.6 − 74.5) + 1.645 + = [12.22, 13.98]
10 12

Dung Nguyen Probability and Statistics 11/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Equal Variances


(Hypothesis Tests on the Difference in Means)
If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
∼ tm+n−2
se
where r 
1 1 (m − 1)s21 + (n − 1)s22
se = s + and s2 = .
m n m+n−2
First of all, compute a statistic
(x − y) − ∆
t=
se
Then apply the following decision rule

Dung Nguyen Probability and Statistics 12/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Equal Variances


(Hypothesis Tests on the Difference in Means)
If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
∼ tm+n−2
se
where r 
1 1 (m − 1)s21 + (n − 1)s22
se = s + and s2 = .
m n m+n−2
First of all, compute a statistic
(x − y) − ∆
t=
se
Then apply the following decision rule
H1 Rejection Region
µ1 − µ2 6= ∆ |t| > tα/2,m+n−2
µ1 − µ2 < ∆ t < −tα,m+n−2
µ1 − µ2 > ∆ t > tα,m+n−2

Dung Nguyen Probability and Statistics 12/40


Two Independent samples Inferences for Two Population Means

Example 4 - Online vs Classroom


The course coordinator wants to determine if two ways of taking the
course resulted in a significant difference in achievement as
measured by the final exam for the course. The following table
gives the scores on an examination with 45 possible points for two
groups.
Online 32 37 35 28 41 44 35 31 34
Classroom 35 31 29 25 34 40 27 32 31
Do these data present sufficient evidence to indicate that the
average grade for students who take the course online is
significantly higher than for those who attend a conventional
class? Assume that the sample population are both normal and have
the same variances and the significance level α = 0.01.

Dung Nguyen Probability and Statistics 13/40


Two Independent samples Inferences for Two Population Means

Solution
We test the following hypotheses
H0 : µ1 − µ2 ≤ 0 vs. H 1 : µ1 − µ2 > 0 (i.e. µ1 > µ2 )
Compute m = 9, n = 9, x = 35.22, y = 31.56, s1 = 4.94, s2 = 4.48. Thus
(9 − 1)4.942 + (9 − 1)4.482
s2 = = 22.2361
9+9−2
and the statistic
(35.22 − 31.56) − 0
t= q  = 1.6495
22.2361 19 + 91
Since c = t0.01,9+9−2 = t0.01,16 = 2.583 > t, we fail reject H0 .

Dung Nguyen Probability and Statistics 14/40


Two Independent samples Inferences for Two Population Means

Example 5 - Yield from a Catalyst


Two catalysts are being analyzed to determine how they affect the
mean yield of a chemical process. Specifically, catalyst 1 is
currently in use, but catalyst 2 is acceptable. Since catalyst 2
is cheaper, it should be adopted, providing it does not change the
process yield. A test is run in the pilot plant and results in the
data shown in the table. Is there any difference between the mean
yields? Use α = 0.05, and assume equal variances.

Obs. 1 2 3 4 5 6 7 8
Cat. 1 91.50 94.18 92.18 95.39 91.79 89.07 94.72 89.21
Cat. 2 89.19 90.95 90.46 93.21 97.19 97.04 91.07 92.75

Dung Nguyen Probability and Statistics 15/40


Two Independent samples Inferences for Two Population Means

Solution
We test the following hypotheses
H0 : µ1 − µ2 = 0 vs. H1 : µ1 − µ2 6= 0.
x = 92.255, y = 92.733, σx = 2.39, σy = 2.98. Then s2 = 7.30 and t0 = −0.35.
The null hypothesis cannot be rejected.

Interpretation: At 5% level of significance, we do not have strong


evidence to conclude that catalyst 2 results in a mean yield that
differs from the mean yield when catalyst 1 is used.

Dung Nguyen Probability and Statistics 16/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Equal Variances σ


(Confidence Interval on a Difference in Means)
If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
∼ tm+n−2
se
where r 
1 1 (m − 1)s21 + (n − 1)s22
se = s + and s2 = .
m n m+n−2

A 100(1 − α)% confidence interval for µ1 − µ2 is


µ1 − µ2 = (X − Y) ± tα/2 · se

Dung Nguyen Probability and Statistics 17/40


Two Independent samples Inferences for Two Population Means

Example 6 - Cement Hydration


Ten samples of standard cement had an average weight percent
calcium of x = 90.0 with a sample standard deviation of s1 = 5.0, and
15 samples of the lead-doped cement had an average weight percent
calcium of y = 87.0 with a sample standard deviation of s2 = 4.0.
Assume that weight percent calcium is normally distributed with
same standard deviation. Find a 95% confidence interval on the
difference in means, µ1 − µ2 , for the two types of cement.

Dung Nguyen Probability and Statistics 18/40


Two Independent samples Inferences for Two Population Means

Example 6 - Cement Hydration


Ten samples of standard cement had an average weight percent
calcium of x = 90.0 with a sample standard deviation of s1 = 5.0, and
15 samples of the lead-doped cement had an average weight percent
calcium of y = 87.0 with a sample standard deviation of s2 = 4.0.
Assume that weight percent calcium is normally distributed with
same standard deviation. Find a 95% confidence interval on the
difference in means, µ1 − µ2 , for the two types of cement.

Solution
q 
1 1
s2 = 19.52, =⇒ se = 19.62 10 + 15 = 1.796. Thus
µ1 − µ2 = 90.0 − 87.0 ± (2.069)(1.796)

Dung Nguyen Probability and Statistics 18/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Unequal Variances


If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
q ≈ tν
s21 s22
m + n
where  2
s21 s22
m + n
ν=  2 2  2 .
1 s 1 s22
m−1 m
1
+ n−1 n

Dung Nguyen Probability and Statistics 19/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Unequal Variances


If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
q ≈ tν
s21 s22
m + n
where  2
s21 s22
m + n
ν=  2 2  2 .
1 s 1 s22
m−1 m
1
+ n−1 n

Example 1
Determine the number of degrees of freedom for the two-sample t
test or CI in each of the following situations:

Dung Nguyen Probability and Statistics 19/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Unequal Variances


If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
q ≈ tν
s21 s22
m + n
where  2
s21 s22
m + n
ν=  2 2  2 .
1 s 1 s22
m−1 m
1
+ n−1 n

Example 1
Determine the number of degrees of freedom for the two-sample t
test or CI in each of the following situations:
1 m = 10, n = 10, s1 = 5.0, s2 = 6.0 =⇒ ν = 17.433 ≈ 17.

Dung Nguyen Probability and Statistics 19/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Unequal Variances


If both X and Y are normal then
(X − Y) − (µ1 − µ2 )
q ≈ tν
s21 s22
m + n
where  2
s21 s22
m + n
ν=  2 2  2 .
1 s 1 s22
m−1 m
1
+ n−1 n

Example 1
Determine the number of degrees of freedom for the two-sample t
test or CI in each of the following situations:
1 m = 10, n = 10, s1 = 5.0, s2 = 6.0 =⇒ ν = 17.433 ≈ 17.
2 m = 10, n = 15, s1 = 5.0, s2 = 6.0 =⇒ ν = 21.711 ≈ 22.

Dung Nguyen Probability and Statistics 19/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Unequal Variances


(Hypothesis Tests on the Difference in Means)
The Two-Sample t test for testing H0 : µ1 − µ2 = ∆0
We can test hypotheses about this difference based on the statistic
H1 Rejection Region
(x − y) − ∆0 µ1 − µ2 6= ∆0 |t| > tα/2,ν
T= q 2
s1 2
s2 µ1 − µ2 > ∆0 t > tα,ν
m + n µ1 − µ2 < ∆0 t < −tα,ν
where
s22 2
 
s21
m + n
ν=  2 2  2 2 .
1 s 1 s
m−1 m
1
+ n−1 n2

Dung Nguyen Probability and Statistics 20/40


Two Independent samples Inferences for Two Population Means

Example 7 - Arsenic in Drinking Water


Arsenic concentration in public drinking water supplies is a potential
health risk. An article in the Arizona Republic (May 27, 2001) reported
drinking water arsenic concentrations (in ppb) for 10 metropolitan Phoenix
communities and 10 communities in rural Arizona.

Metro Phoenix Rural Arizona


Phoenix 3 Rimrock 48
Chandler 7 Goodyear 44
Gilbert 25 New River 40
Glendale 10 Apache Junction 38
Mesa 15 Buckeye 33
Paradise Valley 6 Nogales 21
Peoria 12 Black Canyon City 20
Scottsdale 25 Sedona 12
Tempe 15 Payson 1
Sun City 7 Casa Grande 18
Determine if there is any difference in mean arsenic concentrations
between metropolitan Phoenix communities and communities in rural Arizona.

Dung Nguyen Probability and Statistics 21/40


Two Independent samples Inferences for Two Population Means

Solution
We test the following hypotheses
H0 : µ1 = µ2 vs. H1 : µ1 6= µ2
x = 12.5, y = 27.5, s1 = 7.63, s2 = 15.3
=⇒ t0 = −2.77, ν = 13.2 ≈ 13, t0.025,13 = 2.160. Thus we reject the null
hypothesis.

Interpretation: We can conclude that mean arsenic concentration in


the drinking water in rural Arizona is different from the mean
arsenic concentration in metropolitan Phoenix drinking water.

Dung Nguyen Probability and Statistics 22/40


Two Independent samples Inferences for Two Population Means

Normal Population + Unknown σ + Unequal Variances


σ (Confidence Interval on a Difference in Means)
The Two-Sample t Confidence Interval for µ1 − µ2
r
s21 s22
X − Y ± tα/2,ν +
m n
where
s22 2
 2 
s1
m + n
ν=  2 2  2 2
1 s1 1 s2
m−1 m + n−1 n

A one-sided CI can be calculated as described earlier.

Dung Nguyen Probability and Statistics 23/40


Two Independent samples Inferences for Two Population Means

Example 8 -
The void volume within a textile fabric affects comfort,
flammability, and insulation properties. Permeability of a fabric
refers to the accessibility of void space to the flow of a gas or
liquid. An article gave summary information on air permeability
(cm3/cm2/sec) for a number of different fabric types. Consider the
following data on two different types of plain-weave fabric:

Fabric Type Sample Size Sample Mean Sample Std


Cotton 10 51.71 0.79
Triacetate 10 136.14 3.59
Assuming that the porosity distributions for both types of fabric
are normal, let’s calculate a confidence interval for the
difference between true average porosity for the cotton fabric and
that for the acetate fabric, using γ = 95%.

Dung Nguyen Probability and Statistics 24/40


Two Independent samples Inferences for Two Population Means

Solution
r
0.792 3.592
se = + = 1.162
10 10
  2
0.792 3.592
10 + 10
df = = 9.8696
0.792 2 2 2
1

9 10 + 19 3.59
10
Thus
µ1 − µ2 = (51.71 − 136.14) ± (2.228)(1.162).

Dung Nguyen Probability and Statistics 25/40


Two Independent samples Inferences for Two Population Means

Large Sample Size


If m and n are large then
(X − Y) − (µ1 − µ2 )
z= q ' N (0, 1)
S12 S22
m + n
First of all, compute a statistic
r
(x − y) − ∆ s21 s22
z= , se = +
se m n
Then apply the following decision rule

Dung Nguyen Probability and Statistics 26/40


Two Independent samples Inferences for Two Population Means

Large Sample Size


If m and n are large then
(X − Y) − (µ1 − µ2 )
z= q ' N (0, 1)
S12 S22
m + n
First of all, compute a statistic
r
(x − y) − ∆ s21 s22
z= , se = +
se m n
Then apply the following decision rule
H1 Rejection Region
µ1 − µ2 6= ∆ |z| > zα/2
µ1 − µ2 < ∆ z < −zα
µ1 − µ2 > ∆ z > zα

Dung Nguyen Probability and Statistics 26/40


Two Independent samples Inferences for Two Population Means

Example 9 - Battery Lifetime


To compare the average life of two brands of 9-volt batteries, a
sample of 100 batteries from each brand is tested. The sample
selected from the first brand shows an average life of 47 hours and
a standard deviation of 4 hours. A mean life of 48 hours and a
standard deviation of 3 hours are recorded for the sample from the
second brand. Is the observed difference between the means of the
two samples significant at the 0.01 level?

Dung Nguyen Probability and Statistics 27/40


Two Independent samples Inferences for Two Population Means

Example 9 - Battery Lifetime


To compare the average life of two brands of 9-volt batteries, a
sample of 100 batteries from each brand is tested. The sample
selected from the first brand shows an average life of 47 hours and
a standard deviation of 4 hours. A mean life of 48 hours and a
standard deviation of 3 hours are recorded for the sample from the
second brand. Is the observed difference between the means of the
two samples significant at the 0.01 level?
Solution
We test the following hypotheses
H0 : µ1 − µ2 = 0 vs. H1 : µ1 − µ2 6= 0 (i.e. µ1 6= µ2 )
Compute the statistic
(47 − 48) − 0
z= q = −2.
42 32
100 + 100
Since c = z0.005 = 2.58 and |z| < c, we cannot reject H0 .

Dung Nguyen Probability and Statistics 27/40


Two Independent samples Inferences for Two Population Means

Summary
Compute
(x − y) − ∆ (x − y) − ∆
z= or t=
se se
Test two means Normal Known σ 2 q
σ12 σ22
se = m + n N(0, 1)

Population Unknown σ 2 q
se = s m1 + 1
n t(m + n − 2)
distribution Equal

Any q
s21 s22
se = + N(0, 1)
m, n  1 m n

where r
(m − 1)s21 + (n − 1)s22
s=
m+n−2

Dung Nguyen Probability and Statistics 28/40


Analysis of Paired Data

2 Analysis of Paired Data


Normal distribution
Inferences for Population Proportions (Large-Sample)

Dung Nguyen Probability and Statistics 29/40


Analysis of Paired Data Normal distribution

Distribution of the Sample Differences


Assumptions
The data consists of n independently selected pairs (X1 , Y1 ),
(X2 , Y2 ), . . . , (Xn , Yn ), with E(Xi ) = µ1 and E(Yi ) = µ2 . Let
D1 = X1 − Y1 , D2 = X2 − Y2 , . . . , Dn = Xn − Yn
So the Di ’s are the differences within pairs. Then the Di ’s are
assumed to be normally distributed with mean µD and variance σD2 .

Dung Nguyen Probability and Statistics 30/40


Analysis of Paired Data Normal distribution

Distribution of the Sample Differences


Assumptions
The data consists of n independently selected pairs (X1 , Y1 ),
(X2 , Y2 ), . . . , (Xn , Yn ), with E(Xi ) = µ1 and E(Yi ) = µ2 . Let
D1 = X1 − Y1 , D2 = X2 − Y2 , . . . , Dn = Xn − Yn
So the Di ’s are the differences within pairs. Then the Di ’s are
assumed to be normally distributed with mean µD and variance σD2 .

Remark
Let D = X − Y. Then the expected difference is
µD = E(X − Y) = E(X) − E(Y) = µ1 − µ2 .
Then Di ’s constitute a normal random sample with mean µD . Moreover,

D − µD
T= √ ∼ tn−1
sD / n

Dung Nguyen Probability and Statistics 30/40


Analysis of Paired Data Normal distribution

CI and HT on the Difference in Means


Confidence Intervals
The paired t CI for µD is
SD
D ± tα/2,n−1 √
n
A one-sided confidence bound results from retaining the relevant
sign and replacing tα/2 by tα .

Dung Nguyen Probability and Statistics 31/40


Analysis of Paired Data Normal distribution

CI and HT on the Difference in Means


Confidence Intervals
The paired t CI for µD is
SD
D ± tα/2,n−1 √
n
A one-sided confidence bound results from retaining the relevant
sign and replacing tα/2 by tα .

Hypothesis Testing
H1 Rejection Region
µD 6= ∆0 |T| > tα/2,n−1
Test statistic µD > ∆ 0 T > tα,n−1
D − ∆0 µD < ∆ 0 T < −tα,n−1
T= √
SD / n

Dung Nguyen Probability and Statistics 31/40


Analysis of Paired Data Normal distribution

Example 10 - Parallel Parking


The journal Human Subject 1st car 2nd car Difference
Factors (1962, pp. 1 37.0 17.8 19.2
375–380) reported a 2 25.8 20.2 5.6
study in which 14 sub- 3 16.2 16.8 -0.6
jects were asked to 4 24.2 41.4 -17.2
parallel park two cars 5 22.0 21.4 0.6
having very different 6 33.4 38.4 -5.0
wheel bases and turn- 7 23.8 16.8 7.0
ing radii. The time 8 58.2 32.2 26.0
in seconds for each 9 33.6 27.8 5.8
subject was recorded 10 24.4 23.2 1.2
and is given in the 11 23.4 29.6 -6.2
table. Find the 90% 12 21.2 20.6 0.6
confidence interval for 13 36.2 32.2 4.0
µ D = µ1 − µ 2 . 14 29.8 53.8 -24.0

Dung Nguyen Probability and Statistics 32/40


Analysis of Paired Data Normal distribution

Solution
From the column of observed differences, we calculate D = 1.21 and
sD = 12.68. Thus

µD = 1.21 ± (1.771)(12.68)/ 14 = [−4.79, 7.21]

Dung Nguyen Probability and Statistics 33/40


Analysis of Paired Data Normal distribution

Example 11 - Zinc Concentration


Trace metals in drinking water affect the flavor, and unusually
high concentrations can pose a health hazard. An article reports
on a study in which six river locations were selected (six
experimental objects) and the zinc concentration (mg/L) determined
for both surface water and bottom water at each location. The six
pairs of observations are displayed in the accompanying table.
Does the data suggest that true average concentration in bottom
water exceeds that of surface water? (α = 0.05)

Zinc concentration 1 2 3 4 5 6
in bottom water (x) 0.430 0.266 0.567 0.531 0.707 0.716
in surface water (y) 0.415 0.238 0.390 0.410 0.605 0.609
Difference 0.015 0.028 0.177 0.121 0.102 0.107

Dung Nguyen Probability and Statistics 34/40


Analysis of Paired Data Normal distribution

Solution
We test the following hypotheses
H0 : µ1 − µ2 ≤ 0 vs. H1 : µ1 − µ2 > 0.
D = 0.0917, sD = 0.0607.Compute the statistic
0.0917 − 0
z= √ = 3.6998.
0.0607 6
t0.05 = 2.015 =⇒ Reject H0 .

Dung Nguyen Probability and Statistics 35/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Distribution of the Difference in Proportions


Proposition
Let p̂1 = X/m and p̂2 = Y/n, where X ∼ B(m, p1 ) and Y ∼ B(n, p2 ) with X⊥Y.
Then
E(p̂1 − p̂2 ) = p1 − p2
So (p̂1 − p̂2 ) is an unbiased estimator of (p1 − p2 ), and
p1 q1 p2 q2
V(p̂1 − p̂2 ) = +
m n
The following test statistic is distributed approximately as
standard normal and is the basis of the test:
(p̂1 − p̂) − (p1 − p2 )
Z= p p1 q1 p2 q2
m + n

Dung Nguyen Probability and Statistics 36/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Example 12 - Crankshaft Bearings


Consider the process of manufacturing crankshaft bearings. Suppose
that a modification is made in the surface finishing process and
that, subsequently, a second random sample of 85 bearings is
obtained. The number of defective bearings in this second sample
is 8. Suppose that
m = 85, p̂1 = 10/85 = 0.1176, n = 85, p̂2 = 8/85 = 0.0941
Obtain an approximate 95% confidence interval on the difference in
the proportion of defective bearings produced under the two
processes.

Dung Nguyen Probability and Statistics 37/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Example 12 - Crankshaft Bearings


Consider the process of manufacturing crankshaft bearings. Suppose
that a modification is made in the surface finishing process and
that, subsequently, a second random sample of 85 bearings is
obtained. The number of defective bearings in this second sample
is 8. Suppose that
m = 85, p̂1 = 10/85 = 0.1176, n = 85, p̂2 = 8/85 = 0.0941
Obtain an approximate 95% confidence interval on the difference in
the proportion of defective bearings produced under the two
processes.

Solution
p1 − p2 = (0.1176 − 0.0941) ± 1.96(0.0472)
= [−0.0685, 0.1155]

Dung Nguyen Probability and Statistics 37/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Large-Sample (Hypothesis Tests on the Difference


in Proportion)
Although for population means the case ∆0 = 0 presented no
difficulties, for population proportions ∆0 = 0 and ∆0 6= 0 must be
considered separately. Since the vast majority of actual problems
of this sort involve ∆0 = 0, we’ll concentrate on this case.

Dung Nguyen Probability and Statistics 38/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Large-Sample (Hypothesis Tests on the Difference


in Proportion)
Although for population means the case ∆0 = 0 presented no
difficulties, for population proportions ∆0 = 0 and ∆0 6= 0 must be
considered separately. Since the vast majority of actual problems
of this sort involve ∆0 = 0, we’ll concentrate on this case.

A Large-Sample z Test H0 : p̂1 − p̂2 = 0


Test statistic H1 Rejection
(p̂1 − p̂2 ) − ∆0 mp̂1 + np̂2 p̂1 − p̂2 6= 0 |Z| > zα/2
Z= q p= ,
p q m+n1 1 m+n p̂1 − p̂2 > 0 Z > zα
p̂1 − p̂2 < 0 Z < −zα
The test can safely be used as long as mp̂1 , m q̂1 , np̂2 , and nq̂2 are all
at least 10.

Dung Nguyen Probability and Statistics 38/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Example 13 - St. John’s Wort


Extracts of St. John’s Wort are widely used to treat depression.
An article in the April 18, 2001, issue of the Journal of the
American Medical Association compared the efficacy of a standard
extract of St. John’s Wort with a placebo in 200 outpatients
diagnosed with major depression. Patients were randomly assigned
to two groups; one group received the St. John’s Wort, and the
other received the placebo. After eight weeks, 19 of the placebo-
treated patients showed improvement, and 27 of those treated with
St. John’s Wort improved. Is there any reason to believe that St.
John’s Wort is effective in treating major depression? Use α = 0.05.

Dung Nguyen Probability and Statistics 39/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Example 13 - St. John’s Wort


Extracts of St. John’s Wort are widely used to treat depression.
An article in the April 18, 2001, issue of the Journal of the
American Medical Association compared the efficacy of a standard
extract of St. John’s Wort with a placebo in 200 outpatients
diagnosed with major depression. Patients were randomly assigned
to two groups; one group received the St. John’s Wort, and the
other received the placebo. After eight weeks, 19 of the placebo-
treated patients showed improvement, and 27 of those treated with
St. John’s Wort improved. Is there any reason to believe that St.
John’s Wort is effective in treating major depression? Use α = 0.05.

Solution
p̂1 = 27/100 = 0.27, p̂2 = 19/100 = 0.19, m = n = 100. Then
19 + 27 p
p= = 0.23 and SE = 0.23(1 − 0.23)(1/100 + 1/100) = 0.0595.
100 + 100
Since z0 = 1.34 and z0.025 = 1.96, we cannot reject the null hypothesis.
Dung Nguyen Probability and Statistics 39/40
Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Large-Sample (Confidence Interval on the


Difference in Proportion)
The Paired t Confidence Interval
A CI for p1 − p2 is r
p̂1 q̂1 p̂2 q̂2
(p̂1 − p̂2 ) ± zα/2 +
m n

Dung Nguyen Probability and Statistics 40/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Large-Sample (Confidence Interval on the


Difference in Proportion)
The Paired t Confidence Interval
A CI for p1 − p2 is r
p̂1 q̂1 p̂2 q̂2
(p̂1 − p̂2 ) ± zα/2 +
m n

This interval can safely be used as long as mp̂1 , m q̂1 , nq̂2 , and nq̂2
are all at least 10.

Dung Nguyen Probability and Statistics 40/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Large-Sample (Confidence Interval on the


Difference in Proportion)
The Paired t Confidence Interval
A CI for p1 − p2 is r
p̂1 q̂1 p̂2 q̂2
(p̂1 − p̂2 ) ± zα/2 +
m n

This interval can safely be used as long as mp̂1 , m q̂1 , nq̂2 , and nq̂2
are all at least 10.
A one-sided confidence bound results from retaining the
relevant sign and replacing zα/2 by zα .

Dung Nguyen Probability and Statistics 40/40


Analysis of Paired Data Inferences for Population Proportions (Large-Sample)

Large-Sample (Confidence Interval on the


Difference in Proportion)
The Paired t Confidence Interval
A CI for p1 − p2 is r
p̂1 q̂1 p̂2 q̂2
(p̂1 − p̂2 ) ± zα/2 +
m n

This interval can safely be used as long as mp̂1 , m q̂1 , nq̂2 , and nq̂2
are all at least 10.
A one-sided confidence bound results from retaining the
relevant sign and replacing zα/2 by zα .
The estimated standard deviation of (p̂1 − p̂2 ) is different here
from what it was for hypothesis testing when ∆0 = 0.

Dung Nguyen Probability and Statistics 40/40

You might also like