0% found this document useful (0 votes)
16 views62 pages

CH06

This document discusses inferences about differences between two population parameters. Specifically, it covers constructing confidence intervals and testing hypotheses about the difference between two population means (μ1 - μ2) and proportions (p1 - p2) when samples are large and independent. The key points are: 1. When samples are large and independent, the sampling distribution of the difference between the two sample means (x1 - x2) is approximately normal. 2. Formulas are provided to calculate the mean, standard deviation, and a confidence interval for the difference between two population means (μ1 - μ2) based on the normal distribution. 3. Examples demonstrate how to use the formulas to construct a 95% confidence

Uploaded by

ISLAM KHALED ZSC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views62 pages

CH06

This document discusses inferences about differences between two population parameters. Specifically, it covers constructing confidence intervals and testing hypotheses about the difference between two population means (μ1 - μ2) and proportions (p1 - p2) when samples are large and independent. The key points are: 1. When samples are large and independent, the sampling distribution of the difference between the two sample means (x1 - x2) is approximately normal. 2. Formulas are provided to calculate the mean, standard deviation, and a confidence interval for the difference between two population means (μ1 - μ2) based on the normal distribution. 3. Examples demonstrate how to use the formulas to construct a 95% confidence

Uploaded by

ISLAM KHALED ZSC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

-287-

CHAPTER (6)
ESTIMATION AND

HYPOTHESIS

TESTING:
TWO POPULATIONS
-288-

Chapters 4 and 5 discussed the estimation and


hypothesis-testing procedures for  and P involving a single
population. This chapter extends the discussion of estimation
and hypothesis testing procedures to the difference between
two population means and the difference between two
population proportions. For example, we may want to make a
confidence interval for the difference between mean prices of
houses in Cairo and in Sohag. Or we may want to test the
hypothesis that the mean price of houses in Cairo is different
from that in Sohag. As another example, we may want to
make a confidence interval for the difference between the
proportions of all male and female adults who abstain from
smoking. Or we may want to test the hypothesis that the
proportion of all adult males who abstain from smoking is
different from the proportion of all adult females who abstain
from smoking. Constructing confidence intervals and testing
hypotheses about population parameters are referred to as
making inferences.

6.1 INFERENCES ABOUT THE DIFFERENCE


BETWEEN TWO POPULATION MEANS
FOR LARGE AND INDEPENDENT
SAMPLES
Let 1 be the mean of the first population and 2 be the
mean of the second population. Suppose we want to make a
confidence interval and test a hypothesis about the difference
between these two population means, that is 1 2. Let x1 be
the mean of a sample taken from the first population and x2
be the mean of a sample taken from the second population.
Then, x1 x2 is the sample statistic that is used to make an
interval estimate and to test a hypothesis about 1 2 . This
section discusses how to make confidence intervals and test
-289-

hypotheses about 1 2 when the two samples are large and
independent. As discussed in earlier chapters, in the case of
 , a sample is considered to be large if it contains 30 or more
observations. The concept of independent and dependent
samples is explained next.

6.1.1 INDEPENDENT VERSUS DEPENDENT SAMPLES


Two samples are independent if they are drawn from
two different populations and the elements of one sample
have no relationship to the elements of the second sample. If
the elements of the two samples are somehow related, then
the samples are said to be dependent. Thus, in two
independent samples, the selection of one sample has no
effect on the selection of the second sample.

INDEPENDENT VERSUS DEPENDENT SAMPLES


Two samples drawn from two populations are independent
if the selection of one sample from one population does not
affect the selection of the second sample from the second
population. Otherwise, the samples are dependent.

Example 6-1 and 6-2 illustrate independent and


dependent, respectively.
Example 6-1: Suppose we want to estimate the difference
between the mean salaries of all male and female executives.
To do so, we draw two samples, one from the population of
male executives and another from the population of female
executives. These two samples are independent because they
are drawn from two different populations and the samples
have no effect on each other.
Example 6-2: Suppose we want to estimate the difference
between the mean weights of all participants before and after
a weight loss program. To accomplish this, suppose we take a
-290-
sample of 40 participants and measure their weights before
and after the completion of the program. Note that these two
samples include the 40 participants. This is an example of two
dependent samples. Such samples are also called paired or
matched samples.
This section and sections 6.1, and 6.2 discuss how to
make confidence intervals and hypotheses about the
difference between two population parameters when samples
are independent. Section 6.3 discusses how to make
confidence intervals and test hypotheses about the difference
between two population means when samples are dependent.

6.1.2. MEAN, STANDARD DEVIATION, AND


SAMPLING DISTRIBUTION OF x1 – x2
Suppose we draw two (independent) large samples
from two different populations that are referred to as
population 1 and population 2. Let
1 = the mean of population 1
2 = the mean of population 2
 1 = the standard deviation of population 1
2 = the standard deviation of population 2
n1 = the size of the sample drawn from population 1
(n1  30)
n2 = the size of the sample drawn from population 2
(n2  30)
x1 = the mean of the sample drawn from population 1
x2 = the mean of the sample drawn from population2
Then, from the central limit theorem, x1 is approximately
normally distributed with mean 1 and standard
deviation1/ n1 and x2 is approximately normally
distributed with mean 2 and standard deviation 2 / n2
-291-
Using these results, we can now make the following
statements about the mean, the standard deviation, and the
shape of the sampling distribution of x1  x2 . Figure 6.1
shows the sampling distribution of x1 x2 .
1. The mean of x1  x2 , denoted by x1  x2 is  x1  x2 =
1  2
2. The standard deviation of x1  x2 , denoted by x1 - x2 is

3. Regardless of the shapes of the two populations, the shape


of the sampling distribution of x1  x2 is approximately
normal. This is so because the difference between two
normally distributed random variables is also normally
distributed. Note again that for this to hold true, both
samples must be large.
Figure 6.1

THE SAMPLING DISTRIBUTION, MEAN, AND


STANDARD DEVIATION OF x1  x2
For two large and independent samples, selected from two
different populations, the sampling distribution of x1  x2 is
(approximately) normal with its mean and standard deviation
as follows.
-292-

However, we usually do not know the standard


deviations 1 and 2 of the two populations. In such cases ,
we replace x1  x2 by its point estimator sx1  x2 which is
calculated as follows.

AN ESTIMATE OF THE STANDARD DEVIATION OF x1  x2


The value of sx1  x2 which gives an estimate of x1  x2 is
calculated as

where s1 and s2 are the standard deviations of the two


samples selected from the two populations.
Thus, when both samples are large, the sampling
distribution of x1  x2 is approximately normal.
Consequently, in such cases, we use the normal distribution
to make a confidence interval and to test a hypothesis about
1  2

6.1.3 INTERVAL ESTIMATION OF 1  2


By constructing a confidence interval for 1  2 we
find the difference between the means of two populations.
For example, we may want to find the difference between the
mean heights of male and female adults. The difference
between the two sample means x1 - x2 is the point estimator
of the difference between the two population means 1  2.
Again, in this section we assume that the two samples are
-293-
large and independent. When these assumptions hold true,
we use the normal distribution to make a confidence interval
for the difference between the two population means. The
following formula gives the interval estimation for 1  2

CONFIDENCE INTERVAL FOR 1  2


The (1  )100% confidence interval for 1  2 is
( x1 - x2 ) ± z x1  x2 if 1 and 2 are known
( x1 - x2 ) ± z sx1  x2 if 1 and 2 are not known
The value of z is obtained from the normal distribution table
for the given confidence level. The values of x1  x2 and sx1  x2
are calculated as explained earlier.

Examples 6-3 and 6-4 illustrate the procedure to


construct a confidence interval for 1  2 for large samples.
In Example 6-3 the population standard deviations are
known, and in Example 6-4 they are not known.
Example 6-3: According to the Bureau of Labor Statistics, in
2003 construction workers earned an average of 551 EGP per
week and manufacturing workers earned an average of 487
EGP per week. Assume that these mean earnings have been
calculated for samples of 500 and 700 workers taken from the
two populations, respectively. Further assume that the stand-
ard deviations of weekly earnings of the two populations are
66 and 60 EGP, respectively.
(a) What is the point estimate of 1  2?
(b) Construct a 95% confidence interval for the
difference between the mean weekly earnings of the
two populations.
Solution Refer to all construction workers as population 1
and all manufacturing workers as population 2. The
-294-

respective samples, then, are samples 1 and 2. Let 1 and 2


be the means of populations 1 and 2, and let x1 and x2 be the
means of the respective samples. From the given information,
n1 = 500 , x1 = 551 EGP , 1 = 66 EGP
n2 = 700 , x2 = 487 EGP , 2 = 60 EGP
(a) The point estimate of 1  2 is given by the value of
x1  x2 . Thus,
Point estimate of 1  2 = 551  487 = 64
(b) The confidence level is 1  α = .95
First, we calculate the standard deviation of x1 - x2 as
follows.

Next, we find the z value for the 95% confidence


level. From the normal distribution table, this value of z
is 1.96. Finally, substituting all the values in the
confidence interval formula, we obtain the 95%
confidence interval for 1  2 as
( x1  x2 ) ± zx1  x2 = (551  487) ± 1.96 (3.7222)
= 64 ± 7.30 = 56.70 EGP to 71.30 EGP
Thus, with 95% confidence we can state that the
difference in the mean weekly earnings of all
construction workers and all manufacturing workers in
2003 was between 56.70 and 71.30 EGP.

Example 6- 4: A management conducted a management


progress study with AT&T management job holders to assess
the performance of those who possessed a college degree and
those who did not. Let us refer to the two groups of
participants as college and non-college participants,
-295-
respectively. The samples included 274 college and 148 non-
college participants. In the area of motivation for
advancement, the mean scores were 2.89 for college
participants and 2.70 for non-college participants, with the
two standard deviations being .57 and .64 , respectively. Find
a 99% confidence interval for the difference between the
mean scores of the two respective populations in the area of
motivation for advancement.

Solution: Let all the management job holders with a college


degree be referred to as population 1 and the ones without a
college degree be referred to as population 2. We can refer to
the respective samples as samples 1 and 2. 1 and 2 be the
means of populations 1 and 2, respectively, and let x1 and x2
be the means of the respective samples. From the given
information,
n1 = 274 , x1 = 2.89 , s1 = .57
n2 = 148 , x2 = 2.70 , s2 = .64
The confidence level is 1   = 0.99.
Because 1 and 2 are not known, we replace x1  x2 by sx1  x2
in the confidence interval formula. The value of sx1  x2 is

From the normal distribution table, the z value for a 99%


confidence level is (approximately) 2.58. The 99% confidence
interval for 1  2 is
( x1  x2 ) ± z sx1  x2 = (2.89 - 2.70) ± 2.58 (.0629)
= .19 ± .16 = .03 to .35
Thus, with 99% confidence we can state that the
difference in the mean scores of the two populations of
-296-
managers in the area of motivation for advancement is
between .03 and .35.

6.1.4 HYPOTHESIS TESTING ABOUT 1  2


It is often necessary to compare the means of two
populations. For example, we may want to know if the mean
price of houses in Cairo is the same as that in Sohag.
Similarly, we may be interested in knowing if, on average,
Egyptian children spend fewer hours in school than
American children. In both these cases we will perform a
test of hypothesis about 1  2 . The alternative hypothesis
in a test of hypothesis may be that the means of the two
populations are different, or that the mean of the first
population is greater than the mean of the second population,
or that the mean of the first population is smaller than the
mean of the second population. These three situations are
described below.
1. Testing an alternative hypothesis that the means of two
populations are different is equivalent to 1  2 , which is
the same as 1  2  0.
2. Testing an alternative hypothesis that the mean of the first
population is greater than the mean of the second
population is equivalent to 1 > 2 which is the same as
1  2 > 0
3. Testing an alternative hypothesis that the mean of the first
population is smaller than the mean of the second
population is equivalent to 1 < 2 which is the same as
1  2 < 0.
The procedure followed to perform a test of hypothesis
about the difference between two population means is
similar to the one used to test hypotheses about single
population parameters in Chapter 5. The procedure
-297-
involves the same five steps that were used in Chapter 5 to
test hypotheses about  and P. Because we are dealing
with large (and independent) samples in this section, we
will use the normal distribution to conduct a test of
hypothesis about 1  2

TEST STATISTIC z FOR x1  x2


The value of the test statistic z for x1  x2 is computed as

The value of 1  2 is substituted from H0 . If the values of


1 and 2 are not known, we replace x1  x2 by sx1  x2 in the
formula.

Example 6-5 Reconsider Example 6-3 on the mean weekly


earnings of construction workers and manufacturing
workers. Test at the 1% significance level if the mean weekly
earnings of the two groups of workers are different.
Solution From the information given in Example 6-3,
n1 = 500 , x1 = 551 EGP , 1 = 66 EGP
n2 = 700 , x2 = 487 EGP , 2 = 60 EGP
where the subscript 1 refers to construction workers and 2 to
manufacturing workers. The significance level is α = .01. Let
1 = the mean weekly earnings of all construction workers
2 = the mean weekly earnings of all manufacturing workers
Step 1. State the null and alternative hypothesis
We are to test if the two population means are
different. The two possibilities are:
(i) The mean weekly earnings of construction workers and
manufacturing workers are not different. In other
words,
-298-

1 = 2 which can be written as 1  2 = 0.


(ii) The mean weekly earnings of construction workers and
manufacturing workers are different. That is, 1  2 ,
which can be written a 1  2  0.
Considering these two possibilities, the null and alternative
hypotheses are
H0 : 1 - 2 = 0 (the mean weekly earnings are not different)
H1 : 1 - 2  0 (the mean weekly earnings are different)
Step 2. Select the distribution to use
Because n1 > 30 and n2 > 30, both sample sizes are
large. Therefore, the sampling distribution of x1  x2 is
approximately normal, and we use the normal distribution to
make the hypothesis test.
Step 3. Determine the rejection and non-rejection regions
The significance level is given to be .01. The  sign in the
alternative hypothesis indicates that the test is two-tailed. The
area in each tail of the normal distribution curve is /2 =
.01/2 = 0.005. The critical values of z for .005 areas in each
tail of the normal distribution are (approximately) 2.58 and
 2.58. These values are shown in Figure 6.2.
Step 4. Calculate the value of the test statistic
The value of the test statistic z for x1  x2 is computed as
follows.
-299-
Figure 6.2

Step 5. Make a decision :


Because the value of the test statistic z = 17.19 falls in
the rejection region, we reject the null hypothesis H0.
Therefore, we conclude that the mean weekly earnings of the
two groups of workers are different. Note that we cannot say
for sure that the two means are different. All we can say is
that the evidence from the two samples is very strong that the
corresponding population means are different.

Example 6-6 Refer to Example 6-4 on the mean scores of


college and non-college participants in the management
progress study. Test at the 5% significance level if the mean
score in the area of motivation for advancement is higher for
college degree holders than for non-college participants.
Solution From the information given in Example 6-4,
n1 = 274 , x1 = 2.89 , s1 = .57
n2 = 148 , x2 = 2.70 , s2 = .64
where subscript 1 refers to college degree holders and 2 to
non-college participants. The significance level is α = .05.
Step 1. State the null and alternative hypothesis
The two possibilities are
-300-
(i) The mean score of college degree holders is not higher
than that of the non-college participants, which can be
written as 1 = 2 or 1  2 = 0 .
(ii) The mean score of college degree holders is higher than
that of the non-college participants, which can be
written 1 > 2 or 1  2 > 0.
The null and alternative hypotheses are:
H0 : 1  2 = 0 (1 is equal to 2)
H1 : 1  2 > 0 (1 is greater than 2)
Note that we can also write the null hypothesis as 1  2  0,
which states that the mean score of college participants is
lower than or equal to the mean score of non-college partici-
pants.
Step 2. Select the distribution to use
Because n1 > 30 and n2 > 30, both sample sizes are large.
As a result, we use the normal distribution to make the test.
Step 3. Determine the rejection and non-rejection regions
The significance level is .05. The > sign in the
alternative hypothesis indicates that the test is right-tailed.
Consequently, the critical value of z is 1.65, as shown in
Figure 6.3.
Step 4. Calculate the value of the test statistic
The value of the test statistic z for x1 - x2 is computed as
follows.
-301-
Figure 6.3.

Step 5. Make a decision


Because the value of the test statistic z = 3.02 for x1  x2
falls in the rejection region, we reject the null hypothesis H0 .
Therefore, we conclude that the mean score in the area of
motivation for advancement is higher for those who hold a
college degree than for those who do not hold a college
degree.

6.2 INFERENCES ABOUT THE DIFFERENCE


BETWEEN TWO POPULATION MEANS
FOR SMALL AND INDEPENDENT
SAMPLES: EQUAL STANDARD
DEVIATIONS
Many times, due to either budget constraint or the
nature of the populations, it may not be possible to take large
samples to make inferences about the difference between two
population means. This section discusses how to make a
confidence interval and test a hypothesis about the difference
between two population means when the samples are small
(n1 < 30 and n2 < 30) and independent. Our main assumption
-302-
in this case is that the two populations from which the two
samples are drawn are (approximately) normally distributed.
If this assumption is true, and we know the population
standard deviations, we can still use the normal distribution
to make inferences about 1  2 when samples are small
and independent. However, we usually do not know the
population standard deviations 1 and 2. In such cases, we
replace the normal distribution by the t-distribution to make
inferences about 1  2 for small and independent samples.
We will make one more assumption in this section that the
standard deviations of the two populations are equal. In
other words, we assume that although 1 and 2 are
unknown, they are equal.

WHEN TO USE THE t DISTRIBUTION TO MAKE


INFERENCES ABOUT 1  2
The t distribution is used to make inferences about 1  2
when the following assumptions hold true.
1. The two populations from which the two samples are
drawn are (approximately) normally distributed.
2. The samples are small (n1 < 30 and n2 < 30) and
independent.
3. The standard deviations 1 and 2 of the two populations
are unknown but they are equal, that is, 1 = 2 = .

When the standard deviations of the two populations are


equal, we can use  for both 1 and or 2. Since  is
unknown, we replace it by its point estimator sp , which is
called the pooled sample standard deviation (hence, the
subscript p). The value of sp is computed by using the
information from the two samples as follows.
-303-

THE POOLED STANDARD DEVIATION FOR TWO


SAMPLES
The pooled standard deviation for two samples is computed as

where n1 and n2 are the sizes of the two samples and s 1 and s2
are the variances of the two samples.

In this formula, n1  1 are the degrees of freedom for sample


1, n2  1 are the degrees of freedom for sample 2 , and n1+
n2  2 are the degrees of freedom for the two samples taken
together.
When sp is used as an estimator of  , the standard
deviation  x1  x2 of x1  x2 is estimated by sx1  x2. The value of
sx1  x2 is calculated by using the following formula.
Estimator of the standard deviation of x1  x2
Estimator of the standard deviation of x1  x2

Now we are ready to discuss the procedures that are


used to make confidence intervals and test hypothesis about
1  2 for small and independent samples selected from two
populations with unknown but equal standard deviations.

6.2.1 INTERVAL ESTIMATION OF 1  2


As was mentioned earlier, the difference between the
two sample means x1 - x2 is the point estimator of the
-304-

difference between the two population means 1  2 . The


following formula gives the confidence interval for 1  2
when the t distribution is used.

CONFIDENCE INTERVAL FOR 1  2


The (1  ) 100% confidence interval for 1  2 is
( x1  x2 ) + t sx1  x2
where the value of t is obtained from the t distribution table
for the given confidence level and n1 + n2  2 degrees of
freedom, and sx1  x2 is calculated as explained earlier in
Section 6.2.

Example 6-7 describes the procedure to make a


confidence interval for 1  2 using the t distribution.
Example 6-7 A consumer agency wanted to estimate the
difference in the mean amounts of caffeine in two brands of
coffee. The agency took a sample of 15 one-pound jars of
Brand I coffee that showed the mean amount of caffeine in
these jars to be 80 milligrams per jar with a standard
deviation of 5 milligrams. Another sample of 12 one-pound
jars of Brand II coffee gave a mean amount of caffeine equal
to77 milligrams per jar with a standard deviation of 6
milligrams. Construct a 95% confidence interval for the
difference between the mean amounts of caffeine in one-
pound jars of these two brands of coffee. Assume that the two
populations are normally distributed and that the standard
deviations of the two populations are equal.
Solution Let 1 and 2 be the mean amounts of caffeine
per jar in all one-pound jars of Brands I and II ,
respectively, and let x1 and x2 be the means of the two
respective samples. From the given information,
-305-
nl = 15 , x1 = 80 milligrams , s1 = 5 milligrams
n2 = 12 , x2 = 77 milligrams , s2 = 6 milligrams
The confidence level is 1  = .95
First we calculate the standard deviation of x1  x2 as
follows:

Next, to find the t value from the t distribution table, we


need to know the area in each tail of the t distribution curve
and the degrees of freedom.
Area in each tail = /2 = .5  (.95/2) = .025
Degrees of freedom = n1 + n2 - 2 = 15 + 12 2 = 25
The t value for df = 25 and .025 area in the right tail of
the t distribution curve is 2.060. The 95% confidence interval
for 1  2 is
( x1  x2 ) ± t sx1 = (80 - 77) ± 2.060 (2.1157)
x2
= 3 ± 4.36 =  1.36 to 7.36 milligrams
Thus, with 95% confidence we can state that based on
these two sample results, the difference in the mean amounts
of caffeine in one-pound jars of these two brands of coffee lies
between  1.36 and 7.36 milligrams. Because the lower limit
of the interval is negative, it is possible that the mean amount
of caffeine is greater in the second brand than in the first
brand of coffee.
Note that the value of x1 - x2 , which is 80  77 = 3,
gives the point estimate of 1  2 .
-306-

6.2.2 HYPOTHESIS TESTING ABOUT 1  2


When the three assumptions mentioned in Section 6.2
are satisfied, then the t distribution is applied to make a
hypothesis test about the difference between two population
means. The test statistic in this case is t, which is calculated as
follows.

TEST STATISTIC t FOR x1 - x2


The value of the test statistic t for x1 - x2 is computed as

The value of 1  2 in this formula is substituted from the


null hypothesis and sx1  x2 is calculated as explained in
Section 6.2.

Examples 6-8 and 6-9 illustrate how a test of hypothesis


about the difference between two population means for small
and independent samples that are selected from two
populations with equal standard deviations is conducted
using the t- distribution.

Example 6-8 A sample of 14 cans of Brand I diet soda gave


the mean number of calories of 23 per can with a standard
deviation of 3 calories. Another sample of 16 cans of Brand
II diet soda gave the mean number of calories of 25 per can
with a standard deviation of 4 calories. At the 1%
significance level, can you conclude that the mean numbers of
calories per can are different for these two brands of diet
soda? Assume that the calories per can of diet soda are
-307-
normally distributed for each of the two brands and that the
standard deviations for the two populations are equal.
Solution Let 1 and 2 be the mean number of calories per
can for diet soda of Brand I and Brand II, respectively, and
let x1 and x2 be the means of the respective samples. From
the given information,
n1 = 14 , xl = 23 , s1 = 3
n2 = 16 , x2 = 25 , s2 = 4
The significance level is α = .01
Step 1. State the null and alternative hypothesis
We are to test for the difference in the mean number of
calories per can for the two brands. The null and alternative
hypotheses are:-
H0 :12 = 0 (the mean number of calories are not different)
H1 : 1  2  0 (the mean number of calories are different) .

Step 2. Select the distribution to use


The two populations are normally distributed, the
samples are small and independent, and the standard
deviations of the two populations are unknown but equal.
Consequently, we will use the t distribution.
Step 3. Determine the rejection and non-rejection regions
The  sign in the alternative hypothesis indicates that
the test is two-tailed. The significance level is .01. Hence,
Area in each tail = /2 = .01/2 = .005
Degrees of freedom = n1 + n2  2 = 14 + 16  2 = 28
The critical values of t for df = 28 and .005 area in each
tail of the t distribution curve are  2.763 and 2.763, as shown
in Figure 6.4.
Step 4. Calculate the value of the test statistic
The value of the test statistic t for x1 - x2 is computed as
follows.
-308-

Figure 6.4

Step 5. Make a decision


Because the value of the test statistic t = 1.531 for x1  x2
falls in the non-rejection region, we fail to reject the null
hypothesis. Consequently we conclude that there is no dif-
ference in the mean number of calories per can for the two
brands of diet soda. The difference in x1 and x2 observed for
the two samples may have occurred due to sampling error
only.
Example 6-9 A sample of 15 children from Cairo showed that
the mean time they spend watching television is 28.5 hours
per week with a standard deviation of 4 hours. Another
sample of 16 children from Sohag showed that the mean time
-309-
spent by them watching television is 23.25 hours per week
with a standard deviation of 5 hours. Using a 2.5%
significance level, can we conclude that the mean time spent
watching television by children in Cairo is higher than that
for children in Sohag ? Assume that the time spent watching
television by children has a normal distribution for both
populations and that the standard deviations for the two
populations are equal.
Solution Let the children from Cairo be referred to as
population 1 and those from Sohag as population 2. Let 1
and 2 be the mean time spent watching television by
children in populations 1 and 2, respectively, and let x1 and
x2 be the mean time spent watching television by children in
respective samples. From the given information,
For Cairo: nl = 15 , xl = 28.50 hours , s1 = 4 hours
For Sohag: n2 = 16 , x2 = 23.25 hours , s2 = 5 hours
The significance level is  = .025.
Step 1. State the null and alternative hypothesis
The two possible decisions are :
(i) The mean time spent watching television by children in
Cairo is not higher than that for children in Sohag. This
can be written as 1 = 2 or 1  2 = 0 .
(ii) The mean time spent watching television by children in
Cairo is higher than that for children in Sohag. This can
be written as 1 > 2 or 1  2 > 0.
Hence, the null and alternative hypotheses are
H0 : 1  2 = 0
H1 : 1  2 > 0
Note that the null hypothesis can also be written 1  2 ≤ 0.
Step 2. Select the distribution to use
-310-
The two populations are normally distributed, the
samples are small and independent, and the standard
deviations of the two populations are unknown but equal.
Consequently, we use the t- distribution to make the test.
Step 3. Determine the rejection and non-rejection regions
The > sign in the alternative hypothesis indicates that
the test is right-tailed. The significance level is .025.
Area in the right tail of the t distribution =  = .025
Degrees of freedom = n1+ n2  2 = 15 + 16  2 = 29
From the t distribution table, the critical value of t for df = 29
and .025 area in the right tail of the t distribution is 2.045.
This value is shown in Figure 6.5.

Figure 6.5

Step 4. Calculate the value of the test statistic


The value of the test statistic t for x1 - x2 is computed as
follows.
-311-

Step 5. Make a decision


Because the value of the test statistic t = 3.214 for x1 x2
falls in the rejection region, we reject the null hypothesis H0 .
Hence, we conclude that children in Cairo State spend, on
average, more time watching TV than children in Sohag.

6.3 INFERENCES ABOUT THE DIFFERENCE


BETWEEN TWO POPULATION MEANS
FOR PAIRED SAMPLES
Sections 6.1, and 6.2, were concerned with estimation
and hypothesis testing about the difference between two
population means when the two samples were drawn
independently from two different populations. This section
describes estimation and hypothesis-testing procedures for
the difference between two population means when the
samples are dependent.

In a case of two dependent samples, two data values -


one for each sample - are collected from the same source (or
element) and, hence, these are also called paired or matched
samples. For example, we may want to make inferences about
the mean weight loss for members of a health club after they
-312-
have gone through an exercise program for a certain period.
To do so, suppose we select a sample of 15 members of this
health club and record their weights before and after the
program. In this example, both sets of data are collected from
the same 15 persons, once before and once after the program.
Thus, although there are two samples, they contain the same
15 persons. This is an example of paired (or dependent or
matched) samples. The procedures to make confidence
intervals and test hypotheses in the case of paired samples
are different from the ones for independent samples
discussed in earlier sections of this chapter.

As another example of paired samples, suppose an


agronomist wants to measure the effect of a new brand of
fertilizer on the yield of potatoes. To do so, he selects 10
pieces of land and divides each piece of land into two
portions. Then he randomly assigns one of the two portions
from each piece of land to grow potatoes without using
fertilizer (or using some other brand of fertilizer). The second
portion from each piece of land is used to grow potatoes
using the new brand of fertilizer. Thus, he will have 10 pairs
of data values. Then, using the procedure to be discussed in
this section, he will make inferences about the difference in
the mean yields of potatoes with the new fertilizer and
without it.
The question arises, why does the agronomist not
choose 10 pieces of land on which to grow potatoes without
using the new brand of fertilizer and another 10 pieces of
land to grow potatoes by using the new brand of fertilizer? If
he does so, the effect of the fertilizer might be confused with
the effects due to soil differences at different locations. Thus,
he will not be able to isolate the effect of the new brand of
fertilizer on the yield of potatoes. Consequently, the results
will not be reliable. By choosing 10 pieces of land and then
-313-
dividing each of them into two portions, the researcher
decreases the possibility that the difference in the
productivities of different pieces of land affects the results.

PAIRED OR MATCHED SAMPLES


Two samples are said to be paired or matched samples when
for each data value collected from one sample there is a
corresponding data value collected from the second sample,
and both these data values are collected from the same
source.

In paired samples, the difference between the two data


values for each element of the two samples is denoted by d.
This value of d is called the paired difference. We then treat
all the values of d as one sample and make inferences
applying procedures similar to the ones used for one-sample
cases in Chapters 5 and 6. Note that as each source (or
element) gives a pair of values (one for each of the two data
sets), each sample contains the same number of values. That
is, both samples are of the same size. Therefore, we denote
the (common) sample size by n , which gives the number of
paired difference values denoted by d. The degrees of
freedom for the paired samples are n - 1. Let

d = the mean of the paired differences for the population.


d = the standard deviation of the paired differences for the
population.
d = the mean of the paired differences for the sample.
sd = the standard deviation of the paired differences for the
sample.
n = the number of paired difference values.
-314-

MEAN AND STANDARD DEVIATION OF THE


PAIRED DIFFERENCES FOR SAMPLES
The values of d and sd are calculated as

In paired samples, instead of using x1 x2 as the sample


statistic to make inferences about 1  2 , we use the sample
statistic d to inferences about d make actually the value of
d is always equal to x1 - x2 , and the value of d is always
equal to 1  2 .

SAMPLING DISTRIBUTION, MEAN, AND STANDARD


DEVIATION OF d
If the number of paired values is large (n  30), because of
the central limit theorem the sampling distribution of d is
approximately normal with its mean and standard deviation
as

In cases when n  30, the normal distribution can be


used to make inferences about d
However, in cases of paired samples, the sample sizes
are usually small and d is unknown. In such cases, assuming
that the paired differences for the population are (approxi-
mately) normally distributed, the normal distribution is
-315-

replaced by the t distribution to make inferences about d .


When d is not known, the standard deviation of d is
estimated by sd = sd / n

ESTIMATE OF THE STANDARD DEVIATION OF


PAIRED DIFFERENCES
If
1. n is less than 30
2. d is not known
3. the population of paired differences is (approximately)
normally distributed
Then the t distribution is used to make inferences about
d . The standard d deviation of d is estimated by sd
which is calculated as

Sections 6.3.1 and 6.3.2 describe the procedures to


make a confidence interval and test a hypothesis about d
when d is unknown and n is small . The inferences are
made using the t distribution. However, if n is large, even if
d is unknown, the normal distribution can be used to make
inferences about d .

6.3.1 INTERVAL ESTIMATION OF d


The mean d of paired differences for paired samples is
the point estimator of d . The following formula is used to
construct a confidence interval for d in the case of
(approximately) normally distributed populations.
-316-

CONFIDENCE INTERVAL FOR d


The (1  ) 100% confidence interval for d is

d ± t sd
Where the value of t is obtained from the t distribution table
for the given confidence level and n  1 degrees of freedom,
and sd is calculated as explained earlier.

Example 6-10 illustrates the procedure to construct a


confidence interval for d .
Example 6-10: A researcher wanted to find the effect of a
special diet on systolic blood pressure. She selected a sample
of seven adults and put them on this dietary plan for 3
months. The following table gives the systolic blood pressures
of these seven adults before and after the completion of this
plan.
Before 210 180 195 220 231 199 224
After 193 186 186 223 220 183 233
Let d be the mean reduction in the systolic blood pressures
due to this special dietary plan for the population of all
adults. Construct a 95% confidence interval for d . Assume
that the population of paired differences is (approximately)
normally distributed.
Solution Because the information obtained is from paired
samples, we will make the confidence interval for the paired
difference mean d of the population using the paired
difference mean d of the sample. Let d be the difference in the
systolic blood pressure of an adult before and after this
special dietary plan. Then, d is obtained by subtracting the
systolic blood pressure after the plan from the systolic blood
pressure before the plan. The third column of Table 6.1 lists
-317-
the values of d for the seven adults. The fourth column of the
table records the values of d 2, which are obtained by
squaring each of the d values.
The values of d and sd are calculated as follows:

Hence, the standard deviation of d is

For the 95% confidence interval , the area in each tail of


the t distribution curve is
Area in each tail = /2 = 0.5 - (0.95/2) = 0.025
The degrees of freedom are df = n  1 = 7  1 = 6
From the t distribution table, the t value for df = 6 and .025
area in the right tail of the t distribution curve is 2.447.
Therefore, the 95% confidence interval for d is:

d ± t sd = 5.00 ± 2.447 (4.0766)


= 5.00 ± 9.98 =  4.98 to 14.98

Thus, we can state with 95% confidence that the mean


difference between systolic blood pressures before and after
the given dietary plan for all adult participants is between 
4.98 and 14.98.
-318-
Table 6.1

6.3.2 HYPOTHESIS TESTING ABOUT d


A hypothesis about d is tested by using the sample
statistic d . If n is 30 or larger, we can use the normal
distribution to test a hypothesis about d . However, if n is
less than 30, we replace the normal distribution by the t
distribution. To use the t distribution, we assume that the
population of all paired differences is (approximately)
normally distributed and that the population standard
deviation d of paired differences in not known. This section
illustrates the case of the t distribution only. The following
formula is used to calculate the value of the test statistic t
when testing a hypothesis about d .

TEST STATISTIC t FOR µd


The value of the test statistic t for d is computed as follows.
-319-
The critical value of t is found from the t distribution
table for the given significance level and n  1 degrees of
freedom.
Examples 6-11 and 6-12 illustrate the hypothesis-testing
procedure for d .
Example 6-11 A company wanted to know if attending a
course on "how to be a successful salesperson" can increase
the average sales of its employees. The company sent six of its
salespersons to attend this course. The following table gives
the one-week sales of these salespersons before and after they
attended this course.
Before 12 18 25 9 14 16
After 18 24 24 14 19 20
Using the 1% significance level, can you conclude that the
mean weekly sales for all salespersons increase as a result of
attending this course? Assume that the population of paired
differences has a normal distribution.
Solution Because the data are for paired samples, we test a
hypothesis about the paired differences mean d of the
population using the paired differences mean d of the sample.
Let
d = (weekly sales before the course)  (weekly sales after the course)
In Table 6.2 , we calculate d for each of the six salespersons
by subtracting the sales after the course from the sales before
the course. The fourth column of the table lists the value of
d2. The values of d and sd are calculated as follows.
-320-

Table 6.2

The standard deviation of d is

Step 1. State the null and alternative hypothesis


We are to test if the mean weekly sales for all
salespersons increase as a result of taking the course. Let 1
be the mean weekly sales for all salespersons before the
course and 2 be the mean weekly sales for all salespersons
after the course. Then d = 1  2. The mean weekly sales
for all salespersons will increase due to attending the course
if 1 is less than 2 , which can be written as 1  2 < 0 or
d < 0. Consequently, the null and alternative hypotheses are:
-321-

H0 : d = 0 (12=0 or the mean weekly sales do not increase)


H1 : d < 0 (12 < 0 or the mean weekly sales do increase)
Note that we can also write the null hypothesis as d  0.
Step 2. Select the distribution to use
The sample size is small (n < 30) , the population of
paired differences is normal, and d is unknown. Therefore,
we use the t distribution to conduct the test.
Step 3. Determine the rejection and non-rejection regions
The < sign in the alternative hypothesis indicates that the
test is left-tailed. The significance level is .01. Hence,
Area in left tail = α = .01
Degrees of freedom = n  l = 6  l = 5
The critical value of t for df = 5 and .01 area in left tail of
the t distribution curve is  3.365. This value is shown in
Figure 6.6.
Step 4. Calculate the value of the test statistic
The value of the test statistic t for d is computed as
follows.

Step 5. Make a decision


Because the value of the test statistic t =  3.870 for d
falls in the rejection region, we reject the null hypothesis.
Consequently, we conclude that the mean weekly sales for all
salespersons increase as a result of this course.
-322-

Figure 6.6.

Example 6-12 Refer to Example 6-10. The table that gives the
blood pressures of seven adults before and after the
completion of a special dietary plan is reproduced below.
Before 210 180 195 220 231 199 224
After 193 186 186 223 220 183 233
Let d be the mean of the differences between the systolic
blood pressures before and after completing this special
dietary plan for the population of all adults. Using the 5%
significance level, can we conclude that the mean of the
paired differences d is different from zero? Assume that the
population of paired differences is (approximately) normally
distributed.
Solution Table 6.1 gives d and d2 for each of the seven adults.
The values of d and sd are calculated as follows.
-323-

Hence the standard deviation of d is

Step 1 State the null and alternative hypothesis


H0: d = 0
(the mean of the paired differences is not different from zero)
H1: d  0
(the mean of the paired differences is different from zero)
Step 2. Select the distribution to use
Because the sample size is small, the population of paired
differences is (approximately) normal, and d is not known.
Therefore, we use the t distribution to make the test.
Step 3. Determine the rejection and non-rejection regions
The ≠ sign in the alternative hypothesis indicates that
the test is two-tailed. The significance level is .05.
Area in each tail of the curve = /2 = .05/2 = .025
Degrees of freedom = n - 1 = 7 - 1 = 6
The two critical values of t for df = 6 and .025 area in each
tail of the t distribution curve are  2.447 and 2.447. These
values are shown in Figure 6.7.
Step 4. Calculate the value of the test statistic
The value of the test statistic t for d is computed as follows.
-324-

Step 5. Make a decision


Because the value of the test statistic t = 1.227 for d
falls in the non-rejection region, we fail to reject the null
hypothesis. Hence, we conclude that the mean of the
population paired differences is not different from zero. In
other words, we can state that the mean of the differences
between the systolic blood pressures before and after
completing this special dietary plan for the population of all
adults is not different from zero.

Figure 6.7.
-325-

6.4 INFERENCES ABOUT THE DIFFERENCE


BETWEEN TWO POPULATION
PROPORTIONS FOR LARGE AND
INDEPENDENT SAMPLES
Quite often we need to construct a confidence interval
and test a hypothesis about the difference between two
population proportions. For instance, we may want to
estimate the difference between the proportion of defective
items produced on two different machines. If P1 and P2 are
the proportions of defective items produced on the first and
second machine, respectively, then we are to make a
confidence interval for P1  P2 . Or we may want to test the
hypothesis that the proportion of defective items produced on
Machine I is different from the proportion of defective items
produced on Machine II. In this case, we are to test the null
hypothesis P1  P2 = 0 against the alternative hypothesis P1
 P2  0 .
This section discusses how to make a confidence
interval and test a hypothesis about P1  P2 for two large and
independent samples. The sample statistic used to make
inferences about P1 P2 is p1  p2 where p1 and p2 are the
proportions for two large and independent samples. As
discussed in Chapter 3, we determine a sample proportion by
dividing the number of elements in the sample with a given
attribute by the sample size. Thus,

p1 = x1 / n1 and p2 = x2 / n2

Where x1 and x2 are the number of elements with a given


characteristic in the two samples and n1 and n2 are the sizes
of the two samples, respectively.
-326-

6.4.1 MEAN, STANDARD DEVIATION, AND


SAMPLING DISTRIBUTION OF p1 p2
As discussed in Chapter 3, for a large sample the
sample proportion p is (approximately) normally distributed
with mean P and standard deviation  PQ /n. Hence, for two
large and independent samples of sizes n1 and n2 ,
respectively, their sample proportions p1 and p2 are
(approximately) normally distributed with means P1 and P2
and standard deviationsP1Q1 / n1andP2Q2 /n2 respectively.
Using these results, we can make the following statements
about the shape of the sampling distribution of p1  p2 and its
mean and standard deviation.
Thus, to construct a confidence interval and test a
hypothesis about P1  P2 for large and independent
samples, we use the normal distribution. As was indicated in
Chapter 4, in the case of proportion the sample is large if np
and nq are both greater than 5. In the case of two samples,
both sample sizes will be large if n1p1 , n1q1 , n2 p2 , and
n2q2 are all greater than 5.

MEAN, STANDARD DEVIATION, AND SAMPLING


DISTRIBUTION OF p1 - p2
For two large and independent samples, the sampling
distribution of p1  p2 is (approximately) normal with its
mean and standard deviation as
μp1- p2 = P1 – P2

σ p1- p2 = √ {P1Q1/ n1} + {P2Q2/ n2}

where Q1 = 1 - P 1 and Q2 = 1 – P2
-327-

6.4.2 INTERVAL ESTIMATION OF P1 - P2


The difference between two sample proportions pl  p2 is
the point estimator for the difference between two population
proportions P1  P2 . Because we do not know P1 and P2 when
we are making a confidence interval for P1  P2 , we cannot
calculate the value of p1  p2 . Therefore, we use sp1  p2 as the
point estimator of p1  p2 in the interval estimation. We
construct the confidence interval for Pl  P2 using the
following formula.

CONFIDENCE INTERVAL FOR P1 P2


The (1  ) 100% confidence interval for P1  P2 is
p1  p2 ± z sp1  p2

where the value of z is read from the normal distribution


table for the given confidence level , and sp1  p2 is
calculated as

s p1- p2 = √ { p1q1/ n1} +{ p2q2/ n2 }

Example 6-13 describes the procedure to make a confidence


interval for the difference between two population
proportions for large samples.

Example 6-13 A researcher wanted to estimate the difference


between the percentages of users of two toothpastes who will
never switch to another toothpaste. In a sample of 500 users
of Toothpaste A taken by this researcher, 100 said that they
will never switch to another toothpaste. In another sample of
400 users of Toothpaste B taken by the same researcher, 68
said that they will never switch to another toothpaste.
-328-

(a) Let P1 and P2 be the proportions of all users of


Toothpastes A and B, respectively, who will never switch
to another toothpaste. What is the point estimate of P1 
P2 ?
(b) Construct a 97% confidence interval for the difference
between the proportions of users of the two toothpastes
who will never switch.
Solution Let P1 and P2 be the proportions of all users of
Toothpastes A and B, respectively, who will never switch to
another toothpaste and let p1 and p2 be the respective sample
proportions. Let x1 and x2 be the number of users of
Toothpastes A and B, respectively, in the two samples who
said that they will never switch to another toothpaste. From
the given information;
Toothpaste A: n1 = 500 and xl = 100
Toothpaste B: n2 = 400 and x2 = 68
The two sample proportions are calculated as follows.
p1 = x1 /n1 = 100 / 500 = .20
p2 = x2 /n2 = 68 / 400 = .17
Then,
q1 = 1  .20 = .80 and q2 = 1  .17 = .83
(a) The point estimate of P1  P2 is as follows
The point estimate of P1  P2 = p1  p2 = .20  .17 = .03
(b) The values of n1P1 , n1Q1 , n2 P2 , and n2Q2 are:
n1p1 = 500 ( .20 ) = 100 , n1q1 = 500 ( .80 ) = 400,
n2p2 = 400 ( .17 ) = 68 , n2q2 = 400 ( .83 ) = 332,
Since each of these values is greater than 5, both sample sizes
are large. Consequently we use the normal distribution to
make a confidence interval for Pl  P2.
-329-

The standard deviation pl  p2 is

sp1- p2 = √ (p1q1/ n1) + ( p2q2/ n2)

= √{(0.20) (0.80) / 500} +{ (0.17) (0.83) / 400}


= 0.0259
The z value for a 97% confidence level, obtained from
the normal distribution table for .97/2 = .4850, is 2.17. The
97% confidence interval for P1  P2 is
p1  p2 ± z sp1  p2 = (0.20 – 0.17) ± 2.17(0.0259)

= .03 ± .056 =  .026 to .086


Thus, with 97% confidence we can state that the
difference between the two population proportions
is between  .026 and .086.

6.4.3 HYPOTHESIS TESTING ABOUT Pl - P2


In this section we learn how to test a hypothesis
about P1  P2 for two large and independent samples. The
procedure involves the same five steps that we have used
previously. Once again, we calculate the standard deviation
of p1  p2 as

σ p1- p2 = √ (P1Q1/ n1) + (P2Q2/ n2)

where Q1 = 1 - P 1 and Q2 = 1 – P2
When a test of hypothesis about P1  P2 is performed, usually
the null hypothesis is P1 = P2 and the values of P1 and P2 are
not known. Assuming that the null hypothesis is true and P1 =
P2 , a common value of P1 and P2, denoted by p , is calculated
by using one of the following formulas.
-330-

p = (x1 + x2) / (n1 + n2) or


= (n1 p1 + n2 p2) / (n1 + n2)
Which of these formulas is used depends on whether the
values of x1 and x2 or the values of p1 and p2 are known. Note
that x1 and x2 are the number of elements in each of the two
samples that possess a certain characteristic. This value of p
is called the pooled sample proportion. Using the value of the
pooled sample proportion, we compute an estimate of the
standard deviation of p1  p2 :

s p1- p2 = √ p q ( 1/ n1 + 1/ n2 )

Where q = 1 – p

TEST STATISTIC Z FOR P1  P2


The value of the test statistic z for p1  p2 is calculated as:
z = [ (p1  p2) – (P1  P2) ] / sp1- p2
The value of P1  P2 is substituted from H0 . which usually
is zero

Examples 6-14 and 6-15 illustrate the procedure to test


hypotheses about the difference between two population
proportions for large samples.

Example 6-14 Reconsider Example 6-13 about the


percentages of users of two toothpastes who will never switch
to another toothpaste. At the 1% significance level, can we
conclude that the proportion of users of Toothpaste A who
will never switch to another toothpaste is higher than the
proportion of users of Toothpaste B who will never switch to
another toothpaste?
-331-

Solution: Let P1 and P2 be the proportions of all users of


Toothpastes A and B , respectively, who will never switch to
another toothpaste and let p1 and p2 be the corresponding
sample proportions. Let x1 and x2 be the number of users of
Toothpastes A and B, respectively, in the two samples who
said that they will never switch to another toothpaste. From
the given information,
Toothpaste A: nl = 500 and xl = 100
Toothpaste B: n2 = 400 and x2 = 68

The significance level is  = .01.


The two sample proportions are calculated as follows:
p1 = x1 /n1 = 100 / 500 = .20
p2 = x2 /n2 = 68 / 400 = .17

Step 1. State the null and alternative hypothesis


We are to test if the proportion of users of Toothpaste
A who will never switch to another toothpaste is higher than
the proportion of users of Toothpaste B who will never switch
to another toothpaste. In other words, we are to test whether
P1 is greater than P2. This can be written as P1  P2 > 0. Thus,
the two hypotheses are:
H0 : P1  P2 = 0 ( P1 is not greater than P2 )
H1 : P1  P2 > 0 ( P1 is greater than P2 )

Step 2. Select the distribution to use


As shown in Example 6-13, n1p1 , n1q1 , n2p2 , and n2q2
are all greater than 5. Consequently both samples are large
and we apply the normal distribution to make the test.
Step 3. Determine the rejection and non-rejection regions
The > sign in the alternative hypothesis indicates that the test
is right-tailed. From the normal distribution table, for a .01
-332-
significance level, the critical value of z is 2.33. This is shown
in Figure 6.8.
Figure 6.8.

Step 4. Calculate the value of the test statistic


The pooled sample proportion is

p = (x1 + x2) / (n1 + n2)


= (100+68) /(500+400) = 0.187
and
q = 1 – p = 1 – .187 = .813
The estimate of the standard deviation of p1 – p2 is

s p1- p2 = √ p q ( 1/ n1 + 1/ n2 )

=√(0.187) (0.813)[1/500+1/400] = 0.0262


The value of the test statistic z for p1  p2 is
z = [ (p1  p2) – (P1  P2) ] / sp1- p2
= [( 0.20 – 0.17) – 0] / 0.262 =1.15
-333-

Step 5. Make a decision


Since the value of the test statistic z = 1.15 for p1  p2
falls in the non-rejection region, we fail to reject the null
hypothesis. Therefore, we conclude that the proportion of
users of Toothpaste A who will never switch to another
toothpaste is not greater than the proportion of users of
Toothpaste B who will never switch to another toothpaste.
Example 6-15 According to a survey conducted by the
National Center for Health Statistics, 60.8% of men and
67.5% of women in some countries acknowledged "a lot" or
"a moderate" amount of stress in their lives. Suppose these
results are based on samples of 1600 men and 1400 women.
Using the 1% significance level, can we conclude that the
proportions of men and women who experience "a lot" or "a
moderate" amount of stress in their lives are different?
Solution: Let P1 and P2 be the proportion of all men and
all women, respectively, who experience "a lot" or "a
moderate" amount of stress in their lives. Let p1 and p2 be
the corresponding sample proportions. From the given
information,
For men: n1 = 1600 and p1 = .608
For women: n2 = 1400 and p2 = .675
The significance level is  = .01.
Step 1. State the null and alternative hypothesis
The null and alternative hypotheses are
H0 :P1  P2 = 0
(the two population proportions are not different)
H1 P1 P2  0
:

(the two population proportions are different)

Step 2. Select the distribution to use


-334-
Because the samples are large and independent, we
apply the normal distribution to make the test.
(The reader should check that n1 p1, n1q1 , n2 p2 , and n2 q2
are all greater than 5.)
Step 3. Determine the rejection and non-rejection regions
The  sign in the alternative hypothesis indicates that
the test is two-tailed . For a 1% significance level, the critical
values of z are 2.58 and 2.58. These values are shown in
Figure 6.9.
Step 4. Calculate the value of the test statistic
The pooled sample proportion is
p = (n1 p1 + n2 p2) / (n1 + n2)
= [(1600(.608)+1400(.675)] /(1600+1400) = 0.639

And q = 1 – p = 1 -0 .639 = 0.361


The estimate of the standard deviation of p1 – p2 is

sp1- p2 = √ p q ( 1/ n1 + 1/ n2 )

=√(0.639) (0.361)[1/1600+1/1400] = 0.0176

The value of the test statistic z for p1  p2 is

z = [ (p1  p2) – (P1  P2) ] / sp1- p2


= [( 0.608 – 0.675) – 0] / 0.0176 = -3.81

Step 5. Make a decision


Since the value of the test statistic z =  3.81 for pl  p2
falls in the rejection region, we reject the null hypothesis. As
a result we can conclude that the proportion of men and
women who experience "a lot " or " a moderate" amount of
stress in their lives are different.
-335-

Figure 6.9
-336-

EXERCISES
1. Construct a 99% confidence interval for 1  2 for the
following.
n1 = 80 , x1 = 12.35 , s1 = 2.68
n2 = 65 , x2 = 16.40 , s2 = 2.90

2. Refer to Exercise 1. Test at the 1% significance level if the


two population means are different.

3. Refer to Exercise 1. Test at the 5% significance level if 1 is


less than 2.

4. Assuming that the two populations are normally distributed


with equal standard deviations, construct a 90% confidence
interval for 1  2 for the following.

n1 = 18 , x1 = 34.40 , s1 = 6.7
n2 = 22 , x2 = 26.50 , s2 = 7.1

5. Refer to Exercise 4. Test at the 5% significance level if the


two population means are different. :

6. Refer to Exercise 4. Test at the 2.5% significance level if


1 is greater than 2 .

7. Determine the confidence interval for d for each of the


following assuming that the population of paired differences
has a normal distribution.

.
-337-
8. Perform the stated test of hypothesis for each of the following
assuming that the population of paired differences has a
normal distribution.

9. Construct a 99% confidence interval for Pl  P 2 for the


following.
n1 = 250 , p1 = .37 , n2 = 340 , p2 = .31
10. Refer to Exercise 9. Test at the 1% significance level if the
two population proportions are different.
11. Refer to Exercise 9. Test at the 2% significance level if P1is
greater than P2 .
12. A mathematics proficiency test was given to 905
randomly selected 13- year-old students. The following
table gives the mean scores of male and female students
along with the standard deviations of the sample means

Male students x1 = 474.6 s1 = 6.4


Female students x2 = 473.2 s2 = 5.1

a. Construct a 99% confidence interval for the difference


between the two population means. Assume that the
two samples are large.
b. Test at the 5% significance level if the mean scores in
the mathematics proficiency test are different for all
male and all female 13-year-old students.
13. A consulting agency was asked by a large insurance
company to investigate if business majors were better
salespersons. A sample of 40 salespersons with a
-338-
business degree showed that they sold an average of 10
insurance policies per week with a standard deviation of
1.80 policies. Another sample of 45 salespersons with a
degree other than business showed that they sold an
average of 8.5 insurance policies per week with a
standard deviation of 1.35 policies.
a. Construct a 99% confidence interval for the difference
between the two population means.
b. Using the 1% significance level, can you conclude that
persons with a business degree are better salespersons
than those who have a degree in another area?
14. According to an estimate, the average earnings of female
workers who are not union members are 348 EGP per
week and those of female workers who are union
members are 467 EGP per week. Suppose that these
average earnings are calculated based on random
samples of 1500 female workers who are not union
members and 2000 female workers who are union
members. Further assume that the standard deviations
for these two samples are 30 EGP and 35 EGP,
respectively.
a. Construct a 95% confidence interval for the difference
between the two population means. .
b. Test at the 2.5% significance level if the mean weekly
earnings of female workers who are not union
members are less than those of female workers who
are union members.
15. A researcher wants to test if the mean GPAs (Grade
Point Averages) of all male and all female college
students who actively participate in sports are different.
She took a random sample of 28 male students and 24
female students who are actively involved in sports. She
found the mean GPAs of the two groups to be 2.62 and
-339-
2.74, respectively, with the corresponding standard
deviations equal to .43 and .38.
a. Test at the 5% significance level if the mean GPAs of
the two populations are different.
b. Construct a 90% confidence interval for the difference
between the two population means.
Assume that the GPAs of all male and all female
student athletes both have a normal distribution with
the same standard deviation.
16. An agency wanted to estimate the difference between the
auto insurance premiums paid by drivers insured with
two different insurance companies. A random sample of
25 drivers insured with insurance company A showed
that they paid an average monthly insurance premium
of 83 EGP with a standard deviation of 14 EGP. Another
random sample of 20 drivers insured with insurance
company B showed that these drivers paid an average
monthly insurance premium of 76 EGP with a standard
deviation of 12 EGP. Assume that the insurance
premiums paid by all drivers insured with companies A
and B are both normally distributed with equal standard
deviations.
a. Construct a 99% confidence interval for the difference
between the two population means.
b. Test at the 1% significance level if the mean monthly
insurance premium paid by drivers insured with
company A is higher than that of drivers insured
with company B.
17. A random sample of 28 children selected from families
with only one child gave a mean tolerance level of 2.4 (on
a scale of 1 to 8) with a standard deviation of .62.
Another random sample of 25 children selected from
-340-
families with more than one child gave a mean tolerance
level of 3.5 with a standard deviation of .47.
a. Construct a 99% confidence interval for the
difference between the two population means.
b. Test at the 5% significance level if the mean tolerance
level for children from families with only one child is
lower than that for children from families with more
than one child.
Assume that the tolerance levels for all children
in both groups have a normal distribution with the
same standard deviation.
18. A random sample of eight students was selected to test
for the effectiveness of hypnosis on their academic
performances. The following table gives the GPAs for
the semester before and the semester after the students
tried hypnosis.

Before 2.3 2.8 3.1 2.7 3.4 2.6 2.8 2.5


After 2.6 3.2 3.0 3.5 3.7 2.4 2.9 2.9
a. Construct a 99% confidence interval for the mean d
of the population paired differences where a paired
difference is defined as the difference between the
GPAs of a student before and after trying hypnosis.
b. Using a 2.5% significance level, can you conclude that
the academic performance of all students improves
due to hypnotism?
Assume that the population of paired differences is
(approximately) normally distributed.
19. A random sample of nine students was selected to test
for the effectiveness of a special course designed to
improve memory. The following table gives the results of
-341-
a memory test given to these students before and after
this course.
Before 43 57 48 65 71 49 38 69 58
After 49 56 55 77 79 57 36 64 69
a. Construct a 95% confidence interval for the mean d of
the population paired differences where a paired
difference is defined as the difference between the
memory test scores of a student before and after
attending this course.
b. Test at the 1% significance level if this course makes
any statistically significant improvement in the
memory of all students.
Assume that the population of the paired
differences has a normal distribution.
20. According to a survey of 1000 men and 900 women, 21%
of men and 28% of women read for fun almost every day.
a. Construct a 95% confidence interval for the difference
between the proportions of all men and all women who
read for fun every day.
b. Test at the 2% significance level if the proportions of all
men and all women who read for fun every day are
different.
21. According to the Bureau of Labor Statistics, of all
married couples with children under 18, the percentage in
which both spouses work was 57.9% in 1993 and 46.3%
in 1983. Suppose these results are based on samples of
2000 and 1600 such couples for 1993 and 1983,
respectively.
a. Construct a 99% confidence interval for the difference
between the two population proportions.
-342-
b. Test at the 2.5% significance level if the proportion of
all such couples in which both spouses work is higher
for 1993 than for 1983.
22. According to a National Cancer Institute survey, 32% of
children aged 6-11 and 26% of children aged 12-18 eat
less than one vegetable serving per day. Assume that this
survey included 800 children aged 6 - 11 and 900
children aged 12-18.
a. Construct a 99% confidence interval for the difference
between the two proportions of all children aged 6-11
and 12-18 who eat less than one vegetable serving per
day.
b. Test at the 1% significance level if the proportion of
all children aged 6-11 who eat less than one vegetable
serving per day is higher than the proportion of all
children aged 12-18 who eat less than one vegetable
serving per day.
23. According to a report published by the Education
Testing Center, the percentage of sixth graders who do
one hour or more of homework a day is 59% for a school
and 69% for another school. Suppose these results are
based on samples of 1500 and 1800 such students
selected from the two schools, respectively.
a. Construct a 97% confidence interval for the difference
between the proportions of all sixth graders in these
two schools who do one hour or more of homework a
day.
b. Using the 1% significance level, can you conclude that
the proportion of all sixth graders who do one hour
or more of homework a day is lower for the first
school than for the second school?
-343-

SELF-REVIEW TEST
1. To test the hypothesis that the mean blood pressure, of
university professors is lower than that of company
executives, which of the following would you use?
a. A left-tailed test
b. A two-tailed test
c. A right-tailed test
2. Briefly explain the meaning of independent and
dependent samples. Give one example of each of these
cases.
3. A company psychologist wanted to test if company
executives have job-related stress scores higher than those
of university professors. He took a sample of 40 executives
and 50 professors and tested them for job-related stress.
The sample of 40 executives gave a mean stress score of
7.6 with a standard deviation of .8. The sample of 50
professors produced a mean stress score of 5.4 with a
standard deviation of 1.3.
a. Construct a 99% confidence interval for the difference
between the mean stress scores of all executives and
all professors.
b. Test at the 2.5% significance level if the mean stress
score of all executives is higher than that of all
professors.
4. A sample of 20 employee mothers showed that they spend
an average of 2.3 hours per week playing with their
children with a standard deviation of .54 hours. A sample
of 25 nonemployee mothers gave a mean of 4.6 hours per
week with a standard deviation of .8 hours.
a. Construct a 95% confidence interval for the difference
between the mean time spent per week playing with
-344-
their children by all employee and all nonemployee
mothers.
b. Test at the 1 % significance level if the mean time spent
per week playing with their children by all employee
mothers is less than that of nonemployee mothers.
Assume that the times spent per week playing with
their children by all employee and all nonemployee
mothers both are normally distributed with equal but
unknown standard deviations.
5. The following table gives the number of items made in one
hour by seven randomly selected workers on two different
machines.

Worker 1 2 3 4 5 6 7
Machine I 15 18 14 20 16 18 21
Machine II 16 20 13 23 19 18 20
a. Construct a 99% confidence interval for the mean d
of the population paired differences where a paired
difference is equal to the number of items made by an
employee in one hour on Machine I minus the number
of items made by the same employee in one hour on
Machine II.
b. Test at the 5% significance level if the mean d of the
population paired differences is different from zero.
Assume that the population of paired differences is
(approximately) normally distributed.
6. A sample of 500 male registered voters showed that 51% of
them voted in an election. Another sample of 400 female
registered voters showed that 55% of them voted in the
same election.
a. Construct a 97% confidence interval for the difference
between the proportion of all male and all female
registered voters who voted in the election.
-345-

b. Test at the 1% significance level if the proportion of all


male voters who voted in the election is different from
that of all female votes.
-346-

Appendix
-347-
-348-

You might also like