Chapter 12
Chapter 12
Comparing Multiple
Proportions, Test of
Independence and
Goodness of Fit
CONTENTS APPENDIXES
STATISTICS IN PRACTICE: 12.1 CHI-SQUARE TESTS USING
UNITED WAY MINITAB
12.1 TESTING THE EQUALITY OF 12.2 CHI-SQUARE TESTS USING
POPULATION PROPORTIONS EXCEL
FOR THREE OR MORE
POPULATIONS
A Multiple Comparison Procedure
12.2 TEST OF INDEPENDENCE
12.3 GOODNESS OF FIT TEST
Multinomial Probability
Distribution
Normal Probability Distribution
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
510 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
STATISTICS in PRACTICE
UNITED WAY*
ROCHESTER, NEW YORK
United Way of Greater Rochester is a nonprofit orga-
nization dedicated to improving the quality of life for
all people in the seven counties it serves by meeting the
community’s most important human care needs.
The annual United Way/ Red Cross fund-raising
campaign funds hundreds of programs offered by more
than 200 service providers. These providers meet a
In Chapters 9, 10, and 11 we introduced methods of statistical inference for hypothesis tests
about the means, proportions, and variances of one and two populations. In this chapter,
we introduce three additional hypothesis-testing procedures that expand our capacity for
making statistical inferences about populations.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.1 Testing the Equality of Population Proportions for Three or More Populations 511
The test statistic used in conducting the hypothesis tests in this chapter is based on the
chi-square (!2) distribution. In all cases, the data are categorical. These chi-square tests are
versatile and expand hypothesis testing with the following applications.
1. Testing the equality of population proportions for three or more populations
2. Testing the independence of two categorical variables
3. Testing whether a probability distribution for a population follows a specific his-
torical or theoretical probability distribution
We begin by considering hypothesis tests for the equality of population proportions for
three or more populations.
the hypotheses for the equality of population proportions for k ≥ 3 populations are as
follows:
H0: p1 = p2 = . . . = pk
Ha: Not all population proportions are equal
If the sample data and the chi-square test computations indicate H0 cannot be rejected, we
cannot detect a difference among the k population proportions. However, if the sample data
and the chi-square test computations indicate H0 can be rejected, we have the statistical
evidence to conclude that not all k population proportions are equal; that is, one or more
population proportions differ from the other population proportions. Further analyses can
be done to conclude which population proportion or proportions are significantly different
from others. Let us demonstrate this chi-square test by considering an application.
Organizations such as J.D. Power and Associates use the proportion of owners likely to
repurchase a particular automobile as an indication of customer loyalty for the automobile.
An automobile with a greater proportion of owners likely to repurchase is concluded to
have greater customer loyalty. Suppose that in a particular study we want to compare the
customer loyalty for three automobiles: Chevrolet Impala, Ford Fusion, and Honda Accord.
The current owners of each of the three automobiles form the three populations for the
study. The three population proportions of interest are as follows:
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
512 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
Automobile Owners
Chevrolet Impala Ford Fusion Honda Accord Total
AutoLoyalty Likely to Yes 69 120 123 312
Repurchase No 56 80 52 188
Total 125 200 175 500
H0: p1 = p2 = p3
Ha: Not all population proportions are equal
To conduct this hypothesis test we begin by taking a sample of owners from each of
the three populations. Thus we will have a sample of Chevrolet Impala owners, a sample
of Ford Fusion owners, and a sample of Honda Accord owners. Each sample provides
categorical data indicating whether the respondents are likely or not likely to repurchase
the automobile. The data for samples of 125 Chevrolet Impala owners, 200 Ford Fusion
In studies such as these, we owners, and 175 Honda Accord owners are summarized in the tabular format shown in
often use the same sample Table 12.1. This table has two rows for the responses Yes and No and three columns, one
size for each population. corresponding to each of the populations. The observed frequencies are summarized in
We have chosen different
sample sizes in this example
the six cells of the table corresponding to each combination of the likely to repurchase
to show that the chi-square responses and the three populations.
test is not restricted to Using Table 12.1, we see that 69 of the 125 Chevrolet Impala owners indicated that
equal sample sizes for each they were likely to repurchase a Chevrolet Impala. One hundred and twenty of the 200
of the k populations. Ford Fusion owners and 123 of the 175 Honda Accord owners indicated that they were
likely to repurchase their current automobile. Also, across all three samples, 312 of the
500 owners in the study indicated that they were likely to repurchase their current auto-
mobile. The question now is how do we analyze the data in Table 12.1 to determine if the
hypothesis H0: p1 = p2 = p3 should be rejected?
The data in Table 12.1 are the observed frequencies for each of the six cells that repre-
sent the six combinations of the likely to repurchase response and the owner population. If
we can determine the expected frequencies under the assumption H0 is true, we can use the
chi-square test statistic to determine whether there is a significant difference between the
observed and expected frequencies. If a significant difference exists between the observed
and expected frequencies, the hypothesis H0 can be rejected and there is evidence that not
all the population proportions are equal.
Expected frequencies for the six cells of the table are based on the following rationale.
First, we assume that the null hypothesis of equal population proportions is true. Then we
note that in the entire sample of 500 owners, a total of 312 owners indicated that they were
likely to repurchase their current automobile. Thus, 312/500 = .624 is the overall sample
proportion of owners indicating they are likely to repurchase their current automobile. If
H0: p1 = p2 = p3 is true, .624 would be the best estimate of the proportion responding likely
to repurchase for each of the automobile owner populations. So if the assumption of H0 is
true, we would expect .624 of the 125 Chevrolet Impala owners, or .624(125) = 78 owners
to indicate they are likely to repurchase the Impala. Using the .624 overall sample proportion,
we would expect .624(200) = 124.8 of the 200 Ford Fusion owners and .624(175) = 109.2
of the Honda Accord owners to respond that they are likely to repurchase their respective
model of automobile.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.1 Testing the Equality of Population Proportions for Three or More Populations 513
Let us generalize the approach to computing expected frequencies by letting eij de-
note the expected frequency for the cell in row i and column j of the table. With this
notation, now reconsider the expected frequency calculation for the response of likely
to repurchase Yes (row 1) for Chevrolet Impala owners (column 1), that is, the expected
frequency e11.
Note that 312 is the total number of Yes responses (row 1 total), 175 is the total sam-
ple size for Chevrolet Impala owners (column 1 total), and 500 is the total sample size.
Following the logic in the preceding paragraph, we can show
Starting with the first part of the above expression, we can write
(Row 1 Total)(Column 1 Total)
e11 5
Total Sample Size
Generalizing this expression shows that the following formula can be used to provide the
expected frequencies under the assumption H0 is true.
Using equation (12.1), we see that the expected frequency of Yes responses (row 1) for
Honda Accord owners (column 3) would be e13 = (Row 1 Total)(Column 3 Total)/(Total
Sample Size) = (312)(175)/500 = 109.2. Use equation (12.1) to verify the other expected
frequencies are as shown in Table 12.2.
The test procedure for comparing the observed frequencies of Table 12.1 with the
expected frequencies of Table 12.2 involves the computation of the following chi-square
statistic:
(fij 2 eij)2
!2 5 oo
i j
eij
(12.2)
where
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
514 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
Automobile Owners
Chevrolet Impala Ford Fusion Honda Accord Total
Likely to Yes 78 124.8 109.2 312
Repurchase No 47 75.2 65.8 188
Total 125 200 175 500
Reviewing the expected frequencies in Table 12.2, we see that the expected frequency
is at least five for each cell in the table. We therefore proceed with the computation of the
chi-square test statistic. The calculations necessary to compute the value of the test statistic
are shown in Table 12.3. In this case, we see that the value of the test statistic is !2 = 7.89.
In order to understand whether or not !2 = 7.89 leads us to reject H0: p1 = p2 = p3,
you will need to understand and refer to values of the chi-square distribution. Table 12.4
shows the general shape of the chi-square distribution, but note that the shape of a specific
chi-square distribution depends upon the number of degrees of freedom. The table shows
the upper tail areas of .10, .05, .025, .01, and .005 for chi-square distributions with up to
15 degrees of freedom. This version of the chi-square table will enable you to conduct the
hypothesis tests presented in this chapter.
Since the expected frequencies shown in Table 12.2 are based on the assumption
that H0: p1 = p2 = p3 is true, observed frequencies, fij, that are in agreement with expected
frequencies, eij, provide small values of (fij −eij)2 in equation (12.2). If this is the case, the
value of the chi-square test statistic will be relatively small and H0 cannot be rejected. On
the other hand, if the differences between the observed and expected frequencies are large,
values of (fij − eij)2 and the computed value of the test statistic will be large. In this case,
the null hypothesis of equal population proportions can be rejected. Thus a chi-square test
The chi-square test for equal population proportions will always be an upper tail test with rejection of H0 oc-
presented in this section is
curring when the test statistic is in the upper tail of the chi-square distribution.
always a one-tailed test with
the rejection of H0 occurring We can use the upper tail area of the appropriate chi-square distribution and the
in the upper tail of the p-value approach to determine whether the null hypothesis can be rejected. In the automobile
chi-square distribution. brand loyalty study, the three owner populations indicate that the appropriate chi-square
TABLE 12.3 COMPUTATION OF THE CHI-SQUARE TEST STATISTIC FOR THE TEST OF EQUAL
POPULATION PROPORTIONS
Squared Difference
Observed Expected Squared Divided by
Likely to Automobile Frequency Frequency Difference Difference Expected Frequency
Repurchase? Owner ( fi j) (ei j) ( fij 2 ei j) ( fij 2 ei j)2 ( fij 2 ei j)2/eij
Yes Impala 69 78.0 −9.0 81.00 1.04
Yes Fusion 120 124.8 −4.8 23.04 0.18
Yes Accord 123 109.2 13.8 190.44 1.74
No Impala 56 47.0 9.0 81.00 1.72
No Fusion 80 75.2 4.8 23.04 0.31
No Accord 52 65.8 −13.8 190.44 2.89
Total 500 500 !2 = 7.89
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.1 Testing the Equality of Population Proportions for Three or More Populations 515
Area or
probability
0 !2
!2 = 7.89
We see the upper tail area at !2 = 7.89 is between .025 and .01. Thus, the corresponding
upper tail area or p-value must be between .025 and .01. With p-value ≤ .05, we reject
H0 and conclude that the three population proportions are not all equal and thus there
is a difference in brand loyalties among the Chevrolet Impala, Ford Fusion, and Honda
Accord owners. Minitab or Excel procedures provided in Appendix F can be used to show
!2 = 7.89 with 2 degrees of freedom yields a p-value = .0193.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
516 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
Instead of using the p-value, we could use the critical value approach to draw the same
conclusion. With " = .05 and 2 degrees of freedom, the critical value for the chi-square
test statistic is !2 = 5.991. The upper tail rejection region becomes
Reject H0 if !2 ≥ 5.991
With 7.89 ≥ 5.991, we reject H0. Thus, the p-value approach and the critical value approach
provide the same hypothesis-testing conclusion.
Let us summarize the general steps that can be used to conduct a chi-square test for the
equality of the population proportions for three or more populations.
H0: p1 = p2 = . . . = pk
Ha: Not all population proportions are equal
2. Select a random sample from each of the populations and record the observed
frequencies, fij, in a table with 2 rows and k columns
3. Assume the null hypothesis is true and compute the expected frequencies, eij
4. If the expected frequency, eij, is 5 or more for each cell, compute the test
statistic:
(fij 2 eij)2
2
! 5
i j
ooeij
5. Rejection rule:
where the chi-square distribution has k − 1 degrees of freedom and " is the
level of significance for the test.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.1 Testing the Equality of Population Proportions for Three or More Populations 517
population proportions exist. For this we will rely on a multiple comparison procedure that
can be used to conduct statistical tests between all pairs of population proportions. In the fol-
lowing, we discuss a multiple comparison procedure known as the Marascuilo procedure.
This is a relatively straightforward procedure for making pairwise comparisons of all pairs
of population proportions. We will demonstrate the computations required by this multiple
comparison test procedure for the automobile customer loyalty study.
We begin by computing the absolute value of the pairwise difference between sample
proportions for each pair of populations in the study. In the three-population automobile
brand loyalty study we compare populations 1 and 2, populations 1 and 3, and then popula-
tions 2 and 3 using the sample proportions as follows:
Chevrolet Impala and Ford Fusion
u p1 2 p2 u 5 u.5520 2 .6000u 5 .0480
In a second step, we select a level of significance and compute the corresponding critical
value for each pairwise comparison using the following expression.
where
!2" = chi-square with a level of significance " and k – 1 degrees of freedom
pi and pj = sample proportions for populations i and j
ni and nj = sample sizes for populations i and j
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
518 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
TABLE 12.5 PAIRWISE COMPARISON TESTS FOR THE AUTOMOBILE BRAND LOYALTY STUDY
Significant if
Pairwise Comparison z pi 2 pj z CVij u pi 2 pj u . CVij
Chevrolet Impala vs. Ford Fusion .0480 .1380 Not significant
Chevrolet Impala vs. Honda Accord .1509 .1379 Significant
Ford Fusion vs. Honda Accord .1029 .1198 Not significant
If the absolute value of any pairwise sample proportion difference u pi 2 pj u exceeds its
corresponding critical value, CVij, the pairwise difference is significant at the .05 level of
significance and we can conclude that the two corresponding population proportions are
different. The final step of the pairwise comparison procedure is summarized in Table 12.5.
The conclusion from the pairwise comparison procedure is that the only significant
difference in customer loyalty occurs between the Chevrolet Impala and the Honda Accord.
Our sample results indicate that the Honda Accord had a greater population proportion of
owners who say they are likely to repurchase the Honda Accord. Thus, we can conclude that
the Honda Accord (p3 5 .7029) has a greater customer loyalty than the Chevrolet Impala
( p1 5 .5520).
The results of the study are inconclusive as to the comparative loyalty of the Ford Fusion.
While the Ford Fusion did not show significantly different results when compared to the
Chevrolet Impala or Honda Accord, a larger sample may have revealed a significant differ-
ence between Ford Fusion and the other two automobiles in terms of customer loyalty. It is
not uncommon for a multiple comparison procedure to show significance for some pairwise
comparisons and yet not show significance for other pairwise comparisons in the study.
1. In Chapter 10, we used the standard normal distri- each population had a binomial distribution
bution and the z test statistic to conduct hypothesis with parameter p the population proportion of
tests about the proportions of two populations. Yes responses. An extension of the chi-square
However, the chi-square test introduced in this procedure in this section applies when each of
section can also be used to conduct the hypoth- the k populations has three or more possible re-
esis test that the proportions of two populations sponses. In this case, each population is said
are equal. The results will be the same under both to have a multinomial distribution. The chi-
test procedures and the value of the test statistic square calculations for the expected frequen-
!2 will be equal to the square of the value of the cies, eij, and the test statistic, !2, are the same
test statistic z. An advantage of the methodology as shown in expressions (12.1) and (12.2). The
in Chapter 10 is that it can be used for either a only difference is that the null hypothesis as-
one-tailed or a two-tailed hypothesis about the sumes that the multinomial distribution for the
proportions of two populations, whereas the chi- response variable is the same for all popula-
square test in this section can be used only for two- tions. With r responses for each of the k popu-
tailed tests. Exercise 12.6 will give you a chance lations, the chi-square test statistic has (r − 1)
to use the chi-square test for the hypothesis that (k − 1) degrees of freedom. Exercise 12.8 will
the proportions of two populations are equal. give you a chance to use the chi-square test to
2. Each of the k populations in this section had compare three populations with multinomial
two response outcomes, Yes or No. In effect, distributions.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.1 Testing the Equality of Population Proportions for Three or More Populations 519
Exercises
Methods
1. Use the sample data below to test the hypotheses
H0: p1 = p2 = p3
Ha: Not all population proportions are equal
where pi is the population proportion of Yes responses for population i. Using a .05 level
of significance, what is the p-value and what is your conclusion?
Populations
Response 1 2 3
Yes 150 150 96
No 100 150 104
Applications
3. The sample data below represent the number of late and on time flights for Delta, United,
and US Airways (Bureau of Transportation Statistics, March 2012).
Airline
Flight Delta United US Airways
Late 39 51 56
On Time 261 249 344
a. Formulate the hypotheses for a test that will determine if the population proportion of
late flights is the same for all three airlines.
b. Conduct the hypothesis test with a .05 level of significance. What is the p-value and
what is your conclusion?
c. Compute the sample proportion of late flights for each airline. What is the overall
proportion of late flights for the three airlines?
4. Benson Manufacturing is considering ordering electronic components from three different
suppliers. The suppliers may differ in terms of quality in that the proportion or percentage
of defective components may differ among the suppliers. To evaluate the proportion of
defective components for the suppliers, Benson has requested a sample shipment of 500
components from each supplier. The number of defective components and the number of
good components found in each shipment are as follows.
Supplier
Component A B C
Defective 15 20 40
Good 485 480 460
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
520 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
a. Formulate the hypotheses that can be used to test for equal proportions of defective
components provided by the three suppliers.
b. Using a .05 level of significance, conduct the hypothesis test. What is the p-value and
what is your conclusion?
c. Conduct a multiple comparison test to determine if there is an overall best supplier or
if one supplier can be eliminated because of poor quality.
5. Kate Sanders, a researcher in the department of biology at IPFW University, studied the
effect of agriculture contaminants on the stream fish population in Northeastern Indiana
(April 2012). Specially designed traps collected samples of fish at each of four stream
locations. A research question was, Did the differences in agricultural contaminants found
at the four locations alter the proportion of the fish population by gender? Observed
frequencies were as follows.
Stream Locations
Gender A B C D
Male 49 44 49 39
Female 41 46 36 44
a. Focusing on the proportion of male fish at each location, test the hypothesis that the
population proportions are equal for all four locations. Use a .05 level of significance.
What is the p-value and what is your conclusion?
b. Does it appear that differences in agricultural contaminants found at the four locations
altered the fish population by gender?
Exercise 6 shows a 6. A tax preparation firm is interested in comparing the quality of work at two of its regional
chi-square test can be offices. The observed frequencies showing the number of sampled returns with errors and
used when the hypothesis the number of sampled returns that were correct are as follows.
is about the equality of two
population proportions.
Regional Office
Return Office 1 Office 2
Error 35 27
Correct 215 273
a. What are the sample proportions of returns with errors at the two offices?
b. Use the chi-square test procedure to see if there is a significant difference between
the population proportion of error rates for the two offices. Test the null hypothesis
H0: p1 = p2 with a .10 level of significance. What is the p-value and what is your
conclusion? Note: We generally use the chi-square test of equal proportions when
there are three or more populations, but this example shows that the same chi-square
test can be used for testing equal proportions with two populations.
c. In the Section 10.2, a z test was used to conduct the above test. Either a !2 test statistic
or a z test statistic may be used to test the hypothesis. However, when we want to make
inferences about the proportions for two populations, we generally prefer the z test
statistic procedure. Refer to the Notes and Comments at the end of this section and
comment on why the z test statistic provides the user with more options for inferences
about the proportions of two populations.
7. Social networking is becoming more and more popular around the world. Pew Research
Center used a survey of adults in several countries to determine the percentage of adults
who use social networking sites (USA Today, February 8, 2012). Assume that the results
for surveys in Great Britain, Israel, Russia, and United States are as follows.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.2 Test of Independence 521
Country
Use Social Great United
Networking Sites Britain Israel Russia States
Yes 344 265 301 500
No 456 235 399 500
a. Conduct a hypothesis test to determine whether the proportion of adults using social
networking sites is equal for all four countries. What is the p-value? Using a .05 level
of significance, what is your conclusion?
b. What are the sample proportions for each of the four countries? Which country has
the largest proportion of adults using social networking sites?
c. Using a .05 level of significance, conduct multiple pairwise comparison tests among
the four countries. What is your conclusion?
Exercise 8 shows a 8. A manufacturer is considering purchasing parts from three different suppliers. The parts
chi-square test can also received from the suppliers are classified as having a minor defect, having a major defect,
be used for multiple or being good. Test results from samples of parts received from each of the three suppliers
population tests when
are shown below. Note that any test with these data is no longer a test of proportions for the
the categorical response
variable has three or more three supplier populations because the categorical response variable has three outcomes:
outcomes. minor defect, major defect, and good.
Supplier
Part Tested A B C
Minor Defect 15 13 21
Major Defect 5 11 5
Good 130 126 124
Using the data above, conduct a hypothesis test to determine if the distribution of defects is the
same for the three suppliers. Use the chi-square test calculations as presented in this section
with the exception that a table with r rows and c columns results in a chi-square test statistic
with (r – 1)(c – 1) degrees of freedom. Using a .05 level of significance, what is the p-value
and what is your conclusion?
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
522 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
independent, beer preference does not depend on gender and the preference for light, regular,
and dark beer can be expected to be the same for male and female beer drinkers. However, if
the test conclusion is that the two categorical variables are not independent, we have evidence
that beer preference is associated or dependent upon the gender of the beer drinker. As a result,
we can expect beer preferences to differ for male and female beer drinkers. In this case, a beer
manufacturer could use this information to customize its promotions and advertising for the
different target markets of male and female beer drinkers.
The hypotheses for this test of independence are as follows:
H0: Beer preference is independent of gender
Ha: Beer preference is not independent of gender
The sample data will be summarized in a two-way table with beer preferences of light,
regular, and dark as one of the variables and gender of male and female as the other vari-
able. Since an objective of the study is to determine if there is difference between the beer
preferences for male and female beer drinkers, we consider gender an explanatory variable
and follow the usual practice of making the explanatory variable the column variable in the
data tabulation table. The beer preference is the categorical response variable and is shown
as the row variable. The sample results of the 200 beer drinkers in the study are summarized
in Table 12.6.
The sample data are summarized based on the combination of beer preference and
gender for the individual respondents. For example, 51 individuals in the study were males
who preferred light beer, 56 individuals in the study were males who preferred regular
beer, and so on. Let us now analyze the data in the table and test for independence of beer
preference and gender.
First of all, since we selected a sample of beer drinkers, summarizing the data for each
variable separately will provide some insights into the characteristics of the beer drinker
population. For the categorical variable gender, we see 132 of the 200 in the sample were
male. This gives us the estimate that 132/200 = .66, or 66%, of the beer drinker population
is male. Similarly we estimate that 68/200 = .34, or 34%, of the beer drinker population is
female. Thus male beer drinkers appear to outnumber female beer drinkers approximately
2 to 1. Sample proportions or percentages for the three types of beer are
Prefer Light Beer 90/200 = .450, or 45.0%
Prefer Regular Beer 77/200 = .385, or 38.5%
Prefer Dark Beer 33/200 = .165, or 16.5%
Across all beer drinkers in the sample, light beer is preferred most often and dark beer is
preferred least often.
Let us now conduct the chi-square test to determine if beer preference and gender
are independent. The computations and formulas used are the same as those used for
TABLE 12.6 SAMPLE RESULTS FOR BEER PREFERENCES OF MALE AND FEMALE
BEER DRINKERS (OBSERVED FREQUENCIES)
Gender
Male Female Total
BeerPreference Light 51 39 90
Beer Preference Regular 56 21 77
Dark 25 8 33
Total 132 68 200
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.2 Test of Independence 523
Gender
Male Female Total
Light 59.40 30.60 90
Beer Preference Regular 50.82 26.18 77
Dark 21.78 11.22 33
Total 132 68 200
the chi-square test in Section 12.1. Utilizing the observed frequencies in Table 12.6 for
row i and column j, fij, we compute the expected frequencies, eij, under the assumption
that the beer preferences and gender are independent. The computation of the expected
frequencies follows the same logic and formula used in Section 12.1. Thus the expected
frequency for row i and column j is given by
(Row i Total)(Column j Total)
eij 5 (12.4)
Sample Size
For example, e11 = (90)(132)/200 = 59.40 is the expected frequency for male beer drink-
ers who would prefer light beer if beer preference is independent of gender. Show that
equation (12.4) can be used to find the other expected frequencies shown in Table 12.7.
Following the chi-square test procedure discussed in Section 12.1, we use the following
expression to compute the value of the chi-square test statistic.
(fij 2 eij)2
!2 5 oo
i j
eij
(12.5)
With r rows and c columns in the table, the chi-square distribution will have (r – 1)(c – 1)
degrees of freedom provided the expected frequency is at least 5 for each cell. Thus, in this
application we will use a chi-square distribution with (3 – 1)(2 – 1) = 2 degrees of freedom.
The complete steps to compute the chi-square test statistic are summarized in Table 12.8.
We can use the upper tail area of the chi-square distribution with 2 degrees of freedom
and the p-value approach to determine whether the null hypothesis that beer preference is
TABLE 12.8 COMPUTATION OF THE CHI-SQUARE TEST STATISTIC FOR THE TEST
OF INDEPENDENCE BETWEEN BEER PREFERENCE AND GENDER
Squared Difference
Observed Expected Squared Divided by
Beer Frequency Frequency Difference Difference Expected Frequency
Preference Gender fij eij ( fij 2 eij ) ( fij 2 eij )2 ( fij 2 eij )2/eij
Light Male 51 59.40 −8.40 70.56 1.19
Light Female 39 30.60 8.40 70.56 2.31
Regular Male 56 50.82 5.18 26.83 .53
Regular Female 21 26.18 −5.18 26.83 1.02
Dark Male 25 21.78 3.22 10.37 .48
Dark Female 8 11.22 −3.22 10.37 .92
Total 200 200 !2 5 6.45
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
524 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
independent of gender can be rejected. Using row two of the chi-square distribution table
shown in Table 12.4, we have the following:
!2 = 6.45
Thus, we see the upper tail area at !2 = 6.45 is between .05 and .025, and so the correspond-
ing upper tail area or p-value must be between .05 and .025. With p-value ≤ .05, we reject
H0 and conclude that beer preference is not independent of the gender of the beer drinker.
Stated another way, the study shows that beer preference can be expected to differ for male
and female beer drinkers. Minitab or Excel procedures provided in Appendix F can be used
to show !2 = 6.45 with two degrees of freedom yields a p-value = .0398.
Instead of using the p-value, we could use the critical value approach to draw the same
conclusion. With " = .05 and 2 degrees of freedom, the critical value for the chi-square
test statistic is !2.05 = 5.991. The upper tail rejection region becomes
Reject H0 if ≥ 5.991
With 6.45 ≥ 5.991, we reject H0. Again we see that the p-value approach and the critical
value approach provide the same conclusion.
While we now have evidence that beer preference and gender are not independent, we
will need to gain additional insight from the data to assess the nature of the association
between these two variables. One way to do this is to compute the probability of the beer
preference responses for males and females separately. These calculations are as follows:
The bar chart for male and female beer drinkers of the three kinds of beer is shown in
Figure 12.1.
0.7 Male
0.6 Female
0.5
Probability
0.4
0.3
0.2
0.1
0
Light Regular Dark
Beer Preference
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.2 Test of Independence 525
What observations can you make about the association between beer preference and
gender? For female beer drinkers in the sample, the highest preference is for light beer at
57.35%. For male beer drinkers in the sample, regular beer is most frequently preferred
at 42.42%. While female beer drinkers have a higher preference for light beer than males,
male beer drinkers have a higher preference for both regular beer and dark beer. Data visu-
alization through bar charts such as shown in Figure 12.1 is helpful in gaining insight as to
how two categorical variables are associated.
Before we leave this discussion, we summarize the steps for a test of independence.
Exercises
Methods
9. The following table contains observed frequencies for a sample of 200. Test for indepen-
dence of the row and column variables using " = .05.
Column Variable
Row Variable A B C
P 20 44 50
Q 30 26 30
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
526 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
10. The following table contains observed frequencies for a sample of 240. Test for indepen-
dence of the row and column variables using " = .05.
Column Variable
Row Variable A B C
P 20 30 20
Q 30 60 25
R 10 15 30
Applications
11. A Bloomberg Businessweek subscriber study asked, “In the past 12 months, when traveling
for business, what type of airline ticket did you purchase most often?” A second question
asked if the type of airline ticket purchased most often was for domestic or international
travel. Sample data obtained are shown in the following table.
Type of Flight
Type of Ticket Domestic International
First class 29 22
Business class 95 121
Economy class 518 135
a. Using a .05 level of significance, is the type of ticket purchased independent of the
type of flight? What is your conclusion?
b. Discuss any dependence that exists between the type of ticket and type of flight.
12. A Deloitte employment survey asked a sample of human resource executives how their
company planned to change its workforce over the next 12 months. A categorical response
WorkforcePlan variable showed three options: The company plans to hire and add to the number of em-
ployees, the company plans no change in the number of employees, or the company plans
to lay off and reduce the number of employees. Another categorical variable indicated if the
company was private or public. Sample data for 180 companies are summarized as follows.
Company
Employment Plan Private Public
Add Employees 37 32
No Change 19 34
Lay Off Employees 16 42
a. Conduct a test of independence to determine if the employment plan for the next
12 months is independent of the type of company. At a .05 level of significance, what
is your conclusion?
b. Discuss any differences in the employment plans for private and public companies
over the next 12 months.
13. Health insurance benefits vary by the size of the company (Atlanta Business Chronicle,
December 31, 2010). The sample data below show the number of companies providing
health insurance for small, medium, and large companies. For purposes of this study,
small companies are companies that have fewer than 100 employees. Medium-sized com-
panies have 100 to 999 employees, and large companies have 1000 or more employees.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.2 Test of Independence 527
The questionnaire sent to 225 employees asked whether or not the employee had health
insurance and then asked the employee to indicate the size of the company.
Education
Quality Rating Some HS HS Grad Some College College Grad
Average 35 30 20 60
Outstanding 45 45 50 90
Exceptional 20 25 30 50
Reputation of Company
Quality of Management Excellent Good Fair
Excellent 40 25 5
Good 35 35 10
Fair 25 10 15
a. Use a .05 level of significance and test for independence of the quality of management
and the reputation of the company. What is the p-value and what is your conclusion?
b. If there is a dependence or association between the two ratings, discuss and use prob-
abilities to justify your answer.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
528 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
16. The race for the 2013 Academy Award for Actress in a Leading Role was extremely tight,
featuring several worthy performances (ABC News online, February 22, 2013). The nom-
inees were Jessica Chastain for Zero Dark Thirty, Jennifer Lawrence for Silver Linings
Playbook, Emmanuelle Riva for Amour, Quvenzhané Wallis for Beasts of the Southern
Wild, and Naomi Watts for The Impossible. In a survey, movie fans who had seen each
of the movies for which these five actresses had been nominated were asked to select the
actress who was most deserving of the 2013 Academy Award for Actress in a Leading
Role. The responses follow.
Age Group
Hours of Sleep 39 or younger 40 or older
Fewer than 6 38 36
6 to 6.9 60 57
7 to 7.9 77 75
8 or more 65 92
Host B
Host A Con Mixed Pro
Con 24 8 13
Mixed 8 13 11
Pro 10 9 64
Use a test of independence with a .01 level of significance to analyze the data. What is
your conclusion?
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.3 Goodness of Fit Test 529
Observed Frequency
Company A’s Company B’s Company C’s
Product Product New Product
48 98 54
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
530 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
We now can perform a goodness of fit test that will determine whether the sample
of 200 customer purchase preferences is consistent with the null hypothesis. Like other
chi-square tests, the goodness of fit test is based on a comparison of observed frequencies
with the expected frequencies under the assumption that the null hypothesis is true. Hence,
the next step is to compute expected purchase preferences for the 200 customers under
the assumption that H0: pA = .30, pB = .50, and pC = .20 is true. Doing so provides the
expected frequencies as follows.
Expected Frequency
Company A’s Company B’s Company C’s
Product Product New Product
200(.30) = 60 200(.50) = 100 200(.20) = 40
Note that the expected frequency for each category is found by multiplying the sample size
of 200 by the hypothesized proportion for the category.
The goodness of fit test now focuses on the differences between the observed fre-
quencies and the expected frequencies. Whether the differences between the observed and
expected frequencies are “large” or “small” is a question answered with the aid of the fol-
lowing chi-square test statistic.
k (fi 2 ei )2
!2 5 o
i51
ei
(12.6)
where
Note: The test statistic has a chi-square distribution with k − 1 degrees of freedom
provided that the expected frequencies are 5 or more for all categories.
Let us continue with the Scott Marketing Research example and use the sample data to
test the hypothesis that the multinomial population has the market share proportions pA = .30,
pB = .50, and pC = .20. We will use an " = .05 level of significance. We proceed by using
the observed and expected frequencies to compute the value of the test statistic. With the
expected frequencies all 5 or more, the computation of the chi-square test statistic is shown
The test for goodness of fit in Table 12.9. Thus, we have !2 = 7.34.
is always a one-tailed test We will reject the null hypothesis if the differences between the observed and expected
with the rejection occurring
in the upper tail of the
frequencies are large. Thus the test of goodness of fit will always be an upper tail test.
chi-square distribution. We can use the upper tail area for the test statistic and the p-value approach to determine
whether the null hypothesis can be rejected. With k − 1 = 3 − 1 = 2 degrees of freedom,
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.3 Goodness of Fit Test 531
TABLE 12.9 COMPUTATION OF THE CHI-SQUARE TEST STATISTIC FOR THE SCOTT MARKETING
RESEARCH MARKET SHARE STUDY
Squared Difference
Observed Expected Squared Divided by
Hypothesized Frequency Frequency Difference Difference Expected Frequency
Category Proportion ( fi ) (ei ) ( fi 2 ei ) ( fi 2 ei )2 ( fi 2 ei )2/ei
Company A .30 48 60 −12 144 2.40
Company B .50 98 100 −2 4 0.04
Company C .20 54 40 14 196 4.90
Total 200 !2 = 7.34
row two of the chi-square distribution table in Table 12.4 provides the following:
!2 = 7.34
The test statistic !2 = 7.34 is between 5.991 and 7.378. Thus, the corresponding upper
tail area or p-value must be between .05 and .025. With p-value ≤ .05, we reject H0 and
conclude that the introduction of the new product by company C will alter the historical
market shares. Minitab or Excel procedures provided in Appendix F can be used to show
!2 = 7.34 provides a p-value = .0255.
Instead of using the p-value, we could use the critical value approach to draw the same
conclusion. With " = .05 and 2 degrees of freedom, the critical value for the test statistic
is !2.05 5 5.991. The upper tail rejection rule becomes
Reject H0 if !2 $ 5.991
With 7.34 > 5.991, we reject H0. The p-value approach and critical value approach provide
the same hypothesis testing conclusion.
Now that we have concluded the introduction of a new company C product will alter
the market shares for the three companies, we are interested in knowing more about how
the market shares are likely to change. Using the historical market shares and the sample
data, we summarize the data as follows:
Company Historical Market Share (%) Sample Data Market Share (%)
A 30 48/200 = .24, or 24
B 50 98/200 = .49, or 49
C 20 54/200 = .27, or 27
The historical market shares and the sample market shares are compared in the bar chart
shown in Figure 12.2. This data visualization process shows that the new product will
likely increase the market share for company C. Comparisons for the other two companies
indicate that company C’s gain in market share will hurt company A more than company B.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
532 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
FIGURE 12.2 BAR CHART OF MARKET SHARES BY COMPANY BEFORE AND AFTER
THE NEW PRODUCT FOR COMPANY C
Probability 0.4
0.3
0.2
0.1
0
A B C
Company
Let us summarize the steps that can be used to conduct a goodness of fit test for a
hypothesized multinomial population distribution.
o
i i
!2 5
i51
ei
5. Rejection rule:
p { value approach: Reject H0 if p { value # "
Critical value approach: Reject H0 if !2 $ !2"
where " is the level of significance for the test and there are k − 1 degrees
of freedom.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.3 Goodness of Fit Test 533
TABLE 12.10 normal probability distribution. Because the normal probability distribution is continuous,
CHEMLINE we must modify the way the categories are defined and how the expected frequencies are
EMPLOYEE computed. Let us demonstrate the goodness of fit test for a normal distribution by consider-
APTITUDE TEST ing the job applicant test data for Chemline, Inc., shown in Table 12.10.
SCORES FOR Chemline hires approximately 400 new employees annually for its four plants located
50 RANDOMLY throughout the United States. The personnel director asks whether a normal distribution
CHOSEN JOB applies for the population of test scores. If such a distribution can be used, the distribution
APPLICANTS would be helpful in evaluating specific test scores; that is, scores in the upper 20%, lower
40%, and so on, could be identified quickly. Hence, we want to test the null hypothesis that
71 66 61 65 54 93
60 86 70 70 73 73 the population of test scores has a normal distribution.
55 63 56 62 76 54 Let us first use the data in Table 12.10 to develop estimates of the mean and standard
82 79 76 68 53 58 deviation of the normal distribution that will be considered in the null hypothesis. We use
85 80 56 61 61 64
65 62 90 69 76 79 the sample mean x and the sample standard deviation s as point estimators of the mean and
77 54 64 74 65 65 standard deviation of the normal distribution. The calculations follow.
61 56 63 80 56 71
79 84 oxi 3421
x5 5 5 68.42
n 50
s5 Î o(xi 2 x)2
n21
5 Î 5310.0369
49
5 10.41
Using these values, we state the following hypotheses about the distribution of the job
applicant test scores.
Chemline
H0: The population of test scores has a normal distribution with mean 68.42
and standard deviation 10.41
Ha: The population of test scores does not have a normal distribution with
mean 68.42 and standard deviation 10.41
The hypothesized normal distribution is shown in Figure 12.3.
With the continuous normal probability distribution, we must use a different procedure for
defining the categories. We need to define the categories in terms of intervals of test scores.
With a continuous Recall the rule of thumb for an expected frequency of at least five in each interval or
probability distribution,
establish intervals such
category. We define the categories of test scores such that the expected frequencies will be
that each interval has an at least five for each category. With a sample size of 50, one way of establishing categories
expected frequency of five
or more.
FIGURE 12.3 HYPOTHESIZED NORMAL DISTRIBUTION OF TEST SCORES
FOR THE CHEMLINE JOB APPLICANTS
Standard Deviation
10.41
Mean 68.42
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
534 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
55.10
59.68
63.01
65.82
68.42
71.02
73.83
77.16
81.74
is to divide the normal probability distribution into 10 equal-probability intervals (see
Figure 12.4). With a sample size of 50, we would expect five outcomes in each interval or
category, and the rule of thumb for expected frequencies would be satisfied.
Let us look more closely at the procedure for calculating the category boundaries. When
the normal probability distribution is assumed, the standard normal probability tables can be
used to determine these boundaries. First consider the test score cutting off the lowest 10%
of the test scores. From the table for the standard normal distribution we find that the z value
for this test score is −1.28. Therefore, the test score of x = 68.42 − 1.28(10.41) = 55.10
provides this cutoff value for the lowest 10% of the scores. For the lowest 20%, we find
z = −.84, and thus x = 68.42 − .84(10.41) = 59.68. Working through the normal distribu-
tion in that way provides the following test score values.
These cutoff or interval boundary points are identified on the graph in Figure 12.4.
With the categories or intervals of test scores now defined and with the known expected
frequency of five per category, we can return to the sample data of Table 12.10 and determine
the observed frequencies for the categories. Doing so provides the results in Table 12.11.
With the results in Table 12.11, the goodness of fit calculations proceed exactly as
before. Namely, we compare the observed and expected results by computing a !2 value.
The calculations necessary to compute the chi-square test statistic are shown in Table 12.12.
We see that the value of the test statistic is !2 = 7.2.
To determine whether the computed !2 value of 7.2 is large enough to reject H0, we need
to refer to the appropriate chi-square distribution table. Using the rule for computing the
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.3 Goodness of Fit Test 535
Observed Expected
Frequency Frequency
Test Score Interval ( fi ) (ei )
Less than 55.10 5 5
55.10 to 59.68 5 5
59.68 to 63.01 9 5
63.01 to 65.82 6 5
65.82 to 68.42 2 5
68.42 to 71.02 5 5
71.02 to 73.83 2 5
73.83 to 77.16 5 5
77.16 to 81.74 5 5
81.74 and over 6 5
Total 50 50
Squared
Difference
Divided by
Observed Expected Squared Expected
Test Score Frequency Frequency Difference Difference Frequency
Interval ( fi ) (ei ) ( fi 2 ei ) ( fi 2 ei )2 ( fi 2 ei )2/ei
Less than 55.10 5 5 0 0 0.0
55.10 to 59.68 5 5 0 0 0.0
59.68 to 63.01 9 5 4 16 3.2
63.01 to 65.82 6 5 1 1 0.2
65.82 to 68.42 2 5 −3 9 1.8
68.42 to 71.02 5 5 0 0 0.0
71.02 to 73.83 2 5 −3 9 1.8
73.83 to 77.16 5 5 0 0 0.0
77.16 to 81.74 5 5 0 0 0.0
81.74 and over 6 5 1 1 0.2
Total 50 50 !2 5 7.2
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
536 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
distribution cannot be rejected. The normal probability distribution may be applied to assist
in the interpretation of test scores. A summary of the goodness fit test for a normal prob-
ability distribution follows.
o
i i
!2 5
i51
ei
5. Rejection rule:
p { value approach: Reject H0 if p { value # "
Critical value approach: Reject H0 if !2 $ !2"
where " is the level of significance. The degrees of freedom = k − p − 1,
where p is the number of parameters of the distribution estimated by the sam-
ple. In step 2a, the sample is used to estimate the mean and standard deviation.
Thus, p = 2 and the degrees of freedom = k − 2 − 1 = k − 3.
Exercises
Methods
19. Test the following hypotheses by using the !2 goodness of fit test.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
12.3 Goodness of Fit Test 537
Applications
21. During the first 13 weeks of the television season, the Saturday evening 8:00 p.m. to
9:00 p.m. audience proportions were recorded as ABC 29%, CBS 28%, NBC 25%, and
independents 18%. A sample of 300 homes two weeks after a Saturday night schedule
revision yielded the following viewing audience data: ABC 95 homes, CBS 70 homes, NBC
89 homes, and independents 46 homes. Test with " = .05 to determine whether the viewing
audience proportions changed.
22. Mars, Inc. manufactures M&M’s, one of the most popular candy treats in the world. The milk
chocolate candies come in a variety of colors including blue, brown, green, orange, red, and
M&M yellow. The overall proportions for the colors are .24 blue, .13 brown, .20 green, .16 orange,
.13 red, and .14 yellow. In a sampling study, several bags of M&M milk chocolates were
opened and the following color counts were obtained.
Use a .05 level of significance and the sample data to test the hypothesis that the overall
proportions for the colors are as stated above. What is your conclusion?
23. The Wall Street Journal’s Shareholder Scoreboard tracks the performance of 1000 major
U.S. companies. The performance of each company is rated based on the annual total return,
including stock price changes and the reinvestment of dividends. Ratings are assigned by
dividing all 1000 companies into five groups from A (top 20%), B (next 20%), to E (bottom
20%). Shown here are the one-year ratings for a sample of 60 of the largest companies. Do
the largest companies differ in performance from the performance of the 1000 companies in
the Shareholder Scoreboard? Use " = .05.
A B C D E
5 8 15 20 12
24. The National Highway Traffic Safety Administration reported the percentage of traffic
accidents occurring each day of the week. Assume that a sample of 420 accidents provided
the following data.
a. Conduct a hypothesis test to determine if the proportion of traffic accidents is the same
for each day of the week. What is the p-value? Using a .05 level of significance, what
is your conclusion?
b. Compute the percentage of traffic accidents occurring on each day of the week. What
day has the highest percentage of traffic accidents? Does this seem reasonable? Discuss.
25. Use " = .01 and conduct a goodness of fit test to see whether the following sample appears
to have been selected from a normal probability distribution.
55 86 94 58 55 95 55 52 69 95 90 65 87 50 56
55 57 98 58 79 92 62 59 88 65
After you complete the goodness of fit calculations, construct a histogram of the data. Does
the histogram representation support the conclusion reached with the goodness of fit test?
(Note: x 5 71 and s = 17.)
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
538 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
26. The weekly demand for a product is believed to be normally distributed. Use a goodness
of fit test and the following data to test this assumption. Use " = .10. The sample mean is
Demand 24.5 and the sample standard deviation is 3.
18 20 22 27 22
25 22 27 25 24
26 23 20 24 26
27 25 19 21 25
26 25 31 29 25
25 28 26 28 24
Summary
In this chapter we have introduced hypothesis tests for the following applications.
1. Testing the equality of population proportions for three or more populations.
2. Testing the independence of two categorical variables.
3. Testing whether a probability distribution for a population follows a specific histori-
cal or theoretical probability distribution.
All tests apply to categorical variables and all tests use a chi-square (!2) test statistic that
is based on the differences between observed frequencies and expected frequencies. In each
case, expected frequencies are computed under the assumption that the null hypothesis is
true. These chi-square tests are upper tailed tests. Large differences between observed and
expected frequencies provide a large value for the chi-square test statistic and indicate that
the null hypothesis should be rejected.
The test for the equality of population proportions for three or more populations is based
on independent random samples selected from each of the populations. The sample data show
the counts for each of two categorical responses for each population. The null hypothesis is that
the population proportions are equal. Rejection of the null hypothesis supports the conclusion
that the population proportions are not all equal.
The test of independence between two categorical variables uses one sample from a
population with the data showing the counts for each combination of two categorical vari-
ables. The null hypothesis is that the two variables are independent and the test is referred
to as a test of independence. If the null hypothesis is rejected, there is statistical evidence
of an association or dependency between the two variables.
The goodness of fit test is used to test the hypothesis that a population has a specific histori-
cal or theoretical probability distribution. We showed applications for populations with a mul-
tinomial probability distribution and with a normal probability distribution. Since the normal
probability distribution applies to continuous data, intervals of data values were established to
create the categories for the categorical variable required for the goodness of fit test.
Glossary
Goodness of fit test A chi-square test that can be used to test that a population probabil-
ity distribution has a specific historical or theoretical probability distribution. This test
was demonstrated for both a multinomial probability distribution and a normal probability
distribution.
Marascuilo procedure A multiple comparison procedure that can be used to test for a
significant difference between pairs of population proportions. This test can be helpful in
identifying differences between pairs of population proportions whenever the hypothesis
of equal population proportions has been rejected.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
Supplementary Exercises 539
Key Formulas
CVij 5 Ï!2"
ni
1 Î
pi(1 2 pi) pj(1 2 pj)
nj
(12.3)
(fi 2 ei)2
!2 5 o
i
ei
(12.6)
Supplementary Exercises
27. In a quality control test of parts manufactured at Dabco Corporation, an engineer sampled
parts produced on the first, second, and third shifts. The research study was designed to
determine if the population proportion of good parts was the same for all three shifts.
Sample data follow.
Production Shift
Quality First Second Third
Good 285 368 176
Defective 15 32 24
a. Using a .05 level of significance, conduct a hypothesis test to determine if the popula-
tion proportion of good parts is the same for all three shifts. What is the p-value and
what is your conclusion?
b. If the conclusion is that the population proportions are not all equal, use a multiple
comparison procedure to determine how the shifts differ in terms of quality. What shift
or shifts need to improve the quality of parts produced?
28. Phoenix Marketing International identified Bridgeport, Connecticut, Los Alamos, New Mexico,
Naples, Florida and Washington D.C. as the four U.S. cities with the highest percentage
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
540 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
of millionaires. Data consistent with that study show the following number of millionaires
for samples of individuals from each of the four cities.
City
Millionaire Bridgeport Los Alamos Naples Washington DC
Yes 44 35 36 34
No 456 265 364 366
a. Use the sample data to calculate the point estimate of the population proportion of
visitors who rated each of these museums as spectacular.
b. Conduct a hypothesis test to determine if the population proportion of visitors who
rated the museum as spectacular is equal for these five museums. Using a .05 level of
significance, what is the p-value and what is your conclusion?
30. A Pew Research Center survey asked respondents if they would rather live in a place with
a slower pace of life or a place with a faster pace of life. The survey also asked the respon-
dent’s gender. Consider the following sample data.
Gender
Preferred Pace of Life Male Female
Slower 230 218
No Preference 20 24
Faster 90 48
a. Is the preferred pace of life independent of gender? Using a .05 level of significance,
what is the p-value and what is your conclusion?
b. Discuss any differences between the preferences of men and women.
31. Bara Research Group conducted a survey about church attendance. The survey respondents
were asked about their church attendance and asked to indicate their age. Use the sample
data to determine whether church attendance is independent of age. Using a .05 level of
significance, what is the p-value and what is your conclusion? What conclusion can you
draw about church attendance as individuals grow older?
Age
Church Attendance 20 to 29 30 to 39 40 to 49 50 to 59
Yes 31 63 94 72
No 69 87 106 78
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
Supplementary Exercises 541
32. An ambulance service responds to emergency calls for two counties in Virginia. One
county is an urban county and the other is a rural county. A sample of 471 ambulance calls
Ambulance over the past two years showed the county and the day of the week for each emergency
call. Data are as follows.
Day of Week
County Sun Mon Tue Wed Thu Fri Sat
Urban 61 48 50 55 63 73 43
Rural 7 9 16 13 9 14 10
Test for independence of the county and the day of the week. Using a .05 level of signifi-
cance, what is the p-value and what is your conclusion?
33. Based on sales over a six-month period, the five top-selling compact cars are Chevy Cruze,
Ford Focus, Hyundai Elantra, Honda Civic, and Toyota Corolla (Motor Trend, November
2, 2011). Based on total sales, the market shares for these five compact cars were Chevy
Cruze 24%, Ford Focus 21%, Hyundai Elantra 20%, Honda Civic 18%, and Toyota Co-
rolla 17%. A sample of 400 compact car sales in Chicago showed the following number
of vehicles sold.
Use a goodness of fit test to determine if the sample data indicate that the market shares
for the five compact cars in Chicago are different than the market shares reported by Motor
Trend. Using a .05 level of significance, what is the p-value and what is your conclusion?
What market share differences, if any, exist in Chicago?
34. A random sample of final examination grades for a college course follows.
Grades 55 85 72 99 48 71 88 70 59 98 80 74 93 85 74
82 90 71 83 60 95 77 84 73 63 72 95 79 51 85
76 81 78 65 75 87 86 70 80 64
Use " = .05 and test to determine whether a normal probability distribution should be
rejected as being representative of the population distribution of grades.
35. A salesperson makes four calls per day. A sample of 100 days gives the following fre-
quencies of sales volumes.
Observed Frequency
Number of Sales (days)
0 30
1 32
2 25
3 10
4 3
Total 100
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
542 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
Records show sales are made to 30% of all sales calls. Assuming independent sales calls,
the number of sales per day should follow a binomial probability distribution. The binomial
probability function presented in Chapter 5 is
n!
f(x) 5 px(1 2 p)n2x
x!(n 2 x)!
For this exercise, assume that the population has a binomial probability distribution with
n = 4, p = .30, and x = 0, 1, 2, 3, and 4.
a. Compute the expected frequencies for x = 0, 1, 2, 3, and 4 by using the binomial
probability function. Combine categories if necessary to satisfy the requirement that
the expected frequency is five or more for all categories.
b. Use the goodness of fit test to determine whether the assumption of a binomial prob-
ability distribution should be rejected. Use " = .05. Because no parameters of the
binomial probability distribution were estimated from the sample data, the degrees of
freedom are k − 1 when k is the number of categories.
1. Should legislative pay be cut for every day the state budget is late?
Yes ____ No ____
2. Should there be more restrictions on lobbyists?
Yes ____ No ____
3. Should there be term limits requiring that legislators serve a fixed number of years?
Yes ____ No ____
The responses were coded using 1 for a Yes response and 2 for a No response. The complete
data set is available in the file named NYReform.
NYReform
Managerial Report
1. Use descriptive statistics to summarize the data from this study. What are your pre-
liminary conclusions about the independence of the response (Yes or No) and party
affiliation for each of the three questions in the survey?
2. With regard to question 1, test for the independence of the response (Yes and No)
and party affiliation. Use " = .05.
3. With regard to question 2, test for the independence of the response (Yes and No)
and party affiliation. Use " = .05.
4. With regard to question 3, test for the independence of the response (Yes and No)
and party affiliation. Use " = .05.
5. Does it appear that there is broad support for change across all political lines? Explain.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
Appendix 12.1 Chi-Square Tests Using Minitab 543
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
544 Chapter 12 Comparing Multiple Proportions, Test of Independence and Goodness of Fit
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203
Appendix 12.2 Chi-Square Tests Using Excel 545
ChiSquare
In Figure 12.5, the observed frequency cells are B7 to D8, written B7:D8 and
the expected frequency cells are B16 to D17, written B16:D17. The function
=CHISQ.TEST(B7:D8,B16:D17) is shown in cell E20 of the background worksheet. This
function does all the chi-square test computations and returns the p-value for the test.
The test of independence summarizes the observed frequencies in a tabular for-
The Excel worksheet shown mat very similar to the one shown in Figure 12.5. The formulas to compute expected
in Figure 12.5 is available frequencies are also very similar to the formulas shown in the background worksheet. For
in the DATAfile ChiSquare. the goodness of fit test, the user provides the observed frequencies in a column rather than
a table. The user must also provide the associated expected frequencies in another col-
umn. Lastly, the CHISQ.TEST function is used to obtain the p-value as described above.
Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203