Tests Using Contingency Tables: Test For Independence
Tests Using Contingency Tables: Test For Independence
12–3 When data can be tabulated in table form in terms of frequencies, several types of hy-
potheses can be tested using the chi-square test.
Tests Using Two such tests are the independence of variables test and the homogeneity of pro-
Contingency Tables portions test. The test of independence of variables is used to determine whether two
variables are independent of or related to each other when a single sample is selected.
The test of homogeneity of proportions is used to determine whether the proportions for
a variable are equal when several samples are selected from different populations. Both
tests use the chi-square distribution and a contingency table, and the test value is found
the same way. The independence test will be explained first.
Test for Independence The chi-square independence test can be used to test the independence of two vari-
Objective 2. Test two ables. For example, suppose a new postoperative procedure is administered to a number
variables for independence of patients in a large hospital. One can ask the question, “Do the doctors feel differently
using chi-square. about this procedure from the nurses, or do they feel basically the same way?” Note that
the question is not whether or not they prefer the procedure but whether there is a dif-
ference of opinion between the two groups.
To answer this question, a researcher selects a sample of nurses and doctors and
tabulates the data in table form, as shown.
Prefer new Prefer old No
Group procedure procedure preference
Nurses 100 80 20
Doctors 50 120 30
As the survey indicates, 100 nurses prefer the new procedure, 80 prefer the old pro-
cedure, and 20 have no preference; 50 doctors prefer the new procedure, 120 like the old
procedure, and 30 have no preference. Since the main question is whether there is a dif-
ference in opinion, the null hypothesis is stated as follows:
H0: The opinion about the procedure is independent of the profession.
The alternative hypothesis is stated as follows:
H1: The opinion about the procedure is dependent on the profession.
If the null hypothesis is not rejected, the test means that both professions feel basi-
cally the same way about the procedure, and the differences are due to chance. If the
null hypothesis is rejected, the test means that one group feels differently about the pro-
cedure from the other. Remember that rejection does not mean that one group favors the
procedure and the other does not. Perhaps both groups favor it or both dislike it, but in
different proportions.
In order to test the null hypothesis using the chi-square independence test, one must
compute the expected frequencies, assuming that the null hypothesis is true. These fre-
quencies are computed by using the observed frequencies given in the table.
When data are arranged in table form for the chi-square independence test, the table
is called a contingency table. The table is made up of R rows and C columns. The table
here has two rows and three columns.
Prefer new Prefer old No
Group procedure procedure preference
Nurses 100 80 20
Doctors 50 120 30
Note that row and column headings do not count in determining the number of rows and
columns.
524 Chapter 12 Other Chi-Square Tests
Interesting Fact A contingency table is designated as an R C (rows times columns) table. In this
case, R 2 and C 3; hence, this table is a 2 3 contingency table. Each block in the
You’re never too old—or table is called a cell and is designated by its row and column position. For example, the
too young—to be your cell with a frequency of 80 is designated as C1,2, or row 1, column 2. The cells are shown
best. George Foreman
won the world
below.
heavyweight boxing Column 1 Column 2 Column 3
championship at 46.
William Pitt was 24 when
Row 1 C1,1 C1,2 C1,3
he became prime minister Row 2 C2,1 C2,2 C2,3
of Great Britain. Benjamin
Franklin was a newspaper
The degrees of freedom for any contingency table are (rows 1) times (columns
columnist at age 16 and a 1); that is, d.f. (R 1)(C 1). In this case, (2 1)(3 1) (1)(2) 2. The rea-
framer of the Constitution son for this formula for d.f. is that all the expected values except one are free to vary in
when he was 81. Source: each row and in each column.
Prime, Fall 1996, p. 107 Using the previous table, one can compute the expected frequencies for each block
(or cell) as shown next.
a. Find the sum of each row and each column, and find the grand total, as shown.
Prefer new Prefer old No
Group procedure procedure preference
Row 1 sum
Nurses 100 80 20 200
Row 2 sum
Doctors 50 120 30 200
150 200 50 400
b. For each cell, multiply the corresponding row sum by the column sum and divide
by the grand total, to get the expected value:
row sum column sum
expected value
grand total
For example, for C1, 2, the expected value, denoted by E1, 2, is (refer to the previous
tables)
200 200
E1,2 100
400
For each cell, the expected values are computed as follows:
200 150 200 200 200 50
E1,1 75 E1,2 100 E1,3 25
400 400 400
200 150 200 200 200 50
E2,1 75 E2,2 100 E2,3 25
400 400 400
The expected values can now be placed in the corresponding cells along with the
observed values, as shown.
Prefer new Prefer old
Group procedure procedure No preference
Nurses 100 (75) 80 (100) 20 (25) 200
Doctors 50 (75) 120 (100) 30 (25) 200
150 200 50 400
Section 12–3 Tests Using Contingency Tables 525
The rationale for the computation of the expected frequencies for a contingency
table uses proportions. For C1,1 a total of 150 out of 400 people prefer the new proce-
dure. And since there are 200 nurses, one would expect, if the null hypothesis were true,
(150/400)(200), or 75, of the nurses to be in favor of the new procedure.
The formula for the test value for the independence test is the same as the one used
for the goodness-of-fit test. It is
O E 2
2
E
For the previous example, compute the (O E)2/E values for each cell, and then
find the sum.
O E 2
2
E
100 75 2 80 100 2 20 25 2 50 75 2
75 100 25 75
120 100 2 30 25 2
100 25
26.67
The final steps are to make the decision and summarize the results. This test
is always a right-tailed test, and the degrees of freedom are (R 1)(C 1)
(2 1)(3 1) 2. If 0.05, the critical value from Table G is 5.991. Hence, the
decision is to reject the null hypothesis, since 26.67 5.991. See Figure 12–6.
Figure 12–6
Critical and Test Values
for the Postoperative
Procedures Example
5.991 26.67
The conclusion is that there is enough evidence to support the claim that opinion is
related to (dependent on) profession—i.e., that the doctors and nurses differ in their
opinions about the procedure.
Two more examples illustrate the procedure for the chi-square test of independence.
Example 12–5 A sociologist wishes to see whether the number of years of college a person has com-
pleted is related to his or her place of residence. A sample of 88 people is selected and
classified as shown.
Four-year Advanced
Location No college degree degree Total
Urban 15 12 8 35
Suburban 8 15 9 32
Rural 6 8 7 21
Total 29 35 24 88
526 Chapter 12 Other Chi-Square Tests
Solution
STEP 1 State the hypotheses and identify the claim.
H0: A person’s place of residence is independent of the number of years
of college completed.
H1: A person’s place of residence is dependent on the number of years
of college completed (claim).
STEP 2 Find the critical value. The critical value is 9.488, since the degrees of
freedom are (3 1)(3 1) (2)(2) 4.
STEP 3 Compute the test value. To compute the test value, one must first compute
the expected values.
35 29 35 35 35 24
E1,1 11.53 E1,2 13.92 E1,3 9.55
88 88 88
32 29 32 35 32 24
E2,1 10.55 E2,2 12.73 E2,3 8.73
88 88 88
21 29 21 35 21 24
E3,1 6.92 E3,2 8.35 E3,3 5.73
88 88 88
Four-year Advanced
Location No college degree degree Total
Urban 15 (11.53) 12 (13.92) 8 (9.55) 35
Suburban 8 (10.55) 15 (12.73) 9 (8.73) 32
Rural 6 (6.92) 8 (8.35) 7 (5.73) 21
29 35 24 88
O E 2
2
E
15 11.53 2 12 13.92 2 8 9.55 2
11.53 13.92 9.55
8 10.55 2 15 12.73 2 9 8.73 2
10.55 12.73 8.73
6 6.92 2 8 8.35 2 7 5.73 2
6.92 8.35 5.73
3.01
STEP 4 Make the decision. The decision is not to reject the null hypothesis since
3.01 9.488. See Figure 12–7.
Section 12–3 Tests Using Contingency Tables 527
Figure 12–7
Critical and Test Values
for Example 12–5
3.01 9.488
STEP 5 Summarize the results. There is not enough evidence to support the claim
that a person’s place of residence is dependent on the number of years of
college completed.
Example 12–6 A researcher wishes to determine whether there is a relationship between the gender of
an individual and the amount of alcohol consumed. A sample of 68 people is selected,
and the following data are obtained.
Alcohol consumption
Gender Low Moderate High Total
Male 10 9 8 27
Female 13 16 12 41
Total 23 25 20 68
At 0.10, can the researcher conclude that alcohol consumption is related to gender?
Solution
STEP 1 State the hypotheses and identify the claim.
H0: The amount of alcohol that a person consumes is independent of the
individual’s gender.
H1: The amount of alcohol that a person consumes is dependent on the
individual’s gender (claim).
STEP 2 Find the critical value. The critical value is 4.605, since the degrees of
freedom are (2 1)(3 1) 2.
STEP 3 Compute the test value. First, compute the expected values.
27 23 27 25 27 20
E1,1 9.13 E1,2 9.93 E1,3 7.94
68 68 68
41 23 41 25 41 20
E2,1 13.87 E2,2 15.07 E2,3 12.06
68 68 68
The completed table is shown next.
Alcohol consumption
Gender Low Moderate High Total
0.283 4.605
STEP 5 Summarize the results. There is not enough evidence to support the claim
that the amount of alcohol a person consumes is dependent on the
individual’s gender.
Test for Homogeneity The second chi-square test that uses a contingency table is called the homogeneity of
of Proportions proportions test. In this situation, samples are selected from several different popula-
Objective 3. Test proportions tions and the researcher is interested in determining whether the proportions of elements
for homogeneity using chi- that have a common characteristic are the same for each population. The sample sizes are
square. specified in advance, making either the row totals or column totals in the contingency
table known before the samples are selected. For example, a researcher may select a sam-
ple of 50 freshmen, 50 sophomores, 50 juniors, and 50 seniors, and then find the propor-
tion of students who are smokers in each level. The researcher will then compare the
proportions for each group to see if they are equal. The hypotheses in this case would be
H0: p1 p2 p3 p4
H1: At least one proportion is different from the others.
If one does not reject the null hypothesis, it can be assumed that the proportions are
equal and the differences in them are due to chance. Hence, the proportion of students who
smoke is the same for grade levels freshmen through senior. When the null hypothesis is
rejected, it can be assumed that the proportions are not all equal. The computational pro-
cedure is the same as that for the test of independence as shown in the next example.
Example 12–7 A researcher selected a sample of 150 seniors from each of three area high schools and
asked each senior, “Do you drive to school in a car owned by either you or your
Section 12–3 Tests Using Contingency Tables 529
Interesting Fact parents?” The data are shown in the table. At 0.05, test the claim that the proportion
of students who drive their own or their parents’ cars is the same at all three schools.
Water is the most critical
nutrient in your body. It is School 1 School 2 School 3 Total
needed for just about Yes 18 22 16 56
everything that happens.
Water is lost fast: two No 32 28 34 94
cups daily is lost just 50 50 50 150
exhaling, 10 cups through
normal waste and body
cooling, and one to two Solution
quarts per hour running,
STEP 1 State the hypotheses.
biking, or working out.
Source: Fitness, Special H0: p1 p2 p3
Edition, p. 84.
H1: At least one proportion is different from the others.
STEP 2 Find the critical value. The formula for the degrees of freedom is the same
as before: (rows 1) (columns 1) (2 1) (3 1) 1(2) 2. The
critical value is 5.991.
STEP 3 Compute the test value. First, compute the expected values.
56 50 94 50
E1,1 18.67 E2,1 31.33
150 150
56 50 94 50
E1,2 18.67 E2,2 31.33
150 150
56 50 94 50
E1,3 18.67 E2,3 31.33
150 150
When the degrees of freedom for a contingency table are equal to 1—i.e., the table
is a 2 2 table—some statisticians suggest using the Yates correction for continuity.
The formula for the test is then
2
O E
0.5 2
E
Since the chi-square test is already conservative, most statisticians agree that the Yates
correction is not necessary. (See Exercise 12–53.)
The steps for the chi-square independence and homogeneity tests are summarized
in the Procedure Table.
Procedure
Procedure Table
Table
The assumptions for the two chi-square tests are given next.
Exercises
12–20. How is the chi-square independence test similar to 12–23. Generally, how would the null and alternative
the goodness-of-fit test? How is it different? hypotheses be stated for the chi-square independence test?
12–21. How are the degrees of freedom computed for the 12–24. What is the name of the table used in the
independence test? independence test?
12–22. When the observed frequencies are close to the 12–25. How are the expected values computed for each
expected frequencies, what is the value of chi-square? cell in the table?
Section 12–3 Tests Using Contingency Tables 531
Speaking of STATISTICS
Can you find a mistake in this USA SNAPSHOT?
USA SNAPSHOTS ®
A look at statistics that shape our lives
Ignoring commercials?
Americans are exposed to about 270 ads a
day1 in all media, but few seem to notice. How
many ads a day they think they saw/heard:
1-30
31-50
51-100 Men Women
101-300 9%
301 or more 17% 15%
25%
15%
20%
18%
1 –McKinsey and Co.
27%
30% 25%
12–26. Explain how the chi-square independence test 12–29. A researcher wishes to see whether the age of an
differs from the chi-square homogeneity of proportions test. individual is related to coffee consumption. A sample of
12–27. How are the null and alternative hypotheses stated 152 people is selected, and they are classified as shown in
for the test of homogeneity of proportions? the table. At 0.01, is there a relationship between
coffee consumption and age?
For Exercises 12–28 through 12–51, perform the
following steps. Coffee consumption
a. State the hypotheses and identify the claim. Age Low Moderate High
b. Find the critical value.
c. Compute the test value. 21–30 18 16 12
d. Make the decision. 31–40 9 15 27
e. Summarize the results. 41–50 5 12 10
Use the traditional method of hypothesis testing unless 51 and over 13 9 6
otherwise specified. 12–30. A survey of the 164 state representatives is
12–28. A study is being conducted to determine whether conducted to see whether their opinions on a bill are related
there is a relationship between jogging and blood pressure. to their party affiliation. The following data are obtained.
A random sample of 210 subjects is selected, and they are At 0.01, can the researcher conclude that opinions are
classified as shown in the table. At 0.05, test the claim related to party affiliations?
that jogging and blood pressure are not related. Opinion
Blood pressure Party Approve Disapprove No opinion
Jogging status Low Moderate High
Republican 27 15 13
Joggers 34 57 21 Democrat 43 18 12
Nonjoggers 15 63 20 Independent 9 15 12
532 Chapter 12 Other Chi-Square Tests
Speaking of STATISTICS
This study involves three groups: smokers, ex-smokers, hypothesis could be used in this study? Do you agree
and nonsmokers. Suggest how a chi-square independence with the results of the study? Explain your answer.
test could be used to arrive at the conclusions. What
Source: Reprinted with permission from Psychology Today magazine. Copyright © 1995. (Sussex Publishers, Inc.)
12–31. An automobile manufacturer wishes to determine 12–33. 300 men and 210 women were asked about how
whether the age of the purchaser is related to the price of many ads in all media they think they saw or heard during
the car purchased. A sample of 222 drivers shows the one day. The results are shown.
following data. At 0.05 is the purchase price of the car Number
independent of the age of the driver?
301
Selling price 1–30 31–50 51–100 101–300 or more
Under $20,001– $30,001– Men 45 60 90 54 51
Age $20,000 $30,000 $40,000
Women 50 50 54 30 26
21–30 16 25 3 At 0.01 is the number of ads people feel that they see
31–40 44 23 15 or hear related to the gender to the person?
41–50 31 15 18 Source: Based on information from USA Today Snapshots,
51 and over 9 11 12 February 23, 1999.
12–32. A researcher wishes to determine if on-line service 12–34. An instructor wishes to see if the way people obtain
or Internet use is independent of the type of user. A sample information is independent of their educational
of 300 computer users shows the following data. At background. A survey of 400 high school and college
0.10, test the claim that usage is independent of the user. graduates yielded the following information. At 0.05,
test the claim that the way people obtain information is
Service usage independent of their educational background.
Increase Same Decrease Television Newspapers Other sources
12–37. A study is being conducted to determine whether Male 243 201 191
the age of the customer is related to the type of movie he Female 135 149 202
or she rents. A sample of renters gives the data shown here.
At 0.10, is the type of movie selected related to the 12–42. According to a recent survey, 32% of Americans
customer’s age? say they are “very likely” to become organ donors.
A researcher surveys 50 drivers in each of three
Type of movie neighborhoods to determine the percentage of those
Age Documentary Comedy Mystery willing to donate their organs. The results are shown here.
At 0.01, test the claim that the proportions of those
12–20 14 9 8
who will donate their organs are equal in all three
21–29 15 14 9 neighborhoods.
30–38 9 21 39
39–47 7 22 17 Neighbor- Neighbor- Neighbor-
48 and over 6 38 12 hood A hood B hood C
Southside West End East Hills Jefferson 12–47. A researcher surveyed 100 randomly selected
lawyers in each of four areas of the country and asked
Passed 49 38 46 34
them if they had performed pro bono work for 25 or
Failed 71 82 74 86 fewer hours in the last year. The results are shown here.
Total 120 120 120 120 At 0.10, is there enough evidence to reject the
claim that the proportions of those who accepted pro
Source: Lewis H. Lapham, et al., The Harper’s Index Book (New
York: Henry Holt & Co., 1987), p. 57. bono work for 25 hours or less are the same in each
area?
12–44. An advertising firm has decided to ask 92
customers at each of three local shopping malls if they are North South East West
willing to take part in a market research survey. According Yes 43 39 22 28
to previous studies, 38% of Americans refuse to take part in No 57 61 78 72
such surveys. The results are shown here. At 0.01, test
the claim that the proportions of those who are willing to Total 100 100 100 100
participate are equal. Source: Daniel Weiss, 100% American (New York: Poseidon Press,
1988), p. 59.
Mall A Mall B Mall C
12–48. On average, 79% of American fathers are in the
Will participate 52 45 36
delivery room when their children are born. A physician’s
Will not participate 40 47 56 assistant surveyed 300 first-time fathers to determine if
Total 92 92 92 they had been in the delivery room when their children
were born. The results are shown here. At 0.05, is
Source: Lewis H. Lapham, et al., The Harper’s Index Book (New
York: Henry Holt & Co., 1987), p. 41. there enough evidence to reject the claim that the
proportions of those who were in the delivery room at the
12–45. An insurance firm wished to see if the time of birth are the same?
proportion of drivers who admit to driving after
drinking varies according to the age of the driver. The Hos- Hos- Hos- Hos-
firm surveyed 86 drivers in each of four age groups to see pital A pital B pital C pital D
if they admitted to driving after drinking. The results Present 66 60 57 56
are shown here. At 0.05, test the claim that the Not present 9 15 18 19
proportions of those who said yes are equal for the age
groups. Total 75 75 75 75
Ages 21–29 30–39 40–49 50 and over Source: Daniel Weiss, 100% American (New York: Poseidon Press,
1988), p. 79.
Yes 32 28 26 21
12–49. A children’s playground equipment
No 54 58 60 65 manufacturer read in a survey that 55% of all
Total 86 86 86 86 American playground injuries occur on the monkey
bars. The manufacturer wishes to investigate playground
12–46. According to a recent survey, 59% of Americans injuries in four different parts of the country to determine
aged 8 to 17 would prefer that their mother work outside if the proportions of accidents on the monkey bars are
the home, regardless of what she does now. A school equal. The results are shown here. At 0.05, test the
district psychologist decided to select three samples of claim that the proportions are equal. Use the P-value
60 students each in elementary, middle, and high school method.
to see how the students in her district felt about the
issue. At 0.10, test the claim that the proportions of Accidents North South East West
the students who prefer that their mother have a job are
On monkey bars 15 18 13 16
equal.
Not on monkey bars 15 12 17 14
Elementary Middle High
Total 30 30 30 30
Prefers mother work 29 38 51 Source: Michael D. Shook and Robert L. Shook, The Book of Odds
Prefers mother not work 31 22 9 (New York: Penguin Putnam Inc., 1991), p. 96.
Total 60 60 60 12–50. According to the American Automobile
Source: Daniel Weiss, 100% American (New York: Poseidon Press, Association, 31 million Americans travel over the
1988), p. 59. Thanksgiving holiday. To determine whether to stay open
Section 12–3 Tests Using Contingency Tables 535
or not, a national restaurant chain surveyed 125 customers The chi-square test value can be computed as
at each of four locations to see if they would be traveling
over the holiday. The results are shown here. At 0.10, nad bc 2
2
test the claim that the proportions of Americans who will a b a c c d b d
travel over the Thanksgiving holiday are equal. Use the
P-value method. where n a b c d. Compute the 2 test value by
using the above formula and the formula (O E)2/E, and
Loca- Loca- Loca- Loca- compare the results for the following table.
tion A tion B tion C tion D
12 15
Will travel 37 52 46 49
9 23
Will not travel 88 73 79 76
Total 125 125 125 125 *12–53. For the contingency table shown in Exercise
12–52, compute the chi-square test value by using Yates’s
Source: Michael D. Shook and Robert L. Shook, The Book of Odds
(New York: Penguin Putnam Inc., 1991), p. 67.
correction for continuity.
12–51. The vice president of a large supermarket chain *12–54. When the chi-square test value is significant, and
wished to determine if his customers made a list before there is a relationship between the variables, the strength of
going grocery shopping. He surveyed 288 customers in this relationship can be measured by using the contingency
three stores. The results are shown here. At 0.10, test coefficient. The formula for the contingency coefficient is
the claim that the proportions of the customers in the three 2
stores who made a list before going shopping are equal. C
2 n
Store A Store B Store C
where 2 is the test value and n is the sum of frequencies of
Made list 77 74 68 the cells. The contingency coefficient will always be less
No list 19 22 28 than 1. Compute the contingency coefficient for Exercises
Total 96 96 96 12–28 and 12–40.
The three columns will be placed in the Columns box as a sequence, C1 through C3.
5. Click [OK].
2 8 15 9 32
10.55 12.73 8.73
3 6 8 7 21
6.92 8.35 5.73
Total 29 35 24 88
The chi-square test statistic 3.006 has a P-value of 0.557. Do not reject the null hypothesis. The
sample data do not support a relationship between level of education and place of residence.
10 9 8
13 16 12
Section 12–4 Summary 537
Input Input
Output
The test value is 0.2808562115. The P-value is 0.8689861378. The decision is to not reject the
null hypothesis, since this value is greater than 0.10. You can find the expected values by press-
ing MATRIX, moving the cursor to [B], and pressing ENTER twice.
12–4 Three uses of the chi-square distribution were explained in this chapter. It can be
used as goodness-of-fit test, in order to determine whether the frequencies of a distri-
Summary bution are the same as the hypothesized frequencies. For example, is the number of
defective parts produced by a factory the same each day? This test is always a right-
tailed test.
The test of independence is used to determine whether two variables are related or
are independent. This test uses a contingency table and is always a right-tailed test. An
example of its use is a test to determine whether the attitudes of urban residents about
the recycling of trash differ from the attitudes of rural residents.
Finally, the homogeneity of proportions test is used to determine if several propor-
tions are all equal when samples are selected from different populations.
The chi-square distribution is also used for other types of statistical hypothesis
tests, such as the Kruskal-Wallis test, which is explained in Chapter 14.
Important Terms
contingency table 523 goodness-of-fit test 512 independence test 523
expected homogeneity of observed
frequency 512 proportions test 528 frequency 512