11 Sample Problems On Chi-Square Tests (Chapter 11) - ANSWER KEY
11 Sample Problems On Chi-Square Tests (Chapter 11) - ANSWER KEY
The company that makes milk chocolate M&Ms claims the following distribution:
13% red, 20% orange, 16% green, 13% brown, 14% yellow, and 24% blue. Is this true?
1. Suppose that we opened a large bag of M&M’s. Think of this bag as being a random
sample of the entire population of M&Ms. You will count the number of candies that you
have and record the counts of each color. Here is the observed frequency:
66 115 85 48 70 107
H0: ρred = 0.13, ρorange = 0.20, ρgreen = 0.16, ρbrown = 0.13, ρyellow = 0.14, ρblue = 0.24
Ha: At least two of the proportions specified in the null hypothesis are not correct.
3. Let’s suppose that M&Ms claimed distribution is correct. If they are correct, how many of
each color would we expect to get in our sample.
You must show your work for one of the expected values.
1
4. Check conditions.
• Random? The bags of M&Ms purchased locally are nowhere near being SRSs of
production, violating the assumption of random selection. However, we believe the
bag of M&Ms purchased locally is representative of all bags. So, we will proceed
with caution.
• Independent (10% condition)? It is reasonable to assume that there are at least
491*10 = 4,910 M&Ms in produced.
• Large count. Are all the expected counts greater than 5? What is the lowest
expected count? All expected counts are greater than 5. The smallest is 63.83 (red).
5. Calculations. For this test, the degrees of freedom is n – 1 where n represents the
number of categories (colors).
df = n – 1 = 6 –1=5
Use the table to calculate the test statistic.
(Observed - Expected ) 2
Observed Expected (Observed - Expected) (Observed - Expected)2 Expected
Red 66 63.83 66 - 63.83 = 2.17 (2.17)2 = 4.71 0.074
Orange 115 98.20 16.8 282.24 2.874
Green 85 78.56 6.44 41.47 0.528
Brown 48 63.83 -15.83 250.59 3.926
Yellow 70 68.74 1.26 1.59 0.023
Blue 107 117.84 -10.84 117.51 0.997
Sometimes the observed is greater than expected and sometimes it is less. We square the
difference (O-E) so that all of our values are positive.
Add up all the numbers in the last column. This is our test statistic: c2 = 8.42
c2 = Σ(O-E)2/E
6. What value would we get for the test statistic if our sample was very close to what is
expected? Explain.
It would be a positive value close to zero because the values for (Observed
– Expected)2 would be small.
7. What value would we get for the test statistic if our sample was very far from what is
expected? Explain.
7. Make a conclusion. Do the data provide significant evidence that the company was lying
about the distribution of colors of M&Ms? Use α = 0.05 .
9. Follow-up analysis. If you rejected the null hypothesis, which color M&M had an observed
value farthest from the expected value?
Color ________ had the largest contribution (___) to c2. That color had
___ more/fewer observed (___) than expected (___).
3
round(res$expected, 3) # Expected counts
Chi-Square Test: Goodness of Fit
Hypotheses:
H0: ρred = 0.13, ρorange = 0.20, ρgreen = 0.16, ρbrown = 0.13 ρyellow = 0.14, ρblue = 0.24.
Ha: At least two of these proportions are not as specified in the null hypothesis.
OR
H0: The claimed distribution (in context) is true.
Ha: The claimed distribution (in context) is not true.
Conditions:
ü Large counts. The chi-square test for goodness of fit becomes more accurate
with more observations, so large counts should be used. A conservative check
for large counts is that all expected counts (rather than actual counts) must be at
least 5. Show lowest expected count.
The expected count for any categorical variable is obtained by multiplying the expected
proportion for each category by the sample size. That is, expected count = pi*n for all i
categories. Don’t round the expected counts!
4
Check Your Understanding (fair six-sided die?)
Carrie made a 6-sided die in her ceramics class and rolled it 90 times to test if each side
was equally likely to show up. The table summarizes the outcomes of her 90 rolls.
H0: The probability of getting each outcome is 1/6, i.e., the die is fair.
Ha: The probability of getting one or more of the outcomes is not 1/6, i.e., the die is not fair.
(b) Calculate the expected count for each of the possible outcomes.
df = 6 – 1 = 5
(e) Use table C or your calculator to find the p-value. What conclusion would you make?
P(c2 > 14.4) = 0.0133
Because the P-value (0.0133) is less than the significance level (α = 0.05),
we reject H0. This is convincing evidence that the die is not fair.
5
(f) Which side of the die had an observed value the farthest from the expected?
The side with 2 dots had 13 (=28-15) more observed than expected.
If the χ2 is statistically significant, be prepared to discuss which values were the largest
contributor to χ2. To see which outcome had the biggest contribution to χ2, go to LIST3
on your calculator. Find the largest contributions and then calculate (O – E) and
discuss the difference between observed and expected.
Casio calculator syntax for chi-square test for goodness of fit: From the main menu
(MENU), enter the Statistics module (2). Input Observed counts in List1. Separately,
calculate Expected counts and enter them in List 2. TEST (F3) à CHI (F3) à GOF
(F1)
Observed: List1
Expected: List2
df: 5 df = number of categories - 1
CNTRB: List3
Hit the EXE button, which returns
χ2 = 14.4 chi-square statistic
p= 0.01325859 P-value
df = 5 degrees of freedom
CNTRB List3 List showing the contributions to test statistic
6
Chi-Square Test for Goodness of Fit
Within the family of χ2 distribution density curves, the skew becomes less
pronounced with increasing degrees of freedom.
7
Check Your Understanding (choice of car color)
Does the warm, sunny weather in Arizona affect a driver’s choice of car color? Cass thinks
that Arizona drivers might opt for a lighter color with the hope that it will reflect some of the
heat from the sun. To see whether the distribution of car colors in Oro Valley, near Tucson,
is different from the distribution of car colors across North America, she selected a random
sample of 300 cars in Oro Valley. The table shows the distribution of car color for Cass’s
sample in Oro Valley and the distribution of car color in North America, according to
www.ppg.com.
1. Do these data provide convincing evidence that the distribution of car color in Oro Valley
differs from the North American distribution? Use the four-step (ICCI) process.
H0: The distribution of car colors in Oro Valley is the same as the distribution of car
colors in North America.
Ha: The distribution of car colors in the Oro Valley is not the same as the distribution of
car colors in North America.
Conditions: If conditions are met, we will perform a chi-square test for goodness of fit
(“GOF”).
ü Random: The data came from a random sample of 300 cars in Oro Valley.
ü Independent (10% condition): Because we are sampling without replacement,
there must be at least 10(300) = 3000 cars in Oro Valley. This is reasonable to
assume.
ü Large Counts: The expected counts are 300(0.23) = 69, 300(0.18) = 54,
300(0.16) = 48, 300(0.15) = 45, 300(0.10) = 30, 300(0.09) = 27, 300(0.02) = 6,
300(0.07) = 21. All expected counts are at least 5. (The smallest is 6 for green.)
(84 − 69) + (38 − 54)
2 2
Calculate: χ 2
= +! = 29.92
69 54
Degrees of freedom = df = 8 − 1 = 7
• Using technology: Using df = 7, c2cdf(lower: 29.92, upper: 1000, df: 7) reveals
P(c2 ≥ 29.92) = 0.0000982 » 0.
• OR Using the calculator’s c2 test with df = 7, P(c2 ≥ 29.92) = 0.0000982 » 0
Conclude: Because the P-value of approximately 0 is less than a = 0.05, we reject H0.
We have convincing evidence that at least one of the proportions car colors in the Oro
Valley is not the same as that in North America.
8
2. If there is convincing evidence of a difference in the distribution of car color, perform a
follow-up analysis.
Open TEST soft menu (F3) à CHI (F3) à choose the Goodness-of-fit test GOF (F1)
Observed: List1
Expected: List2
df: 7
CNTRB: List3
Hit the EXE button, which returns
χ2 = 29.9213854
p = 9.8165x10-5
df =7
CNTRB: List3
9
Does gummy bear brand matter?
Is the distribution of gummy bear colors the same for Haribo gummy bears and Great
Value (Walmart brand) gummy bears? We’ll collect data as a class and determine if we
have convincing evidence of a difference.
Suppose that we open a large bag of each brand of gummy bears and observe the
following distribution of colors. Fill in the table with the totals.
Brand
Haribo Great Value Total
Red 181 79 260
Green 91 59 150
Color Yellow 105 55 160
Orange 123 57 180
Clear 100 50 150
Total 600 300 900
1. How many samples do we have? What population are they from? Explain.
We have two samples: one sample from the all Haribo brand gummy
bears and one sample from all Great Value brand gummy bears.
This question may look like the M&M goodness-of-fit question from the previous lesson,
but there is a very important distinction. With the M&M question, we were comparing data
from one sample to a claimed distribution of color. With this gummy bear question, we are
comparing data from one sample to data from another sample. This is analogous to the
difference between a one-proportion z-test and a two-proportion z- test.
H0: The color distribution is the same for both the Haribo and Great Value brands.
Ha: The color distribution is not the same for the Haribo and Great Value brands.
10
4. Now we will use a chi-square test to test whether there is a difference between the
two populations. We first need to find the expected values. Complete the table below
by writing down the value of the expected count in the space provided in each cell.
Show your calculations for one expected count.
The expect count in a particular cell of a two-way table of categorical data can be
calculated using the formula:
Expected count = (row total)(column total) Expected count = 260*600
Table total 900
Brand
Haribo Great Value Total
Red 181 173.33 79 86.67 260
Green 91 100.00 59 50.00 150
Color Yellow 105 106.67 55 53.33 160
Orange 123 120.00 57 60.00 180
Clear 100 100.00 50 50.00 150
Total 600 300 900
Always write down the values of the expected count. Show your work for one of
the expected counts. Don’t round expected counts!
5. Use your work above to complete a 4-step (ICCI) significance test. α = 0.05
Identify: We want to perform a test of the following hypotheses at the α = 0.05 level:
H0: The color distribution is the same for both the Haribo and Great Value brands.
Ha: The color distribution is not the same for the Haribo and Great Value brands.
Conditions: If conditions are met, we will perform a chi-square test for homogeneity.
ü Random: The data came from two independent random samples.
ü 10%: 600 is less than 10% of all Haribo gummy bears and 300 is less
than 10% of all Great Value gummy bears. So, n ≤ N/10 for
both.
ü Large Counts: The expected counts (in the table above) are all at
least 5. The smallest is 50 for Great Value Green and Clear.
+ ! = 3.75
173.33 100
Using technology: With df = (5-1)*(2-1) = 4, P-value = P(χ2 ≥ 3.75) = 0.441.
c2cdf(lower: 3.75, upper: 1000, df: 4)
df = (# of rows – 1)∙(# of columns – 1)
Conclude: Because the P-value of 0.441 is greater than a = 0.05, we fail to reject H0.
We do not have convincing evidence that there is a difference in the true distributions of
color Haribo gummy bears and Great Value gummy bears.
11
Casio calculator syntax for chi-square test for homogeneity/independence: In the
STATS module, TEST (F3) à CHI (F3) à 2WAY (F2) à ►MAT (F2).
ü Navigate to the matrix you want to use, e.g., Observed count: MAT A and hit
EXE.
ü Specify the matrix dimensions: m is for rows, n is for columns.
Here, m=5 and n = 2.
ü Enter the data.
ü Return to the test page by hitting EXIT twice.
ü Optionally, do the same thing for Expected count: MAT B. If you don’t, the
calculator will generate Expected counts.
ü Hit the EXE button, which returns
χ2 = 3.75043269
p = 0.44083332
df = 4
R statistical software
data = matrix(c(181, 91, 105, 123, 100, 79, 59, 55, 57, 50), nrow=5)
# By default, R fills a matrix by column, so enter the first column, then the second
column, etc.
colnames(data)=c("Haribo", "Great Value") # Column names
rownames(data)=c("Red", "Green", "Yellow", "Orange", "Clear") # Row names
data
test <- chisq.test(data)
test
marginals = addmargins(data) # Add row and column totals
marginals
round(test$expected, 2) # Expected counts under null
residuals = test$residuals # Pearson residuals r = (O-E)/sqrt(E)
contrib = (test$residuals)^2 # Contributions
round(contrib, 2)
6. Explain how this test is different from a chi-square test for goodness of fit?
Here, we have two samples from two populations. We are comparing data from one
sample to data from another sample.
With the c2 test for goodness-of-fit, we had one sample from one population. In that test,
we compared data from one sample to a claimed distribution of color.
12
Chi-Square Test for Homogeneity
Hypotheses for chi-square test for homogeneity:
The expect count in a particular cell of a two-way table of categorical data can be
calculated using the formula:
Always write down the values of the Expected counts. Don’t round. Show your
calculations for at least one term.
Conditions
ü Random. Data should be collected using a stratified random sample or a
randomized experiment.
ü 10%: When sampling without replacement, check that n ≤ N/10 for both
samples.
ü Large counts. A conservative check for large counts is that all expected counts must
be at least 5.
If you reject H0 and are asked to do a follow-up (contribution) analysis, look to see
which cells have the largest contribution to the c2 statistic and discuss the difference
between observed counts and expected counts.
13
Check Your Understanding (gender of interviewer)
For a class project, Abby and Mia wanted to know if the gender of an interviewer could
affect the responses to a survey question. The subjects in their experiment were 100
males from their school. Half of the males were randomly assigned to be asked, “Would
you vote for a female president?” by a female interviewer. The other half of the males
were asked the same question by a male interviewer. The table shows the results.
(b) Show the calculation for the expected count in the Male/Yes cell. Then provide a
complete table of expected counts.
Expected count for Male/Yes cell = 50*69 = 34.5
100
M F
Yes 34.5 34.5
No 5.5 5.5
Maybe 10 10
Test statistic: χ
2
=
( 30 − 34.5) + ( 39 − 34.5)
2 2
+!= 4.25
34.5 34.5
With df = (3-1)*(2-1) = 2, c2cdf(lower: 4.25, upper: 1000, df: 2) reveals
P-value = P(c2 ≥ 4.25) = 0.1196.
14
Are Taco Tongue and Evil Eyebrow independent?
Is there an association between the Taco Tongue and the Evil Eyebrow? Below is the data
for a random sample of 600 senior students. Do we have convincing evidence that the
ability to do the Taco Tongue and Evil Eyebrow are associated?
In chapter 5, we learned about making a claim about independence within a sample. That
method did not make a claim about the population. The chi-square technique that we learn
about in this chapter allows us to make a claim about the population.
2. Calculate the expected counts. 480*200 = 160
600
Observed: Expected:
Evil Eyebrow Evil Eyebrow
Yes No Total Yes No Total
Taco Yes 180 300 480 Taco Yes 160 320 480
Tongue No Tongue No
20 100 120 40 80 120
Total 200 400 600 Total 200 400 600
3. Do the data provide significant evidence that there is an association between the ability to
Taco Tongue and Evil Eyebrow? Use α = 0.05
Identify: We want to perform a test of the following hypotheses using a = 0.05:
H0: There is no association between being able to make an Evil Eyebrow and being
able to make a Taco Tongue in the population of seniors.
Ha: There is an association between being able to make an Evil Eyebrow and being
able to make a Taco Tongue in the population of seniors.
Conditions: If conditions are met, we will perform a chi-square test for independence.
ü Random: Random sample of 600 seniors.
ü 10%: The sample of 600 seniors is less than 10% of all seniors.
ü Large Counts: The expected counts (see table below) are all at least 5.
(The smallest is 40 > 5)
15
Calculate: Test statistic c2 = (180 – 160)2 + ... + (100 – 80)2 = 18.75
160 80
df = (2-1)*(2-1) = 1
Conclude: Because the P-value (0.000015) is less than a = 0.05, we reject H0. The
data provide convincing evidence to conclude that there is an association between
being able to make a Taco Tongue and being able to make an Evil Eyebrow in the
population of seniors.
R statistical software
data = matrix(c(180, 20, 300, 100), nrow=2)
# By default, R fills a matrix by column, so enter the first column, then the second
column, etc.
colnames(data)=c("Yes Evil Eyebrow","No Evil Eyebrow") # Column names
rownames(data)=c("Yes Taco Tongue","No Taco Tongue") # Row names
marginals = addmargins(data) # Add row and column totals
marginals
test <- chisq.test(data, correct = FALSE) # Chi-square test for two-way table without continuity
correction
test
round(test$expected, 2) # Expected counts
residuals = test$residuals # Pearson residuals r = (O-E)/sqrt(E)
contrib = (test$residuals)^2 # Contributions
contrib
16
Chi-Square Test for Independence
Hypotheses for chi-square test for independence:
The difference between chi-square tests for homogeneity and tests for
association/independence rests on how you get the data.
• For homogeneity of populations: One categorical variable is observed in two or
more populations (groups) from a stratified random sample or randomized
experiment, e.g., remember the context of gummy bears. All experiments are
tests of homogeneity.
• For Association/Independence: Two categorical variables are observed in a
single population, e.g., remember the context of Evil Eyebrow.
For chi-square tests for goodness of fit, we have one variable and one population, e.g.,
remember the context of M&Ms.
The calculator mechanics for the chi-square tests for homogeneity and independence
are the same.
17
Are gender and favorite class independent?
Is there an association between gender and preference for English or math class? Below is
the data for a random sample of senior students. Do we have convincing evidence that
gender and favorite class are associated?
Observed: Expected:
English Math Total English Math Total
Female 43 22 65 Female 36.49 28.51 65
Male 21 28 49 Male 27.51 21.49 49
Total 64 50 114 Total 64 50 114
2. Do the data provide significant evidence that there is an association between gender and
preference for English or math class? Use α = 0.05
Table: Using df = 1, P-value < 0.05 because the calculated c2 statistic (6.1582) is
greater than the critical value (3.84) in the df = 1 row.
18
Conclude: Because the P-value (0.0131) is less than a = 0.05, we reject H0. The data
provide convincing evidence to conclude that there is an association between Gender
and preference for English or Math among senior students.
19
Check Your Understanding (pick the test)
For each of the following situations decide what type of chi square test is
appropriate. Explain.
2. The General Social Survey (GSS) asked a random sample of adults their opinion
about whether astrology is very scientific, sort of scientific, or not at all scientific.
Here is a two-way table of counts for people in the sample who had three levels
of higher education:
3. Casinos are required to verify that their games operate as advertised. American
roulette wheels have 38 slots—18 red, 18 black, and 2 green. In one casino,
managers record data from a random sample of 200 spins of one of their
American roulette wheels. The table displays the results.
20
Check your understanding (Ibuprofen or acetaminophen?)
In a study reported by the Annals of Emergency Medicine (March 2009), researchers
conducted a randomized, double-blind clinical trial to compare the effects of ibuprofen
and acetaminophen plus codeine as a pain reliever for children recovering from arm
fractures. There were many response variables recorded, including the presence of any
adverse effect, such as nausea, dizziness, and drowsiness. Here are the results:
All experiments are tests of homogeneity. Here, one group took Ibuprofen, and another
took Acetaminophen plus codeine.
(b) Do these data provide convincing evidence at the a = 0.05 level that there is a
difference in proportion of subjects who had adverse effects across treatments.
Conditions: If conditions are met, we will perform a chi-square test for homogeneity.
ü Random: The treatments were assigned at random.
ü Independent: Knowing if one subject had an adverse effect shouldn’t give
any additional information about the responses of other subjects, so the
observations can be considered independent. (Remember, do not check
the 10% condition for experiments.)
ü Large Sample Size: The expected counts (listed below) are all at least 5.
Acetaminophen
Expected counts Ibuprofen Total
plus Codeine
Adverse effects 48.5 44.5 93
No adverse effects 73.5 67.5 141
Total 122 112 234
21
( 36 - 48.5)
2
The chi-square test for homogeneity based on a 2x2 two-way table is equivalent to the
two-sample z-test test for ρ1 = ρ2 with a two-sided alternative hypothesis.
(c) Show that the results of a two-sample z test for a difference in proportions generate
give the same P-value.
Identify: Since we are comparing the proportion of subjects with adverse effects for just
two treatments, we can use a two-sample z test to test the following hypotheses:
H0 : rI - rA = 0
Ha : rI - rA ¹ 0
where
ρI = the true proportion of adverse effects for Ibup. users, and
ρA = the true proportion of adverse effects for Acet. users.
Conditions: Two-proportion z-test.
ü Random: Same as for the chi-square test of homogeneity.
ü Independent: Same as for the chi-square test of homogeneity.
ü Large count: Success and failures are all greater than 10 {36, 86, 57, 55}.
Calculate: When the conditions are met, the two-sample z test for difference in
proportions p1 - p2 uses the test statistic
22
normalcdf(lower= -∞, upper= -3.33924, σ=1, µ=0) reveals P(Z < -3.33924) = 0.00042
P-value = P(Z ≤ -3.33924 or Z ≥ 3.33924) = 2*(0.00042) = 0.000084
Conclude: Same conclusion as for the chi-square test of homogeneity above.
Note that the P-value from the two-sample z test is the same as the P-value from the
chi-square test.
(d) Show that the square of the calculated z-statistic from the two-sample z test is equal
to the calculated chi-square statistic from the test of homogeneity.
z 2 = (-3.339) 2 = 11.15 = c 2
Casio calculator syntax for two-sample z test: Go to module 2 (Statistics) à TEST (F3)
à Z (F1) à 2-PROP (F4). Calculator input:
p1 ≠ p 2
x1 = 36 x2 = 57
n1 = 122 n2 = 112
Press EXE, which returns
z= -3.333924
p= 0.000084
When should you use a chi-square test and when should you use a two-sample z test?
23
Here are some things to keep in mind:
• The chi-square test is always two-sided. That is, it only tests for a difference in
the two proportions. If you want to test whether one proportion is larger than the
other (a one-sided test), use the two-sample z test.
• If you want to estimate the difference between two proportions, use a two-sample
z interval. There are no confidence intervals that correspond to chi-square tests.
• If you are comparing more than two treatments or the response variable has
more than two categories, you must use a chi-square test.
• You can also use a chi-square goodness-of-fit test in place of a one-sample z
test for a proportion if the alternative hypothesis is two-sided. The chi-square test
will use two categories (success and failure) and have df = 2 – 1 = 1.
24