0% found this document useful (0 votes)
108 views39 pages

Chapter 4 - Categorical Data - 2019

This document discusses using a binomial test to determine if a person can distinguish between Pepsi and Coke better than chance. The author conducted an experiment where he had his ex-girlfriend taste 10 drinks, 5 Pepsi and 5 Coke, without telling her the contents. She correctly identified 7 of the 10 drinks. A binomial test was used to determine if identifying 7 of 10 drinks correctly is statistically significantly better than the chance accuracy of 5 out of 10 drinks. The null hypothesis is that she cannot distinguish Pepsi and Coke systematically, while the alternative hypothesis is that she can distinguish them systematically. The binomial test estimates the probability of getting 7 or more correct answers out of 10 if the true identification probability was only 50% by chance.

Uploaded by

Ti SI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views39 pages

Chapter 4 - Categorical Data - 2019

This document discusses using a binomial test to determine if a person can distinguish between Pepsi and Coke better than chance. The author conducted an experiment where he had his ex-girlfriend taste 10 drinks, 5 Pepsi and 5 Coke, without telling her the contents. She correctly identified 7 of the 10 drinks. A binomial test was used to determine if identifying 7 of 10 drinks correctly is statistically significantly better than the chance accuracy of 5 out of 10 drinks. The null hypothesis is that she cannot distinguish Pepsi and Coke systematically, while the alternative hypothesis is that she can distinguish them systematically. The binomial test estimates the probability of getting 7 or more correct answers out of 10 if the true identification probability was only 50% by chance.

Uploaded by

Ti SI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Gignac, G. E. (2019). How2statsbook (Online Edition 1).

Perth, Australia:
Author.

4
Categorical Data
Contents
Binomial Test ................................................................................................................................. 2
Binomial Test Hypotheses ......................................................................................................... 3
Binomial Test Confidence Intervals: SPSS ................................................................................. 4
Pearson Chi-Square: Test of Two Independent Proportions/Frequencies ................................... 5
Pearson Chi-Square: Two Independent Proportions - SPSS ...................................................... 6
Binomial Test Versus Pearson Chi-Square ................................................................................ 7
2x2 Pearson Chi-Square: Test of Association ................................................................................ 8
2x2 Pearson Chi-Square: SPSS ................................................................................................... 9
2x2 Pearson Chi-Square: Measure of Association (phi) .......................................................... 10
Assumptions: Pearson Chi-Square .............................................................................................. 11
2x2 Pearson Chi-Square: Effect Size Guidelines ...................................................................... 11
Yates Continuity Correction .................................................................................................... 12
Pearson Chi-Square: r x c Contingency Table Analysis ................................................................ 12
Pearson Chi-Square: r x c Contingency Table Analysis: SPSS .................................................. 14
Measure of Effect Size: Cramer’s V ......................................................................................... 15
r x c Contingency Table Analysis: Follow-Up Analyses ................................................................ 15
Adjusted Standardized Residual Analysis................................................................................ 16
Adjusted Standardized Residual Analysis: SPSS ...................................................................... 16
Adjusted Standardized Residual Analysis: Bonferroni Correction .......................................... 17
Adjusted Standardized Residual Analysis: Bonferroni Correction - SPSS................................ 18
What is the Minimum Cell Frequency Required? ....................................................................... 19
Differences Between Proportions/Percentages: Within-Subjects Designs ................................ 19
McNemar Chi-Square .................................................................................................................. 19
McNemar’s Chi-Square: SPSS .................................................................................................. 21
Advanced Topics ......................................................................................................................... 23
Why do Researchers Use p < .05? ........................................................................................... 23
Is phi Just a Pearson Correlation (r)? Yes. ............................................................................... 24
Adjusted phi ............................................................................................................................ 25
Is a One-Tailed Pearson Chi-square Analysis Possible? .......................................................... 25
Odds/Ratios ............................................................................................................................. 26
Odds/Ratios: SPSS ................................................................................................................... 27
Relative Risk ............................................................................................................................ 28
Relative Risk: SPSS ................................................................................................................... 29
How to Interpret Odds Ratios and Risk Ratios Less than 1 ..................................................... 31
2 x 2 Pearson Chi-Square: Interactions? ................................................................................. 31
Partitioning .............................................................................................................................. 32
CHAPTER 4: CATEGORICAL DATA

Partitioning: SPSS .................................................................................................................... 32


Pearson’s Chi-Square Versus McNemar Chi-Square ............................................................... 33
Mid-p McNemar Test .............................................................................................................. 34
Dealing with Constant Values ................................................................................................. 34
Practice Questions ...................................................................................................................... 35
Advanced Practice Questions...................................................................................................... 37

Binomial Test
I once had a girlfriend who insisted that she could tell the difference between Pepsi
and Coke. I had my doubts that anyone could do this, so I decided to put her to the test
statistically (as you do). First, I blind folded her, so that it would be a blind taste test. I then
filled five cups with Pepsi and five cups with Coke (but I didn’t tell her that I was going to fill
the cups 50/50; it could have been any fraction, from her perspective). I placed them onto a
table in front of her. I had her taste the contents of each of the ten cups. After each cup
tasting, she told me whether she thought the drink was Pepsi or Coke. I wrote down her
responses and scored each response as a 1 for a correct identification and a 0 for an incorrect
identification.
What is important to keep in mind, here, is that anyone would be expected to achieve
50% accuracy on this test, because there are only two possible answers for each of the ten
taste tests. Thus, anyone would be able to get 5 out of 10 taste tests correct, just by guessing.
My ex-girlfriend managed to identify correctly the contents of 7 out of the 10 cups. The key
statistical question is whether 7 out 10 is beyond the chance observation of 5 out of 10? That
is, my ex-girlfriend could have just been guessing, and, just by chance, managed to identify the
contents of 7 out of the 10 cups, correctly. This is precisely the type of question that can be
answered by a statistical test. Is the observation of 7 out of 10 beyond the expected chance of
5 out of 10? Before I report the results of the statistical analysis I performed in this case, I
would like to explain what “beyond chance” means in the context of statistics.
People often fool themselves into believing something systematic has happened,
when, in fact, it was really just a chance event (e.g., Croson & Sundali, 2005; Tversky, &
Kahneman, 1971). Consequently, statisticians want to protect themselves against concluding,
incorrectly, that something happened simply by chance. In practical terms, “beyond chance”
means that there is, at most, a 5% chance that one has fooled him or herself into concluding
an event has occurred beyond chance, when, in fact, it has occurred simply by chance.
Theoretically, nothing is actually beyond chance, if you take a probabilistic perspective on life
(which statisticians do). However, for better and for worse, statisticians have adopted a chance
event of 5% or less as sufficiently unlikely to merit the label of “beyond chance”. To return to
my ex-girlfriend’s apparent ability to detect the difference between Pepsi and Coke, I needed
to estimate the chances of 7 correct guesses out of 10, under the expectation that as good as 5
C4.2
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

correct guesses out of 10 trials could happen simply by guessing randomly. To estimate the
chances, in this case, one could conduct a statistical analysis known as the binomial test. I did
just that with my ex-girlfriend’s data. If the chances of having observed 7 out of 10 correct
taste tests was less 5%, I would have concluded that my ex-girlfriend had the ability to
distinguish the taste of Pepsi and Coke “beyond chance”.

Binomial Test Hypotheses


In this example, there are only two possible outcomes, thus, random guessing would
be able to achieve 50% accuracy on the Pepsi and Coke distinction test. Thus, to repeated, my
ex-girlfriend had to beat 50% accuracy at a level that is beyond chance, p < .05.

Null hypothesis (H0): My ex-girlfriend does not have the ability to detect Pepsi and Coke
systematically (identification probability = .50)

Alternative hypothesis (H1): My ex-girlfriend does have the ability to detect Pepsi and Coke
systematically (identification probability ≠ .50)

To conduct a binomial test “by hand” is a little more complicated than one might
think. Fortunately, most statistical programs offer the option to conduct a binomial test on
data.

Binomial Test: SPSS


To conduct a binomial test in SPSS, you can use the ‘Nonparametric Tests’ utility in the
menu options (Watch Video 4.1: Binomial Test in SPSS). Most programs test the proportion of
correct responses observed in the data against .50 (i.e., the null hypothesis) by default. That is,
the null hypothesis is consistent with half the observations in each of the two categories
(within sampling fluctuations). In this case, the null is correctly represented by .50. However, in
some cases, you may wish to change the null hypothesis to a value other than .50.
The results associated the binomial test analysis are reported in the SPSS table entitled
‘Binomial Test’:

C4.3
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

The key value to examine in this table is in the last column: .344. The .344 value
corresponds to the probability of fooling oneself into thinking that my ex-girlfriend had the
ability to distinguish between Pepsi and Coke in a taste test based on 10 tastings. Because .344
is greater than .05 (i.e., 5%), I concluded that the observed correct identification proportion of
.70 was not beyond the .50 expectation under the null hypothesis, i.e., correct identifications
that would be expected purely by guessing. Thus, my ex-girlfriend failed to demonstrate a
statistically significant ability at detecting the difference between Pepsi and Coke.1

Binomial Test Confidence Intervals: SPSS


In the current example, my ex-girlfriend achieved a 70% success rate at distinguishing
Coke from Pepsi, which was not statistically significantly different from the null hypothesis of
50%. However, it would be useful to know the confidence interval associated with the point-
estimate of 70%. The binomial test p-value of .344 implies that the lower-bound 95%
confidence interval associated with the .70 point-estimate will be less than .50. That is, it will
intersect with .50.
Unfortunately, SPSS does not offer an especially attractive method to estimate
confidence intervals associated with a binomial analysis.2 Fortunately, however, there are
some alternatives. First, Bruce Weaver (Lakehead University) created a SPSS syntax file to
estimate confidence intervals for a binomial proportion/percentage across five different
methods: (1) Clopper-Pearson, (2) Wald, (3) Wald adjusted for continuity, (4) Wilson score, and
(5) Jeffrey’s. All things considered, Wilson’s score method is perhaps the most attractive
option in most cases (Newcombe, 1988). The Clopper-Pearson method should be preferred
when conservatism is a priority (Watch Video 4.2: Binomial Confidence Intervals of
Proportions in SPSS).
As can be seen in the SPSS table entitled ‘Confidence Intervals for Binomial
Proportions’, all of the lower-bound estimates were less than .50. For example, the Wilson
score lower-bound was estimated at .40 and the upper-bound was estimated at .89. Thus, if
the experiment were conducted many times over with 10 trials, one would expect my ex-
girlfriend’s success rate to be somewhere between .40 and .89 in 95% of the experiments. The
fact that the lower-bound proportion intersected with the null hypothesis proportion of .50
corroborates the p-value (i.e., .344) obtained from the binomial 2-tailed test reported in the
previous section of this chapter.

1
Fortunately for me, she was totally ignorant of the concept of statistical power and the
matter was left at that. To learn about the importance of statistical power, check-out the
chapter on the difference between two means. Also, check out Practice Question 1 of this
chapter.
2
Long story short, I’m not fond of the Monte Carlo utility in SPSS for such purposes, which
appears to be available within the binomial analysis menu option.
C4.4
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Pearson Chi-Square: Test of Two Independent Proportions/Frequencies


Although the binomial test is very useful and powerful at detecting the difference
between two independent proportions (or frequencies), it is a limited approach to data
analysis, as it cannot be extended to more complicated designs. In contrast to the binomial
test, the Pearson chi-square analysis is a family of statistical tests which can test the difference
between a variety of different types of hypotheses relevant to the differences between
independent proportions (or frequencies).
In its simplest form, the Pearson chi-square analysis can test the difference between
two independent proportions. Thus, in a sense, it may be viewed as an alternative to the
binomial test. The fundamental characteristic associated with a Pearson chi-square analysis is
that it makes a comparison between the observed frequencies and the expected frequencies
under the null hypothesis (i.e., no difference between observed and expected frequencies). If
the observed frequencies are sufficiently different to the expected frequencies, then the null
hypothesis of no difference can be rejected (p < .05).
Performing the calculations involved with a Pearson chi-square analysis relevant to the
difference between two proportions is simple. Although this book does not emphasize the
computation of statistics via formulae, I believe useful insights can be gained by examining the
more simple statistical formulae and procedures. The Pearson chi-square statistic may be
represented with the following formula (Watch Video 4.3: Pearson Chi-Square Formula -
Explained):
( fO  f E )2 (1)
2  
fE
where fO = observed frequencies and fE = expected frequencies. The difference between the
observed and expected frequencies need to be summed (Σ), because they need to be
calculated across both possible observations (e.g., hit or miss, heads or tails, predicted or non-
predicted). In the Pepsi versus Coke example, the expected frequencies correspond to the
number of expected Pepsi and Coke guesses under the null hypothesis. Thus, as there were 10
trials of the taste test, one would expect 5 Pepsi and 5 Coke “correct” guesses under the null
hypothesis, because there is 50/50 chance of guessing correctly in the absence of any ability.
My ex-girlfriend guessed 7 correctly and 3 incorrectly. Thus, the observed frequencies were 7
and 3. In order to estimate the chi-square value, the following calculations were performed
(Watch Video 4.4: Pearson Chi-Square Calculations – Step-by-Step):
C4.5
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

(3  5) 2 (7  5) 2
2  
5 5
4 4
2    0.8  0.8  1.60
5 5
The solved Pearson chi-square formula yielded a chi-square value of 1.60. The reason
the Pearson chi-square test is known as a chi-square test is because it was demonstrated by
Karl Pearson that the values calculated from the formula above follow the theoretical chi-
square distribution (Watch Video 4.5: The Chi-Square Distribution - Explained). The chi-square
distribution is similar in nature to the z-distribution and the t-distribution. Theoretically, the
chi-square distribution represents the sampling variability of various statistical values under
the null hypothesis (i.e., chance variation). For example, values obtained from the Pearson chi-
square formula (1) are known to follow the chi-square distribution, when the data are
consistent with the absence of a difference in proportions in the population. Consequently,
the calculated Pearson chi-square value of 1.60 above can be placed within the chi-square
distribution to determine how unlikely it is, under the expectation that the null hypothesis is
true (i.e., that there is no difference in the proportions).
Just like the on-sample t-test, a Pearson chi-square analysis required the identification
of degrees of freedom. The Pearson chi-square test with only two possible outcomes (e.g., hit
or miss, heads or tails, predicted or non-predicted) is associated with one degree of freedom.
Based on the theoretical chi-square distribution with one degree of freedom, it is known that a
chi-square value of 3.841 corresponds to the 95th percentile (i.e., p = .05). Thus, a calculated
Pearson chi-square value greater than 3.841 would imply a sufficiently unlikely event as to
suggest that it occurred beyond chance (i.e., p < .05). As the calculated Pearson chi-square
value of 1.60 was smaller than 3.841, the alternative hypothesis that my ex-girlfriend had the
ability to discriminate between Pepsi and Coke was not supported. To calculate the precise
probability associated with a Pearson chi-square value of 1.60 and 1 degree of freedom, the
analysis could be performed in SPSS, which I do next.

Pearson Chi-Square: Two Independent Proportions - SPSS


In order to calculate a Pearson chi-square analysis in SPSS, you can use the
Nonparametrics menu option (Data File: how_to_get_dumped) (Watch Video 4.6: Test the
difference between two proportions in SPSS). The first table outputted by SPSS is entitled
‘correct,’ which is the name of the variable in the SPSS data file. The observed frequencies
(‘Observed N’) corresponded to 3 incorrect responses (‘0’) and 7 correct responses (‘1’). The
expected frequencies (‘Expected N’) corresponded to 5.0 for both the incorrect and correct
responses. Basic arithmetic yields residuals of -2.0 and 2.0, respectively.

C4.6
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Next, a SPSS table entitled ‘Test Statistics’ includes the key statistical result, the chi-square
value. In this example, the chi-square was calculated at 1.60. With 1 degree of freedom, the p-
value was estimated at .206. Thus, as p = .206 is greater than p = .05, the null hypothesis of no
difference between the proportion of trials guessed correctly by my ex-girlfriend (.70) and the
null hypothesis expectation (.50) was not rejected (p > .05, or p = .206 more precisely). Stated
alternatively, there was an absence of statistical evidence to suggest that my ex-girlfriend had
the ability to distinguish between Pepsi and Coke.

Binomial Test Versus Pearson Chi-Square


A final note about the utility of the binomial test. The binomial test is uniquely useful
in the case of testing the probability of a series of events with no variation. That is, the option
to test the chances of observing 5 heads in a row, with no tails, can only be tested with the
binomial test. By contrast, the Pearson chi-square test requires at least some minimal level of
variability in the data.3 So, why ever use the Pearson chi-square test, if it has the limitation of
the requirement of some variability? As suggested earlier in the chapter, the Pearson chi-
square test has been expanded to accommodate more complicated questions than what can

3
In the Advanced Topics section of the chapter, I describe a syntax-based method that can be used in
SPSS, when one or more variables are constants in the context of the Pearson chi-square analysis and
the McNemar chi-square analysis.

C4.7
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

be dealt with by the binomial test. Consequently, despite the requirement for at least some
variation in the data (almost always satisfied with real data), the Pearson chi-square statistic is
much more commonly used than the binomial test, even in the context of testing the
difference between two independent proportions. I suspect another reason the Pearson chi-
square test is more popular than the binomial test is because the Pearson chi-square test is
more powerful (i.e., greater chance to reject the null hypothesis). You may have noted that the
p-value associated with binomial test was p = .344, whereas the p-value associated with the
Pearson chi-square test was p = .206, because p = .206 was closer to .05 than .344, it suggests
that Pearson chis-square analysis was more likely to detect a ‘statistically significant’ effect.

2x2 Pearson Chi-Square: Test of Association


The Pearson chi-square analysis described above was applied to the simplest possible
hypothesis: test of the difference between two proportions. The Pearson chi-square analysis
can be extended to hypotheses which included two independent variables measured on a
dichotomous scale. Such an application of the Pearson chi-square formula is known as a 2x2
Pearson chi-square analysis.
Approximately 10% of the population is considered left-handed. Geschwind and Behan
(1982) observed in their clinical practice that there appeared to be a disproportionately large
percentage of dyslexics who were left-handed. Geschwind and Behan (1982) developed a
theory to account for this clinical observation. I have simulated some data to approximate
roughly the results reported by Geschwind and Behan (1982).4
The simulated sample consisted of 250 individuals. A total of 8.8% of the sample was
identified as left-handed. Furthermore, a total of 4.0% of the sample was identified as dyslexic.
Neither of these percentages are interesting, examined individually, with respect to the
Geschwind hypothesis. However, they are useful to examine, as they correspond
approximately to the percentages reported in previous investigations.
The crucial question is whether there was a disproportionate percentage of dyslexics
who were left-handed. The 2x2 Pearson chi-square is based on the same formula as that used
for the previous Pearson chi-square analysis which tested the difference between observed
and expected frequencies relevant to my ex-girlfriend’s capacity to distinguish Pepsi from
Coke.
( fO  f E )2 (1)
2  
fE

4
Geschwind and Behan (1982) used an extreme groups approach to their data collection
procedure which was not replicated here in the simulated data.
C4.8
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

However, in this case, the method used to calculate the expected frequencies is a little more
complicated, because two variables (handedness and dyslexia) need to be considered
simultaneously (Watch Video 4.7: 2x2 Pearson Chi-Square Calculations – Step-by-Step).
Conceptually, the main thing you need to know about expected and observed frequencies is
that the larger the discrepancy between the two types of frequencies, the greater chance the
null hypothesis of no association between the two independent variables will be rejected, all
other things equal.
By the way, the degrees of freedom associated with any Pearson 2 contingency table
analysis are equal to (r – 1)(c - 1), where r is equal to the number of rows, and c is equal to the
number of columns. Thus, in this 2 * 2 contingency table analysis, (r - 1)(c - 1) = (2 - 1) (2 - 1),
which is equal to 1. In the previous Pearson chi-square analysis based on the Pepsi vs. Coke
taste testing data, the degrees of freedom were also equal to (r - 1 = 2- 1 = 1).
Hypotheses

Null Hypothesis (H0): The percentage of dyslexics across left-handers and right-handers will be
equal.

Alternative Hypothesis (H1): The percentage of dyslexics across left-handers and right-handers
will be unequal.

2x2 Pearson Chi-Square: SPSS


In order to perform a Pearson chi-square analysis in SPSS, you can use the ‘Crosstabs’
utility (Data File: handedness_dyslexia) (Watch Video 4.8: 2x2 Pearson Chi-Square in SPSS).
The first table outputted by SPSS is entitled ‘Case Processing Summary’ which simply includes
information relevant to the sample size and missing values. The next SPSS table is the
‘Crosstabulation’ table, which includes the observed frequencies, the expected frequencies
and the observed percentages.

C4.9
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

As can be seen in the table entitled ‘handedness * dyslexia Crosstabulation’, the


percentage of right-handers who were dyslexic was 2.2%. By contrast, the percentage of left-
handers who were dyslexic was 22.7%. If the null hypothesis were true, the two percentages
would be equal to each other, at least within sampling fluctuations. The numerical difference
between two percentages is equal to 20.5%. The numerical difference of 20.5% needs to be
tested for statistical significance. The result of such a statistical significance test is reported in
the SPSS table entitled ‘Chi-square Tests.

As can be seen in the SPSS table entitled ‘Chi-Square Tests’, the Pearson chi-square
value was estimated at 22.03, which was statistically significant, p < .001. Thus, a
disproportionately large percentage of dyslexics were left-handed. Stated alternatively, the
difference in the percentages (2.2% versus 22.7%) was statistically significant, p < .001.

2x2 Pearson Chi-Square: Measure of Association (phi)


In addition to differences between percentages, another way to think about the 2x2
Pearson chi-square analysis is that it is a measure of association. Specifically, the 2x2 Pearson
chi-square value can be converted into a measure of standardized association known as phi
(pronounced ‘feye’). Phi is known as standardized, because it can range in magnitude from .00
to 1.0. A value closer to 1.0 indicates a larger association between the two variables. A
commonly used formula for the estimation of phi is:
2 (2)
phi 
N 1

Thus, in the handedness and dyslexia example, phi worked out to .297.
22.03
phi   .297
250  1

C4.10
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

I discuss measures of standardized association in detail in chapter 5. I note here only briefly
that phi and Pearson correlation are identical (also, see Advanced Topics of this chapter).

Assumptions: Pearson Chi-Square


A Pearson chi-square analysis has some assumptions that need to be met, in order for
the results to be fully accurate. Unlike most other statistics, there are relatively few
assumptions associated with the Pearson chi-square analysis. In fact, there are only two
assumptions.

1: Random Sampling
All cases in the population must have an equal chance of being selected into the
sample. In practice, it is entirely unrealistic to run a study such that every person (or case) in
the population to which you wish to infer your results has an equal chance of being selected
into your sample. Consequently, in practice, this first assumption is virtually always violated.
The extent to which the violation of this assumption affects the accuracy of statistical results is
anyone’s guess. Ultimately, there is not much that can be done about it. The show must go on,
as they say.
In the handedness and dyslexia study, many of the participants were recruited from
the health centers that Dr. Geschwind worked at in Glasgow, Scotalnd. Thus, not everyone in
the adult population had an equal chance to be selected into the sample. Instead, the sample
used in the Geschwind and Behan (1982) study would be considered a convenience sample.
It’s impossible to know the extent to which this may have compromised the accuracy of the p-
value obtained from the Pearson chi-square analysis.

2: Independence of Observations
Independence of observations implies that the participants in the investigation have
not influenced each other with respect to the variables of interest. Typically, it is easy to satisfy
this assumption, if the participants complete the tasks, tests or questionnaires independently.
In the handedness and dyslexia study, it is very likely that this assumption was
satisfied. That is, all of the participants were tested individually. Furthermore, they were not
related to each other (not brothers/sisters, for example).

2x2 Pearson Chi-Square: Effect Size Guidelines


Cohen (1988; 1992) provided guidelines for interpreting the magnitude of the
difference between two percentages. Cohen’s guidelines were: small = 10%; medium = 24%;
and large = 38%. In the handedness and dyslexia example, the difference in the percentages
amounted to 20.5% (i.e., 22.7 – 2.2), which would be considered an approximately medium
sized effect. Cohen (1988; 1992) also provided guidelines for interpreting the magnitude of a
C4.11
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

correlation. However, I believe these guidelines would be inappropriately applied to the


context of the phi coefficient.

Yates Continuity Correction


It should be noted that the ‘Chi-Square Tests’ SPSS table included a row of results
labelled ‘Continuity Correction’. The Pearson chi-square value in that row was 17.01, which
was smaller than the Pearson chi-square value in the first row of the table (i.e., 22.03). Yates'
correction is an adjustment made to chi-square values obtained from 2 x 2 contingency table
analyses. More fully, it is known as the ‘Yates' correction for continuity’ and was first proposed
by Yates (1934). The logic of Yates' correction rests upon the fact that contingency table
analyses are based on dichotomous data. However, the statistical chi-square distribution is
continuous (rather than dichotomous). Consequently, an adjustment should be applied to
contingency table analyses, so as to obtain more accurate results. The correction consists of
subtracting .5 from each absolute difference between the observed and expected cell
frequencies.
The use of Yates’ correction is controversial. Despite the fact that the correction is
commonly observed in the literature, there is an appreciable amount of Monte-Carlo
simulation research which suggests that the Yates' correction is overly conservative, even in
small sample sizes, which suggests that it may not be necessary in practice (Camilli & Hopkins,
1978, 1979; Feinberg, 1980; Larntz, 1978; Thompson, 1988). One could also use the work of
Conover (1974) against the routine use of Yates’ correction, as the hypothesis tested by the
Yates’ corrected Pearson chi-square analysis may not be appropriate, if one or both of the
marginal totals are random, which is likely the case in most research scenarios. Finally,
Campbell (2007) found that the ‘regular’ Pearson chi-square value multiplied by (N-1)/N works
well for contingency table analyses with expected cell frequencies as low as 1. Unfortunately,
I’m not familiar with any statistical programs that apply the N-1/N adjusted version of the
Pearson chi-square analysis. However, one could simply multiply the regular Pearson chi-
square value by (N-1)/N and then identify the p-value associated with the adjusted Pearson
chi-square value via the chi-square distribution with df = 1. In the relatively rare case where
one or more of the expected cell frequencies are less than 1, it is recommend to default to
Fisher’s exact test (Campbell, 2007). For contingency tables larger than 2x2, the problem
identified by Yates’ does not apply under any circumstances (Watch Video 4.9: Fixed versus
Free Marginal Totals - Explained).

Pearson Chi-Square: r x c Contingency Table Analysis


One of the most valued characteristics associated with a website experience, from the
point of view of companies like Google, is whether a person clicks on an ad or not.
Consequently, companies like Google conduct experiments to determine which types of ads
C4.12
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

yield the highest percentage of clicks on ads. Tech companies often refer to these type of
studies as A/B testing. In practice, the studies involve the presentation two or more slightly
different experiences on a particular webpage. Which version of the webpage that is displayed
is chosen at random when the webpage visitors’ browser loads the webpage’s content. Then,
they conduct statistical analyses to determine whether there was a statistically significant
difference between the percentages of clicks on ads across the different version of the
webpage. Although I would not expect Google to ever publish the results associated with their
particular studies, I have simulated some data to replicate my impressions of what they might
be expected to observe for a particular type of study. The study I have in mind consists of the
type of banner ad that appears on the right side of the screen on a Google dedicated webpage
(e.g., Google Finance). Should the ad be selected based on: (1) ads that are gaining popularity
generally across the internet; (2) based on the last item the user purchased online; (2) or
based on the last few search terms the user inputted into Google’s search engine. Thus, there
are three groups of ad selections: (1) ‘trending’, (2) ‘purchased’, and (3) ‘search’. The
dependent variable is the percentage of ads that are clicked on, known as the click-through-
rate (CTR).
Because there are three groups in this analysis, a 2x2 Pearson chi-square analysis
could not be applied. Instead, a larger Pearson chi-square analysis needs to be selected. As a
general term, all Pearson-chi-square analyses relevant to the test of percentages are known as
contingency table analyses. A contingency table can include two variables with any a number
of levels. They are known as r by c contingency tables, where r corresponds to the number of
cells (or percentages) in the rows, and c corresponds to the number of cells (or percentages) in
the columns. The 2 x 2 Pearson chi-square analysis is by far the most commonly observed
Pearson chi-square analysis in the literature. However, it is limited to two variables with a
maximum of only two levels (e.g., left/right, agree/disagree, correct/incorrect). Larger
contingency table analyses are known as ‘omnibus’ tests, because they include more than just
one statistical comparison, simultaneously.
In the current example, there are three possible comparisons: (1) trending versus
purchased; (2) trending versus search; and (3) purchased versus search. A Pearson chi-square
analysis based on such data would be referred to as a 3x2 Pearson chi-square analysis5,
because there are three levels in the grouping variable (popular, purchased, and search) and
two levels in the dependent variable (clicked versus not-clicked) (Data File: ad_types)

Hypotheses

Null Hypothesis (H0): The CTR (percentage) will be equal across all three ad selection options.

5
It is essentially arbitrary whether you refer to the analysis as a 2x3 or a 3x2 Pearson chi-
square analysis.
C4.13
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Alternative Hypothesis (H1): The CTR (percentage) will not be equal across all three ad
selection options

Pearson Chi-Square: r x c Contingency Table Analysis: SPSS


In order to calculate a Pearson chi-square analysis in SPSS, you can use the ‘Crosstabs’
utility in SPSS (Watch Video 4.10: r x c Pearson Chi-square SPSS). The first table outputted by
SPSS is entitled ‘Case Processing Summary’ which simply includes information relevant to the
sample size and missing values. It will be noted that the sample size was very large at 30,000.
As can be seen in the SPSS table entitled ‘ad_type * click Crosstabulation’, the Google Finance
webpage was viewed 10,000 times across the three ad types. It can also be seen that the CTR
percentage associated with the trending ads, the purchase ads, and the search ads were 1.9%,
2.6% and 2.1%, respectively. The key question with respect to the Pearson chi-square analysis
is whether these three percentages were statistically significantly different from each other.

As can be seen in the SPSS table entitled ‘Chi-Square Tests’, the null hypothesis of equal
percentages was rejected, 2 = 11.98, p = .003. However, because a Pearson chi-square
analysis larger than a 2x2 is an omnibus test, it is not known precisely which percentages were

C4.14
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

statistically significantly different from the null hypothesis. It is only known that at least one
percentage is statistically significantly different.

In order to uncover the nature of the statistically significant effect in more detail, further
analyses need to be performed. First, however, I cover the topic of effect size. Then, I discuss
follow-up analyses.

Measure of Effect Size: Cramer’s V


For contingency tables larger than 2x2, the phi coefficient cannot be used as an
interpretable estimate of effect size. Instead, researchers tend to use Cramer’s V. Cramer’s V is
a relatively unattractive estimate of effect size in its own right, as it is not interpretable as a
measure of association. However, to my knowledge, there are no especially superior
alternatives, in this context. Consequently, researchers tend to report Cramer’s V in the r x c
context. As can be seen in the SPSS table entitled ‘Symmetric Measures’, Cramer’s V was
estimated at .020. Cramer’s V ranges from .00 to 1.0. Larger values are indicative of a larger
effect.

It will be noted that phi was also estimated at .020. The nature of the Cramer’s V formula is
such that when either the r or the c portion of the contingency table is associated with only
two levels, the phi coefficient and the Cramer’s V coefficient will equal each other. However, I
do not believe that that such an observation renders a Cramer’s V coefficient an interpretable
measure of association in the 2x3 contingency table case.

r x c Contingency Table Analysis: Follow-Up Analyses


The challenge with omnibus Pearson chi-square analyses is that when a statistically
significant effect is observed, it must then be decomposed somehow, in order to understand
C4.15
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

which percentages contributed to the statistically significant effect. In particular, most


researchers simply eye-ball the percentages to get an impression of where the statistically
significant effect(s) have occurred. Fortunately, there are more sophisticated methods. Sharpe
(2015) described four methods to decompose a statistically significant omnibus Pearson chi-
square analysis. In this section of the chapter I present my preferred method known as the
adjusted standardized residual analysis. In the Advanced Topics portion of this chapter, I
discuss another method known as partitioning.

Adjusted Standardized Residual Analysis


Perhaps the most attractive approach to the evaluation of a statistically significant
omnibus Pearson chi-square analysis is to conduct an adjusted standardized residual analysis
(McDonald & Gardner, 2000). Adjusted standardized residuals greater than |1.96| are
indicative of a statistically significant deviation between the expected and observed cell
frequencies, and, by consequence, the expected and observe cell proportion/percentage.

Adjusted Standardized Residual Analysis: SPSS


In order to conduct an adjusted standardized residual analysis in SPSS, you need to
request them within the ‘Cells’ utility (Watch Video 4.11: Pearson Chi-Square Residual Analysis
in SPSS). As can be seen in the SPSS table entitled ‘ad_type * click Crosstabulation’ an
additional row of results labeled ‘Adjusted Residual’ has been added to the table. Focus should
be placed along the ‘Yes’ column. It can be observed that the adjusted standardized residual
associated with the trending ads was estimated at -2.53. A negative residual, in this context,
implies that the observed percentage was smaller than expected, based on the null
hypothesis. Because the value of -2.53 is larger than |1.96|, it may be suggested that the
trending ad type CTR percentage of 1.9% was statistically significantly smaller than expected
under the null hypothesis. By contrast, the purchase ad type was associated with an adjusted
standardized residual of 3.31, which suggests that the CTR percentage of 2.6% was statistically
significantly larger than expected under the null hypothesis. Finally, the search ad type
percentage of 2.1% was not found to be statistically significantly different than expected under
the null hypothesis, as the adjusted standardized residual was less than |1.96| (i.e., -.80). In
summary, the statistically significant Pearson chi-square (χ2 = 11.976, p = .003) was due largely
to the purchase ad types deviation between the expected frequency (percentage) and the
observed frequency (percentage). Thus, webpage visitors clicked on more ads relevant to their
latest online purchase history than was expected based on the null hypothesis of no difference
between the three ad types.

C4.16
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Adjusted Standardized Residual Analysis: Bonferroni Correction


The topic of familywise error rate is treated in detail in the chapter relevant to the
analysis of variance, because it is a relatively advanced topic. However, for those who wish to
deal with the problem of decomposing a statistically significant omnibus Pearson chi-square
analysis in a sophisticated manner, the issue of familywise error needs to be considered. I will
state here only briefly that as the number of statistical analyses on the same sample of data
increases, the chances of concluding erroneously that a statistically significant effect has been
observed also increases. The chances of concluding erroneously that a statistically significant
effect has been observed across a series of statistical analyses on the same sample of data is
known as the familywise error rate. Researchers are advised to keep the familywise error rate
at .05, which is the same as the alpha level for any particular statistical analysis. Alpha (α)
represents a specified probability. Specifically, it is the maximum probability deemed
acceptable with respect to making a wrong decision to reject the null hypothesis. By
convention, researchers and statisticians use .05 (or 5%) as the maximum acceptable
probability of making a wrong decision to reject the null hypothesis. The familywise error rate
represents the chances of committing one or more type I errors across a collection of
statistical analyses performed on the same sample of data.
Researchers have been encouraged to avoid increasing the familywise error rate
beyond .05. To do so, they need to protect themselves somehow. In the context of the google
example, a total of four statistical analyses were evaluated: the omnibus Pearson chi-square

C4.17
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

analysis, and the three adjusted standardized residuals. In practice, researchers typically
disregard the potential impact of the omnibus statistical analysis with respect to the impact on
the familywise error rate. Instead, they focus upon the number of follow-up analyses used to
help uncover the nature of the effect. In the ad type example, there were three adjusted
standardized residuals that were consulted and evaluated, where a z-value greater than |1.96|
was indicative of a p-value less than .05 (i.e., statistically significant).

Adjusted Standardized Residual Analysis: Bonferroni Correction - SPSS


In order to conduct a Bonferroni correction to an adjusted standardized residual
analysis in SPSS, the z-values of interest need to be placed into a column of data into SPSS.
Then, the z-values need to be converted into chi-square values. Then, p-values need to be
estimated for each of the chi-square values. Finally, the p-values need to be multiplied by the
number of adjusted standardized residuals that were consulted in the crosstabs table (Watch
Video 4.12: Bonferroni Chi-Square Residual Analysis in SPSS).
In order to protect the overall analysis from being associated with a familywise error
rate greater than .05, the p-values associated with the adjusted standardized residuals need to
be corrected with something known as a Bonferroni correction. In this case, the correction
involved multiplying the p-values by the number of p-values that were consulted (i.e., 3). As
can be seen in Excel image below, the trending and the purchase CTR percentages remained
statistically significantly different from the expected percentages under the null hypothesis (p
= .034 and p = .003, respectively). Because the z-value was negative for the trending category,
the CTR percentage was lower than expected. By contrast, the purchase CTR percentage was
higher than expected.
Strictly speaking, if the adjusted standardized residual approach with a Bonferroni
correction were applied, it would not be necessary to first conduct and evaluate the omnibus
Pearson chi-square analysis. Stated alternatively, the Bonferroni adjusted standardized
residual procedure does not require a statistically significant omnibus Pearson chi-square
statistic to be observed first. Instead, you can go straight into the adjusted standardized
residual analysis. I’d recommend you do so, given there is little to nothing to be gained by
examining Cramer’s V.

Additionally, I’ll note that I only applied the Bonferroni correction in the current
example under the pretense that three comparisons were made, even though the 2x3
contingency table included a total of six adjusted standardized residuals. In my opinion, the six
C4.18
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

standardized residuals are not independent from each other. In fact, the left and right side of
the contingency table are a mirror image of each other. Therefore, it would be inappropriate
to adjust the p-value, based on six analyses. There were really only three analyses.

What is the Minimum Cell Frequency Required?


In order to obtain accurate Pearson chi-square results, Fisher (1925) suggested that no
cell within a contingency table should be associated with expected cell frequencies less than 5.
Fisher’s expected frequency rule has become very well-known and commonly recommended.
However, the simulation research suggests that Fisher’s rule is too conservative. For example,
Camilli and Hopkins (1979) found that contingency table analyses with expected cell
frequencies of between 1 and 2 were accurate, so long as the overall sample sizes was greater
than 10.

Differences Between Proportions/Percentages: Within-Subjects Designs


The Pearson chi-square analysis is appropriate for data derived from a between-
subjects design. With respect to between-subjects designs, each case in the sample is
measured only once on the dependent variable of interest. Thus, in the handedness example,
the dependent variable, dyslexia, was measured only once. However, there are occasions
where a hypothesis will be relevant to the differences between percentages/proportions
derived from a within-subjects design. In within-subjects designs, the dependent variable is
measured more than once. For example, a researcher may measure agreement (0 = disagree; 1
= agree) with a particular argument at the beginning of a debate and then again at the end of
the debate. In the context of testing the difference between dichotomously scored dependent
variables measured from a within-subjects design, there are two commonly used statistics:
McNemar chi-square and Cochran’s Q. McNemar’s chi-square can test the difference between
two related (within-subjects) proportions/percentages. Arguably, McNemar’s chi-square
should be only of historical interest, because Cochran’s Q can test the difference between two
or more related (within-subjects) proportions/percentages. However, because McNemar’s chi-
square remains so commonly observed in the literature, I will cover both McNemar’s chi-
square and Cochran’s Q, separately.

McNemar Chi-Square
Does the percentage of infants who cry at night change across the ages of 12 months
to 36 months? Presumably, the percentage decreases, right? Or perhaps it increases? Gaylor,
Goodlin-Jones and Anders (2001) were interested in examining this issue scientifically. To this
effect, they collected data on 33 children as they slept at night (video cameras) when they
were 12 months old, and then again when they were 36 months old. The children were coded

C4.19
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

as ‘signalers’, if they cried and needed to be settled, or ‘self-soothers’, if they were able to
sooth themselves throughout the entire night. At 12 months, 16 of the children (48.5%) were
identified as self-soothers. At 36 months, 22 of the children (66.7%) were identified as self-
soothers. Thus, at least numerically, the percentage of self-soothing children increased by
18.2% (66.7 – 48.5). To test the numerical difference between 48.5% and 66.7% statistically,
one could use the McNemar chi-square statistic.

Hypotheses

Null Hypothesis (H0): There is no association between signalers at 12 months and signalers at
36 months.

Alternative Hypothesis (H1): There is an association between signalers at 12 months and


signalers at 36 months.

McNemar (1947) derived a very simple formula for the test of the difference between
two percentages (or proportions) which relies exclusively upon the number of discordant pairs
in the observations. That is, the number of observations that switched from ‘yes’ to ‘no‘
divided by the total number of total switchers.
(b  c) 2
 2
(3)
bc
The values obtained from formula (3) correspond to the chi-square distribution with 1
degree of freedom. In this example, the number of self-soothers at age 12 months that
changed into signalers at age 36 months was 2 (see Table C4.1). By contrast, the number of
signalers at age 12 months that changed into self-soothers at age 36 months was 8.

Table C4.1. Frequencies across Cells in the Contingency Table


Age 36 months
Self-Soothers Signalers Total
Age 12 months Self-Soothers 14 (a) 2 (b) 16
Signalers 8 (c) 9 (d) 17
Total 22 11
Note. The letters a, b, c, and d correspond to the cell demarcations in the McNemar chi-square
formula.

C4.20
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Thus, applying formula (3) to the relevant data in Table C4.1 yielded a chi-square value of 3.60.
Is the chi-square value of 3.60 large enough within the context of the chi-square distribution to
declare it ‘statistically significant’?
(2  8) 2 36
2    3.60
28 10
It turns out that all McNemar chi-square statistic values that are larger than 3.84 are
statistically significant, p < .05. Thus, as the calculated McNemar chi-square value of 3.60 was
not greater than 3.84, the null hypothesis of equal percentages (or proportions) was not
rejected. Rather than calculate McNemar chi-square by hand, it would be more efficient to use
a software. Also, the precise p-values associated with a McNemar chi-square value can be
obtained from a computer program.

McNemar’s Chi-Square: SPSS


Despite the immense popularity of the McNemar test and the fact that SPSS has menu
options that suggest that the McNemar test can be performed, it is not necessarily possible to
conduct a McNemar test in SPSS (!). To prove it, use the Crosstabs utility in SPSS and select
‘McNemar’ as the option (Data File: signalers) (Watch Video 4.13: McNemar Chi-Square Test
in SPSS). As can be seen in the SPSS table entitled ‘Chi-Square Tests’, there is no chi-square
value to be seen. Instead, a p-value is reported. In the signaler/self-soother study, the p-value
was estimated at .109. Thus, the null hypothesis of no difference between the percentages of
self-soothers from 12 months (48.5%) to 36 months (66.7%) was not rejected. Stated
alternatively, the numerical difference of 18.2% (66.7 – 48.5) was not found to be statistically
significant.

It will be noted, however, that the subscript ‘a’ next to the p-value of .109 denoted
that the Binomial distribution was used. Thus, SPSS did not actually conduct a McNemar chi-
square analysis. Instead, when the number of discordant pairs is 10 or less, SPSS automatically
conducts the more conservative binomial test, instead of the McNemar test. In this context,
the binomial test is similar to the Yates correction applied to the Pearson chi-square analysis.
Unfortunately, both tests have been shown to be excessively conservative (Camilli & Hopkins,
1978, 1979; Conover, 1974; Fagerland, Lydersen, & Laake, 2013; Feinberg, 1980; Larntz, 1978;

C4.21
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Thompson, 1988). Consequently, I’m not convinced that the binomial test, or the Yates
correction, should be used.
Fortunately, there is a relatively easy solution to the problem: conduct a Crochran’s Q
analysis. Cocrhan’s Q is a generalized version of McNemar’s chi-square analysis. Thus, it can
test the difference between two or more within-subjects percentages/proportions. In the
signaler/self-soother study, there are only two percentages. To conduct a Cochran’s Q analysis,
you can use the Nonparametrics utility in SPSS (Watch 4.14: Cochran’s Q as a Substitute for
McNemar Chi-Square).

As can be seen in the SPSS table entitled ‘Test Statistics’, the sample size was 33 and the
Cochran’s Q value was estimated at 3.600, which is precisely the same chi-square value
calculated above with the McNemar chi-square formula. Thus, the Cochran’s Q value is really a
chi-square value. Furthermore, the p-value was estimated at .058, which suggests that the null
hypothesis was not rejected (as expected, as the chi-square value was not greater than 3.84).
Thus, there was an absence of a statistically significant evidence to suggest that percentage of
self-soothers changed from 12 to 36 months.

C4.22
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Advanced Topics

Why do Researchers Use p < .05?


Many people ask why 5%, or p < .05, is the “magical” basis upon which researchers
tend to conclude an event is “beyond chance”. There is much debate about this. Some
contend that p = .05 corresponds to two standard deviations above/below the mean within
the normal distribution, which is true. However, I believe the reason p = .05 became widely
adopted is rooted more into human psychology. Specifically, p ≈ .05 is essentially consistent
with people’s intuition about a non-random occurrence. Stated alternatively, ordinary people
begin to question an event that occurs only between 10% and 1% as suspiciously unlikely to
occur by chance.
In an experimental study, Cowles and Davis (1982) had 36 undergraduate university
students play a gambling game where they were given chips which could be redeemed for
money. The participants played a “shell” game where the experimenter supposedly placed a
red button under one of the cups and the participant had to guess which one of the three cups
had the button. Unawares to the students, the experimenter never placed the red button
under any of the cups.6 Theoretically, there was a 33% chance of winning in any trial. However,
given the deception, the actual chance of winning was zero. Cowles and Davis (1982) found
that students, on average, expressed suspicion of an unlikely series of supposedly chance
events (i.e., losing every single time) at p = .098. Furthermore, the students quit playing, on
average, at p = .009. Consequently, it can be suggested that the participants were suspicious of
an unlikely chance event at p = .098 and convinced at p = .009. Therefore, based on Cowles
and Davis (1982), p = .05 appears to be subjectively reasonable to demarcate a sufficiently
unlikely chance event in the minds of adult humans.

Table C4.2. Results of Rigged Coin Flipping Study


1 2 3 4 5 6 7 8 9 10
Heads Heads Heads Heads Heads Heads Heads Heads Heads Heads
Heads Heads Heads Heads Heads Heads Heads Heads Heads
Heads Heads Heads Heads Heads Heads Heads Heads
Heads Heads Heads Heads Heads Heads Heads
Heads Heads Heads Heads Heads Heads
Heads Heads Heads Heads Heads
Heads Heads Heads Heads
Heads Heads Heads

6
Never trust an experimental psychologist.

C4.23
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Heads Heads
Heads
Prob. .500 .250 .125 .063 .031 .016 .008 .004 .002 .001

To help understand the results reported by Cowles and Davis (1982) in a different way
manner, I have reported a probability table under the pretense that a person played a coin
flipping game (heads or tails) with odds of winning at, theoretically, .50 for any trial (see Table
C4.2). However, if the coin were “fixed” such that the participant were to lose every time, it
would be predicted that participants would express suspiciousness of a possible non-chance
event after somewhere between three to four loses in a row (p = .125 to p = .063); and they
would be convinced of a non-chance event after 7 loses in a row (i.e., p = .008).

Is phi Just a Pearson Correlation (r)? Yes.


This portion of the Advanced Topics assumes that you have knowledge of a Pearson
correlation, already. The very commonly applied phi formula necessarily implies that the
estimate will always be positive in direction. However, I believe information is lost by virtue of
this fact, because the well-established Pearson r, another measure of standardized association,
can either be positive or negative in value (see chapter 5). As it turns out, Pearson’s r and phi
are identical, with the exception that Pearson’s r can either be positive or negative in
direction.7 Stated alternatively, Pearson’s r can range in value from -1.0 to 1.0. Again, a value
closer to |1.0| implies a larger association between the two variables. Although Pearson’s
correlation is described in detail in another chapter, it will be pointed out, here, that Pearson’s
r for the handedness and dyslexia study was estimated at .297 (Watch Video 4.15: Phi and
Pearson’s r are the same?). Thus, larger values of handedness (i.e., left-handedness = 1; right-
handedness = 0) were associated with larger values of dyslexia (i.e., dyslexia = 1; no dyslexia =
0).
Although Pearson’s r and phi have a direct correspondence, it should be acknowledged
that the maximum possible value of Pearson’s r and phi is only theoretically 1.0. In practice,
the maximum possible value of phi is much less than 1.0. For this reason, some researchers
calculate ‘adjusted phi’.

7
Pearson’s r and phi also have slightly different standard errors, so, they will also not be
associated with the same exact p-value. I believe which p-value is more accurate has not been
established, yet.

C4.24
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Adjusted phi
As demonstrated by Breaugh (2003), as the row and column marginal proportions
differ in magnitude, a very likely scenario in practice, the maximum possible phi values will
become increasingly less than 1.0. Breaugh (2003) provided a realistic example where the
estimated phi between two dichotomously scored variables was .20, which would suggest a
relatively small effect. However, the maximum possible phi value was .33, which would
suggest that the observed .20 was large. For this reason, some researchers report a statistic
known as adjusted phi, which is the ratio of the observed phi by the maximum phi:
obs (4)
adj 
max

Phi max can be estimated with the following formula:


p1  p1 * p 2 (5)
 max 
p 2  p 2 * p1
where p1 = the row or column in the 2x2 table with the lowest proportion, and p2 = the row or
column with the second lowest proportion. In the handedness and dyslexia example, the
maximum phi was .656, based on the following phi max solution:
.040  (.040 * .088)
 max   .656
.088  (.088 * .040 )
In my opinion, trying to express the effect size associated with a contingency table
analysis from the perspective of a coefficient such as phi (or adjustments thereof) is not
particularly useful. Instead, I believe the most intuitive and useful approach to evaluating the
effect size associated with a 2x2 Pearson chi-square analysis is to simply report the differences
between the corresponding proportions or percentages. Additionally, the difference in the key
cell percentages is not affected by the magnitude of the difference in the marginal
percentages, unlike phi. With respect to the handedness and dyslexia example, 2.2% of the
right-handers had dyslexia, whereas 22.7% of the left-handers had dyslexia. Scientists and
laypeople alike would appreciate the magnitude of the difference expressed in such simple
terms.

Is a One-Tailed Pearson Chi-square Analysis Possible?


The short answer to this questions is, ‘Yes.’ Whether it is justifiable or not is a different
and important question. There are references that state that it is possible to conduct a one-
tailed Pearson chi-square test (e.g., Ferguson, 1976, p. 204). The logic rests on the fact that the
chi-square distribution with 1 degree of freedom is simply the z-distribution squared. For
example, a Pearson chi-square analysis with 1 degree of freedom and alpha specified at .05 is

C4.25
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

associated with a critical chi-square square value of 3.84. Furthermore, 3.84 square rooted
equals 1.96. Not coincidently, 1.96/-1.96 in the z-distribution corresponds to 95% of the area
under the normal curve, with approximately 2.5% of the area beyond 1.96/-1.96 split between
the two tails. Stated alternatively, a z-value of 1.96 corresponds to alpha of .05. Thus, if 1.96 is
a two-tailed critical value in the context of the z distribution, then 3.84 should be considered a
two-tailed critical value with alpha = .05. In order to conduct a one-tailed test with the z-
distribution (alpha = .05), one could use 1.64 as the critical value. With respect to SPSS, there is
no specific option to conduct a one-tailed Pearson chi-square analysis. However, one could
simply conduct the Pearson chi-square analysis as per normal and then divide the reported p-
value in half. If it is less than .05, then one could declare a statistically significant effect.
One-tailed Pearson chi-square testing is possible only in contingency table analyses as
large as the 2x2 design. In contingency table designs with more than two levels, the logic of a
one-tailed test breaks down, completely. It is tantamount to conducting a one-tailed test
ANOVA (described in another chapter). Ultimately, it is not possible to specify the direction of
an effect with three or more levels.

Odds/Ratios
Another method to describe a statistically significant 2x2 Pearson chi-square result is
to express the effect as an odds/ratio. When there is the complete absence of an association
between the two variables, the odds/ratio will equal 1.0. Thus, as the odds/ratio deviates from
1.0 (either lower or higher), the larger the magnitude of the effect. An odds/ratio greater than
1.0 implies a greater chance of Y given an event X. An odds/ratio less than 1.0 implies a lesser
chance of Y given an even X. An intuitive formula for the calculation of an odds/ratio is:
a/b (5)
OR 
c/d
Where a, b, c, and d refer to the frequencies associated with each of the cells within the
contingency table analysis. What exactly constitutes a, b, c, and d is to some degree arbitrary,
in the sense that a and c can certainly be interchanged, so long as b and d are also
interchanged. In most cases, you will likely want to speak of the presence of the outcome in
one group relative to the other. Thus, in the handedness and dyslexia example, it makes most
sense to me to speak of the presence of dyslexia in left-handers relative to right-handers.
Thus, the observed frequencies were a = 5, b = 17, c = 5 and d = 223.

C4.26
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Thus, the odds/ratio (OR) was estimated at:


5 / 17 .294117
OR    13.12
5 / 223 .022422
The value of 13.12 suggests that left-handers were 13.12 times more likely to be diagnosed
with dyslexia than right-handers. It is possible to calculate confidence intervals for the
odds/ratio point estimate of 13.12. First, the OR needs to be transformed onto the natural log
scale. Such a procedure can be performed with a calculator or a stats program. In this case, the
natural log scale of 13.36 equals 2.5923. Next, the standard error (se) for the transformed (ln)
OR needs to be calculated with the following formula:
1 1 1 1 (6)
ln(OR)se    
a b c d
Where a, b, c, and d correspond to the values reported in the SPSS Crosstabulation table. Thus,
in this example, the ln(OR)se corresponds to .4633:
1 1 1 1
ln(OR)se      .2000  .0588  .2000  .0045  .6807
5 17 5 223
Next, the ln(OR)se value needs to be multiplied by 1.96, in order for it to represent
95% of the z-distribution. Thus, .6807 * 1.96 = 1.3342. The next to last step involves
subtracting and adding the value of 1.3342 from/to the ln(OR) of .4633:
95%CI Lower-Bound = 2.5923 – 1.3342 = 1.2581
95%CI Upper-Bound = 2.5923 + 1.3342 = 3.9265
Finally, the values of 1.2581 and 3.9265 need to be rescaled back into the original OR
scale via the exponential function. In this case, that worked out to: 3.52 and 50.72. Thus, the
95% confidence intervals associated with the OR point-estimate of 13.12 were 3.52 (lower-
bound) and 50.72 (upper-bound). These results suggest that left-handers may be as much as
50.72 times more likely to be diagnosed with dyslexia.

Odds/Ratios: SPSS
In order to conduct an odds/ratio analysis in SPSS, you can use the Crosstabs menu
utility (Watch Video 4.16: Odds/Ratios in SPSS):

C4.27
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

As can be seen in the SPSS table entitled “Risk Estimate’, the odds/ratio (OR) was estimated at
13.118. Thus, left-handers were 13.12 times more likely to be diagnosed with dyslexia than
right-handers. Furthermore, the 95% confidence intervals corresponded to 3.455 and 49.801.
Thus, if this study were conducted a large number of times with new samples, we would
expect the OR point-estimate to be somewhere between 3.455 and 49.801 across 95% of the
samples. Given the relatively large sample of 250 participants, it may be surprising to observe
such a large confidence interval.

Problems with Interpreting Odds Ratios


Odds ratios are particularly susceptible to distorted representations of the results,
consequently, they should be used with caution. Altman, Deeks, and Sackett (1998) reported
the results associated with a real study where the odds ratio was estimated at 88. However,
the value of 88 was arguably a gross distortion of the results. By contrast, the relative risk was
estimated at 7.2. As a general statement, Altman et al (1998) expressed a strong preference
for reporting results via relative risk ratios, especially when there is a large discrepancy
between the odds ratio and the relative risk ratio estimates.

Relative Risk
Relative risk is an alternative approach to the odds/ratio for the interpretation of an
effect associated with a 2x2 Pearson chi-square analysis. It represents the ratio of the
observed condition percentages of interest across the two groups. It is easy to get confused
here, because the manner in which the data are entered into the formula or stats program will
affect the relative risk value that is estimated. Consequently, it is important for you to know
the manner in which you want to present the results. In the handedness and dyslexia example
(described in the Foundations section of this chapter), my preference would be to report the
relative risk as the percentage of people diagnosed with dyslexia who were left-handed
relative to the percentage of people diagnosed with dyslexia who were right-handed.
Numerically, my preference works out to the following: 22.7 / 2.2 = 10.32. Thus, the relative
risk of a dyslexia diagnosis, if a person were left-handed, rather than right-handed, was 10.32.

C4.28
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

The relative risk value of 10.32 was comparable to the odds ratio of 13.12. In cases
where there is a fairly substantial difference between the odds ratio and the relative risk ratio,
you will almost always be better off reporting the relative risk ratio (Altman et al., 1998).

Relative Risk: SPSS


In order to conduct a relative risk analysis in SPSS, you can use the Crosstabs menu
utility (Watch Video 4.17: Relative Risk Analysis in SPSS). As can be seen in the SPSS table
entitled ‘Risk Estimate’, SPSS reported two relative risk values. One ‘For cohort dyslexia = no’,
which was estimated at 1.266, and one ‘For cohort dyslexia = yes’, which was estimated at
.096. Where did SPSS get those values, given that I estimated the relative risk at 10.32?

SPSS simply and automatically divides the top row percentage with the corresponding
bottom row percentage. SPSS does so twice: once for the left-side column of results and once
for the right-side column of results. Thus, as can be seen in the SPSS table entitled
‘handedness * dyslexia Crosstabulation’, the relative risk of 1.266 was obtained by the
following ratio: 97.8 / 77.3. Furthermore, the relative risk of .096 was obtained by the
following ratio: 2.2 / 22.7. In my opinion, neither of these two relative risk ratios are
particularly intuitive. Fortunately, as a researcher, you have the option to divide the
corresponding percentages as you see fit. Thus, in this case, my preference is to divide 22.7%
by 2.2% to yield a relative risk ratio of 10.32. Doing so would allow me to say that left-handers
are 10.32 more at risk of being diagnosed with dyslexia than right- handers. The take-home
message, here, is that you cannot rely upon SPSS to necessarily report the most intuitive
relative risk ratio for any particular analysis. Instead, you need to think about how best to
report the relative risk ratio, and possibly calculate it yourself.

C4.29
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

The limitation with calculating the relative risk yourself, if SPSS does not report your
preferred risk ratio, is that you will not get the 95% confidence intervals. Consequently, you
cannot report the relative risk as statistically significant, nor can you report the level of
confidence associated with the estimated relative risk point estimate.
Fortunately, there is a relatively easy solution to the problem. You can simply recode
one of the variables, so that the relevant percentages are ordered in the contingency table in
such a way as to yield the desired relative risk ratio. In the handedness and dyslexia example, I
recoded the handedness variable such that left-handers were coded ‘0’ and right-handers
were coded ‘1’. When I re-ran the analysis with the recoded handedness variable and the
original dyslexia variable, I obtained exactly the same Pearson chi-square results. Additionally,
I obtained the following relative risk ratios:

As can be seen in the SPSS table entitled ‘Risk Estimate’, the ‘For cohort dyslexia = yes’
row includes a relative risk ratio of 10.364, which is precisely the relative risk ratio I calculated
by handed above. Thus, left-handers are 10.36 times more likely to be diagnosed with dyslexia
than right-handers. Furthermore, the relative risk ratio estimate of 10.36 was statistically
significant, because the 95% confidence intervals did not intersect with 1.0. Specifically, the
95% confidence intervals corresponded to 3.25 and 33.05. Unfortunately, SPSS does not report
the p-values associated with the estimated relative risk point-estimates.

C4.30
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

How to Interpret Odds Ratios and Risk Ratios Less than 1


When the null hypothesis is met perfectly, odds ratios and relative risk ratios will be
estimated at precisely 1.0. Thus, a value of 1.0 implies the total absence of an effect between
the two variables. It is my impression that odds ratios and risk ratios are best reported and
interpreted as values larger than 1.0. Consequently, I recommend that the corresponding
ratios be created such that they will yield a value greater than 1.0, rather than less than 1.0.
However, if the example you are dealing with is best interpreted as ratios less than
1.0, then by all means do so. In the handedness and dyslexia example, the original relative risk
ratio was estimated at .096, which implies that the risk of being diagnosed with dyslexia for a
right-handers is about 10% of the risk for left-handers. Again, I would prefer to say that the risk
of being diagnosed with dyslexia is 10.32 greater for left-handers than for right-handers.

2 x 2 Pearson Chi-Square: Interactions?


Occasionally, I come across a paper that states that an interaction was tested with a
2x2 Pearson chi-square analysis. For example, Rottenstreich and Hsee (2001) tested the
hypothesis that a person’s preference for winning a kiss from their favorite movie star versus
$50 would interact with the probability of winning. In one group, the participants were told
they were guaranteed to win either the kiss or the $50. In the other group, the participants
were told that there was a 1% chance that they could win either of the prizes. The participants
were asked to specify their preferences. In the certainty group, 70% of the participants
reported that they preferred the cash over the kiss. By contrast, in the low-probability group,
65% of the participants preferred the kiss over the cash. Rottenstreich et al (2001) reported
that the interaction hypothesis was supported, as the Pearson chi-square was statistically
significant, χ2(1, N = 40), p = .027. Additionally, Rottenstreich et al (2001) produced a graphical
representation of the results similar to that reported in Figure C4.1 to support the notion that
there was an interaction.
In my opinion, Rottenstreich et al (2001) did not demonstrate an interaction. Instead,
they simply demonstrated that there was an association between certainty and preference for
a particular reward. Specifically, higher levels of certainty were associated with relatively less
preference for a less well-known reward (i.e., kiss from movie star). Correspondingly, the phi
coefficient associated with the analysis was -.35, p = .027. Ultimately, it is arbitrary which
values are allocated to which responses. Had a kiss with a movie star been coded a 2 (rather
than 1), the phi coefficient would have been positive in direction, which would not have been
suggestive of an “interaction”. As I describe in further detail in another chapter (see Chapter 9
or 11), an interaction occurs when the effect of an independent variable on a dependent
variable depends on the levels of another independent variable. A 2x2 Pearson chi-square

C4.31
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

analysis includes only two variables, consequently, the demonstration of an interaction is


impossible.

Figure C4.1. Plot of Percentages Suggestive of an Interaction??

Partitioning
The partitioning approach to the decomposition of a statistically significant omnibus
Pearson chi-square analysis involves conducting a series of 2x2 Pearson chi-square follow-up
analyses. In the context of the webpage and ad clicking study, there are three possible 2x2
Pearson chi-square analyses: trending versus purchase; (2) trending versus search; and (3)
purchase versus trending.

Partitioning: SPSS
In order to conduct a series of partitioning Pearson chi-square analyses, it is necessary
to select the groups for comparisons. For example, in order to restrict the analysis to a 2x2
Pearson chi-square analysis of the trending versus purchase percentages, those two groups
need to be selected by SPSS, while the search group is excluded. In order to select groups in
SPSS, the select case menu utility can be used.
Each time the appropriate groups/cases are selected, the 2x2 Pearson chi-square
analysis can be conducted and interpreted. The results associated with the three 2x2 Pearson
chi-square analyses are reported in Table C4.3. It can be observed that the trending vs.
purchase (p = .001) and the purchase versus search (p = .022) 2x2 Pearson chi-square analyses
were statistically significant. However, after the application of a Bonferroni correction, only

C4.32
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

the trending versus purchase 2x2 Pearson chi-square analysis was statistically significant, p =
.003 (Watch Video 4.18: Pearson Chi-Square Partitioning - SPSS).

Table C4.3: 2x2 Pearson Chi-square Partitioning Results


Comparison Percentages χ2 p p’ phi
Trending vs. Purchase 1.9% vs. 2.6% 11.14 .001 .003 .024
Trending vs. Search 1.9% vs. 2.1% 1.12 .289 .867 .007
Purchase vs. Search 2.6% vs. 2.1% 5.22 .022 .066 -.016
Note. N = 20,000 for each analysis; p’ = Bonferroni corrected

It should be kept in mind that the result associated with the omnibus 2x3 Pearson chi-
square analysis does not correspond exactly to a series of three 2x2 Pearson chi-square
analysis. For example, the sum of the three 2x2 Pearson chi-square values will not equal the
2x3 Pearson chi-square value. The reason is that the expected cell frequencies are not the
same across the 2x3 contingency table and the corresponding 2x2 contingency tables. Thus, it
is conceivable that one could obtain a statistically significant 2x3 Pearson chi-square analysis
and no statistically significant 2x2 Pearson chi-square analyses (or vice versa).
Ultimately, the main problem with the Pearson chi-square approach to testing the
difference between percentages is that the expected cell frequencies change from analysis to
analysis, depending on which variables are included in the analysis. By contrast, the within-row
percentages remain the same. For example, the trending ad type CTR percentage was 1.90%
across all Pearson chi-square analyses which included the trending ad type variable. By
contrast, the trending ad type adjusted standardized residual was -2.53 in the 2x3 Pearson chi-
square analysis and -1.06 in the trending by search 2x2 Pearson chi-square analysis.

Pearson’s Chi-Square Versus McNemar Chi-Square


If you read Gaylor, Goodlin-Jones and Anders (2001) study on ‘signallers’ and ‘self-
soothers’, you will find that they conducted a 2x2 Pearson chi-square analysis on their data,
which yielded χ2 = 6.07, p = .014. Thus, the null hypothesis was rejected. But what null
hypothesis? That is an important question. As I described above, the 2x2 Pearson chi-square
analysis is essentially a measure of association, given its direct correspondence with phi. Thus,
the null hypothesis in the context of the Pearson chi-square analysis was, “There will not be an
association between signalers at 12 months (time 1) and signalers at age 36 months (time 2).
Naturally, one would expect an association between children who were signalers at age 12
months and signalers at age 36 months. That is, children who were signalers at 12 months
would naturally be more likely to be signalers at 36 months, in comparison to children who
were already self-soothers at age 12 months. The phi associated with the 2x2 Pearson chi-

C4.33
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

square analysis was estimated at .429, which suggests that there was some correspondence
(association) across time. However, such an observation is distinct from the within-subjects
null hypothesis that the percentage of signalers will change from 12 months to 36 months of
age. Hypotheses relevant to change cannot be tested via Pearson’s chi-square. Only the
McNemar chi-square statistic (or the Cochran’s Q) are appropriate for such questions.

Mid-p McNemar Test


Some of you may be worried to follow my recommendation to totally disregard
recommendations pertaining to expected cell frequencies and/or number of discordant pairs
with respect to contingency table analyses such as the McNemar test. For those
uncomfortable with my recommendation, but who may still feel that the binomial exact test is
too conservative, there is evidence to use an approach somewhere in the middle. It’s called
the mid-p McNemar test (Fagerland, Lydersen, & Laake, 2013). To conduct the mid-p
McNemar test, run the analysis through the Crosstabs utility in SPSS (forthcoming):

Fagerland, M. W., Lydersen, S., & Laake, P. (2013). The McNemar test for binary
matched-pairs data: mid-p and asymptotic are better than exact conditional. BMC Medical
Research Methodology, 13, 91-91.

Dealing with Constant Values


As a default, the Pearson chi-square analysis will be not executed by SPSS, if there is
no variability associated with one or more of the variables. However, this problem can be
overcome, if you specify SPSS to conduct the analysis within integer mode. From a practical
perspective, this means that you have to run the analysis via syntax and add a bit of
information to the standard syntax, as can be seen below. In integer mode, SPSS conducts the
McNemar test as a binomial test.

Standard Syntax for a 2x2 Pearson Chi-Square Modified Syntax for a 2x2 Pearson Chi-Square
CROSSTABS CROSSTABS variables = v1 v2 (1,2)
/TABLES=v1 BY v2 /TABLES=v1 BY v2
/FORMAT=AVALUE TABLES /FORMAT=AVALUE TABLES
/STATISTICS=MCNEMAR /STATISTICS=MCNEMAR
/CELLS=COUNT EXPECTED /CELLS=COUNT EXPECTED
/COUNT ROUND CELL /COUNT ROUND CELL
/METHOD=EXACT TIMER(5). /METHOD=EXACT TIMER(5).

C4.34
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Practice Questions
1: My ex-girlfriend - revisited
Had I been a supportive boyfriend, I would have given my ex-girlfriend more than 10
trials to prove her capacity to distinguish Pepsi from Coke. That is, based on the 10 trials, she
achieved a 70% success rate. True, that was not found to be statistically significantly different
from 50% (p = .206). However, had she been able to keep a 70% success rate across, say, 30
trials, it is possible that the p-value would have come in less than .05. Unfortunately for me,
back then, I was more concerned with being right, than being supportive – and I paid the price!
Test the hypothesis that a 70% success rate would have been found to be statistically
significant based on 30 taste tasting trials (Data File: how_to_get_kissed) (Watch Video 4.P1:
How to Get Kissed).

2: Do fake apologies help people like you more?


I love it when psychologists get into the real-world to do research! Brooks, Dai, and
Schweitzer (2014) were interested in determining the extent to which offering a fake apology
to a stranger increased the chances that the stranger would be more generous in response to
request. In the last of their studies, Brooks et al. (2014) hired an actor to approach 65
strangers at a train station, one at a time. The day they chose to conduct the experiment
rained a moderate amount. On half of the occasions, the actor said to the stranger “I’m so
sorry about the rain! Can I borrow your cell phone?” On the other half, the actor said, “Can I
borrow your cell phone?” Test the null hypothesis that the percentage of people who lent
their cell phone was equal across the two conditions: ‘apology’ versus ‘no apology’. (Data File:
apology_cell_phone) (Watch Video 4.P2: Apologize & Borrow a Cell Phone).

3: Q: “How does your lower-back feel?” A: “Partly cloudy with a 40% chance of an afternoon
storm.”
Some people who suffer from lower-back pain (and osteoarthritis) insist that there is a
connection between changes in the weather and the pain they feel in their body. So much so,
that they can predict the weather. Beilken, Hancock, Maher, Li, and Steffens (2016)
investigated this supposed connection in a sample of 981 individual suffering periodic, acute
lower-back pain. Beilken et al. (2016) had the participants complete a daily diary to specify
when they experienced the lower-back pain. Beilkein et al. (2016) also matched the back pain
events with the meteorological measurements for that day/time. Test the hypothesis that
there was an association between feeling back-pain and a change in barometric pressure in
the atmosphere. (Data File: weather_back_pain) (Watch 4.P3: Can a Person's Lower Back
Predict the Weather?)

C4.35
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

4: Maternal fear versus happy expressions and infant behavior


In this example, I take things up a notch. I describe the results associated with a classic
study in psychology. Your challenge is to create the data file that yields the results reported by
the researchers. The classic study is the visual cliff and social referencing study (Sorce, Emde,
Campos, & Klinnert, 1985). The young children (12-months old) were placed on a table with an
optical effect such that the child could not make-out whether the visual cliff was an illusion or
not. The mothers were on the other side of the table. The mothers smiled clearly until the
infant reached the center of the table. Then, the mothers shifted to a previously trained happy
or fear facial pose. What did the infants do?
First, create the data file that corresponds to the results reported in the study (quoted
below). Additionally, test the null hypothesis that the percentage of children who crossed the
visual cliff was equal between the happy and fear conditions. You should obtain the same chi-
square value reported by Sorce et al. (1985, p. 197):

“When mother posed a fearful expression none of the 17 infants ventured across the
deep side. In sharp contrast, 14 of the 19 infants who observed mothers' happy face
crossed the deep side, χ2(1) = 20.49, p < .0001.”

Here’s the only other piece of information you need to know:

“Thirty-six middle-class mothers and their 12-month-old infants were randomly


assigned to a smiling condition (N = 10 males, 9 females) and to a fear condition (N = 9
males, 8 females)” (Sorce et al., 1985, p. 197).

(Watch Video 4.P4: Maternal Expression & Infant Behavior)

5: “Quitting smoking is easy. I’ve done it thousands of times.” Mark Twain


95% of attempts to quit smoking fail (Hughes, Keely, & Naud, 2004). Free et al. (2015)
sought to improve the success rate, by sending text message support to smokers trying to quit.
An example text message included: “Cravings last less than 5 minutes on average. To help
distract yourself, try sipping a drink slowly until the craving is over”. The percentage of
participants who quit smoking was measured at pre- and post-intervention (6 months after the
start of the text support program started). Test the null hypothesis that the percentage of
people who managed to quit smoking in the intervention and control groups was equal. Free
et al.’s (2015) study was based on a sample size greater than 5,000. However, to keep things
simpler, I simulated the data to correspond to the results very closely, but with just N = 438.
(Data File: smoking_intervention) (Watch Video 4.P5: Smoking Intervention)

C4.36
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Advanced Practice Questions

(forthcoming)

C4.37
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

References
Altman, D. G., Deeks, J. J., & Sackett, D. L. (1998). Odds ratios should be avoided when events
are common. BMJ: British Medical Journal, 317(7168), 1318-1318.
Beilken, K., Hancock, M. J., Maher, C. G., Li, Q., & Steffens, D. (2016). Acute low back pain? Do
not blame the weather—A case-crossover study. Pain Medicine, pnw126.
Campbell, I. (2007). Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample
recommendations. Statistics in Medicine, 26, 3661–3675.
Camilli, G. & Hopkins, K. D. (1978). Applicability of chi-square to 2 * 2 contingency tables with
small expected frequencies. Psychological Bulletin, 85, 163-167.
Camilli, G. & Hopkins, K. D. (1979). Testing for association in 2 * 2 contingency tables with very
small sample sizes. Psychological Bulletin, 86, 1011-1014.
Conover, W. J. (1974). Some reasons for not using the Yates continuity correction on 2× 2
contingency tables. Journal of the American Statistical Association, 69(346), 374-376.
Croson, R., & Sundali, J. (2005). The gambler’s fallacy and the hot hand: Empirical data from
casinos. Journal of Risk and Uncertainty, 30(3), 195-209.
Fagerland, M. W., Lydersen, S., & Laake, P. (2013). The McNemar test for binary matched-pairs
data: mid-p and asymptotic are better than exact conditional. BMC Medical Research
Methodology, 13, 91-91.
Feinberg, S. E. (1980). The analysis of cross-classified categorical data. Cambridge: MIT.
Ferguson, G. A. (1976). Statistical analysis in psychology and education. Tokyo: McGraw-Hill.
Free, C., Knight, R., Robertson, S., Whittaker, R., Edwards, P., Zhou, W., ... & Roberts, I. (2011).
Smoking cessation support delivered via mobile phone text messaging (txt2stop): a
single-blind, randomised trial. The Lancet, 378(9785), 49-55.
Geschwind, N., & Behan, P. (1982). Left-handedness: Association with immune disease,
migraine, and developmental learning disorder. Proceedings of the National Academy
of Sciences, 79(16), 5097-5100.
Hughes, J. R., Keely, J., & Naud, S. (2004). Shape of the relapse curve and long‐term abstinence
among untreated smokers. Addiction, 99(1), 29-38.
Larntz, K. (1978). Small sample comparisons of exact levels for chi-square goodness of fit
statistics. Journal of the American Statistical Association, 73, 253-263.
MacDonald, P. L., & Gardner, R. C. (2000). Type I error rate comparisons of post hoc
procedures for i j Chi-Square tables. Educational and Psychological
Measurement, 60(5), 735-754.
McNemar, Q. (1947). Note on the sampling error of the difference between correlated
proportions or percentages. Psychometrika, 12(2), 153-157.
Newcombe, R. G. (1998). Two‐sided confidence intervals for the single proportion: comparison
of seven methods. Statistics in Medicine, 17(8), 857-872.
C4.38
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 4: CATEGORICAL DATA

Sorce, J. F., Emde, R. N., Campos, J. J., & Klinnert, M. D. (1985). Maternal emotional signaling:
Its effect on the visual cliff behavior of 1-year-olds. Developmental Psychology, 21(1),
195-200.
Thompson, B. (1988). Misuse of chi-square contingency-table test statistics. Educational and
Psychological Research, 8(1), 39-49.
Tversky, A., & Kahneman, D. (1971). Belief in law of small numbers. Psychological
Bulletin, 76(2), 105-110.
Yates, F. (1934). Contingency tables involving small numbers and the chi-square test. Journal of
the Royal Statistical Society, 1, 217-235.

C4.39
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.

You might also like