0% found this document useful (0 votes)
220 views4 pages

Continuity Correction of Pearson's Chi-Square Test in 2x2 Contingency Tables: A Mini-Review On Recent Development

1) The Pearson's chi-square test is commonly used but introduces errors for 2x2 contingency tables when approximating discrete probabilities with continuous distributions. 2) Several authors have proposed continuity corrections to the Pearson's chi-square statistic to reduce this error, including Yates' correction from 1934 and more recent corrections. 3) This document reviews recent developments in continuity corrections for Pearson's chi-square test in 2x2 tables, summarizing Serra's minimized Pearson's chi-square statistic from 2008 and Kajita Matchita et al.'s correction from an unknown date that aims to better control type 1 errors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views4 pages

Continuity Correction of Pearson's Chi-Square Test in 2x2 Contingency Tables: A Mini-Review On Recent Development

1) The Pearson's chi-square test is commonly used but introduces errors for 2x2 contingency tables when approximating discrete probabilities with continuous distributions. 2) Several authors have proposed continuity corrections to the Pearson's chi-square statistic to reduce this error, including Yates' correction from 1934 and more recent corrections. 3) This document reviews recent developments in continuity corrections for Pearson's chi-square test in 2x2 tables, summarizing Serra's minimized Pearson's chi-square statistic from 2008 and Kajita Matchita et al.'s correction from an unknown date that aims to better control type 1 errors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ORIGINAL ARTICLES Epidemiology Biostatistics and Public Health - 2019, Volume 16, Number 2

Continuity correction of Pearson’s chi-square


test in 2x2 Contingency Tables:
A mini-review on recent development
Nicola Serra (1), Teresa Rea (1), Paola Di Carlo (2), Consolato Sergi (3)

(1) Department of Public Health, University Federico II of Naples, Italy


(2) Department of Sciences for Health Promotion, Mother & Child Care, Univ. of Palermo, Italy
(3) Department of Lab. Medicine and Pathology, Univ. of Alberta, Edmonton, AB, Canada, Stollery Children’s Hospital, Univ. of Alberta, Edmonton, AB,
Canada

CORRESPONDING AUTHOR: Nicola Serra Ph.D., Department of Public Health, School of Medicine and Surgery, University Federico II of Naples, Italy.
E-mail: [email protected]

DOI: 10.2427/13059
Accepted on April 16, 2019

ABSTRACT

The Pearson’s chi-square test represents a nonparametric test more used in Biomedicine and Social Sciences, but
it introduces an error for 2x2 contingency tables, when a discrete probability distribution is approximated with a
continuous distribution. The first author to introduce the continuity correction of Pearson’s chi-square test has been
Yates F. (1934). Unfortunately, Yates’s correction may tend to overcorrect of p-value, this can implicate an overly
conservative result. Therefore many authors have introduced variants Pearson’s chi-square statistic, as alternative
continuity correction to Yates’s correction. The goal of this paper is to describe the most recent continuity corrections,
proposed for Pearson’s chi-square test.

Key words: Pearson’s x2 statistic; continuity correction; 2x2 contingency table; Yates’s continuity correction, Serra’s
continuity correction

INTRODUCTION variables. For two dichotomous variables, it is possible


to define a 2x2 contingency table, with the frequencies
Pearson’s chi-square test or c 2 test is the of occurrence of all combinations of their levels,
nonparametric test commonly used by researchers considering a sample size equal to N, as it is shown
in Biology, Medicine and Social Sciences. This test in Table 1
is based on the calculation of Pearson’s c2 statistic, In a 2x2 contingency table, Pearson’s c2 statistic
introduced by Pearson K. [1], considering a sample of is used to test the association between dichotomous
a population characterized by two o more dichotomous variables, for example to individualize a possible

Mini-review on continuity correction of Pearson’s x2 e13059-1


Epidemiology Biostatistics and Public Health - 2019, Volume 16, Number 2 ORIGINAL ARTICLES

association between variables such as sex (Male/Female) METHODS


and smoke (Yes/No). For this scope Pearson introduce
the chi-square statistic to evaluate the discrepancy In this section we introduce the most recent study
between observed (Oi,j ) and expected frequencies (Ei,j about continuity correction of Pearson’s c2 statistic in 2x2
), where the observed frequencies are a, b, c and d of contingency tables.
Tables 1. Instead the expected frequencies are defined
for every cell such as:
Serra’s continuity correction
ri c j
Ei , j = , i, j = 1, 2
N Recently Serra N. [8] introduces a significant
where i and j indicate the row and column index minimized of Pearson’s c2 statistic as a continuity correction
respectively. The formula to compute Pearson’s c2 statistic of Pearson’s c2 test, for small samples (sample size ≤
is described by Pearson K. (1900): 25). This approach is based on the observation that
the denominator r1 r2 c1 c2 of (1), can be interpreted
[1] as a geometric mean. The formula to compute minimize
Pearson’s c2 statistic in a 2x2 contingency table is:
where r1, r2, c1 and c2 i.e. the totals across rows and
columns are generally called marginal totals. [3]
Using the c2 distribution to interpret Pearson’s
c statistic requires one to assume that the discrete
2
Serra N., showed with a statistical approach, that for
probability of observed binomial frequencies of 2x2 small samples (≤25), the minimized Pearson’s c2 statistic in
contingency table, can be approximated by the 2x2 contingency tables, represents a continuty correction
continuous c2 distribution. This assumption is not entirely for Pearson’s c2 statistic more effective in comparison to
correct and introduces some error. To reduce the error Yates’continuity correction. Particularly in this study the author
in approximation, many authors introduced a continuity verify that, the Fisher’s exact test [9,10], actually considered
correction or variants of Pearson’s c2 test. the “gold test” used when c2 test is not appropriate, i.e.
To reduce the error introduced by Pearson’s when the sample size is small and the expected values in
c2 statistic, Yates F. [2] suggested a correction for any of the cells of a 2x2 contingency table are below 5,
continuity that adjusts the formula for Pearson’s c2 by had performance statistically equal to c2 Serra test.
subtracting the value 0.5, from the difference between
each observed value and its expected value for 2x2
contingency table. This correction reduces the c2 Kajita Matchita et al.’s continuity correction
value obtained and consequently increases its p-value.
The formula to compute Yates’s c2 statistic in a 2x2 Kajita Matchita et al. [11] proposed a continuity
contingency table is: correction to maintain a continuity value to be used when
small expected cell frequencies on Pearson’s c2 test for
[2] independence exist in the research data. This correction
method is used to control the type I error and obtained
using a developed correction in more condition. For this
Unfortunately, Yates’s correction may tend to scope the authors used a simulation study. The simulations
overcorrect of p-value; this can implicate an overly were performed with Monte Carlo method, to evaluate
conservative result, as reported by several authors [3-7]. the performance of their method in comparison to other
The goal of this study is with literature review, to continuity corrections such as Yates’s correction and
describe the most recent development about the continuity Williams’s correction [12]. It shows an outperformed
corrections by variants of Pearson’s c2 test defined for control of type I error, considering a pattern of data set at a
2x2 contingency tables. significant level of 0.05 and 0.01, simulated contingency
tables between 2x2 and 4x4 (2x2, 2x3, 2x4, 3x3, 3x4

TABLE 1. 2x2 contingency table form.

Column variable (X)


Row variable (Y) State 1 State 2 Row totals
State 1 a b a + b = r1
State 2 c d c + d = r2
Column totals a + c = c1 b + d = c2 N=a+b+c+d

e13059-2 Mini-review on continuity correction of Pearson’s x2


ORIGINAL ARTICLES Epidemiology Biostatistics and Public Health - 2019, Volume 16, Number 2

and 4x4), a number of small expected cell frequencies up funding agencies in the public, commercial, or not for
to 30% of the total cell used, a sample size between 5 profit sectors.
and 10 times that total cell, and using 10,000 data set
simulated by Monte Carlo method for each pattern. The
type I error (number rejection of null hypothesis divided Competing interests statement
by 10,000) was evaluated by Pearson’s c2 test, i.e. by
classical c2 test without continuity correction. There are no competing interests for this study.
In the case of 2x2 contingency tables, where the
type I error is greater than the significant level, the c2 test
equation to be used is as follows: References
1. Pearson K. (1900), On the criterion that a given system of deviations
[4] from the probable in the case of a correlated system of variables is
such that it can be reasonably supposed to have arisen from random
instead, where the type I error is less than the sampling. Philosophical Magazine Series 5; 50(302):157–175.
significant level, the c2 test equation is 2. Yates, F. (1934). Contingency tables involving small numbers and
the c2 test. Supplement to the Journal of the Royal Statistical Society,
[5] 1(2), 217-235.
3. Camilli, G., & Hopkins, K. D. (1978). Applicability of chi-square
where Oi,j and Ei,j represent the observed and to 2×2 contingency tables with small expected cell frequencies.
expected frequencies respectively, instead C is the Psychological Bulletin, 85(1), 163.
developed correction value. It was computed in two cases 4. Campbell, I. (2007). Chi-squared and Fisher–Irwin tests of two-
as follows, if the type I error is higher than the significant by-two tables with small sample recommendations. Statistics in
level, the authors try to replace the value C into equation Medicine, 26(19), 3661-3675.
(4) start from 0.01, 0.02, 0.03, ..., . If the type I error 5. Haber, M. (1982). The continuity correction and statistical testing.
is less than the significant level, they try to replace the International Statistical Review/Revue Internationale de Statistique,
value C into equation (5) start from 0.01, 0.02 , 0.03 135-144.
..., . After they replaced value C and computed type I 6. Richardson, J. T. (1990). Variants of chi-square for 2×2 contingency
error then to compared with significant level. Developed tables. British Journal of Mathematical and Statistical Psychology,
correction value (C) is the value which gets very similar 43(2), 309-326.
values between type I error and significant level. 7. Richardson, J. T. (2011). The analysis of 2×2 contingency tables—
Yet again. Statistics in Medicine, 30(8), 890-890.
8. Serra, N. (2018). A significant minimization of Pearson’s c2
CONCLUSION statistics in 2x2 contingency tables: preliminary results for small
samples. Epidemiology, Biostatistics and Public Health, 15(3).
In this paper we described the most recent studies 9. Agresti, A. (2001). Exact inference for categorical data: recent
of continuity correction of Pearson’s c2 test. Since the first advances and continuing controversies. Statistics in medicine,
continuity correction proposed by Yates (1934), produced an 20(17-18), 2709-2722.
overcorrection of the p-value, many authors are discouraging 10. Fisher, R.A. (1934), Statistical Methods for Research Workers.
its use. Instead other authors [13-18], have followed Yates Chapter 12. 5th Ed., Oliver & Boyd.
(1934) in claiming that the use of Pearson’s c2 in the case 11. Matchima, K., Vongprasert, J., & Chutiman, N. (2018). The
of 2x2 contingency tables tends to generate too many type Development of a Correction Method for Ensuring a Continuity
I errors, especially with small samples, therefore they defined Value of The Chi-square Test with a Small Expected Cell Frequency.
different continuity corrections of Pearson’s c2 statistic, to Naresuan University Journal: Science and Technology (NUJST),
reduce the type I error, and simultaneously to reduce the type 26(1), 98-105.
II error that Yates’s correction introduces 12. Mcdonald, J.H. (2014), Handbook of Biological Statistics.
Unfortunately, the study of continuity correction of Maryland: Sparky House Publishing.
Pearson’s c2 statistic is very limited in the recent statistical 13. Cochran WG. (1954), Some methods for strengthening the
literature, only two recent studies are dedicated at this common c2 tests. Biometrics; 10(4):417–451.
problem (Serra N., 2018 and Kajita Matchita et al., 14. Cox, D.R. (1970). The continuity correction. Biometrika 57: 217-
2018), showing of the variants of c2 statistic as continuity 219.
correction of Pearson’s c2 test. 15. Feller, W. (1968). An Introduction to Probability Theory and Its
Applications. Volume I, 3rd ed. John Wiley & Sons, Inc. New York.
16. Mantel, N., & Haenszel, W. (1959). Statistical aspects of the
Funding statement analysis of data from retrospective studies of disease. Journal of the
National Cancer Institute, 22(4), 719-748.
This research did not receive any specific grant from 17. Maxwell, E. A. (1976). Analysis of contingency tables and further

Mini-review on continuity correction of Pearson’s x2 e13059-3


Epidemiology Biostatistics and Public Health - 2019, Volume 16, Number 2 ORIGINAL ARTICLES

reasons for not using Yates correction in 2× 2 tables. The Canadian 2 comparative trial. Journal of the Royal Statistical Society. Series A
Journal of Statistics/La Revue Canadienne de Statistique, 277-290. (General), 86-105.
18. Upton, G. J. G. (1982). A comparison of alternative tests for the 2×

e13059-4 Mini-review on continuity correction of Pearson’s x2

You might also like