G-Test: Statistics Likelihood-Ratio Maximum Likelihood Statistical Significance Chi-Squared Tests
G-Test: Statistics Likelihood-Ratio Maximum Likelihood Statistical Significance Chi-Squared Tests
The commonly used chi-squared tests for goodness of fit to a distribution and for
independence in contingency tables are in fact approximations of the log-likelihood ratio on
which the G-tests are based. This approximation was developed by Karl Pearson because at
the time it was unduly laborious to calculate log-likelihood ratios. With the advent of
electronic calculators and personal computers, this is no longer a problem. G-tests are
coming into increasing use, particularly since they were recommended at least since the
1981 edition of the popular statistics textbook by Sokal and Rohlf.[1] Dunning[2] introduced
the test to the computational linguistics community where it is now widely used.
where Oi is the frequency observed in a cell, E is the frequency expected on the null
hypothesis, and the sum is taken across all cells. The corresponding general formula for G
is
where ln denotes the natural logarithm (log to the base e) and the sum is again taken over
all non-empty cells.
Let
, , and
and where
is the mutual information between the row vector and the column vector of the contingency
table.
It can also be shown[citation needed] that the inverse document frequency weighting commonly
used for text retrieval is an approximation of G applicable when the row sum for the query
is much smaller than the row sum for the remainder of the corpus. Similarly, the result of
Bayesian inference applied to a choice of single multinomial distribution for all rows of the
contingency table taken together versus the more general alternative of a separate
multinomial per row produces results very similar to the G statistic.[citation needed]
Given the null hypothesis that the observed frequencies result from random sampling from
a distribution with the given expected frequencies, the distribution of G is approximately a
chi-squared distribution, with the same number of degrees of freedom as in the
corresponding chi-squared test.
For samples of a reasonable size, the G-test and the chi-squared test will lead to the same
conclusions. However, the approximation to the theoretical chi-squared distribution for the
G-test is better than for the Pearson chi-squared tests in cases where for any cell
, and in any such case the G-test should always be used.[citation needed]
For very small samples the multinomial test for goodness of fit, and Fisher's exact test for
contingency tables, or even Bayesian hypothesis selection are preferable to either the chi-
squared test or the G-test.[citation needed]
Application
The G-test allows biologists to compare observed values with those predicted from a
specific null hypothesis. The G-test determines the probability that differences between the
observed and predicted (or expected) values are large enough that they are unlikely to have
occurred due to chance alone. The G-test is generally used with variables that are counts,
not scalars. The Chi-square test can be used in similar instances.
For example, the following table compares the observed distributions of genotypes in a
population with that predicted by the Hardy-Weinberg Principle. The aim of the G-test is to
determine whether the Aa genotype is really more common, and the aa genotype less
common, than they should be under the Hardy-Weinberg assumptions.
Genotype
AA Aa aa
Expected from H-W 32 64 32
Observed in population 32 74 22
A G value of zero means that the observed numbers are exactly equal to the expected
numbers. The larger the differences between observed and expected, the greater the value
of G. The higher the G value, the more likely that the results are significantly different from
that predicted by the null hypothesis (i.e., the smaller the P value).
G increases as the observed values become more and more different from the expected
values, but G also increases as we add more categories. To correct for this, we need to
figure the degrees of freedom in our sample. To calculate the degrees of freedom (df), we
need to determine the minimum number of categories whose value we need to know before
we could calculate the rest. In our genotype example, if we know any two genotype
categories, it is possible to calculate the third (by subtracting from the total). Thus the
degrees of freedom is two.
To determine whether the difference between the observed and expected values is greater
than that expected by chance alone, the G value is compared to those on a table. In the
table, the degrees of freedom are listed down the left side, and the Critical G-values are
given in the adjacent column.
Sometimes the expected values for the G-test are generated from the data themselves. For
example, you have data on the number of times that a large male cricket mates (n1=12),
and the number of times that a small male cricket mates (n2=4). You would expect, if
mating were not related to size, that the two types of males would have the same
opportunity to mate. Thus, since there were 16 total mating events, each male would be
expected to mate 8 times. You can then perform the G-test using 8 as the expected value for
each male, and 12 and 4 as the observed values.
"The genotype frequencies in the first simulation did not differ from those predicted by
Hardy-Weinberg equilibrium (G = 1.23, p > 0.05, d.f. = 2)."
OR if you did find an effect:
"The genotype frequencies in the small populations differed significantly from those
predicted by Hardy-Weinberg equilibrium (G = 8.45, p < 0.05, d.f. = 2)."