Method of Chi Square
Method of Chi Square
We always start with a null hypothesis that there is no association between the variables. For example, There is no association between the gender of applicants and whether or not their application is accepted. The following data have been made up to test this hypothesis. Data for chi-square tests can be from observational studies, surveys, administrative systems, models or experiments. The data need to be frequency counts (not percentages) and categorical. In theory there is no limit on the number of categories you can have but the more there are, the more difficult it will be to interpret the results. The first step is to put the observed data (what was actually measured/recorded) into a contingency table as shown in Table 1. Table 1: Observed frequencies Application successful Male 23 Female 31 TOTAL 54 Application not TOTAL successful 40 63 39 70 79 133
The second step is to calculate the expected frequencies. These are the frequencies that we would have expected to have recorded given the row/column totals. There are several different ways you can do this. The most basic involves calculating what you would observe if there was no association between the two variables. This is done by multiplying the row total by the column total and dividing the result by the table total, for each cell. This is shown in Table 2:
Table 2: Calculation of expected frequencies Application Application not TOTAL successful successful Male (63 * 54) / 133 = (63 * 79) / 133 = 37.42 63 25.58 Female TOTAL (70 * 54) / 133 = (70 * 79) / 133 = 41.58 28.42 54 79 70 133
Be sure to check that your observed and expected values both sum up to the same total. The third step is to calculate the chi-square statistic. The formula for chi-square is: 2 = (E-O) 2 / E Where E is the expected values and O is the observed values. The sigma sign means that everything that follows is summed. So (expected observed)2 / expected is calculated for each cell in the contingency table as shown below.
The expected value for this cell The observed value for this cell
Application not successful (37.42 - 40)2 / 37.42 = 0.18 (41.58 - 39)2 / 41.58 =0.16
Female
.. and then the results from each cell are summed: 0.26 + 0.18 + 0.23 + 0.16 = 0.83 And that is the X2 value. The next thing to do is calculate the degrees of freedom. This is: (number of rows 1) x (number of columns 1)
Methodology Glossary Tier 1 In the example above there are two rows and two columns, so the degrees of freedom is 1. The final step is to see whether the chi-square value, given the degrees of freedom, is statistically significant. This can be done by comparing the X2 value against a table of critical values that have been derived from the chi-square distribution. Alternatively most software packages will tell you the exact p-value so you can see instantly whether it is below the standard threshold of 0.05 or not. In this example, X2 = 0.83 which is greater than 0.0199, the critical value for 5% significance with 1 degree of freedom. Hence the result is not significant and we cannot reject the null hypothesis that there is no association between the gender of applicants and whether or not their application is accepted or not. What this means is that the likelihood that the difference between the observed and expected values is due to chance rather than any genuine affect is greater than 5%. Presenting and explaining results When presenting results in reports it is usual to give one table and put the expected frequencies in brackets, like this: Application successful 23 (25.58) 31 (28.42) 54 Application not TOTAL successful 40 (37.42) 63 39 (41.58) 70 79 133
The results are then written as X2 = 0.08, df = 1, p<0.05 (or p=NS if the result was not significant).