Assessment in Learning 1 Chi Square
Assessment in Learning 1 Chi Square
The chi-square (χ2) statistic is a test that measures how well a model compares to the actual
observed data. The data used to calculate the chi-square statistic must be random, raw, mutually
exclusive, obtained from independent variables, and obtained from a sufficiently large sample.
For example, the results of tossing a fair coin meet these criteria.
Chi-square tests are often used to test hypotheses. The chi-square statistic compares the size of
any difference between the expected results and the actual results, given the sample size and
number of variables in the relationship.
For these tests, degrees of freedom are used to determine whether a specific null hypothesis can
be rejected based on the total number of variables and samples within the experiment. As with
any statistic, the larger the sample size, the more reliable the results.
The chi-square (χ2) statistic is a measure of the difference between the observed and
expected frequency of outcomes of a set of events or variables.
Chi-square is useful for analyzing such differences in categorical variables, especially
those that are nominal in nature.
χ2 depends on the size of the difference between the actual and observed values, the
degrees of freedom, and the sample size.
χ2 can be used to test whether two variables are related or independent of each other.
It can also be used to test the goodness of fit between an observed distribution and a
theoretical distribution of frequencies.
The Formula for Chi-Square Is
where:
c=Degrees of freedom
O=Observed value(s)
E=Expected value(s)
Data values that are a simple random sample from the full population.
Categorical or nominal data. The Chi-square goodness of fit test is not appropriate for
continuous data.
A data set that is large enough so that at least five values are expected in each of the
observed data categories.
Chi-square goodness of fit test example
Let's use candy bags as an example. We collect a random sample of ten bags. Each bag has 100
pieces of candy and five flavors. Our hypothesis is that the proportions of the five flavors in each
bag are the same.
Let's start by answering: Is the Chi-square goodness of fit test an appropriate way to evaluate the
distribution of flavors in candy bags?
We have a simple random sample of 10 bags of candy. We meet this need.
Our categorical variable is candy flavors. We have a number of each flavor in 10 bags of
candy. We meet this need.
Each bag has 100 pieces of candy. Each bag has five flavors of candy. We expect to have
equal numbers for each flavor. This means we expect 100 / 5 = 20 pieces of candy in
each flavor from each bag. For the 10 bags in our sample, we expect 10 x 20 = 200 pieces
of candy in each flavor. This is more than the requirement of five expected values in each
category.
Based on the answers above, yes, the Chi-square goodness of fit test is an appropriate method to
evaluate the distribution of flavors in candy bags.
Let’s start by listing what we expect if each bag has the same number of pieces for each flavor.
Above, we calculated this as 200 for 10 bags of candy.
Comparison of actual vs expected number of pieces of each flavor of candy
To draw a conclusion, we compare the test statistic to a critical value from the Chi-Square
distribution. This activity involves four steps:
1. We first decide on the risk we are willing to take of drawing an incorrect conclusion
based on our sample observations. For the candy data, we decide prior to collecting
data that we are willing to take a 5% risk of concluding that the flavor counts in each
bag across the full population are not equal when they really are. In statistics-speak,
we set the significance level, α , to 0.05.
2. We calculate a test statistic. Our test statistic is 52.75.
3. We find the theoretical value from the Chi-square distribution based on our
significance level. The theoretical value is the value we would expect if the bags
contain the same number of pieces of candy for each flavor.
In addition to the significance level, we also need the degrees of freedom to find this
value. For the goodness of fit test, this is one fewer than the number of categories. We
have five flavors of candy, so we have 5 – 1 = 4 degrees of freedom.
Chart Title
300
250
250
200 225 225
200 200 200 200 200
150 180
100 120
50
0
Apple Lime Cherry Orange Grape
Expected Actual
∑ni=1(Oi−Ei)2Ei∑i=1n(Oi−Ei)2Ei
In the formula above, we have n groups. The ∑∑ symbol means to add up the calculations for
each group. For each group, we do the same steps as in the candy example. The formula
shows Oi as the Observed value and Ei as the Expected value for a group.
We then compare the test statistic to a Chi-square value with our chosen significance level (also
called the alpha level) and the degrees of freedom for our data. Using the candy data as an
example, we set α = 0.05 and have four degrees of freedom. For the candy data, the Chi-square
value is written as:
χ²0.05,4χ²0.05,4
The test statistic is lower than the Chi-square value. You fail to reject the hypothesis of
equal proportions. You conclude that the bags of candy across the entire population have
the same number of pieces of each flavor in them. The fit of equal proportions is “good
enough.”
The test statistic is higher than the Chi-Square value. You reject the hypothesis of equal
proportions. You cannot conclude that the bags of candy have the same number of pieces
of each flavor. The fit of equal proportions is “not good enough.”
Sharmaine A. Mislang