Chi-Square: History and Definition
Chi-Square: History and Definition
Illustration
Say, you are a technology company selling different software solutions and you want
to predict customer acceptance of your latest offering. You could conduct a pilot test among
your prospects and collect the customer experience data. The normality of this data would
then be checked and verified and chi-square analysis conducted. The analysis may reveal
that additional features are required in the software to make it more useful and user-
friendly. Hence, this would give you a better idea of your customer's probable acceptance of
your new software solution.
The advantages of chi-square test are based in the fact that it is a non-parametric test. Firstly, it is
extremely easy to calculate and interpret. Next, it can be used on nominal data.
Further, it can be applied in a wide area including surveys, business decision making, quality control,
biological research, medical research, etc. Also, chi-square tests are commonly used in studies dealing with
demographics, Likert scales, and other discrete data.
The subscript “c” are the degrees of freedom. “O” is your observed value and E is
your expected value. It’s very rare that you’ll want to actually use this formula to find a
critical chi-square value by hand. The summation symbol means that you’ll have to perform
a calculation for every single data item in your data set. As you can probably imagine, the
calculations can get very, very, lengthy and tedious. Instead, you’ll probably want to use
technology:
Chi Square Test in SPSS.
Chi Square P-Value in Excel.
A chi-square statistic is one way to show a relationship between two categorical
variables. In statistics, there are two types of variables: numerical (countable) variables and
non-numerical (categorical) variables. The chi-squared statistic is a single number that tells
you how much difference exists between your observed counts and the counts you would
expect if there were no relationship at all in the population.
There are a few variations on the chi-square statistic. Which one you use depends
upon how you collected the data and which hypothesis is being tested. However, all of the
variations use the same idea, which is that you are comparing your expected values with the
values you actually collect. One of the most common forms can be used for contingency
tables:
There are a few variations on the chi-square statistic. Which one you use depends
upon how you collected the data and which hypothesis is being tested. However, all of the
variations use the same idea, which is that you are comparing your expected values with the
values you actually collect. One of the most common forms can be used for contingency
tables:
Where O is the observed value, E is the expected value and “i” is the “ith” position in
the contingency table.
A low value for chi-square means there is a high correlation between your two sets
of data. In theory, if your observed and expected values were equal (“no difference”) then
chi-square would be zero — an event that is unlikely to happen in real life. Deciding whether
a chi-square test statistic is large enough to indicate a statistically significant difference isn’t
as easy it seems. It would be nice if we could say a chi-square test statistic >10 means a
difference, but unfortunately that isn’t the case.
You could take your calculated chi-square value and compare it to a critical value
from a chi-square table. If the chi-square value is more than the critical value, then there is
a significant difference.
You could also use a p-value. First state the null hypothesis and the alternate
hypothesis. Then generate a chi-square curve for your results along with a p-value
(See: Calculate a chi-square p-value Excel). Small p-values (under 5%) usually indicate that
a difference is significant (or “small enough”).
Tip: The Chi-square statistic can only be used on numbers. They can’t be used for
percentages, proportions, means or similar statistical value. For example, if you have 10
percent of 200 people, you would need to convert that to a number (20) before you can run
a test statistic.
Uses
The chi-squared distribution has many uses in statistics, including:
Chi Distribution
A similar distribution is the chi distribution. This distribution describes the square root of
a variable distributed according to a chi-square
distribution.; with df = n > 0 degrees of freedom has
a probability density function of:
f(x) = 2(1-n/2) x(n-1) e(-(x2)/2) / Γ(n/2)
The chi-square formula is a difficult formula to deal with. That’s mostly because you’re
expected to add a large amount of numbers. The easiest way to solve the formula is by
making a table.
Sample question: 256 visual artists were surveyed to find out their zodiac sign. The results
were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19),
Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis
that zodiac signs are evenly distributed across visual artists.
Step 1: Make a table with columns for “Categories,” “Observed,” “Expected,” “Residual
(Obs-Exp)”, “(Obs-Exp)2” and “Component (Obs-Exp)2 / Exp.” Don’t worry what these mean
right now; We’ll cover that in the following steps.
Step 2: Fill in your categories. Categories should be given to you in the question. There
are 12 zodiac signs, so:
Step 3: Write your counts. Counts are the number of each items in each category in
column 2. You’re given the counts in the question:
Step 4: Calculate your expected value for column 3. In this question, we would expect
the 12 zodiac signs to be evenly distributed for all 256 people, so 256/12=21.333. Write this
in column 3.
Step 5: Subtract the expected value (Step 4) from the Observed value (Step 3) and
place the result in the “Residual” column. For example, the first row is Aries: 29-
21.333=7.667.
Step 6: Square your results from Step 5 and place the amounts in the (Obs-
Exp)2 column.
Step 7: Divide the amounts in Step 6 by the expected value (Step 4) and place those
results in the final column.
SPSS Instructions.
You’ll find the chi square test in SPSS under “Crosstabs”.
Step 3: Click “Chi Square” to place a check in the box and then click “Continue” to return
to the Crosstabs window.
Step 4: Select the variables you want to run (in other words, choose two variables that
you want to compare using the chi square test). Click one variable in the left window and
then click the arrow at the top to move the variable into “Row(s).” Repeat to add a second
variable to the “Column(s)” window.
Step 5: Click “cells” and then check “Rows” and “Columns”. Click “Continue.”
Step 6: Click “OK” to run the Chi Square Test. The Chi Square tests will be returned at
the bottom of the output sheet in the “Chi Square Tests” box.
Step 7: Compare the p-value returned in the chi-squ
The subscript “c” are the degrees of freedom. “O” is your observed value and E is
your expected value. It’s very rare that you’ll want to actually use this formula to find a
critical chi-square value by hand. The summation symbol means that you’ll have to perform
a calculation for every single data item in your data set. As you can probably imagine, the
calculations can get very, very, lengthy and tedious. Instead, you’ll probably want to use
technology:
Chi Square Test in SPSS.
Chi Square P-Value in Excel.
A chi-square statistic is one way to show a relationship between two categorical variables. In
statistics, there are two types of variables: numerical (countable) variables and non-
numerical (categorical) variables. The chi-squared statistic is a single number that tells you
how much difference exists between your observed counts and the counts you would expect
if there were no relationship at all in the population.
There are a few variations on the chi-square statistic. Which one you use depends upon
how you collected the data and which hypothesis is being tested. However, all of the
variations use the same idea, which is that you are comparing your expected values with the
values you actually collect. One of the most common forms can be used for contingency
tables:
Where O is the observed value, E is the expected value and “i” is the “ith” position in the
contingency table.
A low value for chi-square means there is a high correlation between your two sets of data.
In theory, if your observed and expected values were equal (“no difference”) then chi-square
would be zero — an event that is unlikely to happen in real life. Deciding whether a chi-
square test statistic is large enough to indicate a statistically significant difference isn’t as
easy it seems. It would be nice if we could say a chi-square test statistic >10 means a
difference, but unfortunately that isn’t the case.
You could take your calculated chi-square value and compare it to a critical value from a chi-
square table. If the chi-square value is more than the critical value, then there is a
significant difference.
You could also use a p-value. First state the null hypothesis and the alternate hypothesis.
Then generate a chi-square curve for your results along with a p-value (See: Calculate a chi-
square p-value Excel). Small p-values (under 5%) usually indicate that a difference is
significant (or “small enough”).
Tip: The Chi-square statistic can only be used on numbers. They can’t be used for
percentages, proportions, means or similar statistical value. For example, if you have 10
percent of 200 people, you would need to convert that to a number (20) before you can run
a test statistic.
Back to Top
The chi-square distribution (also called the chi-squared distribution) is a special case of
the gamma distribution; A chi square distribution with n degrees of freedom is equal to a
gamma distribution with a = n / 2 and b = 0.5 (or β = 2).
Let’s say you have a random sample taken from a normal distribution. The chi square
distribution is the distribution of the sum of these random samples squared . The degrees
of freedom (k) are equal to the number of samples being summed. For example, if you
have taken 10 samples from the normal distribution, then df = 10. The degrees of freedom
in a chi square distribution is also its mean. In this example, the mean of this particular
distribution will be 10. Chi square distributions are always right skewed. However, the
greater the degrees of freedom, the more the chi square distribution looks like a normal
distribution.
Uses
The chi-squared distribution has many uses in statistics, including:
Chi Distribution
A similar distribution is the chi distribution. This distribution describes the square root of
a variable distributed according to a chi-square distribution.; with df = n > 0 degrees of
freedom has a probability density function of:
f(x) = 2(1-n/2) x(n-1) e(-(x2)/2) / Γ(n/2)
For values where x is positive.
The cdf for this function does not have a closed form, but it can be approximated with a
series of integrals, using calculus.
Back to Top
The chi-square formula is a difficult formula to deal with. That’s mostly because you’re
expected to add a large amount of numbers. The easiest way to solve the formula is by
making a table.
Sample question: 256 visual artists were surveyed to find out their zodiac sign. The results
were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19),
Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis
that zodiac signs are evenly distributed across visual artists.
Step 1: Make a table with columns for “Categories,” “Observed,” “Expected,” “Residual
(Obs-Exp)”, “(Obs-Exp)2” and “Component (Obs-Exp)2 / Exp.” Don’t worry what these mean
right now; We’ll cover that in the following steps.
Step 2: Fill in your categories. Categories should be given to you in the question. There
are 12 zodiac signs, so:
Step 3: Write your counts. Counts are the number of each items in each category in
column 2. You’re given the counts in the question:
Step 4: Calculate your expected value for column 3. In this question, we would expect
the 12 zodiac signs to be evenly distributed for all 256 people, so 256/12=21.333. Write this
in column 3.
Step 5: Subtract the expected value (Step 4) from the Observed value (Step 3) and
place the result in the “Residual” column. For example, the first row is Aries: 29-
21.333=7.667.
Step 6: Square your results from Step 5 and place the amounts in the (Obs-
Exp)2 column.
Step 7: Divide the amounts in Step 6 by the expected value (Step 4) and place those
results in the final column.
SPSS Instructions.
You’ll find the chi square test in SPSS under “Crosstabs”.
Step 1: Click “Analyze,” then click “Descriptive Statistics,” then click “Crosstabs.”
Chi square in SPSS is found in the Crosstabs command.
Step 2: Click the “Statistics” button. The statistics button is to the right of the Crosstabs
window. A new pop up window will appear.
Step 3: Click “Chi Square” to place a check in the box and then click “Continue” to return
to the Crosstabs window.
Step 4: Select the variables you want to run (in other words, choose two variables that
you want to compare using the chi square test). Click one variable in the left window and
then click the arrow at the top to move the variable into “Row(s).” Repeat to add a second
variable to the “Column(s)” window.
Step 5: Click “cells” and then check “Rows” and “Columns”. Click “Continue.”
Step 6: Click “OK” to run the Chi Square Test. The Chi Square tests will be returned at
the bottom of the output sheet in the “Chi Square Tests” box.
Step 7: Compare the p-value returned in the chi-square area (listed in the Asymp Sig
column) to your chosen alpha level.
Back to Top
Check out our YouTube channel for more help with stats. Find dozens of videos on basic
stats principles plus how to calculate stats using Microsoft Excel.
A chi-square test for independence shows how categorical variables are related. There are a
few variations on the statistic; which one you use depends upon how you collected the data.
It also depends on how your hypothesis is worded. All of the variations use the same idea;
you are comparing the values you expect to get (expected values) with the values you
actually collect (observed values). One of the most common forms can be used in
a contingency table.
The chi square hypothesis test is appropriate if you have:
https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/chi-
square/
https://www.researchoptimus.com/article/what-is-chi-square.php