0% found this document useful (0 votes)
89 views16 pages

Chi-Square: History and Definition

The document discusses the chi-square distribution and chi-square test, including their history, uses, calculations, and applications. The chi-square distribution describes the sum of squares of independent normal variables. A chi-square test is used to test goodness of fit or independence between categorical variables.

Uploaded by

Nhia Oblea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views16 pages

Chi-Square: History and Definition

The document discusses the chi-square distribution and chi-square test, including their history, uses, calculations, and applications. The chi-square distribution describes the sum of squares of independent normal variables. A chi-square test is used to test goodness of fit or independence between categorical variables.

Uploaded by

Nhia Oblea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Chi-square

History and Definition


A chi-square distribution is the distribution of the sum of squares of k independent
standard normal random variables with k degree of freedom. A chi-square test is a statistical
hypothesis test where the null hypothesis that the distribution of the test statistic is a chi-
square distribution, is true.
While the chi-square distribution was first introduced by German statistician Friedrich
Robert Helmert, the chi-square test was first used by Karl Pearson in 1900. The most popular
chi-square test is Pearson's chi-squared test and is also called 'chi-squared' test and denoted
by 'Χ²'. A classical example of chi-square test is the test for fairness of a die where we test
the hypothesis that all six possible outcomes are equally likely.

 
Illustration
Say, you are a technology company selling different software solutions and you want
to predict customer acceptance of your latest offering. You could conduct a pilot test among
your prospects and collect the customer experience data. The normality of this data would
then be checked and verified and chi-square analysis conducted. The analysis may reveal
that additional features are required in the software to make it more useful and user-
friendly. Hence, this would give you a better idea of your customer's probable acceptance of
your new software solution.

Practical Applications and Benefits of Chi-


square
A chi-square test is useful for testing the 'goodness of fit' of an observed distribution with a
theoretical distribution; and in qualitative data to test the 'independence' of two criteria of classification. It is
also used to estimate the confidence interval for a normally distributed population's standard deviation from
the sample standard deviation; or for other tests like ANOVA and Friedman's Rank ANOVA.

The advantages of chi-square test are based in the fact that it is a non-parametric test. Firstly, it is
extremely easy to calculate and interpret. Next, it can be used on nominal data.

Further, it can be applied in a wide area including surveys, business decision making, quality control,
biological research, medical research, etc. Also, chi-square tests are commonly used in studies dealing with
demographics, Likert scales, and other discrete data.

Chi-Square Statistic: How to Calculate It /


Distribution
What is a Chi Square Test?
There are two types of chi-square tests. Both use the chi-square statistic and distribution
for different purposes:
 A chi-square goodness of fit test determines if a sample data matches a
population. For more details on this type, see: Goodness of Fit Test.
 A chi-square test for independence compares two variables in a contingency
table to see if they are related. In a more general sense, it tests to see whether
distributions of categorical variables differ from each another.
 A very small chi square test statistic means that your observed data fits
your expected data extremely well. In other words, there is a relationship.
 A very large chi square test statistic means that the data does not fit very
well. In other words, there isn’t a relationship.

What is a Chi-Square Statistic?


The formula for the chi-square statistic used in the chi square test is:

The chi-square formula.

The subscript “c” are the degrees of freedom. “O” is your observed value and E is
your expected value. It’s very rare that you’ll want to actually use this formula to find a
critical chi-square value by hand. The summation symbol means that you’ll have to perform
a calculation for every single data item in your data set. As you can probably imagine, the
calculations can get very, very, lengthy and tedious. Instead, you’ll probably want to use
technology:
 Chi Square Test in SPSS.
 Chi Square P-Value in Excel.
A chi-square statistic is one way to show a relationship between two categorical
variables. In statistics, there are two types of variables: numerical (countable) variables and
non-numerical (categorical) variables. The chi-squared statistic is a single number that tells
you how much difference exists between your observed counts and the counts you would
expect if there were no relationship at all in the population.
There are a few variations on the chi-square statistic. Which one you use depends
upon how you collected the data and which hypothesis is being tested. However, all of the
variations use the same idea, which is that you are comparing your expected values with the
values you actually collect. One of the most common forms can be used for contingency
tables:

There are a few variations on the chi-square statistic. Which one you use depends
upon how you collected the data and which hypothesis is being tested. However, all of the
variations use the same idea, which is that you are comparing your expected values with the
values you actually collect. One of the most common forms can be used for contingency
tables:
Where O is the observed value, E is the expected value and “i” is the “ith” position in
the contingency table.

A low value for chi-square means there is a high correlation between your two sets
of data. In theory, if your observed and expected values were equal (“no difference”) then
chi-square would be zero — an event that is unlikely to happen in real life. Deciding whether
a chi-square test statistic is large enough to indicate a statistically significant difference isn’t
as easy it seems. It would be nice if we could say a chi-square test statistic >10 means a
difference, but unfortunately that isn’t the case.
You could take your calculated chi-square value and compare it to a critical value
from a chi-square table. If the chi-square value is more than the critical value, then there is
a significant difference.
You could also use a p-value. First state the null hypothesis and the alternate
hypothesis. Then generate a chi-square curve for your results along with a p-value
(See: Calculate a chi-square p-value Excel). Small p-values (under 5%) usually indicate that
a difference is significant (or “small enough”).
Tip: The Chi-square statistic can only be used on numbers. They can’t be used for
percentages, proportions, means or similar statistical value. For example, if you have 10
percent of 200 people, you would need to convert that to a number (20) before you can run
a test statistic.

Chi Square P-Values


A chi square test will give you a p-value. The p-value will tell you if your test results
are significant or not. In order to perform a chi square test and get the p-value, you need
two pieces of information:
1. Degrees of freedom. That’s just the number of categories minus 1.
2. The alpha level(α). This is chosen by you, or the researcher. The usual alpha level is
0.05 (5%), but you could also have other levels like 0.01 or 0.10.
 
In elementary statistics or AP statistics, both the degrees of freedom(df) and the alpha level
are usually given to you in a question. You don’t normally have to figure out what they are.
You may have to figure out the df yourself, but it’s pretty simple: count the categories and
subtract 1.
Degrees of freedom are placed as a subscript after the chi-square (Χ2) symbol. For
example, the following chi square shows 6 df:Χ26.
And this chi square shows 4 df:Χ 4.
2

The Chi-Square Distribution


The chi-square distribution (also called the chi-squared distribution) is a special case of
the gamma distribution; A chi square distribution with n degrees of freedom is equal to a
gamma distribution with a = n / 2 and b = 0.5 (or β = 2).
Let’s say you have a random sample taken from a normal distribution. The chi square
distribution is the distribution of the sum of these random samples squared . The degrees
of freedom (k) are equal to the number of samples being summed. For example, if you
have taken 10 samples from the normal distribution, then df = 10. The degrees of freedom
in a chi square distribution is also its mean. In this example, the mean of this particular
distribution will be 10. Chi square distributions are always right skewed. However, the
greater the degrees of freedom, the more the chi square distribution looks like a normal
distribution.

Uses
The chi-squared distribution has many uses in statistics, including:

 Confidence interval estimation for a population standard deviation of a normal


distribution from a sample standard deviation.
 Independence of two criteria of classification of qualitative variables.
 Relationships between categorical variables (contingency tables).
 Sample variance study when the underlying distribution is normal.
 Tests of deviations of differences between expected and observed frequencies (one-way
tables).
 The chi-square test (a goodness of fit test).

Chi Distribution
A similar distribution is the chi distribution. This distribution describes the square root of
a variable distributed according to a chi-square
distribution.; with df = n > 0 degrees of freedom has
a probability density function of:
f(x) = 2(1-n/2) x(n-1)  e(-(x2)/2) / Γ(n/2)

For values where x is positive.

The cdf for this function does not have a closed form, but it


can be approximated with a series of integrals,
using calculus.

How to Calculate a Chi Square Statistic


A chi-square statistic is used for testing hypotheses. Watch this video, How to calculate a chi
square, or read the steps below.
The chi-square formula.

The chi-square formula is a difficult formula to deal with. That’s mostly because you’re
expected to add a large amount of numbers. The easiest way to solve the formula is by
making a table.
Sample question: 256 visual artists were surveyed to find out their zodiac sign. The results
were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19),
Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis
that zodiac signs are evenly distributed across visual artists.
Step 1: Make a table with columns for “Categories,” “Observed,” “Expected,” “Residual
(Obs-Exp)”, “(Obs-Exp)2” and “Component (Obs-Exp)2 / Exp.” Don’t worry what these mean
right now; We’ll cover that in the following steps.

Step 2: Fill in your categories. Categories should be given to you in the question. There
are 12 zodiac signs, so:
Step 3: Write your counts. Counts are the number of each items in each category in
column 2. You’re given the counts in the question:

Step 4: Calculate your expected value for column 3. In this question, we would expect
the 12 zodiac signs to be evenly distributed for all 256 people, so 256/12=21.333. Write this
in column 3.

Step 5: Subtract the expected value (Step 4) from the Observed value (Step 3) and
place the result in the “Residual” column. For example, the first row is Aries: 29-
21.333=7.667.

Step 6: Square your results from Step 5 and place the amounts in the (Obs-
Exp)2 column.
Step 7: Divide the amounts in Step 6 by the expected value (Step 4) and place those
results in the final column.

Step 8: Add up (sum) all the values in the last column.

This is the chi-square statistic: 5.094.

SPSS Instructions.
You’ll find the chi square test in SPSS under “Crosstabs”.

Sample problem: Run a chi square test in SPSS.


Note:in order to run a chi-square test in SPSS you should already have written a hypothesis
statement. See: How to state the null hypothesis.
Watch the video or read the steps below:
Step 1: Click “Analyze,” then click “Descriptive Statistics,” then click “Crosstabs.”

Chi square in SPSS is found in the Crosstabs command.


Step 2: Click the “Statistics” button. The statistics button is to the right of the Crosstabs
window. A new pop up window will appear.

Step 3: Click “Chi Square” to place a check in the box and then click “Continue” to return
to the Crosstabs window.
Step 4: Select the variables you want to run (in other words, choose two variables that
you want to compare using the chi square test). Click one variable in the left window and
then click the arrow at the top to move the variable into “Row(s).” Repeat to add a second
variable to the “Column(s)” window.
Step 5: Click “cells” and then check “Rows” and “Columns”. Click “Continue.”
Step 6: Click “OK” to run the Chi Square Test. The Chi Square tests will be returned at
the bottom of the output sheet in the “Chi Square Tests” box.
Step 7: Compare the p-value returned in the chi-squ

How To Test a Chi Square Hypothesis (Test


for Independence)
Watch the video, or read the steps below:
A chi-square test for independence shows how categorical variables are related. There are a
few variations on the statistic; which one you use depends upon how you collected the data.
It also depends on how your hypothesis is worded. All of the variations use the same idea;
you are comparing the values you expect to get (expected values) with the values you
actually collect (observed values). One of the most common forms can be used in
a contingency table.
The chi square hypothesis test is appropriate if you have:

 Discrete outcomes (categorical.)


 Dichotomous variables.
 Ordinal variables.
For example, you could have a clinical trial with blood sugar outcomes of hypoglycemic,
normoglycemic, or hyperglycemic.

Chi-Square Statistic: How to Calculate It /


Distribution
Contents
Click to skip to a section
Definitions
1. What is a Chi Square Test?
2. What is a Chi-Square Statistic?
3. Chi Square P-Values.
4. The Chi-Square Distribution & Chi Distribution
Calculations:
1. How to Calculate a Chi-Square Statistic:
 By Hand (with video)
 SPSS Instructions.
2. How To Test a Chi Square Hypothesis (with video)
See also:
 Chi-square test for normality.

What is a Chi Square Test?


There are two types of chi-square tests. Both use the chi-square statistic and distribution
for different purposes:
 A chi-square goodness of fit test determines if a sample data matches a
population. For more details on this type, see: Goodness of Fit Test.
 A chi-square test for independence compares two variables in a contingency
table to see if they are related. In a more general sense, it tests to see whether
distributions of categorical variables differ from each another.
 A very small chi square test statistic means that your observed data fits
your expected data extremely well. In other words, there is a relationship.
 A very large chi square test statistic means that the data does not fit very
well. In other words, there isn’t a relationship.
Back to Top

What is a Chi-Square Statistic?


The formula for the chi-square statistic used in the chi square test is:
The chi-square formula.

The subscript “c” are the degrees of freedom. “O” is your observed value and E is
your expected value. It’s very rare that you’ll want to actually use this formula to find a
critical chi-square value by hand. The summation symbol means that you’ll have to perform
a calculation for every single data item in your data set. As you can probably imagine, the
calculations can get very, very, lengthy and tedious. Instead, you’ll probably want to use
technology:
 Chi Square Test in SPSS.
 Chi Square P-Value in Excel.
A chi-square statistic is one way to show a relationship between two categorical variables. In
statistics, there are two types of variables: numerical (countable) variables and non-
numerical (categorical) variables. The chi-squared statistic is a single number that tells you
how much difference exists between your observed counts and the counts you would expect
if there were no relationship at all in the population.
There are a few variations on the chi-square statistic. Which one you use depends upon
how you collected the data and which hypothesis is being tested. However, all of the
variations use the same idea, which is that you are comparing your expected values with the
values you actually collect. One of the most common forms can be used for contingency
tables:

Where O is the observed value, E is the expected value and “i” is the “ith” position in the
contingency table.

A low value for chi-square means there is a high correlation between your two sets of data.
In theory, if your observed and expected values were equal (“no difference”) then chi-square
would be zero — an event that is unlikely to happen in real life. Deciding whether a chi-
square test statistic is large enough to indicate a statistically significant difference isn’t as
easy it seems. It would be nice if we could say a chi-square test statistic >10 means a
difference, but unfortunately that isn’t the case.
You could take your calculated chi-square value and compare it to a critical value from a chi-
square table. If the chi-square value is more than the critical value, then there is a
significant difference.
You could also use a p-value. First state the null hypothesis and the alternate hypothesis.
Then generate a chi-square curve for your results along with a p-value (See: Calculate a chi-
square p-value Excel). Small p-values (under 5%) usually indicate that a difference is
significant (or “small enough”).
Tip: The Chi-square statistic can only be used on numbers. They can’t be used for
percentages, proportions, means or similar statistical value. For example, if you have 10
percent of 200 people, you would need to convert that to a number (20) before you can run
a test statistic.
Back to Top

Chi Square P-Values.


A chi square test will give you a p-value. The p-value will tell you if your test results
are significant or not. In order to perform a chi square test and get the p-value, you need
two pieces of information:
1. Degrees of freedom. That’s just the number of categories minus 1.
2. The alpha level(α). This is chosen by you, or the researcher. The usual alpha level is
0.05 (5%), but you could also have other levels like 0.01 or 0.10.
 
In elementary statistics or AP statistics, both the degrees of freedom(df) and the alpha level
are usually given to you in a question. You don’t normally have to figure out what they are.
You may have to figure out the df yourself, but it’s pretty simple: count the categories and
subtract 1.
Degrees of freedom are placed as a subscript after the chi-square (Χ2) symbol. For
example, the following chi square shows 6 df:
Χ26.
And this chi square shows 4 df:
Χ24.
Back to Top

The Chi-Square Distribution


By Geek3|Wikimedia Commons  GFDL

The chi-square distribution (also called the chi-squared distribution) is a special case of
the gamma distribution; A chi square distribution with n degrees of freedom is equal to a
gamma distribution with a = n / 2 and b = 0.5 (or β = 2).
Let’s say you have a random sample taken from a normal distribution. The chi square
distribution is the distribution of the sum of these random samples squared . The degrees
of freedom (k) are equal to the number of samples being summed. For example, if you
have taken 10 samples from the normal distribution, then df = 10. The degrees of freedom
in a chi square distribution is also its mean. In this example, the mean of this particular
distribution will be 10. Chi square distributions are always right skewed. However, the
greater the degrees of freedom, the more the chi square distribution looks like a normal
distribution.

Uses
The chi-squared distribution has many uses in statistics, including:

 Confidence interval estimation for a population standard deviation of a normal


distribution from a sample standard deviation.
 Independence of two criteria of classification of qualitative variables.
 Relationships between categorical variables (contingency tables).
 Sample variance study when the underlying distribution is normal.
 Tests of deviations of differences between expected and observed frequencies (one-
way tables).
 The chi-square test (a goodness of fit test).

Chi Distribution
A similar distribution is the chi distribution. This distribution describes the square root of
a variable distributed according to a chi-square distribution.; with df = n > 0 degrees of
freedom has a probability density function of:
f(x) = 2(1-n/2) x(n-1)  e(-(x2)/2) / Γ(n/2)
For values where x is positive.

The cdf for this function does not have a closed form, but it can be approximated with a
series of integrals, using calculus.
Back to Top

How to Calculate a Chi Square Statistic


A chi-square statistic is used for testing hypotheses. Watch this video, How to calculate a chi
square, or read the steps below.
The chi-square formula.

The chi-square formula is a difficult formula to deal with. That’s mostly because you’re
expected to add a large amount of numbers. The easiest way to solve the formula is by
making a table.
Sample question: 256 visual artists were surveyed to find out their zodiac sign. The results
were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19),
Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis
that zodiac signs are evenly distributed across visual artists.
Step 1: Make a table with columns for “Categories,” “Observed,” “Expected,” “Residual
(Obs-Exp)”, “(Obs-Exp)2” and “Component (Obs-Exp)2 / Exp.” Don’t worry what these mean
right now; We’ll cover that in the following steps.

Step 2: Fill in your categories. Categories should be given to you in the question. There
are 12 zodiac signs, so:

Step 3: Write your counts. Counts are the number of each items in each category in
column 2. You’re given the counts in the question:

Step 4: Calculate your expected value for column 3. In this question, we would expect
the 12 zodiac signs to be evenly distributed for all 256 people, so 256/12=21.333. Write this
in column 3.

Step 5: Subtract the expected value (Step 4) from the Observed value (Step 3) and
place the result in the “Residual” column. For example, the first row is Aries: 29-
21.333=7.667.

Step 6: Square your results from Step 5 and place the amounts in the (Obs-
Exp)2 column.

Step 7: Divide the amounts in Step 6 by the expected value (Step 4) and place those
results in the final column.

Step 8: Add up (sum) all the values in the last column.

This is the chi-square statistic: 5.094.


Like the explanation?  Check out the Practically Cheating Statistics Handbook, which has
hundreds more step-by-step explanations, just like this one!
Back to Top

SPSS Instructions.
You’ll find the chi square test in SPSS under “Crosstabs”.

Sample problem: Run a chi square test in SPSS.


Note:in order to run a chi-square test in SPSS you should already have written a hypothesis
statement. See: How to state the null hypothesis.
Watch the video or read the steps below:

Step 1: Click “Analyze,” then click “Descriptive Statistics,” then click “Crosstabs.”
Chi square in SPSS is found in the Crosstabs command.

Step 2: Click the “Statistics” button. The statistics button is to the right of the Crosstabs
window. A new pop up window will appear.

Step 3: Click “Chi Square” to place a check in the box and then click “Continue” to return
to the Crosstabs window.
Step 4: Select the variables you want to run (in other words, choose two variables that
you want to compare using the chi square test). Click one variable in the left window and
then click the arrow at the top to move the variable into “Row(s).” Repeat to add a second
variable to the “Column(s)” window.
Step 5: Click “cells” and then check “Rows” and “Columns”. Click “Continue.”
Step 6: Click “OK” to run the Chi Square Test. The Chi Square tests will be returned at
the bottom of the output sheet in the “Chi Square Tests” box.
Step 7: Compare the p-value returned in the chi-square area (listed in the Asymp Sig
column) to your chosen alpha level.
Back to Top
Check out our YouTube channel for more help with stats. Find dozens of videos on basic
stats principles plus how to calculate stats using Microsoft Excel.

How To Test a Chi Square Hypothesis (Test


for Independence)
Watch the video, or read the steps below:

A chi-square test for independence shows how categorical variables are related. There are a
few variations on the statistic; which one you use depends upon how you collected the data.
It also depends on how your hypothesis is worded. All of the variations use the same idea;
you are comparing the values you expect to get (expected values) with the values you
actually collect (observed values). One of the most common forms can be used in
a contingency table.
The chi square hypothesis test is appropriate if you have:

 Discrete outcomes (categorical.)


 Dichotomous variables.
 Ordinal variables.
For example, you could have a clinical trial with blood sugar outcomes of hypoglycemic,
normoglycemic, or hyperglycemic.

Test a Chi Square Hypothesis: Steps


Sample question: Test the chi-square hypothesis with the following characteristics:
1. 11 Degrees of Freedom
2. Chi square test statistic of 5.094
Note: Degrees of freedom equals the number of categories minus 1.
Step 1: Take the chi-square statistic. Find the p-value in the chi-square table. If you are
unfamiliar with chi-square tables, the chi square table link also includes a short video on how
to read the table. The closest value for df=11 and 5.094 is between .900 and .950.
Note: The chi square table doesn’t offer exact values for every single possibility. If you use
a calculator, you can get an exact value. The exact p value is 0.9265.
Step 2: Use the p-value you found in Step 1. Decide whether to support or reject the null
hypothesis. In general, small p-values (1% to 5%) would cause you to reject the null
hypothesis. This very large p-value (92.65%) means that the null hypothesis should not be
rejected.

https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/chi-
square/

https://www.researchoptimus.com/article/what-is-chi-square.php

You might also like