STATISTICS Unit 7 Chi-Squared Introduction
STATISTICS Unit 7 Chi-Squared Introduction
Learning objectives
Here we see how we can test particular hypotheses using the chi-squared test, sometimes referred to as
'goodness of fit' testing. After studying this unit you should
• understand how the chi-squared test can be used to test a uniform distribution, using critical values
• understand how the chi-squared test can be used to test the dependency between different attributes
by using a contingency table.
Notes
The first use of the test was by the English mathematician Karl Pearson (1857-1936) who was working
on correlation and regression problems. He wanted to test his methods using real data and, for
example, used over 4000 trials of roulette and tosses of a coin. Although he published his work in
1900 it not until many decades later that it was recognised as an important advance in statistical
analysis.
Key points
• In testing whether data fits a particular distribution, using chi-squared, you need to have
observed data, Oi , and theoretical (expected) data, Ei
• Critical values for the chi-squared test are based on the significance level and number of degrees
of freedom
• For 2 × 2 contingency tables, you need to calculate the chi-squared value using Yates' continuity
correction
• Groups in contingency tables must be combined if expected values are less than 5.
Facts to remember
• ( )
The formula for chi-squared χ 2 is given by
2
n
(Oi − Ei )2
χ = ∑
i =1 Ei
where Oi are the observed values and Ei are the expected values.
1
CMM Subject Support Strand: STATISTICS Unit 7 Chi-Squared: Introduction
Glossary of terms
Observed and expected frequencies: observed data, Oi , and corresponding theoretical values, Ei ,
based on the distribution being tested
Contingency table: gives the numbers in different categories for 2 factors (that may or may not be
related), e.g.
French Russian Total
Male 39 16 55
Female 21 14 35
Total 60 30 90