Module 17. Lesson Proper (1)
Module 17. Lesson Proper (1)
A chi-square test (often denoted as 𝑋 2 ) is a statistical test used to assess whether there
is a significant association between observed and expected frequencies in categorical
data. It is one of the most common nonparametric tests, meaning it doesn't rely on
assumptions about the underlying data distribution, particularly the assumption of
normality that parametric tests require. The chi-square test is widely used in various
fields to analyze categorical data, such as in social sciences, biology, marketing, and
medicine.
Key Concepts
• This test is used when you want to assess whether a single categorical variable
follows a hypothesized distribution.
• For example, you might want to test if a bird feeder attracts different bird species
in equal proportions, or if a dice rolls fair (i.e., each face is equally likely).
• Null hypothesis: There is no difference between the observed and expected
frequencies.
• Alternative hypothesis: There is a significant difference between the observed
and expected frequencies.
• This test is used when you want to assess whether two categorical variables are
independent or related to each other.
• For example, you might test whether there is an association between gender and
voting preference (i.e., do men and women vote differently?).
• Null hypothesis: The two variables are independent (no association).
• Alternative hypothesis: The two variables are dependent (there is an
association).
When to Use a Chi-Square Test
Chi-square tests are a type of nonparametric technique that tests hypotheses about the
form of the entire frequency distribution. This test is used with ordinal data in the form of
frequencies or proportions. The observed frequencies are compared to the expected
frequencies and the differences are tested at desired level of significance.
Formula: Where:
𝑓𝑜 = observed frequency
(𝑓𝑜 − 𝑓𝑒 )2
𝑐ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 = 𝑥 2 = ∑ 𝑓𝑒 = expected frequency
𝑓𝑒
𝑥 2 = indicates that there is large
discrepancy between 𝑓𝑜 and 𝑓𝑒 and may
warrant rejection of the null hypothesis
Contingency Table
This table consists of a cross-tabulation of classes of observations with the
frequencies for each class shown. A one-way classification table is one where only one
row of observations is given. A two-way classification table is one where there are r
number or rows of observed frequencies and m number of columns of observed
frequencies.
1. Goodness of Fit
The chi-square test for goodness of fit is used to assess how well observed sample
data match the expected proportions under a specific hypothesis about the population
distribution. The test compares the observed frequencies in each category to the
expected frequencies, based on the proportions specified in the null hypothesis, to
determine whether there is a significant difference between them.
The expected frequencies (𝑓𝑒 ) for the goodness of fit test are determined by
Where:
𝑓𝑒 = 𝑝𝑛 𝑝 = hypothesized proportion of observation
𝑛 = sample size
𝑓𝑒 = are computed and may be decimal values
Example: A psychologist examining art appreciation selected an abstract painting that
has no obvious top or bottom. Hangers were placed on the painting so that it could be
hung with any one of the four sides at the top. The pointing was shown to a sample of
n=50 participants, and each was asked to hang the painting in the orientation that looked
correct. The following data indicate how many people chose each of the four sides to be
placed at the top. (𝛼 = 0.05)
Step 2. Locate the critical region. For this example, the value for degrees of freedom is
𝑑𝑓 = 𝐶 − 1 = 4 − 1 = 3
For 𝑑𝑓 = 3 and 𝛼 = 0.05, the critical value for chi-square indicates that the critical 𝑋 2 has
a value of 7.81.
𝑥 2 = 8.0
Test of Independence
Two variables are independent when there is no consistent, predictable
relationship between them. In this case, the frequency distribution for one variable is not
related to the categories of the second variable. As a result, when two variables are
independent, the frequency distribution for one variable will have the same shape for all
categories of the second variable.
Degrees of freedom for the chi-square test for independence are determined by:
Where:
𝑑𝑓 = (𝐶 − 1)(𝑅 − 1) 𝑑𝑓 = degree of freedom
𝐶 = column
𝑅 = Row
1 = constant
Example: A manufacturer of watches would like to examine preferences for digital versus
analog watches. A sample of n=200 people is selected, and these individuals are
classified by age and preference. The manufacturer would like to know whether there is
a relationship between ages and watch preference. The observed frequencies (𝑓𝑜 ) are as
follows:
Digital Analog Undecided Totals
Under 30 90 40 10 140
30 or over 10 40 10 60
Column totals 100 80 20 𝑛 = 200
Solution:
Step 1. State the hypotheses, and select an alpha level.
H0: Preference is independent of age. That is, the frequency distribution of preference
has the same form for people younger than 30 as for people older than 30
H1: Preference is related to age. That is, the type of watch preferred depends on a
person’s age.
Step 2. Locate the critical region. The critical chi-square value is 5.99
𝑑𝑓 = (𝐶 − 1)(𝑅 − 1)
= (3 − 1)(2 − 1) = 2
Step 3. Compute the test statistic. Find the 𝑓𝑒 and calculate the chi-square statistic.
Where:
𝑓𝑐 𝑓𝑟 𝑓𝑒 = expected frequency
𝑓𝑒 =
𝑛 𝑓𝑐 = column total
𝑓𝑟 = row total
Chi-square statistic:
= 38.09