0% found this document useful (0 votes)
6 views

Module 17. Lesson Proper (1)

N/A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 17. Lesson Proper (1)

N/A
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The Chi-square test

A chi-square test (often denoted as 𝑋 2 ) is a statistical test used to assess whether there
is a significant association between observed and expected frequencies in categorical
data. It is one of the most common nonparametric tests, meaning it doesn't rely on
assumptions about the underlying data distribution, particularly the assumption of
normality that parametric tests require. The chi-square test is widely used in various
fields to analyze categorical data, such as in social sciences, biology, marketing, and
medicine.

Key Concepts

1. Categorical Variables: The chi-square test is applicable when you have


categorical (nominal or ordinal) variables. These are variables that have discrete
categories, such as gender, species, or customer satisfaction ratings.
2. Frequency Distribution: The test evaluates whether the observed frequency
distribution (how data is distributed across categories) differs significantly from an
expected distribution. For example, if you expect the categories to be equally
distributed, you will test whether your data aligns with that expectation.

Types of Chi-Square Tests

1. Chi-Square Goodness of Fit Test

• This test is used when you want to assess whether a single categorical variable
follows a hypothesized distribution.
• For example, you might want to test if a bird feeder attracts different bird species
in equal proportions, or if a dice rolls fair (i.e., each face is equally likely).
• Null hypothesis: There is no difference between the observed and expected
frequencies.
• Alternative hypothesis: There is a significant difference between the observed
and expected frequencies.

2. Chi-Square Test of Independence

• This test is used when you want to assess whether two categorical variables are
independent or related to each other.
• For example, you might test whether there is an association between gender and
voting preference (i.e., do men and women vote differently?).
• Null hypothesis: The two variables are independent (no association).
• Alternative hypothesis: The two variables are dependent (there is an
association).
When to Use a Chi-Square Test

• You want to test hypotheses about one or more categorical variables.


• You have a sufficient sample size, typically at least 5 expected observations per
category.
• The data are from a random sample or representative sample of the population.
• The data are not normally distributed (which is why you use nonparametric tests).

Chi-square tests are a type of nonparametric technique that tests hypotheses about the
form of the entire frequency distribution. This test is used with ordinal data in the form of
frequencies or proportions. The observed frequencies are compared to the expected
frequencies and the differences are tested at desired level of significance.

Formula: Where:
𝑓𝑜 = observed frequency
(𝑓𝑜 − 𝑓𝑒 )2
𝑐ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 = 𝑥 2 = ∑ 𝑓𝑒 = expected frequency
𝑓𝑒
𝑥 2 = indicates that there is large
discrepancy between 𝑓𝑜 and 𝑓𝑒 and may
warrant rejection of the null hypothesis

Contingency Table
This table consists of a cross-tabulation of classes of observations with the
frequencies for each class shown. A one-way classification table is one where only one
row of observations is given. A two-way classification table is one where there are r
number or rows of observed frequencies and m number of columns of observed
frequencies.

1. Goodness of Fit
The chi-square test for goodness of fit is used to assess how well observed sample
data match the expected proportions under a specific hypothesis about the population
distribution. The test compares the observed frequencies in each category to the
expected frequencies, based on the proportions specified in the null hypothesis, to
determine whether there is a significant difference between them.

The expected frequencies (𝑓𝑒 ) for the goodness of fit test are determined by
Where:
𝑓𝑒 = 𝑝𝑛 𝑝 = hypothesized proportion of observation
𝑛 = sample size
𝑓𝑒 = are computed and may be decimal values
Example: A psychologist examining art appreciation selected an abstract painting that
has no obvious top or bottom. Hangers were placed on the painting so that it could be
hung with any one of the four sides at the top. The pointing was shown to a sample of
n=50 participants, and each was asked to hang the painting in the orientation that looked
correct. The following data indicate how many people chose each of the four sides to be
placed at the top. (𝛼 = 0.05)

Top Up Bottom Left Right


(Correct) Up Side Up Side Up
18 17 7 8
The question for hypothesis test is whether there are any preferences among the four
possible orientations.

Step 1. State the hypotheses and select an alpha level.


H0: There is no preference for any Top Up Bottom Left Right
specific orientation. Thus, the four (Correct) Up Side Up Side Up
possible orientations are selected 25% 25% 25% 25%
equally often, and the population
distribution has the following proportions.
H1: One or more of the orientations is preferred over the others.

Step 2. Locate the critical region. For this example, the value for degrees of freedom is
𝑑𝑓 = 𝐶 − 1 = 4 − 1 = 3

For 𝑑𝑓 = 3 and 𝛼 = 0.05, the critical value for chi-square indicates that the critical 𝑋 2 has
a value of 7.81.

Step 3. Calculate the chi-square statistics


1
𝑓𝑒 = 𝑝𝑛 = (50) = 12.5
4
2
(𝑓𝑜 − 𝑓𝑒 )2 (18 − 12.5)2 (17 − 12.5)2 (7 − 12.5)2 (8 − 12.5)2
𝑥 =∑ = + + +
𝑓𝑒 12.5 12.5 12.5 12.5

30.25 20.25 30.25 20.25


= + + +
12.5 12.5 12.5 12.5

= 2.42 + 1.62 + 2.42 + 1.62

𝑥 2 = 8.0

Step 4. State a decision and a conclusion.


The obtained chi-square value is in the critical region. Therefore, H0 is rejected and the
researcher may conclude that the four orientations are not equally likely to be preferred.
Instead there are significant differences among the four orientations, with some selected
more often than others less often than would be expected by chance.

Test of Independence
Two variables are independent when there is no consistent, predictable
relationship between them. In this case, the frequency distribution for one variable is not
related to the categories of the second variable. As a result, when two variables are
independent, the frequency distribution for one variable will have the same shape for all
categories of the second variable.
Degrees of freedom for the chi-square test for independence are determined by:

Where:
𝑑𝑓 = (𝐶 − 1)(𝑅 − 1) 𝑑𝑓 = degree of freedom
𝐶 = column
𝑅 = Row
1 = constant
Example: A manufacturer of watches would like to examine preferences for digital versus
analog watches. A sample of n=200 people is selected, and these individuals are
classified by age and preference. The manufacturer would like to know whether there is
a relationship between ages and watch preference. The observed frequencies (𝑓𝑜 ) are as
follows:
Digital Analog Undecided Totals

Under 30 90 40 10 140
30 or over 10 40 10 60
Column totals 100 80 20 𝑛 = 200

Solution:
Step 1. State the hypotheses, and select an alpha level.
H0: Preference is independent of age. That is, the frequency distribution of preference
has the same form for people younger than 30 as for people older than 30
H1: Preference is related to age. That is, the type of watch preferred depends on a
person’s age.

Level of significance: 𝛼 = 0.05

Step 2. Locate the critical region. The critical chi-square value is 5.99

𝑑𝑓 = (𝐶 − 1)(𝑅 − 1)

= (3 − 1)(2 − 1) = 2

Step 3. Compute the test statistic. Find the 𝑓𝑒 and calculate the chi-square statistic.

Where:
𝑓𝑐 𝑓𝑟 𝑓𝑒 = expected frequency
𝑓𝑒 =
𝑛 𝑓𝑐 = column total
𝑓𝑟 = row total

For younger than 30: For 30 years and older:

100(140) 14000 100(60) 6000


𝑓𝑒 = = = 70 𝑓𝑜𝑟 𝑑𝑖𝑔𝑖𝑡𝑎𝑙 𝑓𝑒 = = = 30 𝑓𝑜𝑟 𝑑𝑖𝑔𝑖𝑡𝑎𝑙
200 200 200 200

80(140) 11200 80(60) 4800


𝑓𝑒 = = = 56 𝑓𝑜𝑟 𝑎𝑛𝑎𝑙𝑜𝑔 𝑓𝑒 = = = 24 𝑓𝑜𝑟 𝑎𝑛𝑎𝑙𝑜𝑔
200 200 200 200

20(140) 2800 20(60) 1200


𝑓𝑒 = = = 14 𝑓𝑜𝑟 𝑢𝑛𝑑𝑒𝑐𝑖𝑑𝑒𝑑 𝑓𝑒 = =
200 200 200 200
= 30 𝑓𝑜𝑟 𝑢𝑛𝑑𝑒𝑐𝑖𝑑𝑒𝑑
Summary of the expected frequencies (𝑓𝑒 )

Digital Analog Undecided


Under 30 70 56 14
30 or over 30 24 6

Summary of the calculations:


(𝑓0 − 𝑓𝑒 )2
𝑓0 𝑓𝑒 (𝑓0 − 𝑓𝑒 ) (𝑓0 − 𝑓𝑒 )2
𝑓𝑒
Under 30 –digital 90 70 20 400 5.71
Under 30 – analog 40 56 -16 256 4.57
Under 30 – undecided 10 14 -4 16 1.14
30 or over – digital 10 30 -20 400 13.33
30 or over – analog 40 24 16 256 10.67
30 or over – undecided 10 6 4 16 2.67

Chi-square statistic:

𝑥 2 = 5.71 + 4.57 + 1.14 + 13.33 + 10.67 + 2.67

= 38.09

Step 4. Make a decision about H0, and state the conclusion.


The chi-square value is in critical region. Therefore we can reject the null hypothesis.
There is a relationship between watch preference and age.

You might also like