Chi Square
Chi Square
1
Parametric and Nonparametric
Tests
• Today’s lesson introduces two non-
parametric hypothesis tests using the
chi-square statistic: the chi-square test for
goodness of fit and the chi-square test for
independence.
2
Parametric and Nonparametric
Tests (cont.)
• The term "non-parametric" refers to the fact that
the chi‑square tests do not require assumptions
about population parameters nor do they test
hypotheses about population parameters.
• Previous examples of hypothesis tests, such as
the t tests and analysis of variance, are
parametric tests and they do include
assumptions about parameters and hypotheses
about parameters.
3
Parametric and Nonparametric
Tests (cont.)
• The most obvious difference between the
chi‑square tests and the other hypothesis
tests we have considered (t and ANOVA)
is the nature of the data.
• For chi‑square, the data are frequencies
rather than numerical scores.
4
The Chi-Square Test for
Goodness-of-Fit
• The chi-square test for goodness-of-fit uses
frequency data from a sample to test hypotheses
about the shape or proportions of a population.
• Each individual in the sample is classified into
one category on the scale of measurement.
• The data, called observed frequencies, simply
count how many individuals from the sample are
in each category.
5
The Chi-Square Test for
Goodness-of-Fit (cont.)
• The null hypothesis specifies the
proportion of the population that should be
in each category.
• The proportions from the null hypothesis
are used to compute expected
frequencies that describe how the sample
would appear if it were in perfect
agreement with the null hypothesis.
6
The Chi-Square Test for
Independence
• The second chi-square test, the chi-
square test for independence, can be
used and interpreted in two different ways:
1. Testing hypotheses about the
relationship between two variables in a
population, or
2. Testing hypotheses about
differences between proportions for two
or more populations.
7
The Chi-Square Test for
Independence (cont.)
• Although the two versions of the test for
independence appear to be different, they
are equivalent and they are
interchangeable.
• The first version of the test emphasizes
the relationship between chi-square and a
correlation, because both procedures
examine the relationship between two
variables.
8
The Chi-Square Test for
Independence (cont.)
• The second version of the test
emphasizes the relationship between chi-
square and an independent-measures t
test (or ANOVA) because both tests use
data from two (or more) samples to test
hypotheses about the difference between
two (or more) populations.
9
The Chi-Square Test for
Independence (cont.)
• The first version of the chi-square test for
independence views the data as one
sample in which each individual is
classified on two different variables.
• The data are usually presented in a matrix
with the categories for one variable
defining the rows and the categories of the
second variable defining the columns.
10
The Chi-Square Test for
Independence (cont.)
11
The Chi-Square Test for
Independence (cont.)
• The second version of the test for independence
views the data as two (or more) separate
samples representing the different populations
being compared.
• The same variable is measured for each sample
by classifying individual subjects into categories
of the variable.
• The data are presented in a matrix with the
different samples defining the rows and the
categories of the variable defining the columns..
12
The Chi-Square Test for
Independence (cont.)
• The data, again called observed
frequencies, show how many individuals
are in each cell of the matrix.
• The null hypothesis for this test states that
the proportions (the distribution across
categories) are the same for all of the
populations
13
The Chi-Square Test for
Independence (cont.)
• Both chi-square tests use the same statistic.
The calculation of the chi-square statistic
requires two steps:
14
The Chi-Square Test for
Independence (cont.)
For the goodness of fit test, the expected frequency for
each category is obtained by
expected frequency = fe = pn
(p is the proportion from the null hypothesis and n is the
size of the sample)
For the test for independence, the expected frequency for
each cell in the matrix is obtained by
(row total)(column total)
expected frequency = fe = ─────────────────
n
15
The Chi-Square Test for
Independence (cont.)
2. A chi-square statistic is computed to measure
the amount of discrepancy between the ideal
sample (expected frequencies from H0) and the
actual sample data (the observed frequencies =
fo).
18
Measuring Effect Size for the Chi-
Square Test for Independence (cont.)
• The value of the phi-coefficient, or the
squared value which is equivalent to an r2,
is used to measure the effect size.
• When there are more than two categories
for one (or both) of the variables, then you
can measure effect size using a modified
version of the phi-coefficient known as
Cramér=s V.
• The value of V is evaluated much the
same as a correlation.
19
How to use chi square
1. Hypotheses ( H0 and Ha )
2. Compute for the chi square value
3. Statistical rule
4. Decision
20
Example:
One item in an attitude scale was answered
by group of boys and girls, each choosing
one of five possible answers as shown in the
table. Is there a significant difference and
reationship between the responses of the
boys and girls?
21
Strongly Agree Indifferent Disagree Strongly Total
Agree Disagree
Boys 30 35 15 30 10 120
Girls 15 20 10 20 15 80
Total 45 55 25 50 25 200
22
Step 1. Hypotheses
23
Step 2. Solve for the chi square
value
24
The calculation of chi-square is the same for all
chi-square tests:
(fo – fe)2
chi-square = χ2 = Σ ─────
fe
25
The expected frequency for each cell in the matrix is
obtained by
(row total)(column total)
expected frequency = fe = ─────────────────
n
26
Strongly Agree Indifferent Disagree Strongly Total
Agree Disagree
Boys 30 35 15 30 10 120
(Observed)
27 33 15 30 15
(Expected )
Girls 15 20 10 20 15 80
18 22 10 20 10
(Expected)
Total 45 55 25 50 25 200
Χ2 = (30 - 27)2 + (35 – 33)2 + (15 – 15)2 + (30 – 30)2 + (10 – 15)2
27 33 15 30 15
+ (15 - 18)2 + (20 – 22)2 + (10 – 10)2 + (20 – 20)2 + (15 – 10)2
18 22 10 20 10
Χ2 = 5. 303 28
Step 3. Statistical Rule
If the computed value of chi square ( X2 ) is
greater than the critical value of X0.05 then we
reject Ho and accept Ha.
29
How to find the critical value
using the table ?
Find first the degree of freedom
( 2 -1 ) ( 5 – 1 ) = 4
30
Then locate 4 in the df column
and X0.05
31
Applying the statistical rule
If the computed value of chi square ( X2 ) is greater than
the critical value of X0.05 then we reject Ho and accept Ha.
32
Step 4. Decision
X2 = 5.303
X0.05 = 9.488 X2 < X0.05
Since the chi square value is less than the critical value
therefore, we will accept Ho.
33
References:
https://www.youtube.com/watch?v=n06JNqE8kuE
Ybanez, Lydia M. (2017) Basic Statistics. Phoenix Publishing House, Inc, 927 Quezon
Avenue, Quezon City, Philippines.
Turhan, Nihak (2020) Karl Pearson’s chi square test. Academic Journals.
15(9), 575-580.
34