Week 16
Hypothesis Testing-
Testing For
Independence
Objectives
By the end of this lesson, you must be
able to;
• Use the Pearson chi square
distribution to test for independence.
Definition
A test of independence tests the null
hypothesis that in a contingency table,
the row and column variables are
independent.
Pearson chi square
We use the Pearson chi square to conduct the
significance tests in 2x2 tables or large
contingency tables. To test whether the
observed differences are statistically
significant.
Definition
• A contingency table (or two-way frequency
table) is a table in which frequencies
correspond to two variables.
• (One variable is used to categorize rows, and a
second variable is used to categorize
columns.)
• Contingency tables have at least two rows and
at least two columns.
Notation
O represents the observed frequency in a cell of
a contingency table.
E represents the expected frequency in a cell,
found by assuming that the row and column
variables are independent
r represents the number of rows in a
contingency table (not including labels).
c represents the number of columns in a
contingency table (not including labels).
Hypotheses and Test Statistic
H 0 : The row and column variables are independent.
H1 : The row and column variables are dependent.
2
(O E )
2
E
(row total)(column total)
E
(grand total)
• O is the observed frequency in a cell and E is the
expected frequency in a cell.
• ALWAYS A RIGHT TAILED TEST
Pearson Chi square
Distribution
Pearson Chi square
Distribution
Pearson Chi square
Distribution
• It is a measure of the difference between
actual and expected frequencies.
• Helps us understand the relationship
between two categorical variables.
• Involves frequency of events, observed
Vs. Expected!
• Helps answer the question of whether the
differences are due to chance or some
other important phenomena
Pearson Chi square
Distribution
• The “expected frequency” is that there is no
difference between the sets of results (the null
hypothesis). In that case, the Chi square value
would be zero.
• The larger the observed difference between
the sets of results, the greater the Chi square
value.
• However, it is difficult to interpret the Chi
square value by itself as it depends on the
number of factors studied.
• We use critical values and P-values to
Degrees of freedom
• The degrees of freedom for contingency
table is calculated by;
• df= (R-1) X (C-1), whereby R=Rows
and C=Columns
NOTE: *don’t count the column/row totals
and labels*
Expected Frequencies
The expected frequencies are
calculated by;
Example 1
Is there an association between smoking status
and gender? Use the chi square test for
independence.
- State your null and Alternative hypotheses
MALE FEMALE
SMOKER 14 11
NON- 17 19
SMOKER
Example 1
Assuming that there is independence (no
association), then the expected values
would be;
MALE FEMALE TOTAL
SMOKER 12.705 12.295
NON- 18.295 17.705
SMOKER
TOTAL
Example 1
From the observed and expected values
above, find Chi square value (test
statistic). It’s easier to do so by using a
table;O E O-E (O-E)2/E
14 12.705 1.295 0.131
11 12.295 - 1.295 0.136
17 18.295 - 1.295 0.0911
19 17.705 1.295 0.0942
TOTAL 61 61 0 0.452
=
0.452
Example 1
• Calculate the degrees of Freedom.
df= (R-1) X (C-1)= (2-1)(2-1)=1
• Find the P Value
P value= 0.5
• Do you reject or fail to reject the null hypothesis
Fail to reject the null hypothesis
• State your conclusion.
There is sufficient evidence to conclude that
smoking status is independent of gender. OR
In this data, whether someone smokes or not,
is independent of their gender.
END WEEK
16