Chi-Squared Tests
Chi-Squared Tests
Chi-Squared Tests
15.1
A Common Theme
Number of Statistical
What to do? Data Type?
Categories? Technique:
15.2
Two Techniques
15.3
The Multinomial Experiment
15.4
Chi-squared Goodness-of-Fit Test
➢ We test whether there is sufficient evidence to reject a
specified set of values for pi.
15.5
Example 15.1
Two companies, A and B, have recently conducted
aggressive advertising campaigns to maintain and possibly
increase their respective shares of the market for fabric
softener. These two companies enjoy a dominant position
in the market. Before the advertising campaigns began, the
market share of company A was 45%, whereas company B
had 40% of the market. Other competitors accounted for
the remaining 15%.
15.6
Example 15.1
To determine whether these market shares changed after
the advertising campaigns, a marketing analyst solicited the
preferences of a random sample of 200 customers of fabric
softener. Of the 200 customers, 102 indicated a preference
for company A's product, 82 preferred company B's fabric
softener, and the remaining 16 preferred the products of
one of the competitors. Can the analyst infer at the 5%
significance level that customer preferences have changed
from their levels before the advertising campaigns were
launched?
15.7
Example 15.1
➢ We compare market share before and after an
advertising campaign to see if there is a difference (i.e.
if the advertising was effective in improving market
share).
➢ We hypothesize values for the parameters equal to the
before-market share. That is,
H0: p1 = .45, p2 = .40, p3 = .15.
➢ The alternative hypothesis is a denial of the null. That
is,
H1: At least one pi is not equal to its specified value.
15.8
Example 15.1
➢ If the null hypothesis is true, we would expect the number of
customers selecting brand A, brand B, and other to be 200 times
the proportions specified under the null hypothesis. That is,
e1 = 200(.45) = 90
e2 = 200(.40) = 80
e3 = 200(.15) = 30
➢ This expression is derived from the formula for the expected value
of a binomial random variable.
15.9
Example 15.1
➢ If the expected frequencies and the observed frequencies are quite
different, we would conclude that the null hypothesis is false, and
we would reject it.
15.10
Chi-squared Goodness-of-Fit Test
Our Chi-squared goodness of fit test statistic is given by:
observed expected
frequency frequency
15.11
Example 15.1
In order to calculate our test statistic, we lay-out the data
in a tabular fashion for easier calculation by hand:
Observed Expected Summation
Delta
Company Frequency Frequency Component
fi ei (fi – ei) (fi – ei)2/ei
A 102 90 12 1.60
B 82 80 2 0.05
Others 16 30 -14 6.53
Total 200 200 8.18
15.12
Example 15.1
Our rejection region is:
15.13
Example 15.1
15.14
Required Conditions
➢ In order to use this technique, the sample size must be
large enough so that the expected value for each cell is 5
or more. (i.e. n x pi ≥ 5)
15.15
Identifying Factors
Factors that Identify the Chi-Squared Goodness-of-Fit Test:
ei=(n)(pi)
15.16
Chi-squared Test of a Contingency Table
➢ The Chi-squared test of a contingency table is used to
• determine whether there is enough evidence to
infer that two nominal variables are related.
• In order to use these techniques, we need to
classify the data according to two different criteria.
15.17
Example 15.2
➢ The MBA program was experiencing problems scheduling
their courses. The demand for the program's optional courses
and majors was quite variable from one year to the next.
15.18
Example 15.2
➢ As a start he took a random sample of last year's MBA
students and recorded the undergraduate degree and the
major selected in the graduate program.
15.19
Example 15.2
The data are stored in two columns. The first column consist of
integers 1, 2, 3, and 4 representing the undergraduate degree
where
1 = BA
2 = BEng
3 = BBA
4 = other
1= Accounting
2 = Finance
3 = Marketing
(Data File: Xm15-02)
15.20
Example 15.2
MBA Major
Undergrad
Accounting Finance Marketing Total
Degree
BA 31 13 16 60
BEng 8 16 7 31
BBA 12 10 17 39
Other 10 5 7 22
Total 61 44 47 152
15.21
Example 15.2
➢ The problem objective is to determine whether two variables
(undergraduate degree and MBA major) are related. Both
variables are nominal. Thus, the technique to use is the chi-
squared test of a contingency table. The alternative
hypotheses specifies what we test. That is,
H1: The two variables are dependent
➢ The null hypothesis is a denial of the alternative hypothesis.
H0: The two variables are independent.
15.22
Test Statistic
( f e ) 2
2 i i
ei
15.23
Example 15.2
The first step is to count the number of students in each of
the 12 combinations. The result is called a cross-
classification table.
15.24
Example 15.2
➢ If the null hypothesis is true (Remember we always start with
this assumption.) and the two nominal variables are independent,
then, for example,
15.25
Test Statistic
➢ There are 152 students of which 61 who have chosen
accounting as their MBA major.
➢ Thus, we estimate the probability of accounting as
61
P(Accounting) .401
152
➢ Similarly
60
P(BA) .395
152
15.26
Example 15.2
➢ If the null hypothesis is true
15.27
Example 15.2
➢ We can now compare observed with expected
frequencies:
MBA Major
Undergrad
Accounting Finance Marketing
Degree
BA 31 24.08 13 17.37 16 18.55
15.28
Example 15.2
A B C D E F
1 Contingency Table
2
3 Degree
4 MBA Major 1 2 3 TOTAL
5 1 31 13 16 60
6 2 8 16 7 31
7 3 12 10 17 39
8 4 10 5 7 22
9 TOTAL 61 44 47 152
10
11
12 chi-squared Stat 14.7019
13 df 6
14 p-value 0.0227
15 chi-squared Critical 12.5916
15.29
Example 15.2
➢ The p-value is .0227. There is enough evidence to infer
that the MBA major and the undergraduate degree are
related.
15.30
Required Condition – Rule of Five
➢ In a contingency table where one or more cells have
expected values of less than 5, we need to combine
rows or columns to satisfy the rule of five.
15.31
Identifying Factors
Factors that identify the Chi-squared test of a contingency
table:
15.32
Table 15.1 Statistical Techniques for Nominal Data
15.33