Chap 11

CHAPTER 11
CHI-SQUARE TESTS
Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Chapter learning objectives
 The χ2 -distribution table
 Hypothesis tests and Goodness-of-Fit test
 Hypothesis tests and Test of Independence
or Test of Homogeneity
 Tests of Hypothesis about Variance and
Standard Deviation
THE CHI-SQUARE DISTRIBUTION
 Definition
 The chi-square distribution has only one
parameter called the degrees of freedom. The
shape of a chi-squared distribution curve is skewed
to the right for small df and becomes symmetric for
large df. The entire chi-square distribution curve
lies to the right of the vertical axis. The chi-square
distribution assumes nonnegative values only, and
these are denoted by the symbol χ2 (read as “chi-
square”).

Figure 11.1 Three chi-square
distribution curves.

Example 11-1
 Find the value of χ² for 7 degrees of

freedom and an area of .10 in the right tail
of the chi-square distribution curve.

Table 11.1 χ2 for df = 7 and .10 Area
in the Right Tail

Figure 11.2

Example 11-2
 Find the value of χ² for 12 degrees of

freedom and an area of .05 in the left tail of
the chi-square distribution curve.

Example 11-2: Solution
 Area in the right tail

= 1 – Area in the left tail
= 1 – .05 = .95

Table 11.2 χ2 for df = 12 and .95 Area
in the Right Tail

Figure 11.3

A GOODNESS-OF-FIT TEST
 Definition
 An experiment with the following characteristics is
called a multinomial experiment.
1. It consists of n identical trials (repetitions).
2. Each trial results in one of k possible outcomes (or
categories), where k > 2.
3. The trials are independent.
4. The probabilities of the various outcomes remain
constant for each trial.

 Definition
 The frequencies obtained from the performance of
an experiment are called the observed
frequencies and are denoted by O. The expected
frequencies, denoted by E, are the frequencies
that we expect to obtain if the null hypothesis is
true. The expected frequency for a category is
obtained as
 E = np
 where n is the sample size and p is the probability
that an element belongs to that category if the null
hypothesis is true.

A GOODNESS-OF-FIT
 In a goodness-of-fit test, we test the null
hypothesis that the observed frequencies for an
experiment follow a certain pattern or theoretical
distribution.
 Example:
 Theoretically, 100% students enrolled in our course
should attend each class and pay attention. (Expected)
 However, in reality, we do not observe 100% attendance
or that every student pays attention (Observed during
classes (experiment) – recording attendance, checking
to see if student is paying attention.)
 Degrees of Freedom for a Goodness-of-Fit

Test
 In a goodness-of-fit test, the degrees of
freedom are
 df = k – 1
 where k denotes the number of possible
outcomes (or categories) for the
experiment.

 To make a goodness-of-fit test, the sample
size should be large enough so that the
expected frequency for each category is at
least 5.
Test Statistic for a Goodness-of-Fit
Test
 The test statistic for a goodness-of-fit test is χ2

and its value is calculated as
2
(O  E )
 2 
 where
E
 O = observed frequency for a category
 E = expected frequency for a category = np
 Remember that a chi-square goodness-of-fit test is
always right-tailed.

Example 11-3
 A bank has an ATM installed inside the bank, and
it is available to its customers only from 7 AM to 6
PM Monday through Friday. The manager of the
bank wanted to investigate if the percentage
of transactions made on this ATM is the
same for each of the 5 days (Monday
through Friday) of the week. She randomly
selected one week and counted the number of
transactions made on this ATM on each of
the 5 days during this week. The information
she obtained is given in the following table, where
the number of users represents the number of
transactions on this ATM on these days. For
convenience, we will refer to these transactions as
“people” or “users.” Prem Mann, Introductory Statistics, 7/E
Example 11-3
At the 1% level of significance, can we reject

the null hypothesis that the number of people
who use this ATM each of the 5 days of the week
is the same? Assume that this week is typical of
all weeks in regard to the use of this ATM.

 Step 1:
 Null Hypothesis: The number of people using
the ATM is the same for all 5 days of the
week.
 Alternative Hypothesis: The number of
people using the ATM is not the same for all
5 days of the week.
 H : p = p = p = p = p = .20
0 1 2 3 4 5
 H1 : At least two of the five proportions are

not equal to .20
 Step 2:
 There are 5 categories
 5 days on which the ATM is used
 Multinomial experiment
 We use the chi-square distribution to make
this test.

 Step 3:
 Area in the right tail = α = .01
 k = number of categories = 5
 df = k – 1 = 5 – 1 = 4
 The critical value of χ2 = 13.277

Figure 11.4

Table 11.3 Calculating the Value of
the Test Statistic

 Step 4:
 All the required calculations to find the
value of the test statistic χ2 are shown in
Table 11.3.
2
(O  E )
  
2
23.184
E

 Step 5:
 The value of the test statistic χ2 = 23.184 is
larger than the critical value of χ2 = 13.277
 It falls in the rejection region
 Hence, we reject the null hypothesis
 We state that the number of persons who
use this ATM is not the same for the 5 days
of the week.

Example 11-4
 In a July 23, 2009, Harris Interactive Poll, 1015
advertisers were asked about their opinions of
Twitter. The percentage distribution of their
responses is shown in the following table.

Example 11-4
Assume that these percentage hold true for the
2009 population of advertisers. Recently 800
randomly selected advertisers were asked the
same question. The following table lists the
number of advertisers in this sample who gave
each response.

Example 11-4
Test at the 2.5% level of significance whether the
current distribution of opinions is different from
that for 2009.

 Step 1:
 H : The current percentage distribution of
0
opinions is the same as for 2009.
 H : The current percentage distribution of
1
opinions is different from that for 2009.

 Step 2:
 There are 4 categories
 Respondents have 4 choices
 Multinomial experiment
 We use the chi-square distribution to make
this test.

 Step 3:
 Area in the right tail = α = .025
 k = number of categories = 4
 df = k – 1 = 4 – 1 = 3

Figure 11.5

Table 11.4 Calculating the Value of
the Test Statistic


 Step 5:
 The value of the test statistic χ2 = 5.420 is
smaller than the critical value of χ2 = 9.348
 It falls in the nonrejection region
 Hence, we fail to reject the null hypothesis
 We state that the current percentage
distribution of opinions is the same as for
2009 at 2.5% significance level.

A TEST OF INDEPENDENCE OR
HOMOGENEITY
 A Test of Independence
 A Test of Homogeneity

Test of Independence or
Homogeneity
 Uses “Contingency Table”
 Often we may have information on more
than one variable for each element.
 Such information can be summarized and
presented using a multiple-way
classification table
 Also called “cross-tabulation”
CONTINGENCY TABLES
A university has a total of 20,758 students
enrolled.
Classify these students:
1.Based on gender and
2.Whether these students are full-time or
part-time

Contingency Tables
 A contingency table can be of any size.
 For example, it can be 2 × 3, 3 × 2, 3 × 3, or 4
× 2.
 The first digit refers to the number of rows in
the table, and
 the second digit refers to the number of
columns.
 A 3 × 2 table will contain three rows and two
columns.
 In general, an R × C table contains R rows
and C columns.
Contingency Tables
 Each of the four boxes that contain numbers in the Table is called a cell.
 The number of cells in a contingency table is obtained by multiplying the
number of rows by the number of columns. This table contains 4 cells (2 ×
2)
 The subjects that belong to a cell of a contingency table possess two
characteristics.
 For example, 2615 students listed in the second cell of the first row in
Table 11.5 are male and enrolled part-time.
 The numbers written inside the cells are called the joint frequencies.
 2615 students belong to the joint category of male and part-time.
 It is referred to as the joint frequency of this category.
A Test of Independence
 Definition
 A test of independence involves a test of the
null hypothesis that two attributes of a
population are not related. The degrees of
freedom for a test of independence are
 df = (R – 1)(C – 1)
 Where R and C are the number of rows and
the number of columns, respectively, in the
given contingency table.

A Test of Independence
 Test Statistic for a Test of Independence
 The value of the test statistic χ2 for a test
of independence is calculated as
2
(O  E )
 2 
E
 where O and E are the observed and
expected frequencies, respectively, for a
cell.
Example 11-5
 Violence and lack of discipline have become

major problems in schools in the United
States. A random sample of 300 adults was
selected, and these adults were asked if
they favor giving more freedom to
schoolteachers to punish students for
violence and lack of discipline. The two-way
classification of the responses of these
adults is represented in the following table.

Example 11-5
 Calculate the expected frequencies for

this table, assuming that the two
attributes, gender and opinions on the
issue, are independent.

Table 11.6 Observed Frequencies

Expected Frequencies for a Test
of Independence
 The expected frequency E for a cell is
calculated as
(Row total)(Column total)

E
sample size

Expected Frequencies for a Test of
Independence
In Favor No-Opinion
(F) Against (A) (N) Row Total
Men 93 70 12 175
Women 87 32 6 125
Column
180 102 18
Total
In Favor (F) Against (A) No-Opinion (N)

Men (175)(180)/300 (175)(102)/300 (175)(18)/300
Women (125)(180)/300 (125)(102)/300 (125)(18)/300
Table 11.7 Observed and Expected Frequencies
E
O

Example 11-6
 Reconsider the two-way classification table given in
Example 11-5. In that example, a random sample
of 300 adults was selected, and they were asked
if they favor giving more freedom to schoolteachers
to punish students for violence and lack of
discipline. Based on the results of the survey, a
two-way classification table was prepared and
presented in Example 11-5. Does the sample
provide sufficient information to conclude that the
two attributes, gender and opinions of adults,
are dependent? Use a 1% significance level.

 Step 1:
 H : Gender and opinions of adults are
0
independent
 H : Gender and opinions of adults are
1
dependent

 Step 2: We use the chi-square distribution
to make a test of independence for a
contingency table.
 Step 3:
 α = .01
 df = (R – 1)(C – 1) = (2 – 1)(3 – 1) = 2
 The critical value of χ2 = 9.210 Critical
value

Figure 11.6

Table 11.8 Observed and Expected
Frequencies

Step 4:
2
(O  E )
 2 
E
93  105.00  70  59.50  12  10.50 
2 2 2
  
105.00 59.50 10.50
87  75.00  32  42.50  6  7.50 
2 2 2
  
75.00 42.50 7.50
1.371  1.853  .214  1.920  2.594  .300 8.252

 Step 5:
 The value of the test statistic χ2 = 8.252
 It is less than the critical value of χ2 = 9.210
 Hence, we fail to reject the null
hypothesis
 We state that there is not enough evidence
from the sample to conclude that the two
characteristics, gender and opinions of
adults, are independent for this issue.
Example 11-7
 A researcher wanted to study the relationship
between gender and owning cell phones. She took
a sample of 2000 adults and obtained the
information given in the following table.
 At the 5% level of significance, can you

conclude that gender and owning cell phones
are related for all adults?
 Step 1:
 H : Gender and owning a cell phone are
0
not related (They are Independent)
 H : Gender and owning a cell phone are
1
related (They are dependent)

 Step 2:
 We are performing a test of independence
 We use the chi-square distribution
 Step 3:
 α = .05.
 df = (R – 1)(C – 1) = (2 – 1)(2 – 1) = 1

Figure 11.7

Frequencies

Step 4:

 Step 5:
 It is larger than the critical value of χ2 = 3.841
 Hence, we reject the null hypothesis
 Gender and Cell Phone Ownership are
related.

A Test of Homogeneity
 Definition
 A test of homogeneity involves testing the
null hypothesis that the proportions of
elements with certain characteristics in two
or more different populations are the same
against the alternative hypothesis that these
proportions are not the same.

Example 11-8
 Consider the data on income distributions

for households in California and
Wisconsin given in Table 11.10. Using the
2.5% significance level, test the null
hypothesis that the distribution of
households with regard to income levels is
similar (homogeneous) for the two states.

Example 11-8
 Table 11.10 Observed Frequencies

 Step 1:
 H : The proportions of households that belong
0
to different income groups are the same in
both states
 H : The proportions of households that belong
1
to different income groups are not the same
in both states

to make a homogeneity test.
 Step 3:
 α = .025
 df = (R – 1)(C – 1) = (3 – 1)(2 – 1) = 2

Figure 11.8

Frequencies

Step 4:
2
(O  E )
 2 
E
70  65  34  39  80  75 
2 2 2
  
65 39 75
40  45  100  110  76  66 
2 2 2
  
45 110 66
.385  .641  .333  .566  .909  1.515 4.339

 Step 5:
 It is less than the critical value of χ2
 Hence, we fail to reject the null hypothesis
 We state that the distribution of households
with regard to income appears to be similar
(homogeneous) in California and Wisconsin.

Test of Independence vs.
Test of Homogeneity
 When both the row and column totals are
determined randomly (Here the total sample size
was selected in advance), we perform a test of
independence.
 The cell phone example: 2000 people, row and column
totals depends on gender and cell phone ownership
 When either the column totals or the row totals are
fixed (That is we decided in advance to take either
a fixed number column or fixed number row), we
perform a test of homogeneity.
 California vs Wisconsin example – 250 people were
surveyed in CA and 150 people were surveyed in WI
INFERENCES ABOUT THE
POPULATION VARIANCE
 Estimation of the Population Variance

 Hypothesis Tests About the Population
Variance

INFERENCES ABOUT THE
POPULATION VARIANCE
 Sampling Distribution of (n – 1)s2 / σ2
 If the population from which the sample is
selected is (approximately) normally
distributed, then
(n  1)s 2
 2
 has a chi-square distribution with n – 1

degrees of freedom.

Estimation of the Population Variance
 Confidence interval for the
population variance σ2
 Assuming that the population from which
the sample is selected is (approximately)
normally distributed, we obtain the (1 –
α)100% confidence interval for the
population variance σ2 as
2 2
(n  1)s (n  1)s
to
 / 2
2
21 /2

Estimation of the Population Variance
  2
where  / 2 and 1  / 2 are obtained from the
2
chi-square distribution for α /2 and (1- α/2

areas in the right tail of the chi-square
distribution curve, respectively, and for n-
1 degrees of freedom. The confidence
interval for the population standard
deviation can be obtained by simply
taking the positive square roots of the
two limits of the confidence interval for
the population variance.
Example 11-9
 One type of cookie manufactured by Haddad
Food Company is Cocoa Cookies. The
machine that fills packages of these cookies
is set up in such a way that the average net
weight of these packages is 32 ounces with a
variance of .015 square ounce. From time
to time the quality control inspector at the
company selects a sample of a few such
packages, calculates the variance of the net
weights of these packages, and construct a
Example 11-9
 95% confidence interval for the
population variance. If either both or one of
the two limits of this confidence interval is
not the interval .008 to .030, the machine
is stopped and adjusted. A recently taken
random sample of 25 packages from the
production line gave a sample variance
of 0.029 square ounce. Based on this
sample information, do you think the
machine needs an adjustment? Assume
that the net weights of cookies in all
packages are normally distributed.

 Step 1:
 n = 25 and s2 = .029
 Step 2:
 α = 1 - .95 = .05
 α/2 = .05/2 = .025
 1 – α/2 = 1 – .025 = .975
 df = n – 1 = 25 – 1 = 24
 χ2 for 24 df and .025 area in the right tail = 39.364
 χ2 for 24 df and .975 area in the right tail = 12.401

Figure 11.9

Step 3:
2 2
(n  1)s (n  1)s
to
 / 2
2
21 /2
(25  1)(.029) (25  1)(.029)

to
39.364 12.401
.0177 to .0561

 Thus, with 95% confidence, we can state

that the variance for all packages of Cocoa
Cookies lies between .0177 and .0561
square ounce.

Hypothesis Tests About the Population
Variance
 Test statistic for a Test of Hypothesis
About σ2
 The value of the test statistic χ2 is calculated as
2
(n  1)s
 2
 2
 where s2 is the sample variance, σ2 is the

hypothesized value of the population variance, and
n – 1 represents the degrees of freedom. The
population from which the sample is selected is
assumed to be (approximately) normally
distributed.
Example 11-10
 One type of cookie manufactured by Haddad Food
Company is Cocoa Cookies. The machine that fills
packages of these cookies is set up in such a way
that the average net weight of these packages is
32 ounces with a variance of .015 square ounce.
From time to time the quality control inspector at
the company selects a sample of a few such
packages, calculates the variance of the net
weights of these packages, and makes a test of
hypothesis about the population variance. She
always uses α = .01. The acceptable value of
the population variance is .015 square ounce
or less.
Example 11-10
 If the conclusion from the test of
hypothesis is that the population
variance is not within the acceptable
limit, the machine is stopped and
adjusted. A recently taken random
sample of 25 packages from the
production line gave a sample variance
of .029 square ounce. Based on this
sample information, do you think the
machine needs an adjustment? Assume
that the net weights of cookies in all
packages are normally distributed.
 Step 1:
 H :σ2 ≤ .015
0
 The population variance is within the acceptable
limit
 H1: σ2 >.015
 The population variance exceeds the acceptable
limit


to test a hypothesis about σ2
 Step 3:
 α = .01.
 df = n – 1 = 25 – 1 = 24

Figure 11.10

Step 4:
2
(n  1)s (25  1)(.029)
 
2
 46.400
 2
.015
From H0

 Step 5:
 It is greater than the critical value of χ 2

 Hence, we reject the null hypothesis H0
 We conclude that the population variance is
not within the acceptable limit. The
machine should be stopped and adjusted.

Example 11-11
 The variance of scores on a standardized
mathematics test for all high school seniors
was 150 in 2009. A sample of scores for 20 high
school seniors who took this test this year
gave a variance of 170. Test at the 5%
significance level if the variance of current
scores of all high school seniors on this test is
different from 150. Assume that the scores of all
high school seniors on this test are
(approximately) normally distributed.

 Step 1:
 H : σ2 = 150
0
 The population variance is not different from
150
 H1: σ2 ≠ 150
 The population variance is different from 150

to test a hypothesis about σ2
 Step 3:
 α = .05
 Area in the each tail = .025
 df = n – 1 = 20 – 1 = 19
 The critical values of χ2 32.852 and 8.907

Figure 11.11

Step 4:
(n  1)s 2 (20  1)(170)

 
2
 21.533
2 150
From H0

 Step 5:
 It is between the two critical values of χ2
 Consequently, we fail to reject H0.
 We conclude that the population variance
of the current scores of high school seniors
on this standardized mathematics test does
not appear to be different from 150.
Excel
Screen 11.4


Chap 11

Uploaded by

Copyright:

Available Formats

Chap 11

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 11

Uploaded by

Copyright:

Available Formats

CHAPTER 11

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

 Find the value of χ² for 7 degrees of

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

 Find the value of χ² for 12 degrees of

Prem Mann, Introductory Statistics, 7/E

 Area in the right tail

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

 Degrees of Freedom for a Goodness-of-Fit

Prem Mann, Introductory Statistics, 7/E

 The test statistic for a goodness-of-fit test is χ2

Prem Mann, Introductory Statistics, 7/E

At the 1% level of significance, can we reject

Prem Mann, Introductory Statistics, 7/E

 H1 : At least two of the five proportions are

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

 Violence and lack of discipline have become

Prem Mann, Introductory Statistics, 7/E

 Calculate the expected frequencies for

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

(Row total)(Column total)

Prem Mann, Introductory Statistics, 7/E

In Favor (F) Against (A) No-Opinion (N)

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E

 At the 5% level of significance, can you

Prem Mann, Introductory Statistics, 7/E

Prem Mann, Introductory Statistics, 7/E