Chi-Square Test
Chi-Square Test
The Â2 test examines the difference between the observed and expected values and
Small differences between observed and expected frequencies are an indication of the inde-
pendence between the two classifications.
CALCULATING Â2
This table shows the results of a
sample of 400 randomly selected Regular No regular
sum
adults classified according to gen- exercise exercise
der and regular exercise. Male 112 104 216
This is a 2 £ 2 contingency table. Female 96 88 184
sum 208 192 400
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Y:\...\IBBK3_18\591IB318.CDR
Thu Jul 22 09:45:30 2004
Color profile: Disabled
Composite Default screen
25
50
75
95
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\592IB318.CDR
Wed Jul 21 09:27:37 2004
Color profile: Disabled
Composite Default screen
EXERCISE 18D.1
1 Find Â2calc for the following contingency tables:
a Factor M b Factor S
M1 M2 S1 S2
N1 31 22 53 R1 28 17
N2 20 27 47 R2 52 41
51 49 100
c Factor A d Factor T
A1 A2 T1 T2 T3 T4
B1 24 11 D1 31 22 21 16
B2 16 18 D2 23 19 22 13
B3 25 12
DEGREES OF FREEDOM
The Â2 distribution is dependent on the number of degrees of freedom (df) where
100
0
25
75
95
0
25
50
75
95
100
IB_03
cyan black
Y:\...\IBBK3_18\593IB318.CDR
Fri Jul 23 15:08:40 2004
Color profile: Disabled
Composite Default screen
3 Find the degrees of freedom (df) for the contingency tables of question 1.
4 Find the df of: Factor K
K1 K2 K3 K4 K5 K6 K7
L1 2 5 7 3 1 4 9
L2 6 1 3 8 2 1 7
L3 4 2 2 5 1 6 5
L4 3 4 2 4 3 2 4
25
50
75
95
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\594IB318.CDR
Wed Jul 21 09:28:57 2004
Color profile: Disabled
Composite Default screen
Step 4: We state the rejection inequality Â2calc > k where k is obtained from the table
of critical values.
X (fo ¡ fe )2
Step 5: From the contingency table, find Â2calc using Â2calc = .
fe
Step 6: We either accept H0 or reject H0 , depending on the rejection inequality result.
Step 7: If operating at a 5% level, we could also use p-values to help us with our
decision making. If p > 0:05, we accept H . 0
If p < 0:05, we reject H0 .
Example 6
A survey was given to randomly chosen
high school students from years 9 to 12 on Year group
possible changes to the school’s canteen. 9 10 11 12
The contingency table shows the results. change 7 9 13 14
At a 5% level, test whether there is a sig- no change 14 12 9 7
nificant difference between the proportion
of students wanting a change in the canteen
across the four year groups.
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\595IB318.CDR
Wed Jul 21 09:29:33 2004
Color profile: Disabled
Composite Default screen
(fo ¡ fe )2
fo fe fo ¡ fe (fo ¡ fe )2
fe
7 10:6 ¡3:6 12:96 1:223
9 10:6 ¡1:6 2:56 0:242
13 11:1 1:9 3:61 0:325
) Â2calc + 5:82
14 10:6 3:4 11:56 1:091
14 10:4 3:6 12:96 1:246 which is not > 7:81
12 10:4 1:6 2:56 0:246 Consequently, we accept H0 ,
9 10:9 ¡1:9 3:61 0:331 that there is no significant
difference between the
7 10:4 ¡3:4 11:56 1:112
proportions across the year
Total 5:816 groups.
EXERCISE 18D.2
1 A random sample of people is taken to find if there is a relationship between smoking
marijuana as a teenager and suffering schizophrenia within the next 15 years. The results
are given in the table below:
Schizophrenic Non-Schizophrenic
Smoker 58 73
Non-smoker 269 624
Test at a 5% level whether there is a relationship between smoking marijuana as a
teenager and suffering schizophrenia within the next 15 years.
2 Examine the following contingency tables for the independence of classifications P and
Q. Use a Â2 test i at a 5% level of significance ii at a 10% level of significance.
a Q Q b Q Q Q Q
1 2 1 2 3 4
P1 11 17 P1 6 11 14 18
P2 21 23 P2 9 12 21 17
P3 28 19 P3 13 24 16 10
P4 17 28
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Y:\...\IBBK3_18\596IB318.CDR
Thu Jul 22 10:03:52 2004
Color profile: Disabled
Composite Default screen
4 The following table shows the results of a random sample where annual income and
cigarette smoking are being compared.
Income level
low average high very high
Smoker 82 167 74 31
Non-smoker 212 668 428 168
Test at a 10% level whether lower income people are more likely to be cigarette smokers.
5 This contingency table shows the responses of a randomly chosen sample of 50+ year
olds to a survey dealing with peoples weight and whether they have diabetes.
Weight
light medium heavy obese
Diabetic 11 19 21 38
Non-diabetic 79 68 74 53
Test at a 1% level whether there is a link between weight and suffering diabetes.
6 The following table is a result of a major investigation considering the two factors of
intelligence level (IQ) and cigarette smoking.
Intelligence level
low average high very high
Non smoker 283 486 226 38
Medium level smoker 123 201 58 18
Heavy smoker 100 147 64 8
Test at a 1% level whether there is a link between intelligence level (IQ) and cigarette
smoking.
Temp t (o C) 32:9 33:9 35:2 37:1 38:9 30:3 32:5 31:7 35:7 36:3 34:7
d km ridden 26:5 26:7 24:4 19:8 18:5 32:6 28:7 29:4 23:8 21:2 29:7
2 The contingency table below shows the results of motor vehicle accidents in relation to
whether or not the traveller was wearing a seat belt.
Serious injury Permanent disablement Death
Wearing a belt 189 104 58
Not wearing a belt 83 67 46
0
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Y:\...\IBBK3_18\597IB318.CDR
Thu Jul 22 09:53:30 2004
Color profile: Disabled
Composite Default screen
Find Â2 and test at a 1% level that the wearing of a seat belt and injury or death are
independent factors.
3 A drinks vendor varies the price of Supa-fizz on a daily basis, and records the number
of sales of the drink (shown below).
Price (p) $2:50 $1:90 $1:60 $2:10 $2:20 $1:40 $1:70 $1:85
Sales (s) 389 450 448 386 381 458 597 431
a Produce a scatterplot of the data. Do there appear to be any outliers? If so, should
they be included in the analysis?
b Calculate the least squares regression line. Could it give an accurate prediction of
sales if Super-fizz was priced at 50 cents?
4 Eight identical flower beds (petunias) were watered a varying number of times each
week, and the number of flowers each bed produced is recorded in the table below:
Income (I thousand $) 10 15 20 25 30 40 50 60 80
Peptic ulcer rate R 8:3 7:7 6:9 7:3 5:9 4:7 3:6 2:6 1:2
95
100
5
25
75
0
25
50
75
95
100
IB_03
cyan black
Y:\...\IBBK3_18\598IB318.CDR
Sun Jul 25 16:47:52 2004
Color profile: Disabled
Composite Default screen
3 The following table is a result of a major investigation considering the two factors of
intelligence level and business success
Intelligence level
low average high very high
No success 35 30 41 25
Low success 28 41 26 29
Success 35 24 41 56
High success 52 38 63 72
Test at a 1% level whether there is a link between intelligence level (IQ) and business
success.
Speed (v km/h) 10 20 30 40 50 60 70 80 90
Stopping time (t secs) 1:23 1:54 1:88 2:20 2:52 2:83 3:15 3:45 3:83
a Produce a scatterplot of the data and indicate its most likely model type.
b Find the linear model which best fits the data. Give evidence as to why you have
chosen this model.
c Use the model to find the stopping time for a speed of:
i 55 km/h ii 110 km/h
d What is the interpretation of the vertical intercept?
e Why does this simple rule apply at all speeds, with a good safety margin?
0
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\599IB318.CDR
Wed Jul 21 09:32:03 2004
Color profile: Disabled
Composite Default screen
5 Two supervillains, Silent Predator and the Furry Reaper terrorise Metropolis by abducting
fair maidens (most of whom happen to be journalists). Superman believes that they are
collaborating, alternatively abducting fair maidens so as not to compete with each other
for ransom money. He plots their abduction rate below (in dozens of maidens).
a Plot the data on a scatterplot, and find the least squares regression line (put Silent
Predator on the x-axis).
b Is their any evidence for Superman’s suspicions? (Calculate the r and r2 and
describe the strength of Silent Predator and Furry Reaper’s relationship.)
c What is the estimated number of the Furry Reaper’s abductions given that Silent
Predator’s were 6 dozen?
d Why is the model inappropriate when the Furry Reaper abducts more than 20 dozen
maidens?
e Calculate the p- and r-intercepts. What do these values represent?
f If Superman is faced with a choice of capturing one supervillian but not the other,
which should he choose? (Hint: Use e.)
0
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\600IB318.CDR
Wed Jul 21 09:32:31 2004