Chi-Square Test
Chi-Square Test
Various significance tests like z-test/t-test/F-test are based on the assumptions that the
samples were drawn from the normally distributed population
Parametric tests require the assumption about the type or parameters of the population,
whereas non-parametric tests are used when there is no exact information available about
the population (i.e., if the population has binomial/poisson/normal distribution). These are
distribution-free testing methods. 𝜒 2 test is the most widely used non-parametric test. It
describes the magnitude of discrepancy between theory and observations.
APPLICATIONS OF 𝝌2 TEST
𝜒 2 test is applied in two cases: testing for goodness of it and independence of attributes
Problems
1. The following figures show the distribution of digits in numbers chosen at random
from a telephone directory.
Digits 0 1 2 3 4 5 6 7 8 9 Total
Freq. 1026 1107 997 966 1075 933 1107 972 964 853 10,000
Test whether the digits may be taken to occur equally frequently in the directory
Solution:
𝐻0 : Digits occur equally frequently in the directory
𝐻1 : Digits do not occur equally frequently in the directory
Under the null hypothesis, expected frequency for each of the digits 0,1,2, … ,9 is
10,000
= 1,000
10
2. A survey of 320 families with 5 children each revealed the following distribution
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
Is this result consistent with the hypothesis that male and female births are equally
probable?
Solution:
𝐻0 : Results are consistent with the hypothesis that male and female births are
equally probable
𝐻1 : Results are not consistent with the hypothesis that male and female births are
equally probable
Since the number of trials is finite and only two possibilities are there, if
𝑋: no. of male births,
1 1 𝑥 1 5−𝑥
Then 𝑋~𝐵 (5, 2), i.e., 𝑃(𝑋 = 𝑥) =5 𝐶𝑥 (2) (2)
1 5
𝐸(𝑋 = 𝑥) = 𝑁 × 𝑃(𝑋 = 𝑥) = 320 × ⬚5 𝐶𝑥 ( ) = 10 ×5 𝐶𝑥
2
𝐸(𝑋 = 5) = 10, 𝐸(𝑋 = 4) = 50, 𝐸(𝑋 = 3) = 100, 𝐸(𝑋 = 2) = 100, 𝐸(𝑋 = 1) = 50,
𝐸(𝑋 = 0) = 10
𝒙𝒊 𝑶𝒊 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 ) (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
𝑬𝒊
5 14 10 4 16 1.6
4 56 50 6 36 0.72
3 110 100 10 100 1
2 88 100 -12 144 1.44
1 40 50 -10 100 2
0 12 10 2 4 0.4
(𝑂𝑖 −𝐸𝑖 )2
𝜒 2 = ∑𝑛𝑖=1 = 7.16
𝐸𝑖
2
Tabulated 𝜒0.05 with 6 − 1 = 5 degrees of freedom = 11.07
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted
i.e., the results are consistent with the hypothesis that male and female births are
equally probable
3. In the accounting department of a bank, 100 accounts are selected at random and
examined for errors. The following results have been obtained
No. of errors 0 1 2 3 4 5 6
No. of accounts 36 40 19 2 0 2 1
Does this information verify that the errors are distributed according to poisson
probability law?
Solution:
𝐻0 : Errors are distributed according to poisson probability law
𝐻1 : Errors are not distributed according to poisson probability law
Under the null hypothesis, expected frequencies are computed using the poisson
probability law
𝑒 −𝜆 𝜆𝑥
i.e., if 𝑋~𝑃(𝜆), 𝑃(𝑋 = 𝑥) = , 𝑥 = 0,1,2, … where 𝜆 is the mean
𝑥!
∑𝑓𝑖 𝑥𝑖 100
𝜆 = 𝑥̅ = = =1
∑𝑓𝑖 100
𝑒 −1
i.e., 𝑃(𝑋 = 𝑥) = 𝑥!
36.7879
𝐸(𝑋 = 𝑥) = 100 × 𝑃(𝑋 = 𝑥) =
𝑥!
𝐸(𝑋 = 0) = 36.7879, 𝐸(𝑋 = 1) = 36.7879, 𝐸(𝑋 = 2) = 18.3939,
𝐸(𝑋 = 3) = 6.1313, 𝐸(𝑋 = 4) = 1.5328, 𝐸(𝑋 = 5) = 0.3066, 𝐸(𝑋 = 6) = 0.0511
We note that the last 3 values of expectation are less than 5. Their sum is also less
than 5. Hence, we pool them with 𝐸(𝑋 = 3) and proceed
𝒙𝒊 𝑶𝒊 𝑬𝒊 (𝑶𝒊 (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
− 𝑬𝒊 ) 𝑬𝒊
0 36 36.7879 -0.7879 0.6208 0.0169
1 40 36.7879 3.2121 10.3176 0.2805
2 19 18.3939 0.6061 0.3674 0.0199
3 2 6.1313 -3.0218 9.1313 1.4893
4 0 1.5328
5 2 5 0.3066 8.0218
6 1 0.0511
𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑ = 1.8066
𝐸𝑖
𝑖=1
2
Tabulated 𝜒0.05 with degrees of freedom= 7 − 1 − 1 − 3 = 2 = 5.991
4. The demand for a particular spare part in a factory was found to vary from day to day.
In a sample study, the following information was obtained
Days Monday Tuesday Wednesday Thursday Friday Saturday
No. of parts 1124 1125 1110 1120 1126 1115
Test the hypothesis that the number of parts does not depend on the day of the week
(i.e., test if the demand is equal or not)
Solution:
𝐻0 : Number of parts does not depend on the day of the week
𝐻1 : Number of parts depends on the day of the week
∑𝑂𝑖 6720
Under the null hypothesis, the expected frequency is 𝐸𝑖 = = = 1120
𝑁 6
𝑶𝒊 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 ) (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
𝑬𝒊
1124 1120 4 16 0.014
1125 1120 5 25 0.022
1110 1120 -10 100 0.089
1120 1120 0 0 0
1126 1120 6 36 0.032
1115 1120 -5 25 0.022
𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑ = 0.179
𝐸𝑖
𝑖=1
2
Tabulated 𝜒0.05 with degrees of freedom= 6 − 1 = 5 is 11.07
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted
i.e., the number of parts does not depend on the days of week
5. Two researchers adopted different sampling techniques while investigating the same
group of students to find the number of students falling in different intelligence levels.
The results are as follows
Researcher Number of students in each level Total
Below avg. Average Above avg. Genius
𝑿 86 60 44 10 200
𝒀 40 33 25 2 100
Total 126 93 69 12 300
Would you say that the sampling technique adopted by the two researchers are
significantly different?
Solution:
𝐻0 : There is no significant difference in the sampling techniques adopted by 𝑋 and 𝑌
𝐻1 : There is significant difference in the sampling techniques adopted by 𝑋 and 𝑌
Under the null hypothesis, the expected frequencies are
200 × 126 200 × 93 200 × 69
𝐸(86) = = 84, 𝐸(60) = = 62, 𝐸(40) = = 46
300 300 300
200 × 12 100 × 126 100 × 93
𝐸(10) = = 8, 𝐸(40) = = 42, 𝐸(33) = = 31
300 300 300
100 × 69 100 × 12
𝐸(25) = = 23, 𝐸(2) = =4
300 300
Since 𝐸(2) = 4 < 5, we need to pool the frequencies
𝑶𝒊 𝑬𝒊 (𝑶𝒊 (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
− 𝑬𝒊 ) 𝑬𝒊
86 84 2 4 0.0476
60 62 -2 4 0.0645
44 46 -2 4 0.0869
10 8 2 4 0.5
40 42 -2 4 0.0952
33 31 2 4 0.1290
25 23 0 0 0
2 27 4 27
𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑ = 0.9232
𝐸𝑖
𝑖=1
2
Tabulated 𝜒0.05 with degrees of freedom= (𝑟 − 1)(𝑐 − 1) − 1 = (2 − 1)(4 − 1) − 1 =
2 is 5.991
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted
Exercise
1. The following table gives the number of aircraft accidents that occurs during the
various days of the week. Find whether the accidents are uniformly distributed over
2
the week (Ans: 𝜒𝑐𝑎𝑙𝑐 = 4.165, Accept 𝐻0 )
Days Sun Mon Tues Wed Thurs Fri Sat
No. of accidents 14 16 8 12 11 9 14
2. The theory predicts the proportion of beans in the four groups A,B,C and D should be
9: 3: 3: 1. In an experiment among 1600 beans, the numbers in the four groups were
882, 313, 287 and 118. Does the experimental result support the theory?
2
(Ans: 𝜒𝑐𝑎𝑙𝑐 = 4.72, Accept 𝐻0 )
3. Fit a poisson distribution to the following data and test the goodness of fit
𝑿 0 1 2 3 4 5 6
𝒇 275 72 30 7 5 2 1
2
(Ans: 𝜒𝑐𝑎𝑙𝑐 = 40.9387, Reject 𝐻0 )
4. Two sample polls of votes for two candidates 𝐴 and 𝐵 for a public office are taken.
The results are given in the table. Examine whether the nature of the area is related
2
to voting preference in this election (Ans: 𝜒𝑐𝑎𝑙𝑐 = 10.0916, Reject 𝐻0 )
Votes for→ A B
Area
Rural 620 380
Urban 550 450