0% found this document useful (0 votes)
11 views7 pages

Chi-Square Test

The document discusses the Chi-Square (𝜒2) test, a widely used non-parametric statistical test that assesses the discrepancy between observed and expected frequencies. It outlines the conditions for applying the test, its applications in goodness of fit and independence of attributes, and provides examples of its use in various scenarios. Additionally, it includes calculations and hypotheses testing to determine if observed distributions align with expected distributions.

Uploaded by

khushpatel1222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Chi-Square Test

The document discusses the Chi-Square (𝜒2) test, a widely used non-parametric statistical test that assesses the discrepancy between observed and expected frequencies. It outlines the conditions for applying the test, its applications in goodness of fit and independence of attributes, and provides examples of its use in various scenarios. Additionally, it includes calculations and hypotheses testing to determine if observed distributions align with expected distributions.

Uploaded by

khushpatel1222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CHI-SQUARE (𝝌2 ) TEST

Various significance tests like z-test/t-test/F-test are based on the assumptions that the
samples were drawn from the normally distributed population

Parametric tests require the assumption about the type or parameters of the population,
whereas non-parametric tests are used when there is no exact information available about
the population (i.e., if the population has binomial/poisson/normal distribution). These are
distribution-free testing methods. 𝜒 2 test is the most widely used non-parametric test. It
describes the magnitude of discrepancy between theory and observations.

Conditions for applying 𝝌2 test

(i) Each cell should contain atleast 5 observations, otherwise 𝜒 2 will be


overestimated which leads to rejection of the null hypothesis. (If one cell
observation is less than 5, then it is pooled with the preceding or succeeding cell
so that the pooled frequency is greater than 5. The degrees of freedom are
adjusted accordingly)
(ii) All observations are completely random and independent
(iii) Total sample size is greater than or equal to 50

APPLICATIONS OF 𝝌2 TEST

𝜒 2 test is applied in two cases: testing for goodness of it and independence of attributes

Goodness of fit test


In several occasions, we need to check if an actual sample distribution matches with known
probability distribution (binomial/poisson/normal) or not. The 𝜒 2 test for goodness of fit
enables us to determine the extent to which the theoretical probability distributions coincide
with the empirical sample distributions

The test statistic for testing the hypothesis is


𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑
𝐸𝑖
𝑖=1

with 𝑛 − 1 degrees of freedom, and ∑𝑛𝑖=1 𝑂𝑖 = ∑𝑛𝑖=1 𝐸𝑖

where 𝑂𝑖 : Observed frequency, 𝐸𝑖 : Expected frequency


𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Test for independence of attributes
𝜒 2 helps us check if two or more attributes are associated or not.

The test statistic for testing the hypothesis is


𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑
𝐸𝑖
𝑖=1

with (𝑟 − 1)(𝑐 − 1) degrees of freedom, and ∑𝑛𝑖=1 𝑂𝑖 = ∑𝑛𝑖=1 𝐸𝑖

where 𝑂𝑖 : Observed frequency, 𝐸𝑖 : Expected frequency


𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Problems

1. The following figures show the distribution of digits in numbers chosen at random
from a telephone directory.
Digits 0 1 2 3 4 5 6 7 8 9 Total
Freq. 1026 1107 997 966 1075 933 1107 972 964 853 10,000
Test whether the digits may be taken to occur equally frequently in the directory
Solution:
𝐻0 : Digits occur equally frequently in the directory
𝐻1 : Digits do not occur equally frequently in the directory
Under the null hypothesis, expected frequency for each of the digits 0,1,2, … ,9 is
10,000
= 1,000
10

Digits 𝑶𝒊 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 ) (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2


𝑬𝒊
0 1026 1000 676 676 0.676
1 1107 1000 107 11449 11.449
2 997 1000 -3 9 0.009
3 966 1000 -34 1156 1.156
4 1075 1000 75 5625 5.625
5 933 1000 -67 4489 4.489
6 1107 1000 107 11449 11.449
7 972 1000 -28 784 0.784
8 964 1000 -36 1296 1.296
9 853 1000 -147 21609 21.609
(𝑂𝑖 −𝐸𝑖 )2
𝜒 2 = ∑𝑛𝑖=1 = 58.582
𝐸𝑖
2
Tabulated 𝜒0.05 with 10 − 1 = 9 degrees of freedom = 16.919
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 > 𝜒0.05 ⟹ 𝐻0 is rejected

i.e., the digits do not occur equally frequently in the directory

2. A survey of 320 families with 5 children each revealed the following distribution
No. of boys 5 4 3 2 1 0
No. of girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
Is this result consistent with the hypothesis that male and female births are equally
probable?
Solution:
𝐻0 : Results are consistent with the hypothesis that male and female births are
equally probable
𝐻1 : Results are not consistent with the hypothesis that male and female births are
equally probable
Since the number of trials is finite and only two possibilities are there, if
𝑋: no. of male births,
1 1 𝑥 1 5−𝑥
Then 𝑋~𝐵 (5, 2), i.e., 𝑃(𝑋 = 𝑥) =5 𝐶𝑥 (2) (2)
1 5
𝐸(𝑋 = 𝑥) = 𝑁 × 𝑃(𝑋 = 𝑥) = 320 × ⬚5 𝐶𝑥 ( ) = 10 ×5 𝐶𝑥
2
𝐸(𝑋 = 5) = 10, 𝐸(𝑋 = 4) = 50, 𝐸(𝑋 = 3) = 100, 𝐸(𝑋 = 2) = 100, 𝐸(𝑋 = 1) = 50,
𝐸(𝑋 = 0) = 10
𝒙𝒊 𝑶𝒊 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 ) (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
𝑬𝒊
5 14 10 4 16 1.6
4 56 50 6 36 0.72
3 110 100 10 100 1
2 88 100 -12 144 1.44
1 40 50 -10 100 2
0 12 10 2 4 0.4
(𝑂𝑖 −𝐸𝑖 )2
𝜒 2 = ∑𝑛𝑖=1 = 7.16
𝐸𝑖

2
Tabulated 𝜒0.05 with 6 − 1 = 5 degrees of freedom = 11.07
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted

i.e., the results are consistent with the hypothesis that male and female births are
equally probable
3. In the accounting department of a bank, 100 accounts are selected at random and
examined for errors. The following results have been obtained
No. of errors 0 1 2 3 4 5 6
No. of accounts 36 40 19 2 0 2 1
Does this information verify that the errors are distributed according to poisson
probability law?
Solution:
𝐻0 : Errors are distributed according to poisson probability law
𝐻1 : Errors are not distributed according to poisson probability law
Under the null hypothesis, expected frequencies are computed using the poisson
probability law
𝑒 −𝜆 𝜆𝑥
i.e., if 𝑋~𝑃(𝜆), 𝑃(𝑋 = 𝑥) = , 𝑥 = 0,1,2, … where 𝜆 is the mean
𝑥!
∑𝑓𝑖 𝑥𝑖 100
𝜆 = 𝑥̅ = = =1
∑𝑓𝑖 100
𝑒 −1
i.e., 𝑃(𝑋 = 𝑥) = 𝑥!
36.7879
𝐸(𝑋 = 𝑥) = 100 × 𝑃(𝑋 = 𝑥) =
𝑥!
𝐸(𝑋 = 0) = 36.7879, 𝐸(𝑋 = 1) = 36.7879, 𝐸(𝑋 = 2) = 18.3939,
𝐸(𝑋 = 3) = 6.1313, 𝐸(𝑋 = 4) = 1.5328, 𝐸(𝑋 = 5) = 0.3066, 𝐸(𝑋 = 6) = 0.0511
We note that the last 3 values of expectation are less than 5. Their sum is also less
than 5. Hence, we pool them with 𝐸(𝑋 = 3) and proceed
𝒙𝒊 𝑶𝒊 𝑬𝒊 (𝑶𝒊 (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
− 𝑬𝒊 ) 𝑬𝒊
0 36 36.7879 -0.7879 0.6208 0.0169
1 40 36.7879 3.2121 10.3176 0.2805
2 19 18.3939 0.6061 0.3674 0.0199
3 2 6.1313 -3.0218 9.1313 1.4893
4 0 1.5328
5 2 5 0.3066 8.0218
6 1 0.0511
𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑ = 1.8066
𝐸𝑖
𝑖=1

2
Tabulated 𝜒0.05 with degrees of freedom= 7 − 1 − 1 − 3 = 2 = 5.991

(7 is the total number of observations. 1 degree of freedom is lost due to linearity


∑𝑂𝑖 = ∑𝐸𝑖 . 1 degree of freedom is lost as 𝜆 is estimated from the given data. 3
degrees of freedom are lost due to pooling of the observations)
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted

i.e., the errors are distributed according to poisson probability law

4. The demand for a particular spare part in a factory was found to vary from day to day.
In a sample study, the following information was obtained
Days Monday Tuesday Wednesday Thursday Friday Saturday
No. of parts 1124 1125 1110 1120 1126 1115
Test the hypothesis that the number of parts does not depend on the day of the week
(i.e., test if the demand is equal or not)
Solution:
𝐻0 : Number of parts does not depend on the day of the week
𝐻1 : Number of parts depends on the day of the week
∑𝑂𝑖 6720
Under the null hypothesis, the expected frequency is 𝐸𝑖 = = = 1120
𝑁 6
𝑶𝒊 𝑬𝒊 (𝑶𝒊 − 𝑬𝒊 ) (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
𝑬𝒊
1124 1120 4 16 0.014
1125 1120 5 25 0.022
1110 1120 -10 100 0.089
1120 1120 0 0 0
1126 1120 6 36 0.032
1115 1120 -5 25 0.022
𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑ = 0.179
𝐸𝑖
𝑖=1

2
Tabulated 𝜒0.05 with degrees of freedom= 6 − 1 = 5 is 11.07
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted

i.e., the number of parts does not depend on the days of week

5. Two researchers adopted different sampling techniques while investigating the same
group of students to find the number of students falling in different intelligence levels.
The results are as follows
Researcher Number of students in each level Total
Below avg. Average Above avg. Genius
𝑿 86 60 44 10 200
𝒀 40 33 25 2 100
Total 126 93 69 12 300
Would you say that the sampling technique adopted by the two researchers are
significantly different?
Solution:
𝐻0 : There is no significant difference in the sampling techniques adopted by 𝑋 and 𝑌
𝐻1 : There is significant difference in the sampling techniques adopted by 𝑋 and 𝑌
Under the null hypothesis, the expected frequencies are
200 × 126 200 × 93 200 × 69
𝐸(86) = = 84, 𝐸(60) = = 62, 𝐸(40) = = 46
300 300 300
200 × 12 100 × 126 100 × 93
𝐸(10) = = 8, 𝐸(40) = = 42, 𝐸(33) = = 31
300 300 300
100 × 69 100 × 12
𝐸(25) = = 23, 𝐸(2) = =4
300 300
Since 𝐸(2) = 4 < 5, we need to pool the frequencies
𝑶𝒊 𝑬𝒊 (𝑶𝒊 (𝑶𝒊 − 𝑬𝒊 )2 (𝑶𝒊 − 𝑬𝒊 )2
− 𝑬𝒊 ) 𝑬𝒊
86 84 2 4 0.0476
60 62 -2 4 0.0645
44 46 -2 4 0.0869
10 8 2 4 0.5
40 42 -2 4 0.0952
33 31 2 4 0.1290
25 23 0 0 0
2 27 4 27
𝑛
2
(𝑂𝑖 − 𝐸𝑖 )2
𝜒 =∑ = 0.9232
𝐸𝑖
𝑖=1

2
Tabulated 𝜒0.05 with degrees of freedom= (𝑟 − 1)(𝑐 − 1) − 1 = (2 − 1)(4 − 1) − 1 =
2 is 5.991
2 2 (𝑡𝑎𝑏)
𝜒𝑐𝑎𝑙𝑐 < 𝜒0.05 ⟹ 𝐻0 is accepted

i.e., there is no significant difference in the sampling technique adopted by 𝑋 and 𝑌

Exercise

1. The following table gives the number of aircraft accidents that occurs during the
various days of the week. Find whether the accidents are uniformly distributed over
2
the week (Ans: 𝜒𝑐𝑎𝑙𝑐 = 4.165, Accept 𝐻0 )
Days Sun Mon Tues Wed Thurs Fri Sat
No. of accidents 14 16 8 12 11 9 14
2. The theory predicts the proportion of beans in the four groups A,B,C and D should be
9: 3: 3: 1. In an experiment among 1600 beans, the numbers in the four groups were
882, 313, 287 and 118. Does the experimental result support the theory?
2
(Ans: 𝜒𝑐𝑎𝑙𝑐 = 4.72, Accept 𝐻0 )

3. Fit a poisson distribution to the following data and test the goodness of fit
𝑿 0 1 2 3 4 5 6
𝒇 275 72 30 7 5 2 1
2
(Ans: 𝜒𝑐𝑎𝑙𝑐 = 40.9387, Reject 𝐻0 )
4. Two sample polls of votes for two candidates 𝐴 and 𝐵 for a public office are taken.
The results are given in the table. Examine whether the nature of the area is related
2
to voting preference in this election (Ans: 𝜒𝑐𝑎𝑙𝑐 = 10.0916, Reject 𝐻0 )
Votes for→ A B
Area
Rural 620 380
Urban 550 450

You might also like