Non-Parametric Tests and ANOVA
Non-Parametric Tests and ANOVA
NON-PARAMETRIC TESTS
A non-parametric test is a test which is not concerned with testing of parameters. Non-
parametric tests do not make any assumption regarding the form of the population. Therefore,
non-parametric tests are also called distribution free tests.
Following are the important non-parametric tests:-
1. Chi-square test χ
2. Sign test
3. Signed rank test (Wilcoxon matched pairs test)
4. Rank sum test (Mann-whitney U-test and Kruskal-Wallis H test)
5. Run test
6. Kolmogrov-Smirnor Test (K-S-test)
CHI-SQUARE TEST
The value of chi-square describes the magnitude of difference between observed
frequencies and expected frequencies under certain assumptions. χ value (χ quantity) ranges
from zero to infinity. It is zero when the expected frequencies and observed frequencies
completely coincide. So greater the value of χ , greater is the discrepancy between observed and
expected frequencies.
χ -test is a statistical test which tests the significance of difference between observed
frequencies and corresponding theoretical frequencies of a distribution without any assumption
about the distribution of the population. This is one of the simplest and most widely used non-
parametric test in statistical work. This test was developed by Prof. Karl Pearson in 1990.
Uses of - test
The uses of chi-square test are:-
1. Useful for the test of goodness of fit:- χ - test can be used to test whether there is
goodness of fit between the observed frequencies and expected frequencies.
2. Useful for the test of independence of attributes:- χ test can be used to test whether two
attributes are associated or not.
3. Useful for the test of homogeneity:- χ -test is very useful t5o test whether two attributes
are homogeneous or not.
4. Useful for testing given population variance:- χ -test can be used for testing whether the
given population variance is acceptable on the basis of samples drawn from that
population.
-test as a test of goodness of fit:
As a non-parametric test, χ -test is mainly used to test the goodness of fit between the
observed frequencies and expected frequencies.
Procedure:-
1. Set up mull hypothesis that there is goodness of fit between observed and expected
frequencies.
χ = Σ
Where O = Observed frequencies
E = Expected frequencies
4. Obtain the table value corresponding to the lord of significance and degrees of freedom.
5. Decide whether to accept or reject the null hypothesis. If the calculated value is less than
the table value, we accept the null hypothesis and conclude that there is goodness of fit. If
the calculated value is more than the table value we reject the null hypothesis and
conclude that there is no goodness of fit.
Qn:- A sample analysis of examination result of 200 students were made. It was found that 46
students had failed, 68 secured IIIrd class, 62 IInd class and the rest were placed in the Ist
class. Are these figures commensurate with the general examination results which is in
the ratio of 2 : 3: 3: 2 for various categories respectively?
χ = Σ
Computation of value:
O E
O E O-E O E
E
46 200 = 40 6 36 0.9000
68 200 = 40 8 64 1.0667
62 200 = 40 2 4 0.0667
Σ = 8.4334
χ = 8. 4334
The table value at 5% level of significance
and degree of freedom at 3. = 7. 815
(df = n – r- 1 =4 – 0 – 1 = 3)
we reject the H
we conclude that the analytical figures do not commensurate with the general
examination result. In other words, there is no goodness of fit between the observed and expected
frequencies.
Qn: Test whether the accidents occur uniformity over week days on the basis of the
following information:-
Days of the week: Sun Mon Tue Wed Thu Fri Sat
No. of accidents: 11 13 14 13 15 14 18
Sol: H : There is goodness of fit between observed and expected frequencies, i.e.,
accidents occur uniformly over week days.
H : There is no goodness of fit between observed and expected frequencies; i.e.,
accidents do not accrue uniformly over week days
χ = Σ
Computation of value:
O E
O E O-E O E
E
11 14 -3 9 0.6429
13 14 -1 1 0.0714
14 14 0 0 0.0000
13 14 1 0.0714
-1
15 14 1 0.0714
1
14 14 0 0.0000
0
18 14 16 1.1429
4
= 2.0000
Σ
χ – test is used to find out whether one or more attributes are associated or not.
Procedure:-
1. Set up null and alternative hypothesis.
H : Two attributes are independent (i.e., there is no association between the
attributes)
H : Two attributes are dependent (i.e., there is an association between the
attributes)
2. Find the χ value.
χ = Σ
3. Find the degree of freedom
d.f. = (r-1)(c-1)
Where r = Number of rows
c = Number of columns
4. Obtain table value corresponding to the level of significance and degree of freedom.
5. Describe whether to accept or reject the H . If the calculated value is less than the table
value, we accept the H and conclude that the attributes are independent. If the H and
conclude that the attributes are dependent.
Qn: The following table gives data regarding election to an office:-
Economic Status
Attitude towards election
Rich Poor Total
Favourable 50 155 205
Non favourable 90 110 200
Total 140 265 405
Is attitude towards election influenced by economic status of workers?
Sol: H : The two attributes, election and economic status are independent.
H : The attributes, election and economic status are dependent.
χ = Σ
Observed frequencies are 50, 90, 155 and 110
90 69 21 441 6.39
Σ = 19.26
Qn: In a sample study about the tea habit in two towns, following date are observed in a
sample of size 100 each:-
Town –A:-
51 persons were male, 31 were tea drinkers and 19 were male tea drinks.
Town – B :-
46 persons were male, 17 were male tea drinkers and 26 were tea drinkers.
Is there any association between sex and tea habits ?
If so, in which town it is greater?
Sol:- H : The two attributes, sex and tea habits are independent.
H : The two attributes sex and tea habits are dependent.
Town A:-
2 2 Contingency table of observed frequency
Sex
Male Female Total
Tea habits
Tea Drinkers 19 12 31
Total 51 49 100
Computation of expected frequencies (2 2 contingency table)
Sex →
Male Female Total
Tea Habits ↓
Tea Drinkers = 16 = 15 31
Total 51 49 100
Computation of value:
O E
O E O-E O E
E
19 16 3 9 0.5625
32 35 -3 9 0.2571
12 15 -3 9 0.6000
37 34 3 9 0.2647
Σ = 1.6843
Sex →
Male Female Total
Tea habits ↓
Tea Drinkers 17 9 26
Total 46 54 100
Computation of expected frequencies (2 2 contingency table)
Sex →
Male Female Total
Tea Habits ↓
Tea Drinkers = 12 = 14 26
Total 46 54 100
Computation of value:
O E
O E O-E O E
E
17 12 5 25 2.083
29 34 -5 25 0.735
9 14 -5 25 1.786
45 40 5 25 0.625
Σ 5.229
The table value of χ at 5% level of significance for 1 degree of freedom is 3.84. As the
calculated value is more than the table value, we reject the H . In other words, attributes sex and
tea habits are not independent (i.e., associated) in Town B.
χ = Σ
3. Find the degree of freedom
d.f. = (r-1)(c-1)
Qn: From the adult population of four large cities, random samples were selected and the number
of married and unmarried men were recorded:
Cities
A B C D Total
Married
137 164 152 147 600
Single 32 57 56 35 180
Is there significant variation among the cities in the tendency of men to marry.
Sol:- H : The 4 cities are homogeneous.
H : The 4 cities are heterogeneous.
Single
= 39 = 51 = 48 =42 180
32 39 -7 49 1.2564
57 51 6 36 0.7059
56 48 8 64 1.3333
35 42 -7 49 1.1667
Σ 5.8010
χ – test can be used for testing the given population when the sample is small.
Steps:-
χ =
Where = Sample variance
= Population variance
χ = = = = 8.8999
. .
a) When the number of matched pairs are less than or equal to 25.
Case:1
Procedure:-
1. Set up null hypothesis:
H : There is no significant difference.
H : There is significant difference.
2. Find the difference between each pair of values.
3. Assign ranks to the differences from the smallest to the largest without any regard to sign.
4. Then actual signs of each difference are put to the corresponding ranks.
5. Find the total of positive ranks and negative ranks.
6. Smaller value, as per steps 5 is taken as the calculated value.
7. Obtain the table value of Wilcoxon’s T-Table.
8. Decide whether to accept or reject the null hypothesis.
Qn: Given below is 16 pairs of values showing the performance of two machines A
and B. Test whether there is difference between the performances. Table value of
‘T’ at 5% significanterd is 25.
A: 73, 43, 47, 53, 58, 47, 52, 58, 38, 61, 56, 56, 34, 55, 65, 75
B: 51, 41, 43, 41, 47, 32, 24, 58, 43, 53, 52, 57, 44, 57, 40, 68
U=
∑ ∑ ∑
SSC = … … … . .
MSC = =
..
6. Compute MSE
MSE = =
..
7. Compute F – ratio:
F=
8. Incorporate all these in an ANOVA TABLE as flows:
ANOVA TABLE
Source of Sum of Degree of Means square
F - Ratio
Variation Squares freedom
Between SSC C-1
MSC = F=
Samples
Treatment
Plot name 1 2 3 4
A 42 48 68 80
B 50 66 52 94
C 62 68 76 78
D 34 78 64 82
E 52 70 70 66
Carry out an analysis of variance and state whether there is any significant difference in
treatments.
Sol: H : There is no significant difference in treatments.
X X X X X X X X
= (11,968+22,268+22,100+32,400) -
= 88,736 -
, ,
= 88,736 -
∑ ∑ ∑ ∑
SSC =
=
= 11,520+21,780+21,780+32,000 – 84,500
= 87, 080 – 84, 500 = 2, 580
ONE WAY ANOVA TABLE
Qn: The following data relate to the yield of 4 varieties of rice each shown on 5
plots. Find whether there is significant difference between the mean yield
of these varieties.
Treatment
Plot name 1 2 3 4
P 99 103 109 104
S 99 105 97 107
T 98 95 99 106
Sol: Apply coding method. Subtract 100 from all the observations.
X X X X X X X X
(A) (B) (C) (D)
-1 3 9 4 1 9 81 16
1 2 3 0 1 4 9 0
3 0 7 3 9 0 49 9
-1 5 -3 7 1 25 9 49
-2 -5 -1 6 4 25 1 36
ΣX = 0 ΣX = 5 ΣX = 15 ΣX = 20 16 63 149 110
= (16+63+149+110) -
= 338 -
= 338 -
= 338 – 80 = 258
∑ ∑ ∑ ∑
SSC =
=
=
= 0+5+45+80 – 80
= 50
ONE WAY ANOVA TABLE
As the calculated value is less than the table value, we accept the null hypothesis.
2. Compute SST
SST = Sum of squares of all observations -
3. Compute SSC
∑ ∑ ∑
SSC = … … … . .
4. Compute SSR
∑ ∑ ∑
SSR = … … … . .
Here ∑ X , ∑ X , etc denote the row totals
5. Compute SSE
6. Compute MSC
MSC = =
..
7. Compute MSR
MSR = =
..
8. Compute MSE
MSE = =
..
Fc =
Fr =
Qn: Apply the technique of analysis of variance to the following date relating
to yields of 4 varieties of wheat in 3 blocks:
Blocks
Varieties X Y Z
A 10 9 8
B 7 7 6
C 8 5 4
D 5 4 4
Carry two-way analysis of variance.
Sol:
X Y Z X
Varieties Total X X Total
X X X
A(X 10 9 8 27 100 81 64 245
B (X ) 7 7 6 20 49 49 36 134
C(X ) 8 5 4 17 64 25 16 105
D(X ) 5 4 4 13 25 16 16 57
= 541 -
= 541 -
=
=
= 225+156.25+121 – 494.083
= 502.25 – 494.083 = 8.167
∑ ∑ ∑ ∑
SSR =
=
=
= 243+133.333+96.333+56.333 – 494.083
= 34.916
Machine Type
Workers A B C D
1 44 38 47 36
2 46 40 52 43
3 34 36 44 32
4 43 38 46 33
5 38 42 49 39
(a) Test whether the mean productivity is the same for the different machine types.
(b) Test whether the 5 workers differ with respect to mean productivity.
Let us apply coding method. Let us subtract 40 from all the observations.
A B C D
Workers Total X X X X Total
X X X X
1(X 4 -2 7 -4 5 16 4 49 16 85
2 (X ) 6 0 12 3 21 36 0 144 9 189
4(X ) 3 -2 6 -7 0 9 4 36 49 98
5(X ) -2 2 9 -1 8 4 4 81 1 90
= 594 -
∑ ∑ ∑ ∑
SSC =
=
=
=
1794
= - 20 = 358.8-20 = 338.8
5
∑ X1 2 ∑ X2 2 ∑ X3 2 ∑ X4 2 ∑ X5 2 T2
SSR =
N1 N2 N3 N4 N5 N