Statistics and Probabiltity
Statistics and Probabiltity
TESTS OF SIGNIFICANCE
(Small Samples)
6.0 Introduction:
In the previous chapter we have discussed problems relating to large samples. The
large sampling theory is based upon two important assumptions such as
(b) The values given by the sample data are sufficiently close to the population
values and can be used in their place for the calculation of the standard error of
the estimate.
The above assumptions do not hold good in the theory of small samples. Thus, a
new technique is needed to deal with the theory of small samples. A sample is small when it
consists of less than 30 items. ( n< 30)
Since in many of the problems it becomes necessary to take a small size sample,
considerable attention has been paid in developing suitable tests for dealing with problems of
small samples. The greatest contribution to the theory of small samples is that of Sir William
Gosset and Prof. R.A. Fisher. Sir William Gosset published his discovery in 1905 under the
pen name ‘Student’ and later on developed and extended by Prof. R.A.Fisher. He gave a test
popularly known as ‘ t-test’ .
123
6.1.2 Properties of t- distribution:
2. Like the normal distribution, t-distribution also symmetrical and has a mean zero.
4. As the sample size approaches 30, the t-distribution, approaches the Normal
distribution.
Suppose it is asked to write any four number then one will have all the numbers of
his choice. If a restriction is applied or imposed to the choice that the sum of these number
should be 50. Here, we have a choice to select any three numbers, say 10, 15, 20 and the
fourth number is 5: [50 – (10 + 15 + 20)]. Thus our choice of freedom is reduced by one, on
the condition that the total be 50. Therefore the restriction placed on the freedom is one and
degree of freedom is three. As the restrictions increase, the freedom is reduced.
The number of independent variates which make up the statistic is known as the
degrees of freedom and is usually denoted by n (Nu)
For the student’ s t-distribution the number of degrees of freedom is the sample size
minus one. It is denoted by n = n -1 The degrees of freedom plays a very important role in χ2
test of a hypothesis.
When we fit a distribution the number of degrees of freedom is (n– k-1) where n is
number of observations and k is number of parameters estimated from the data.
124
For e.g., when we fit a Poisson distribution the degrees of freedom is n = n – 1 – 1.
In a contingency table the degrees of freedom is (r-1) (c -1) where r refers to number
of rows and c refers to number of columns.
In case of data that are given in the form of series of variables in a row or column the
d.f will be the number of observations in a series less one ie., n = n – 1
Critical value of t:
The column figures in the main body of the table come under the headings t0.100, t0.50,
t0.025, t0.010 and t0.005. The subscripts give the proportion of the distribution in ‘ tail’ area. Thus
for two-tailed test at 5% level of significance there will be two rejection areas each containing
2.5% of the total area and the required column is headed t0.025
For example,
tn (.05) for single tailed test = tn (0.025) for two tailed test
tn (.01) for single tailed test = tn (0.005) for two tailed test
Thus for one tailed test at 5% level the rejection area lies in one end of the tail of the
distribution and the required column is headed t0.05.
–∞ – tα t=0 tα +∞
(i) t-test for significance of single mean, population variance being unknown.
(ii) t-test for significance of the difference between two sample means, the population
variances being equal but unknown.
125
6.2 Test of significance for Mean:
We set up the corresponding null and alternative hypotheses as follows:
H0 : μ = μ0; There is no significant difference between the sample mean and population Mean.
Level of significance:
5% or 1%
Calculation of statistic:
Expected value :
x−m
te = ~ student's t-distribution with (n-1) d.f
S
n
Inference :
If t0 ≤ te it falls in the acceptance region and the null hypothesis is accepted and if
to > te the null hypothesis H0 may be rejected at the given level of significance.
Example 1:
50 49 52 44 45 48 46 45 49 45
Solution:
Null hypothesis:
126
Alternative Hypothesis:
Level of Significance:
Let α = 0.05
X d = x – 48 d2
50 2 4
49 1 1
52 4 16
44 –4 16
45 –3 9
48 0 0
46 –2 4
45 –3 9
49 +1 1
45 –3 9
Total –7 69
∑d
x = A+
n
−7
= 48 +
10
= 48 − 0.7 = 47.3
1 2 ( ∑ d )2
S2 = ∑ d −
n − 1 n
1 ( 72 )
= 69 −
9 10
64.1
= = 7.12
9
Calculation of Statistic:
Under H0 the test statistic is :
x−m
t0 =
S2 / n
47.3 − 50.0
=
7.12 / 10
2.7
= = 3.2
0.712
127
Expected value:
Inference:
Example 2:
Solution:
We are given
n = 26; x = 147dozens; s = 16
Null hypothesis:
Alternative Hypothesis:
Calculation of statistic:
x−m
t0 =
S / n −1
147 − 140 7×5
= = = 2.19
16 / 25 16
Expected value:
x−m
te = follows t-distribution with (26 − 1) = 25 d.f
s / n −1
= 1.708
128
Inference:
Suppose we want to test if two independent samples have been drawn from two
normal populations having the same means, the population variances being equal. Let x1,
x2, …. xn and y1, y2, …… yn2 be two independent random samples from the given normal
1
populations.
Null hypothesis:
H0 : μ1 = μ2 i.e. the samples have been drawn from the normal populations with same means.
Alternative Hypothesis:
H1 : μ1 ≠ μ2 (μ1 < μ2 or μ1 > μ2)
Test statistic:
Under the H0, the test statistic is
1 n s 2 + n 2 s2 2
and S2 = [ ∑( x − x)2 + ∑( y − y)2 ] = 1 1
n1 + n 2 − 2 n1 + n 2 − 2
Expected value:
x−y
te = follows t-distribution with n1 + n 2 − 2 d.f
1 1
S +
n1 n 2
Inference:
If the t0 < te we accept the null hypothesis. If t0 > te we reject the null hypothesis.
Example 3:
A group of 5 patients treated with medicine ‘A’ weigh 42, 39, 48, 60 and 41 kgs:
Second group of 7 patients from the same hospital treated with medicine ‘ B’ weigh 38, 42 ,
56, 64, 68, 69 and 62 kgs. Do you agree with the claim that medicine ‘ B’ increases the weight
significantly?
129
Solution:
Let the weights (in kgs) of the patients treated with medicines A and B be denoted by
variables X and Y respectively.
Null hypothesis:
H0 : μ1 = μ2
i.e. There is no significant difference between the medicines A and B as regards their effect on
increase in weight.
Alternative Hypothesis:
X x – x (x = 46) (x – x)2
42 –4 16
39 –7 49
48 2 4
60 14 196
41 –5 25
230 0 290
∑ x 230
x= = = 46
n1 5
Medicine B
Y y–y (y = 57) (y – y)2
38 – 19 361
42 – 15 225
56 –1 1
64 7 49
68 11 121
69 12 144
62 5 25
399 0 926
130
∑ y 399
y= = = 57
n2 7
1
S2 = [ ∑ ( x − x)2 + ∑ ( y − y) 2 ]
n1 + n 2 − 2
1
= [290 + 926] = 121.6
10
Calculation of statistic:
Under H0 the test statistic is
x−y
t0 =
1 1
S2 +
n1 n 2
46 − 57
=
1 1
121.6 +
5 7
11
=
12
121.6 ×
35
11
= = 1.7
6.57
Expected value:
x−y
te = follows t-distribution with (5 + 7 − 2) = 10 d.f
1 1
S2 +
n1 n 2
=1.812
Inference:
Since t0 < te it is not significant. Hence H0 is accepted and we conclude that the
medicines A and B do not differ significantly as regards their effect on increase in weight.
Example 4:
Two types of batteries are tested for their length of life and the following data are
obtained:
We are given
Null hypothesis:
H0 : μ1 = μ2 i.e. Two types of batteries A and B are identical i.e. there is no significant
difference between two types of batteries.
Alternative Hypothesis:
H1 : μ1 ≠ μ2 (Two- tailed)
Level of Significance:
Let α = 5%
Calculation of statistics:
x−y
t0 =
1 1
S2 +
n1 n 2
n1s12 + n 2s2 2
where S2 =
n1 + n 2 − 2
9 × 121 + 8 × 144
=
9+8−2
2241
= = 149.4
15
600 − 640
∴ t0 =
1 1
149.4 +
9 8
40 40
= = = 6.735
17 5.9391
149.4
72
Expected value:
x−y
te = follows t-distribution with 9 + 8 − 2 = 15 d.f
1 1
S2 +
n1 n 2
= 2.131
132
Inference:
Since t0 > te it is highly significant. Hence H0 is rejected and we conclude that the two
types of batteries differ significantly as regards their length of life.
In the t-test for difference of means, the two samples were independent of each other.
Let us now take a particular situations where
(ii) The sample observations (x1, x2, …….. xn) and (y1, y2, ……. yn) are not
completely independent but they are dependent in pairs.
That is we are making two observations one before treatment and another after the
treatment on the same individual. For example a business concern wants to find if a particular
media of promoting sales of a product, say door to door canvassing or advertisement in papers
or through T.V. is really effective. Similarly a pharmaceutical company wants to test the
efficiency of a particular drug, say for inducing sleep after the drug is given. For testing of
such claims gives rise to situations in (i) and (ii) above, we apply paired t-test.
Paired – t –test:
Let di = Xi – Yi (i = 1, 2, ……n) denote the difference in the observations for the ith unit.
Null hypothesis:
Alternative Hypothesis:
d
t0 =
S/ n
åd 1 1 ( å d )2
where d = and S2 = å(d − d )2 = å d2 −
n n −1 n −1 n
Expected value:
d
te = follows t-distribution with n − 1 d.f
S/ n
Inference:
To test the desirability of a certain modification in typists desks, 9 typists were given
two tests of as nearly as possible the same nature, one on the desk in use and the other on the
new type.
The following difference in the number of words typed per minute were recorded:
Typists A B C D E F G H I
Increase in number of words 2 4 0 3 –1 4 –3 2 5
Solution:
Null hypothesis:
Typist d d2
A 2 4
B 4 16
C 0 0
D 3 9
E –1 1
F 4 16
G –3 9
H 2 4
I 5 25
∑d = 16 2
∑d = 84
∑ d 16
d= = = 1.778
n 9
1 2 ( ∑ d )2
S= ∑ d −
n − 1 n
1 (16)2
= 84 − = 6.9 = 2.635
8 9
Calculation of statistic:
Under H0 the test statistic is
134
d. n 1.778 × 3
t0 = = = 2.024
S 2.635
Expected value:
d. n
te = follows t-distribution with 9 − 1 = 8 d.f
S
= 1.860
Inference:
When to < te the null hypothesis is accepted. The data does not indicate that the
modification in desk promotes speed in typing.
Example 6:
An IQ test was administered to 5 persons before and after they were trained. The
results are given below:
Candidates I II III IV V
IQ before training 110 120 123 132 125
IQ after training 120 118 125 136 121
Test whether there is any change in IQ after the training programme (test at 1% level
of significance)
Solution:
Null hypothesis:
H0 : μ1 = μ2 i.e. there is no significant change in IQ after the training programme.
Alternative Hypothesis:
H1 : μ1 ≠ μ2 (two tailed test)
Level of significance :
α = 0.01
x 110 120 123 132 125 Total
y 120 118 125 136 121 -
d=x–y – 10 2 –2 –4 4 – 10
d2 100 4 4 16 16 140
∑ d −10
d= = = −2
n 5
1 2 ( ∑ d )2
S2 = ∑ d −
n − 1 n
1 100
= 140 − = 30
4 5
135
Calculation of Statistic:
Under H0 the test statistic is
Expected value:
d
te = 2 follows t-distribution with 5 − 1 = 4 d.f
S/ n
= 4.604
Inference:
6.4.1 Definition:
The Chi- square (χ2) test (Chi-pronounced as ki) is one of the simplest and most
widely used non-parametric tests in statistical work. The χ2 test was first used by Karl Pearson
in the year 1900. The quantity χ2 describes the magnitude of the discrepancy between theory
and observation. It is defined as
(Oi − Ei)2
n
χ = ∑ 2
=
Ei
i 1
Where O refers to the observed frequencies and E refers to the expected frequencies.
136
Note:
If χ2 is zero, it means that the observed and expected frequencies coincide with each
other. The greater the discrepancy between the observed and expected frequencies the greater
is the value of χ2.
3. The median of χ2 distribution divides, the area of the curve into two equal parts,
each part being 0.5.
4. The mode of χ2 distribution is equal to (n-2)
5. Since Chi-square values always positive, the Chi square curve is always positively
skewed.
6. Since Chi-square values increase with the increase in the degrees of freedom, there
is a new Chi-square distribution with every increase in the number of degrees of
freedom.
7. The lowest value of Chi-square is zero and the highest value is infinity ie χ2 ≥ 0.
8. When Two Chi- squares χ12 and χ22 are independent χ2 distribution with n1 and n2
degrees of freedom and their sum χ12 + χ22 will follow χ2 distribution with (n1 + n2)
degrees of freedom.
137
9.
When n (d.f) > 30, the distribution of 2 χ2 approximately follows normal
1. N, the total frequency should be reasonably large, say greater than 50.
3. Each of the observations which makes up the sample for this test must be
independent of each other.
χ2 test is wholly dependent on degrees of freedom.
4.
Follows χ2-distribution with ν = n – k – 1 d.f. where 01, 02, ...0n are the observed
frequencies, E1, E2…En, corresponding to the expected frequencies and k is the number of
parameters to be estimated from the given data. A test is done by comparing the computed
value with the table value of χ2 for the desired degrees of freedom.
Example 7:
Four coins are tossed simultaneously and the number of heads occurring at each throw
was noted. This was repeated 240 times with the following results.
No. of heads 0 1 2 3 4
No. of throws 13 64 85 58 20
Fit a Binomial distribution assuming under the hypothesis that the coins are unbiased.
138
Solution:
Null Hypothesis:
H0 : The given data fits the Binomial distribution. i.e the coins are unbiased.
p = q = 1/2 n = 4 N = 240
Computation of expected frequencies:
240
13 15 4 0.27
64 60 16 0.27
85 90 25 0.28
58 60 4 0.07
20 15 25 1.67
2.56
(O − E) 2
χ0 2 = ∑ = 2.56
E
139
Expected Value:
Inference:
Since χ02 < χe2 we accept our null hypothesis at 5% level of significance and say that the
given data fits Binomial distribution.
Example 8:
The following table shows the distribution of goals in a foot ball match.
No. of goals 0 1 2 3 4 5 6 7
No. of matches 95 158 108 63 40 9 5 2
Solution:
Null Hypothesis :
Level of significance :
Let α = 0.05
The other expected frequencies will be obtained by using the recurrence formula
m
f (x + 1) = ´ f ( x)
x +1
Putting x = 0, 1, 2, ... we obtain the following frequencies.
f (1) = 1.7 × 87.84
= 149.328
1.7
f ( 2) = × 149.328
2 140
= 126.93
1.7
f (3) = × 126.93
3
f (1) = 1.7 × 87.84
= 149.328
1.7
f ( 2) = × 149.328
2
= 126.93
1.7
f (3) = × 126.93
3
= 71.927
1.7
f (4) = × 71.927
4
= 30.569
1.7
f (5) = × 30.569
5
= 10.393
1.7
f (6) = × 10.393
6
= 2.94
1.7
f (7) = × 2.94
7
= 0.719
Computation of statistic:
95 88 49 0.56
158 150 64 0.43
108 126 324 2.57
63 72 81 1.13
40 30 100 3.33
9 10
5 16 3 14 4 0.29
2 1
8.31
(O − E ) 2
χ0 2 = ∑ = 8.31
E
141
Expected Value:
Inference:
Since χ02 < χe2 we accept our null hypothesis at 5% level of significance and say that
the given data fits Poission distribution.
B B1 B2 Bj Bc Total
A
A1 (A1B1) (A1B2) (A1Bj) (A1Bc) (A1)
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
∑Ai =
Total (B1) (B2) (Bj) (Bc)
∑Bj = N
142
(Ai) is the number of persons possessing the attribute Ai, (i = 1, 2, … r), (Bj) is the
number of persons possing the attribute Bj,(j = 1,2,3,….c) and (Ai Bj) is the number of
persons possessing both the attributes Ai and Bj (i = 1, 2, …r, j = 1,2,…c).
Under the null hypothesis that the two attributes A and B are independent, the
expected frequency for (AiBj) is given by
(Ai) (Bj)
=
N
Calculation of statistic:
2
(O i − E i ) 2
χ0 = ∑
Ei
Expected value:
Inference:
Now comparing χ02 with χe2 at certain level of significance ,we reject or accept the
null hypothesis accordingly at that level of significance.
Under the null hypothesis of independence of attributes, the value of χ2 for the 2 × 2
contingency table
Total
a b a+b
c d c+d
Total a+c b+d N
is given by
N (ad − bc)2
χ0 2 =
(a + c) ( b + d ) (a + b) (c + d )
( a + c ) ( b + d ) ( a + b) ( c + d )
Example 9:
1000 students at college level were graded according to their I.Q. and the economic
conditions of their homes. Use χ2 test to find out whether there is any association between
economic condition at home and I.Q.
IQ
Economic Conditions Total
High Low
Rich 460 140 600
Poor 240 160 400
Total 700 300 1000
Solution:
Null Hypothesis:
There is no association between economic condition at home and I.Q. i.e. they are
independent.
(A) (B) 600 × 700
E11 = = = 420
N 1000
Total
420 180 600
280 120 400
Total 700 300 1000
144
2
(O − E) 2
χ0 = ∑ = 31.746
E
Expected Value:
Inference:
χ02 > χe2, hence the hypothesis is rejected at 5% level of significance. ∴ There is
association between economic condition at home and I.Q.
Example 10:
Out of a sample of 120 persons in a village, 76 persons were administered a new drug
for preventing influenza and out of them, 24 persons were attacked by influenza. Out of those
who were not administered the new drug ,12 persons were not affected by influenza.. Prepare
(ii) Use chi-square test for finding out whether the new drug is effective or not.
Solution:
Effect of Influenza
New drug Total
Attacked Not attacked
Administered 24 76 – 24 = 52 76
Not administered 44 – 12 = 32 12 120 – 76 = 44
Total 120 – 64 = 56 52 + 12 = 64 120
24 + 32 = 56
Null hypothesis:
‘Attack of influenza’ and the administration of the new drug are independent.
Computation of statistic:
N (ad − bc)2
χ0 2 =
( a + c ) ( b + d ) ( a + b) ( c + d )
120 (24 × 12 − 52 × 32)2
=
56 × 64 × 76 × 44
120 ( −1376)2 120 (1376)2
= =
56 × 64 × 76 × 44 56 × 64 × 76145 × 44
= Anit log [log 120 + 2 log 1376 − (log 56 + log 64 + log 76 + log 44)]
= Antilog (1.2777) = 18.95
N (ad − bc)2
χ0 2 =
(a + c) ( b + d ) (a + b) (c + d )
120 (24 × 12 − 52 × 32)2
=
56 × 64 × 76 × 44
120 ( −1376)2 120 (1376)2
= =
56 × 64 × 76 × 44 56 × 64 × 76 × 44
= Anit log [log 120 + 2 log 1376 − (log 56 + log 64 + log 76 + log 44)]
= Antilog (1.2777) = 18.95
Expected Value:
Inference:
Since χ02 > χe2 H0 is rejected at 5 % level of significance. Hence we conclude that the
new drug is definitely effective in controlling (preventing) the disease (influenza).
Example 11:
Two researchers adopted different sampling techniques while investigating the same
group of students to find the number of students falling in different intelligence levels. The
results are as follows
No. of students
Researchers Total
Below average Average Above average Genius
X 86 60 44 10 200
Y 40 33 25 2 100
Total 126 93 69 12 300
Would you say that the sampling techniques adopted by the two researchers are independent?
Solution:
Null Hypothesis:
146
Below average Average Above average Genius Total
X 84 62 46 200 – 192 = 8 200
Y 126 –84 = 42 93 – 62 = 31 69 – 46 = 23 12 – 8 = 4 100
Total 126 93 69 12 300
86 84 2 4 0.048
60 62 –2 4 0.064
44 46 –2 4 0.087
10 8 2 4 0.500
40 42 –2 4 0.095
33 31 2 4 0.129
25 23
2 27 4 27 0 0 0
300 300 0 0.923
(O − E) 2
χo 2 = ∑ = 0.923
E
Expected Value:
2
(O − E ) 2
χ = ∑ follows χ2-distribution with (4 – 1) (2 –1)
e
E
= 3 – 1 = 2 d.f.
= 5.991
Inference:
Since χ02 < χe2, we accept the null hypothesis at 5 % level of significance. Hence we
conclude that the sampling techniques by the two investigators, do not differ significantly.
Null Hypothesis:
Ho : σ2 = σo2 if x1, x2 … xn
147