0% found this document useful (0 votes)
143 views25 pages

Statistics and Probabiltity

This document discusses Student's t-test and its applications for testing hypotheses about population means when the population standard deviation is unknown. It defines the t-statistic and the assumptions of the t-test. The t-distribution has a greater dispersion than the normal distribution and approaches normal as sample size increases. Degrees of freedom are discussed and are equal to sample size minus one for the t-test. Two examples are provided to illustrate hypothesis testing using the t-test. The first tests if the average packing of bags equals 50kg using a sample, and the second tests if an advertising campaign increased sales using a sample of shop sales data.

Uploaded by

nikhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views25 pages

Statistics and Probabiltity

This document discusses Student's t-test and its applications for testing hypotheses about population means when the population standard deviation is unknown. It defines the t-statistic and the assumptions of the t-test. The t-distribution has a greater dispersion than the normal distribution and approaches normal as sample size increases. Degrees of freedom are discussed and are equal to sample size minus one for the t-test. Two examples are provided to illustrate hypothesis testing using the t-test. The first tests if the average packing of bags equals 50kg using a sample, and the second tests if an advertising campaign increased sales using a sample of shop sales data.

Uploaded by

nikhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

6.

TESTS OF SIGNIFICANCE
(Small Samples)

6.0 Introduction:
In the previous chapter we have discussed problems relating to large samples. The
large sampling theory is based upon two important assumptions such as

(a) The random sampling distribution of a statistic is approximately normal and

(b) The values given by the sample data are sufficiently close to the population
values and can be used in their place for the calculation of the standard error of
the estimate.

The above assumptions do not hold good in the theory of small samples. Thus, a
new technique is needed to deal with the theory of small samples. A sample is small when it
consists of less than 30 items. ( n< 30)

Since in many of the problems it becomes necessary to take a small size sample,
considerable attention has been paid in developing suitable tests for dealing with problems of
small samples. The greatest contribution to the theory of small samples is that of Sir William
Gosset and Prof. R.A. Fisher. Sir William Gosset published his discovery in 1905 under the
pen name ‘Student’ and later on developed and extended by Prof. R.A.Fisher. He gave a test
popularly known as ‘ t-test’ .

6.1 t - statistic definition:


If x1, x2, ……xn is a random sample of size n from a normal population with mean μ
x−m
and variance σ2, then Student’ s t-statistic is defined as t =
S
n
∑x
where x = is the sample mean
n
1
and S2 = ∑ ( x − x)2
n −1
is an unbiased estimate of the population variance σ2 It follows student’ s t-distribution with
n = n – 1 d.f

6.1.1 Assumptions for students t-test:

1. The parent population from which the sample drawn is normal.

2. The sample observations are random and independent.

3. The population standard deviation s is not known.

123
6.1.2 Properties of t- distribution:

1. t-distribution ranges from -∞ to ∞ just as does a normal distribution.

2. Like the normal distribution, t-distribution also symmetrical and has a mean zero.

3. t-distribution has a greater dispersion than the standard normal distribution.

4. As the sample size approaches 30, the t-distribution, approaches the Normal
distribution.

Comparison between Normal curve and corresponding t - curve:

6.1.3 Degrees of freedom (d.f):

Suppose it is asked to write any four number then one will have all the numbers of
his choice. If a restriction is applied or imposed to the choice that the sum of these number
should be 50. Here, we have a choice to select any three numbers, say 10, 15, 20 and the
fourth number is 5: [50 – (10 + 15 + 20)]. Thus our choice of freedom is reduced by one, on
the condition that the total be 50. Therefore the restriction placed on the freedom is one and
degree of freedom is three. As the restrictions increase, the freedom is reduced.
The number of independent variates which make up the statistic is known as the
degrees of freedom and is usually denoted by n (Nu)

The number of degrees of freedom for n observations is n - k where k is the number of


independent linear constraint imposed upon them.

For the student’ s t-distribution the number of degrees of freedom is the sample size
minus one. It is denoted by n = n -1 The degrees of freedom plays a very important role in χ2

test of a hypothesis.

When we fit a distribution the number of degrees of freedom is (n– k-1) where n is
number of observations and k is number of parameters estimated from the data.

124
For e.g., when we fit a Poisson distribution the degrees of freedom is n = n – 1 – 1.

In a contingency table the degrees of freedom is (r-1) (c -1) where r refers to number
of rows and c refers to number of columns.

Thus in a 3 × 4 table the d.f are (3 – 1) (4 – 1) = 6 d.f. In a 2 × 2 contingency table the


d.f are (2 – 1) (2 – 1) = 1

In case of data that are given in the form of series of variables in a row or column the
d.f will be the number of observations in a series less one ie., n = n – 1
Critical value of t:

The column figures in the main body of the table come under the headings t0.100, t0.50,
t0.025, t0.010 and t0.005. The subscripts give the proportion of the distribution in ‘ tail’ area. Thus
for two-tailed test at 5% level of significance there will be two rejection areas each containing
2.5% of the total area and the required column is headed t0.025

For example,

tn (.05) for single tailed test = tn (0.025) for two tailed test

tn (.01) for single tailed test = tn (0.005) for two tailed test

Thus for one tailed test at 5% level the rejection area lies in one end of the tail of the
distribution and the required column is headed t0.05.

Critical value of t – distribution

–∞ – tα t=0 tα +∞

6.1.4 Applications of t-distribution:

The t-distribution has a number of applications in statistics, of which we shall discuss


the following in the coming sections:

(i) t-test for significance of single mean, population variance being unknown.

(ii) t-test for significance of the difference between two sample means, the population
variances being equal but unknown.

(a) Independent samples (b) Related samples: paired t-test

125
6.2 Test of significance for Mean:
We set up the corresponding null and alternative hypotheses as follows:

H0 : μ = μ0; There is no significant difference between the sample mean and population Mean.

H1: μ ≠ μ0 ( μ < μ0 (or) μ > μ0)

Level of significance:

5% or 1%

Calculation of statistic:

Under H0 the test statistic is


Expected value :

x−m
te = ~ student's t-distribution with (n-1) d.f
S
n

Inference :

If t0 ≤ te it falls in the acceptance region and the null hypothesis is accepted and if
to > te the null hypothesis H0 may be rejected at the given level of significance.

Example 1:

Certain pesticide is packed into bags by a machine. A random sample of 10 bags is


drawn and their contents are found to weigh (in kg) as follows:

50 49 52 44 45 48 46 45 49 45

Test if the average packing can be taken to be 50 kg.

Solution:

Null hypothesis:

H0 : μ = 50 kgs in the average packing is 50 kgs.

126
Alternative Hypothesis:

H1 : μ ≠ 50kgs (Two -tailed )

Level of Significance:

Let α = 0.05

Calculation of sample mean and S.D

X d = x – 48 d2
50 2 4
49 1 1
52 4 16
44 –4 16
45 –3 9
48 0 0
46 –2 4
45 –3 9
49 +1 1
45 –3 9
Total –7 69
∑d
x = A+
n
−7
= 48 +
10
= 48 − 0.7 = 47.3
1  2 ( ∑ d )2 
S2 = ∑ d − 
n − 1  n 

1 ( 72 ) 
=  69 − 
9  10 
64.1
= = 7.12
9

Calculation of Statistic:
Under H0 the test statistic is :
x−m
t0 =
S2 / n
47.3 − 50.0
=
7.12 / 10
2.7
= = 3.2
0.712
127
Expected value:

follows t distribution with (10–1) d.f



= 2.262

Inference:

Since t0 > te , H0 is rejected at 5% level of significance and we conclude that the


average packing cannot be taken to be 50 kgs.

Example 2:

A soap manufacturing company was distributing a particular brand of soap through


a large number of retail shops. Before a heavy advertisement campaign, the mean sales per
week per shop was 140 dozens. After the campaign, a sample of 26 shops was taken and the
mean sales was found to be 147 dozens with standard deviation 16. Can you consider the
advertisement effective?

Solution:

We are given

n = 26; x = 147dozens; s = 16

Null hypothesis:

H0: μ = 140 dozens i.e., Advertisement is not effective.

Alternative Hypothesis:

H1: μ > 140 dozens (Right -tailed)

Calculation of statistic:

Under the null hypothesis H0, the test statistic is

x−m
t0 =
S / n −1
147 − 140 7×5
= = = 2.19
16 / 25 16

Expected value:

x−m
te = follows t-distribution with (26 − 1) = 25 d.f
s / n −1
= 1.708

128
Inference:

Since t0 > te, H0 is rejected at 5% level of significance. Hence we conclude that


advertisement is certainly effective in increasing the sales.

6.3 Test of significance for difference between two means:


6.3.1 Independent samples:

Suppose we want to test if two independent samples have been drawn from two
normal populations having the same means, the population variances being equal. Let x1,
x2, …. xn and y1, y2, …… yn2 be two independent random samples from the given normal
1

populations.
Null hypothesis:
H0 : μ1 = μ2 i.e. the samples have been drawn from the normal populations with same means.
Alternative Hypothesis:
H1 : μ1 ≠ μ2 (μ1 < μ2 or μ1 > μ2)
Test statistic:
Under the H0, the test statistic is


1 n s 2 + n 2 s2 2
and S2 = [ ∑( x − x)2 + ∑( y − y)2 ] = 1 1
n1 + n 2 − 2 n1 + n 2 − 2

Expected value:

x−y
te = follows t-distribution with n1 + n 2 − 2 d.f
1 1
S +
n1 n 2

Inference:
If the t0 < te we accept the null hypothesis. If t0 > te we reject the null hypothesis.

Example 3:
A group of 5 patients treated with medicine ‘A’ weigh 42, 39, 48, 60 and 41 kgs:
Second group of 7 patients from the same hospital treated with medicine ‘ B’ weigh 38, 42 ,
56, 64, 68, 69 and 62 kgs. Do you agree with the claim that medicine ‘ B’ increases the weight
significantly?
129
Solution:

Let the weights (in kgs) of the patients treated with medicines A and B be denoted by
variables X and Y respectively.

Null hypothesis:

H0 : μ1 = μ2

i.e. There is no significant difference between the medicines A and B as regards their effect on
increase in weight.

Alternative Hypothesis:

H1 : μ1 < μ2 (left-tail) i.e. medicine B increases the weight significantly.

Level of significance : Let α = 0.05

Computation of sample means and S.Ds


Medicine A

X x – x (x = 46) (x – x)2
42 –4 16
39 –7 49
48 2 4
60 14 196
41 –5 25
230 0 290

∑ x 230
x= = = 46
n1 5

Medicine B
Y y–y (y = 57) (y – y)2
38 – 19 361
42 – 15 225
56 –1 1
64 7 49
68 11 121
69 12 144
62 5 25
399 0 926
130
∑ y 399
y= = = 57
n2 7
1
S2 = [ ∑ ( x − x)2 + ∑ ( y − y) 2 ]
n1 + n 2 − 2
1
= [290 + 926] = 121.6
10

Calculation of statistic:
Under H0 the test statistic is

x−y
t0 =
 1 1
S2  + 
 n1 n 2 
46 − 57
=
 1 1
121.6  + 
 5 7
11
=
12
121.6 ×
35
11
= = 1.7
6.57

Expected value:

x−y
te = follows t-distribution with (5 + 7 − 2) = 10 d.f
 1 1
S2  + 
 n1 n 2 
=1.812

Inference:

Since t0 < te it is not significant. Hence H0 is accepted and we conclude that the
medicines A and B do not differ significantly as regards their effect on increase in weight.

Example 4:

Two types of batteries are tested for their length of life and the following data are
obtained:

No. of samples Mean life (in hrs) Variance


Type A 9 600 121
Type B 8 640 144

Is there a significant difference in the two means?


131
Solution:

We are given

n1 = 9; x1 = 600hrs; S12 =121; n2 =8; x2 =640hrs; S22 =144

Null hypothesis:

H0 : μ1 = μ2 i.e. Two types of batteries A and B are identical i.e. there is no significant
difference between two types of batteries.

Alternative Hypothesis:

H1 : μ1 ≠ μ2 (Two- tailed)

Level of Significance:

Let α = 5%

Calculation of statistics:

Under H0, the test statistic is

x−y
t0 =
 1 1
S2  + 
 n1 n 2 
n1s12 + n 2s2 2
where S2 =
n1 + n 2 − 2
9 × 121 + 8 × 144
=
9+8−2
2241
= = 149.4
15
600 − 640
∴ t0 =
 1 1
149.4  + 
 9 8
40 40
= = = 6.735
 17  5.9391
149.4  
 72 

Expected value:

x−y
te = follows t-distribution with 9 + 8 − 2 = 15 d.f
 1 1
S2  + 
 n1 n 2 
= 2.131

132
Inference:

Since t0 > te it is highly significant. Hence H0 is rejected and we conclude that the two
types of batteries differ significantly as regards their length of life.

6.3.2 Related samples –Paired t-test:

In the t-test for difference of means, the two samples were independent of each other.
Let us now take a particular situations where

(i) The sample sizes are equal; i.e., n1 = n2 = n (say), and

(ii) The sample observations (x1, x2, …….. xn) and (y1, y2, ……. yn) are not
completely independent but they are dependent in pairs.

That is we are making two observations one before treatment and another after the
treatment on the same individual. For example a business concern wants to find if a particular
media of promoting sales of a product, say door to door canvassing or advertisement in papers
or through T.V. is really effective. Similarly a pharmaceutical company wants to test the
efficiency of a particular drug, say for inducing sleep after the drug is given. For testing of
such claims gives rise to situations in (i) and (ii) above, we apply paired t-test.

Paired – t –test:

Let di = Xi – Yi (i = 1, 2, ……n) denote the difference in the observations for the ith unit.

Null hypothesis:

H0 : μ1 = μ2 ie the increments are just by chance

Alternative Hypothesis:

H1 : μ1 ≠ μ2 (μ1 > μ2 (or) μ1 < μ2)


Calculation of test statistic:

d
t0 =
S/ n
åd 1 1 ( å d )2
where d = and S2 = å(d − d )2 = å d2 −
n n −1 n −1 n

Expected value:

d
te = follows t-distribution with n − 1 d.f
S/ n

Inference:

By comparing t0 and te at the desired level of significance, usually 5% or 1%, we


reject or accept the null hypothesis.
133
Example 5:

To test the desirability of a certain modification in typists desks, 9 typists were given
two tests of as nearly as possible the same nature, one on the desk in use and the other on the
new type.

The following difference in the number of words typed per minute were recorded:

Typists A B C D E F G H I
Increase in number of words 2 4 0 3 –1 4 –3 2 5

Do the data indicate the modification in desk promotes speed in typing?

Solution:

Null hypothesis:

H0 : μ1 = μ2 i.e. the modification in desk does not promote speed in typing.


Alternative Hypothesis:

H1 : μ1 < μ2 (Left tailed test)

Level of significance: Let α = 0.05

Typist d d2
A 2 4
B 4 16
C 0 0
D 3 9
E –1 1
F 4 16
G –3 9
H 2 4
I 5 25
∑d = 16 2
∑d = 84
∑ d 16
d= = = 1.778
n 9
1  2 ( ∑ d )2 
S= ∑ d − 
n − 1  n 

1 (16)2 
= 84 −  = 6.9 = 2.635
8  9 

Calculation of statistic:
Under H0 the test statistic is

134
d. n 1.778 × 3
t0 = = = 2.024
S 2.635

Expected value:

d. n
te = follows t-distribution with 9 − 1 = 8 d.f
S
= 1.860

Inference:

When to < te the null hypothesis is accepted. The data does not indicate that the
modification in desk promotes speed in typing.

Example 6:
An IQ test was administered to 5 persons before and after they were trained. The
results are given below:

Candidates I II III IV V
IQ before training 110 120 123 132 125
IQ after training 120 118 125 136 121
Test whether there is any change in IQ after the training programme (test at 1% level
of significance)
Solution:
Null hypothesis:
H0 : μ1 = μ2 i.e. there is no significant change in IQ after the training programme.
Alternative Hypothesis:
H1 : μ1 ≠ μ2 (two tailed test)
Level of significance :
α = 0.01
x 110 120 123 132 125 Total
y 120 118 125 136 121 -
d=x–y – 10 2 –2 –4 4 – 10
d2 100 4 4 16 16 140
∑ d −10
d= = = −2
n 5
1  2 ( ∑ d )2 
S2 = ∑ d − 
n − 1  n 
1 100 
= 140 − = 30
4 5 
135
Calculation of Statistic:
Under H0 the test statistic is


Expected value:
d
te = 2 follows t-distribution with 5 − 1 = 4 d.f
S/ n
= 4.604

Inference:

Since t0 < te at 1% level of significance we accept the null hypothesis. We therefore,


conclude that there is no change in IQ after the training programme.

6.4 Chi square statistic:


Various tests of significance described previously have mostly applicable to only
quantitative data and usually to the data which are approximately normally distributed. It may
also happens that we may have data which are not normally distributed. Therefore there arises
a need for other methods which are more appropriate for studying the differences between the
expected and observed frequencies. The other method is called Non-parametric or distribution
free test. A non- parametric test may be defined as a statistical test in which no hypothesis
is made about specific values of parameters. Such non-parametric test has assumed great
importance in statistical analysis because it is easy to compute.

6.4.1 Definition:

The Chi- square (χ2) test (Chi-pronounced as ki) is one of the simplest and most
widely used non-parametric tests in statistical work. The χ2 test was first used by Karl Pearson
in the year 1900. The quantity χ2 describes the magnitude of the discrepancy between theory
and observation. It is defined as

 (Oi − Ei)2 
n
χ = ∑ 2

= 
 Ei 
i 1

Where O refers to the observed frequencies and E refers to the expected frequencies.

136
Note:

If χ2 is zero, it means that the observed and expected frequencies coincide with each
other. The greater the discrepancy between the observed and expected frequencies the greater
is the value of χ2.

Chi square - Distribution:

The square of a standard normal variate is a Chi-square variate with 1 degree


of freedom i.e., If X is normally distributed with mean μ and standard deviation σ, then
2
 x − m 2
  is a Chi-square variate (χ ) with 1 d.f. The distribution of Chi-square depends
s 
on the degrees of freedom. There is a different distribution for each number of degrees of
freedom.

6.4.2 properties of Chi-square distribution:

1. The Mean of χ2 distribution is equal to the number of degrees of freedom (n)

2. The variance of χ2 distribution is equal to 2n

3. The median of χ2 distribution divides, the area of the curve into two equal parts,
each part being 0.5.
4. The mode of χ2 distribution is equal to (n-2)

5. Since Chi-square values always positive, the Chi square curve is always positively
skewed.

6. Since Chi-square values increase with the increase in the degrees of freedom, there
is a new Chi-square distribution with every increase in the number of degrees of
freedom.

7. The lowest value of Chi-square is zero and the highest value is infinity ie χ2 ≥ 0.

8. When Two Chi- squares χ12 and χ22 are independent χ2 distribution with n1 and n2
degrees of freedom and their sum χ12 + χ22 will follow χ2 distribution with (n1 + n2)
degrees of freedom.
137
9.
When n (d.f) > 30, the distribution of 2 χ2 approximately follows normal

distribution. The mean of the distribution 2 χ2 is 2 n − 1 and the standard


deviation is equal to 1.

6.4.3 Conditions for applying χ2 test:

The following conditions should be satisfied before applying χ2 test.

1. N, the total frequency should be reasonably large, say greater than 50.

2. No theoretical cell-frequency should be less than 5. If it is less than 5, the


frequencies should be pooled together in order to make it 5 or more than 5.

3. Each of the observations which makes up the sample for this test must be
independent of each other.
χ2 test is wholly dependent on degrees of freedom.
4.

6.5 Testing the Goodness of fit (Binomial and Poisson Distribution):


Karl Pearson in 1900, developed a test for testing the significance of the discrepancy
between experimental values and the theoretical values obtained under some theory or
hypothesis. This test is known as χ2-test of goodness of fit and is used to test if the deviation
between observation (experiment) and theory may be attributed to chance or if it is really due
to the inadequacy of the theory to fit the observed data. Under the null hypothesis that there is
no significant difference between the observed and the theoretical values. Karl Pearson proved
that the statistic
 (Oi − Ei)2 
n
χ = ∑
2

i =1 
 Ei 
(O1 − E1 )2 (O2 − E 2 )2 (O − E n ) 2
= + + ....... n
E1 E2 En

Follows χ2-distribution with ν = n – k – 1 d.f. where 01, 02, ...0n are the observed
frequencies, E1, E2…En, corresponding to the expected frequencies and k is the number of
parameters to be estimated from the given data. A test is done by comparing the computed
value with the table value of χ2 for the desired degrees of freedom.

Example 7:

Four coins are tossed simultaneously and the number of heads occurring at each throw
was noted. This was repeated 240 times with the following results.

No. of heads 0 1 2 3 4
No. of throws 13 64 85 58 20

Fit a Binomial distribution assuming under the hypothesis that the coins are unbiased.
138
Solution:

Null Hypothesis:

H0 : The given data fits the Binomial distribution. i.e the coins are unbiased.

p = q = 1/2 n = 4 N = 240
Computation of expected frequencies:

No. of heads P (X = x) = 4Cxpxqn–x Expected Frequency N. P (X = x)


0 4
 1  1  1  1
0 4C 0     =     × 240 = 15
 2  2  16  16
1 3
 1  1  4  4
1 4C1     =     × 240 = 60
 2  2  16  16
2 2
 1  1  6  6
2 4C 2     =     × 240 = 90
 2  2  16  16
3 1
 1  1  4  4
3 4C3     =     × 240 = 60
 2  2  16  16
4 0
 1  1  1  1
4 4C 4     =     × 240 = 15
 2  2  16  16

240

Computation of chi square values

Observed Frequency Expected Frequency  (O − E ) 2 


(O – E)2 
O E  E 

13 15 4 0.27
64 60 16 0.27
85 90 25 0.28
58 60 4 0.07
20 15 25 1.67
2.56

 (O − E) 2 
χ0 2 = ∑  = 2.56
 E 

139
Expected Value:

follows χ2-distribution with ( n – k – 1) d.f.

(Here k = 0, since no parameter is estimated from the data)


= 9.488 for ν = 5-1= 4 d.f.

Inference:

Since χ02 < χe2 we accept our null hypothesis at 5% level of significance and say that the
given data fits Binomial distribution.

Example 8:

The following table shows the distribution of goals in a foot ball match.

No. of goals 0 1 2 3 4 5 6 7
No. of matches 95 158 108 63 40 9 5 2

Fit a Poisson distribution and test the goodness of fit.

Solution:

Null Hypothesis :

The given data fits the Poisson distribution.

Level of significance :

Let α = 0.05

Computation of expected frequencies:


The other expected frequencies will be obtained by using the recurrence formula
m
f (x + 1) = ´ f ( x)
x +1

Putting x = 0, 1, 2, ... we obtain the following frequencies.
f (1) = 1.7 × 87.84
= 149.328
1.7
f ( 2) = × 149.328
2 140
= 126.93
1.7
f (3) = × 126.93
3
f (1) = 1.7 × 87.84
= 149.328
1.7
f ( 2) = × 149.328
2
= 126.93
1.7
f (3) = × 126.93
3
= 71.927
1.7
f (4) = × 71.927
4
= 30.569
1.7
f (5) = × 30.569
5
= 10.393
1.7
f (6) = × 10.393
6
= 2.94
1.7
f (7) = × 2.94
7
= 0.719

No. of goals 0 1 2 3 4 5 6 7 Total


Expected frequency 88 149 127 72 30 10 3 1 480

Computation of statistic:

Observed Frequency Expected Frequency  (O − E ) 2 


(O – E)2 
O E  E 

95 88 49 0.56
158 150 64 0.43
108 126 324 2.57
63 72 81 1.13
40 30 100 3.33

9 10 
 
5 16 3 14 4 0.29
2 1 

8.31

 (O − E ) 2 
χ0 2 = ∑  = 8.31
 E 

141
Expected Value:

follows χ2-distribution with ( n – k – 1) d.f.

= 9.488 for ν = 6 – 1 – 1 = 4 d.f.

Inference:

Since χ02 < χe2 we accept our null hypothesis at 5% level of significance and say that
the given data fits Poission distribution.

6.6 Test of independence


Let us suppose that the given population consisting of N items is divided into r
mutually disjoint (exclusive) and exhaustive classes A1, A2, …, Ar with respect to the attribute
A so that randomly selected item belongs to one and only one of the attributes A1, A2, …,
Ar .Similarly let us suppose that the same population is divided into c mutually disjoint and
exhaustive classes B1, B2, …, Bc w.r.t another attribute B so that an item selected at random
possess one and only one of the attributes B1, B2, …, Bc. The frequency distribution of the
items belonging to the classes A1, A2, …, Ar and B1, B2, …, Bc can be represented in the
following r × c manifold contingency table.

r × c manifold contingency table

B B1 B2  Bj  Bc Total
A
A1 (A1B1) (A1B2)  (A1Bj)  (A1Bc) (A1)

A2 (A2B1) (A2B2)  (A2Bj)  (A2Bc) (A2)

. . . . . . . .

. . . . . . . .

. . . . . . . .

Ai (AiB1) (AiB2)  (AiBj)  (AiBc) (Ai)

. . . . . . . .

. . . . . . . .

. . . . . . . .

Ar (ArB1) (ArB2)  (ArBj)  (ArBc) (Ar)

∑Ai =
Total (B1) (B2)  (Bj)  (Bc)
∑Bj = N

142
(Ai) is the number of persons possessing the attribute Ai, (i = 1, 2, … r), (Bj) is the
number of persons possing the attribute Bj,(j = 1,2,3,….c) and (Ai Bj) is the number of
persons possessing both the attributes Ai and Bj (i = 1, 2, …r, j = 1,2,…c).

Also ∑Ai = ∑Bj = N

Under the null hypothesis that the two attributes A and B are independent, the
expected frequency for (AiBj) is given by
(Ai) (Bj)
=
N

Calculation of statistic:

Thus the under null hypothesis of the independence of attributes,the expected


frequencies for each of the cell frequencies of the above table can be obtained on using the
formula

2
 (O i − E i ) 2 
χ0 = ∑ 
 Ei 

Expected value:

follows χ2-distribution with (r – 1) (c – 1) d.f

Inference:

Now comparing χ02 with χe2 at certain level of significance ,we reject or accept the
null hypothesis accordingly at that level of significance.

6.6.1 2 × 2 contingency table :

Under the null hypothesis of independence of attributes, the value of χ2 for the 2 × 2
contingency table

Total
a b a+b
c d c+d
Total a+c b+d N

is given by

N (ad − bc)2
χ0 2 =
(a + c) ( b + d ) (a + b) (c + d )

6.6.2 Yate’ s correction

In a 2 × 2 contingency table, the number of d.f. is (2 – 1) (2 – 1) = 1. If any one of the


theoretical cell frequency is less than 5,the use of pooling method will result in d.f = 0 which
143
is meaningless. In this case we apply a correction given by F.Yate (1934) which is usually
known as “Yates correction for continuity”. This consisting adding 0.5 to cell frequency
which is less than 5 and then adjusting for the remaining cell frequencies accordingly. Thus
corrected values of χ2 is given as
2
 1  1  1  1 
N  a    d   −  b ±   c ±  
 2  2  2  2 
χ = 
2

( a + c ) ( b + d ) ( a + b) ( c + d )

Example 9:

1000 students at college level were graded according to their I.Q. and the economic
conditions of their homes. Use χ2 test to find out whether there is any association between
economic condition at home and I.Q.

IQ
Economic Conditions Total
High Low
Rich 460 140 600
Poor 240 160 400
Total 700 300 1000

Solution:

Null Hypothesis:
There is no association between economic condition at home and I.Q. i.e. they are
independent.
(A) (B) 600 × 700
E11 = = = 420
N 1000

The table of expected frequencies shall be as follows.

Total
420 180 600
280 120 400
Total 700 300 1000

Observed Frequency Expected Frequency  (O − E ) 2 


(O – E)2 
O E  E 

460 420 1600 3.81


240 280 1600 5.714
140 180 1600 8.889
160 120 1600 13.333
31.746

144
2
 (O − E) 2 
χ0 = ∑ = 31.746
 E 

Expected Value:

follows χ2-distribution with (2 – 1) (2 –1) = 1 d.f.


= 3.84

Inference:

χ02 > χe2, hence the hypothesis is rejected at 5% level of significance. ∴ There is
association between economic condition at home and I.Q.

Example 10:

Out of a sample of 120 persons in a village, 76 persons were administered a new drug
for preventing influenza and out of them, 24 persons were attacked by influenza. Out of those
who were not administered the new drug ,12 persons were not affected by influenza.. Prepare

(i) 2 × 2 table showing actual frequencies.

(ii) Use chi-square test for finding out whether the new drug is effective or not.

Solution:

The above data can be arranged in the following 2 × 2 contingency table


Table of observed frequencies

Effect of Influenza
New drug Total
Attacked Not attacked
Administered 24 76 – 24 = 52 76
Not administered 44 – 12 = 32 12 120 – 76 = 44
Total 120 – 64 = 56 52 + 12 = 64 120
24 + 32 = 56

Null hypothesis:
‘Attack of influenza’ and the administration of the new drug are independent.
Computation of statistic:

N (ad − bc)2
χ0 2 =
( a + c ) ( b + d ) ( a + b) ( c + d )
120 (24 × 12 − 52 × 32)2
=
56 × 64 × 76 × 44

120 ( −1376)2 120 (1376)2
= =
56 × 64 × 76 × 44 56 × 64 × 76145 × 44
= Anit log [log 120 + 2 log 1376 − (log 56 + log 64 + log 76 + log 44)]
= Antilog (1.2777) = 18.95
N (ad − bc)2
χ0 2 =
(a + c) ( b + d ) (a + b) (c + d )
120 (24 × 12 − 52 × 32)2
=
56 × 64 × 76 × 44
120 ( −1376)2 120 (1376)2
= =
56 × 64 × 76 × 44 56 × 64 × 76 × 44
= Anit log [log 120 + 2 log 1376 − (log 56 + log 64 + log 76 + log 44)]
= Antilog (1.2777) = 18.95

Expected Value:

follows χ2-distribution with (2 – 1) (2 –1) = d.f.


= 3.84

Inference:

Since χ02 > χe2 H0 is rejected at 5 % level of significance. Hence we conclude that the
new drug is definitely effective in controlling (preventing) the disease (influenza).

Example 11:

Two researchers adopted different sampling techniques while investigating the same
group of students to find the number of students falling in different intelligence levels. The
results are as follows

No. of students
Researchers Total
Below average Average Above average Genius
X 86 60 44 10 200
Y 40 33 25 2 100
Total 126 93 69 12 300

Would you say that the sampling techniques adopted by the two researchers are independent?
Solution:

Null Hypothesis:

The sampling techniques adopted by the two researchers are independent.


126 × 200
E (86) = = 84
300
93 × 200
E (60) = = 62
300
69 × 200
E (44) = = 46
300

The table of expected frequencies is given below.

146
Below average Average Above average Genius Total
X 84 62 46 200 – 192 = 8 200
Y 126 –84 = 42 93 – 62 = 31 69 – 46 = 23 12 – 8 = 4 100
Total 126 93 69 12 300

Computation of chi-square statistic:

Observed Frequency Expected Frequency  (O − E ) 2 


(O – E) (O – E)2 
O E  E 

86 84 2 4 0.048
60 62 –2 4 0.064
44 46 –2 4 0.087
10 8 2 4 0.500
40 42 –2 4 0.095
33 31 2 4 0.129

25 23
2  27  4  27 0 0 0
   
300 300 0 0.923

 (O − E) 2 
χo 2 = ∑  = 0.923
 E 

Expected Value:

2
 (O − E ) 2 
χ = ∑  follows χ2-distribution with (4 – 1) (2 –1)
e
 E 
= 3 – 1 = 2 d.f.
= 5.991
Inference:

Since χ02 < χe2, we accept the null hypothesis at 5 % level of significance. Hence we
conclude that the sampling techniques by the two investigators, do not differ significantly.

6.7 Test for population variance:


Suppose we want to test if the given normal population has a specified variance
2
σ = σo 2

Null Hypothesis:

Ho : σ2 = σo2 if x1, x2 … xn
147

You might also like