Sampling Theory and Testing Hypothesis
Sampling Theory and Testing Hypothesis
7. A procedure for deciding whether to accept (or) to reject a null hypothesis {and
hence to reject (or) to accept the alternative hypothesis} – Testing of Hypothesis.
10. A procedure for testing the hypothesis whether the difference between population
parameter θ o and corresponding sample statistic θ is significant (or) not – Test
of Significance.
Actual Decision
Accept H o Reject H o
H o is true Correct decision Wrong
(no error) Type I Error
Probability = (1 − α ) Probability α
H o is false Wrong Correct decision
Type II Error (no error)
Probability β Probability = (1 − β )
12. One Tailed and Two Tailed Tests:
• A test of any statistical hypothesis where the alternative hypothesis is one – tailed
(right tailed or left tailed) is called a One – Tailed test.
For Example: For testing the mean of population in a single tailed, we assume that
the
null hypothesis :H0 : µ = µ 0 against the
alternative hypothesis : H1 : µ > µ 0 (Right tailed)
H1 : µ < µ 0 (Left tailed) is called One – tailed test.
• Suppose if we want to test if the bulbs produced by new process ( µ 2 ) have higher
average life than those produced by standard process ( µ1 ) then we’ve H0 :
µ1 = µ 2 and H1 : µ1 < µ 2 ( left tail test)
When sample size is less than 30 (n < 30) then the sample is called as Small
sample.
Null hypothesis : H o : µ = µo
Alternative hypothesis: H1 : µ ≠ µ o
x−µ
Test statistic: t = (S.D is not given directly)
S2
n
∑ xi
Where x = = sample mean; µ = population mean
n
2
2 ∑ ( xi − x )
S = ; n = sample size
n −1
x−µ
ii) Test statistic : t = (S.D is given directly)
S .D
n −1
Degree of freedom(d.f) : n − 1
Problems:
Degree of freedom = n − 1 = 15
Null hypothesis : H o : µ = 56
Alternative hypothesis: H1 : µ ≠ 56
x−µ 53 − 56
Test statistic (C.V) :t= = = -3.79
S 10
n 15
t = 3.79
4. A random sample of 10 boys had the following I.Q’s 70, 120, 110, 101,
88, 83, 95, 98, 107, 100. Do these data support the assumption of a
population mean I.Q. of 100? Find a reasonable range in which most of
the mean I.Q. values of samples of 10 boys lie.
Solution:
x x−x (x − x)2
70 -27.2 739.84
120 22.8 519.84
110 12.8 163.84
101 3.8 14.44
88 -9.2 84.64
83 -14.2 201.64
95 -2.2 4.84
98 0.8 0.64
107 9.8 96.04
100 2.8 7.84
972 1833.60
∑x 972
Mean x = = = 97.2
n 10
2
2 ∑ ( xi − x ) 1833.60
We know that, S = = = 203.73
n −1 9
Standard deviation, S = 203.73 = 14.27
Null Hypothesis: H o : µ1 = µ 2
Alternative hypothesis: H1 : µ1 ≠ µ 2
x1 − x 2
Test statistic: t =
1 1
s +
n1 n2
2 2
∑ ( x1 − x1 ) + ∑ ( x 2 − x 2 )
Where s 2 = [ S.D is not given directly]
n1 + n2 − 2
n s 2 + n2 s 22
s2 = 1 1 [ S.D is given directly]
n1 + n2 − 2
s1 , s 2 - sample standard deviation.
Degrees of freedom d . f . = n1 + n2 − 2
1. The average number of articles produced by two machines per day are 200
and 250 with standard deviations 20 and 25 respectively on the basis of
records of 25 days production. Can you regard both the machines equally
efficient at 1% level of significance?
Null Hypothesis: H o : µ1 = µ 2
Alternative hypothesis: H1 : µ1 ≠ µ 2
x1 − x 2 200 − 250
Test statistic (C.V) : t = = = -7.65
1 1 1 1
s + 23.10 +
n1 n2 25 25
t = 7.65
Tabulated value (T.V): t 0.01 for 48d.f. is 2.58
Conclusion : C.V > T.V
We reject the null hypothesis H o .
2. The means of two random samples of size 9 and 7 are 196.42 and 198.82
respectively. The sum of the squares of the deviation from the mean are
26.94 and 18.73 respectively can the sample be considered to have been
drawn from the same normal population.
Null Hypothesis: H o : µ1 = µ 2
Alternative hypothesis: H1 : µ1 ≠ µ 2
x1 − x 2 196.42 − 198.82
Test statistic (C.V) : t = = = -2.63
1 1 1 1
s + 1.81 +
n1 n2 9 7
t = 2.63
Tabulated value (T.V): t 0.05 for 14d.f. is 2.15
Conclusion : C.V > T.V
We reject the null hypothesis H o .
Sample A 24 27 26 21 25 -
Sample B 27 30 28 31 22 36
Can it be said that two samples come from normal populations having the
same mean.
Solution:
Sample Sample B x − x (x − x)2 y− y ( y − y)2
A (x) ( y)
24 27 -0.6 0.36 -2 4
27 30 2.4 5.76 1 1
26 28 1.4 1.96 -1 1
21 31 -3.6 12.96 2 4
25 22 0.4 0.16 -7 49
36 7 49
123 174 0 0 108
123 174
Mean x = = 24.6 , y = = 29
5 6
2 2
∑ ( x1 − x1 ) = 21.2 , ∑ ( x 2 − x 2 ) = 108
2 2
∑ ( x1 − x1 ) + ∑ ( x 2 − x 2 ) 21.2 + 108
Now, s 2 = = = 14.35
n1 + n2 − 2 5+6−2
⇒ s = 3.78
Null Hypothesis : H o : µ1 = µ 2
Alternative hypothesis: H1 : µ1 ≠ µ 2
x1 − x 2 24.6 − 29
Test statistic (C.V) : t = = = -1.92
1 1 1 1
s + 3.78 +
n1 n2 5 6
t = 1.92
Tabulated value (T.V): t 0.05 for 9d.f. is 2.262
Conclusion : C.V < T.V
We accept the null hypothesis H o .
s12
Test Statistic: F= , s12 > s 22
s22
where s12 =
∑ (x − x)
2
, s 22 =
∑ ( y − y) 2
2. Is is known that the mean diameters of rivets produces by two firms A and B
are practically the same but the standard deviations may differ. For 22 rivets
produced by firm A, the standard deviation is 2.9 mm, where 16 rivets
manufactured by firm B, the standard deviation is 3.8 mm. Compute the
statistic you would use to test whether the products of firm A have the same
variability as those of firm B and test its significance.
Sample A 24 27 26 21 25 -
Sample B 27 30 28 31 22 36
Can it be said that two samples come from same normal population?
Solution: Hint: To test whether the two samples come from the same normal
population.
That is to test i) the equality of variances
ii) equality of means
Test Statistic:
(O − E ) 2
ψ =∑2
where O - Observed frequency
E
E – Expected frequency
Note:
1. It is used to test the difference between observed and expected frequencies are
significant.
2. If the data is given in a series of ‘n’ number then degrees of freedom = n-1
3. In case of a .Binomial distribution d.f = n-1
b. Poisson distribution d.f. = n-2
c. Normal distribution d.f. = n-3
1. The following table gives the number of aircraft accidents that occurred during
the various days of the week. Test whether the accidents are uniformly distributed
over the week.
Solution:
Null hypothesis Ho : The accidents are uniformly distributed over the week.
84
Expected frequencies of the accidents on each of the days = = 14
6
Observed Expected (O − E ) 2 (O − E ) 2
Frequency Frequency E
(O) (E)
14 14 0 0
18 14 16 1.143
12 14 4 0.286
11 14 9 0.643
15 14 1 0.071
14 14 0 0
2.143
(O − E ) 2
Test Statistic (C.V) :ψ2 =∑ = 2.143
E
Degrees of freedom (d.f) : n −1 = 6-1 = 5
Tabulated Value (T.V) : ψ 2 at 5% level for 5 d.f. = 11.07
Conclusion : C.V < T.V
We accept the null hypothesis H o .
Therefore, the accidents are uniformly distributed over the week.
2. A sample analysis of examination results of 500 students was made. It was found
that 220 students had failed, 170 had secured a third class, 90 were placed in
second class and 20 got a first class. Do these figures commensurate with the
general examination result which is in the ratio of 4: 3: 2: 1 for the various
categories respectively.
Solution:
Null hypothesis Ho : The observed results commensurate with the general
examination results.
Expected frequencies are in the ration 4: 3: 2: 1
Total frequency = 500
Expected frequency = Dividing the total frequency in the ratio 4: 3: 2: 1
4 3 2
= × 500 = 200 , × 500 = 150 , × 500 = 100
10 10 10
1
× 500 = 50
10
(O − E ) 2
Test statistic (C.V) :ψ2 =∑ = 23.667
E
Degrees of freedom (d.f) : n − 1 = 4-1 = 3
Tabulated Value (T.V) : ψ 2 at 5% level for 3 d.f. = 7.81
Conclusion : C.V > T.V
We reject the null hypothesis H o .
Therefore, the observed results are not commensurate with the general
examination results.
Let us consider 2 attribute A & B. A is divided into 2 classes and B is divided into 2
classes. The various cell frequencies can be expressed in the following table known as
2 x 2 contingency table.
A a b
B c d
a b a+b
c d c+d
a+c b+d N
Note:
1. The following table gives a classification of a sample of 100 plants of their flower
colour and flatness of leaf.
Solution:
Null hypothesis H0 : They are independent.
Expected frequencies are
50 × 60 50 × 60 60
= 30 = 30
100 100
50 × 40 50 × 40 40
= 20 = 20
100 100
50 50 100
Now,
Observed Expected (O − E ) 2 (O − E ) 2
Frequency Frequency E
(O) (E)
40 30 100 3.333
20 30 100 3.333
10 20 100 5.000
30 20 100 5.000
16.666
(O − E ) 2
Test statistic (C.V) :ψ2 =∑ = 16.666
E
Degrees of freedom (d.f) : = (r − 1)( s − 1) = (2-1)(2-1) = 1
Tabulated Value (T.V) : ψ 2 at 5% level for 1 d.f. = 3.84
Conclusion : C.V > T.V
We reject the null hypothesis H o .
Large Samples
If the sample size n > 30 then the sample is called Large sample.
To test the significant difference between the sample proportion p and population
proportion P.
Test Statistic:
p−P
Z= where n – sample size
PQ
n
Note:
1. Limits for population proportion P are given by p ± 3 pq where q = 1 - p
n
2. 98% confidence limits for population proportion are p ± 2.33 pq
n
1. In a city, a sample of 1000 people were taken and out of them 540 are vegetarians
and the rest are non-vegetarians. Can we say that both habits of eating (vegetarian
or non-vegetarian) are equally popular in the city at 1% level of significance.
Solution:
540
Given: n = 1000, p = sample proportion of vegetarians = = 0.54 ,
1000
1
P = population proportion of vegetarians = =0.5
2
Null hypothesis H0 : Both habits are equally popular. ( H o : P = 0.5 )
Alternative hypothesis : H1 : P ≠ 0.5
p−P 0.54 − 0.5
Test Statistic (C.V) : Z = = = 2.532
PQ 0 .5 × 0 .5
n 1000
Tabulated value (T.V) : Z at 1% level of significance is 2.58
Conclusion : C.V < T.V
We accept null hypothesis H o .
2. A die was thrown 9000 times and of these 3220 yielded a 3 or 4. Is this consistent
with the hypothesis that the die was unbiased?
Solution: Given: n = 9000,
3220
p = Proportion of successes of getting 3 or 4 in 9000 throws = = 0.3578 ,
9000
1 1 2
P = population proportion of successes = P(getting a 3 or 4) = + = =0.33
6 6 6
1
Null hypothesis H0 : The die is unbiased. ( H o : P = )
3
1
Alternative hypothesis : H1 : P ≠
3
p−P 0.3578 − 0.3333
Test Statistic (C.V) : Z = = = 4.94
PQ 0.3333 × 0.6667
n 9000
Tabulated value (T.V) : Z at 1% level of significance is 2.98
Conclusion : C.V > T.V
We reject null hypothesis H o .
p1 − p 2 n p + n2 p 2
Test Statistic : Z= where p = 1 1
⎛ 1 1 ⎞ n1 + n2
pq⎜⎜ + ⎟⎟
⎝ n1 n2 ⎠
and q = 1 – p
p1 − p n1 p1 + n2 p 2
Test Statistic : Z = where p =
n2 pq n1 + n2
n1 (n1 + n2 )
P1 − P2
c) If the sample proportions are not known then Z =
P1Q1 P2 Q2
+
n1 n2
1. Random samples of 400 men and 600 women were asked whether they would like
to have a flyover near their residence. 200 men and 325 women were in favour of
the proposal. Test the hypothesis that proportions of men and women in favour of
the proposal are same, ar 5% level.
Solution:
Given: sample sizes n1 = 400, n2 = 600.
200
Proportion of men = p1 = = 0.5
400
325
Proportion of women = p 2 = = 0.541
600
Null hypothesis : H0 : p1 = p 2
Alternative hypothesis : H1 : p1 ≠ p 2
p1 − p 2 0.5 − 0.541
Test Statistic : Z= =
⎛ 1 1 ⎞ ⎛ 1 1 ⎞
pq⎜⎜ + ⎟⎟ 0.525 × 0.425⎜ + ⎟
⎝ n1 n2 ⎠ ⎝ 400 600 ⎠
− 0.041
= = −1.34
0.032
z = 1.34
200 325
400 × + 600 ×
n p + n2 p 2 400 600 = 525 = 0.525 }
{where p = 1 1 =
n1 + n2 400 + 600 1000
Solution:
300 300
Given: n1 = 400, n 2 = 500, p1 = = 0.75 , p 2 = = 0.6
400 500
⎛ 300 ⎞ ⎛ 300 ⎞
400⎜ ⎟ + 500⎜ ⎟
n1 p1 + n 2 p 2 ⎝ 400 ⎠ ⎝ 500 ⎠
p= = = 0.667 , q = 0.333
n1 + n 2 400 + 500
Null hypothesis H o : Assume that there is no significant difference between p1 &
p
p1 − p 0.75 − 0.667
Test Statistic : Z= = = 4.74
n2 pq 500 × 0.667 × 0.333
n1 (n1 + n2 ) 400(400 + 500)
Tabulated value : z at 5% level of significance is 1.96
Conclusion : C.V > T.V
We reject the null hypothesis H o .
Therefore, the proportion of failures in the affiliated colleges is greater than the
proportion of failures in university departments and affiliated colleges taken
together.
3. In two large populations, there are 30% and 25% respectively of fair haired
people. Is this diffrence likely to be hidden in samples of 1200 and 900
respectively from the two populations.
Solution:
Given: n1 = 1200, n2 = 900 ,
30
P1 = Proportion of fair hairedd people in the first population = = 0.3
100
25
P2 = Proportion of fair hairedd people in the second population = = 0.25
100
Null hypothesis H o : Assume that the sample proportions are equal.
H0 : p1 = p 2
Alternative hypothesis : H1 : p1 ≠ p 2
P1 − P2 0.3 − 0.25
Test Statistic : Z= = = 2.55
P1Q1 P2 Q2 0.3 × 0.7 0.25 × 0.75
+ +
n1 n2 1200 900
Tabulated value : z at 5% level of significance is 1.96
Conclusion : C.V > T.V
We reject the null hypothesis H o .
Therefore, the proportion are not equal.
To test whether the given sample of size n has been drawn from a population
with mean µ .
1. A sample of 900 members has a mean of 3.4 cms and a S.D. 2.61 cms. Is the
sample from a large population of mean 3.25cm and a S.D. 2.61 cms. If the
population is normal and its mean is unknown find the 95% confidence limits of
true mean.
x1 − x 2
Test Statistic: Z=
σ2 σ2
+
n1 n2
n1 s12 + n2 s 22
3. If σ is not known, we can use a estimate of σ 2 given by σ 2 =
n1 + n2
1. The means of 2 large samples 1000 and 2000 members are 67.5 inches and 68.0
inches respectively. Can the samples be regarded as drawn from the same
population of S.D. 2.5 inches.
1. A sample of 100 students is taken from a large population. The mean height of the
students in this sample is 160. Can it be reasonably regarded that, in the
population the mean height is 165cm and S.D is 10cm?
Solution: Given :n = 100, µ = 165 , x = 160 , σ = 10 ,
Null hypothesis : H o : µ = 165
Alternative hypothesis: H1 : µ > 165
x−µ
Test statistic (C.V) : z= = -5 ; rejected
σ
n
2. In a random sample of 60 workers, the average time taken by them to get to work
is 33.8minutes, with a S.D of 6.1 minutes. Can we reject the null hypothesis
µ = 32.6 minutes in favour of alternative hypothesis µ > 32.6 minutes at tabulated
value 2.58?
Solution: Given :n = 60, µ = 32.6 , x = 33.8 , σ = 6.1 ,
Null hypothesis : H o : µ = 32.6
Alternative hypothesis: H1 : µ > 32.6
x−µ
Test statistic (C.V) : z = = 1.5238 ; Accepted
σ
n
3. The mean and S.D of a population are 11795 and 14054 respectively. If n=50,
find 95% confidence limit for mean.
σ
Solution: x ± 1.96 = (7899.4, 15690.6)
n
4. An ambulance service claims that it takes on the average less than 10 minutes to
reach its destination in emergency calls. A sample of 36 calls has a mean of 11
minutes and the variance of 16 minutes. Test the significance at 5% level of
significance.
Solution: Given :n = 36, µ = 10 , x = 11 , σ = 16 = 4 ,
Null hypothesis : H o : µ = 10
Alternative hypothesis: H1 : µ ≠ 10
x−µ
Test statistic (C.V) : z = = 1.5 ; Accepted
S
n
5. The average marks scored by 32 boys are 72 with a standard deviation of 8, while
that for 36 girls is 70 with a standard deviation of 6. Test at 1% level of
significance whether the boys perform better than girls.
Solution: Null Hypothesis : H o : µ1 = µ 2
Alternative hypothesis: H1 : µ1 ≠ µ 2
x1 − x 2
Test statistic (C.V) : z = = 1.15, H0 accepted
σ2 σ2
+
n1 n2