Block 4 MCO 3 Unit 2
Block 4 MCO 3 Unit 2
Block 4 MCO 3 Unit 2
16.0 Objectives
16.1 Introduction
16.2 Small Samples versus Large Samples
16.3 Student’s t-distribution
16.4 Application of t-distribution to determine Confidence Interval for
Population Mean
16.5 Application of t-distribution for Testing Hypothesis Regarding Mean
16.6 t-test for Independent Samples
16.7 t-test for Dependent Samples
16.8 Let Us Sum Up
16.9 Key Words and Symbols
16.10 Answers to Self Assessment/Exercises
16.11 Terminal Questions/Exercises
16.12 Further Reading
16.0 OBJECTIVES
After studying this unit, you should be able to:
l differentiate between exact tests i.e., small sample tests and approximate tests,
i.e., large sample tests,
l be familiar with the properties and applications of t-distribution,
l find the interval estimation for mean using t-distribution,
l have an idea about the theory required for testing hypothesis using
t-distribution,
l apply t-test for independent samples, and
l apply t-test for dependent samples.
16.1 INTRODUCTION
In the previous unit, we considered different aspects of the problems of
inferences. We further noted the limitations of standard normal test or Z-test.
As discussed in Unit 15, we can not apply normal distribution for estimating
confidence intervals for population mean in case the population standard
deviation is unknown and sample size does not exceed 30, i.e., small samples.
We may further recall that as mentioned in Unit 15, we can not test hypothesis
concerning population mean when the sample is small and population standard
deviation is unspecified. In a situation like this, we use t-distribution which is
also known as student’s t-distribution. t-distribution was first applied by W.S.
Gosset who used to work in ‘Guinners Brewery’ in Dublin. The workers of
Guinners Brewery were not allowed to publish their research work. Hence
Gosset was compelled to publish his research work under the penname
‘student’ and hence the distribution is known as student’s t-distribution or simply
student’s distribution. Before we discuss t-distribution, let us differentiate
between exact tests and approximate tests.
9 7
Probability and
Hypothesis Testing 16.2 SMALL SAMPLES VERSUS LARGE SAMPLES
Normally a sample is considered as small if its size is 30 or less whereas, the
sample with size exceeding 30 is considered as a large sample. All the tests
under consideration may be classified into two categories namely exact tests
and approximate tests. Exact tests are those tests that are based on the exact
sampling distribution of the test statistic and no approximation is made about the
form of parent population or the sampling distribution of the test statistic. Since
exact tests are valid for any sample size and usually cost as well as labour
increases with an increase in sample size; we prefer to take small samples for
conducting exact tests. Hence, the exact tests are also known as small sample
tests. It may be noted that while testing for population mean on the basis of a
random sample from a normal distribution, we apply exact tests or small sample
tests provided the population standard deviation is known. This was
demonstrated in Unit 15.
T − θ0 T − θ0
Z0 = or Z 0 = ∧
S.E (T ) S. E (T )
n ( x − µ)
Z=
S|
∑ (x i − x) 2
where , S| =
(n − 1)
n (p − p 0 )
Z0 =
P0 (1 − P0 )
P0 being the specified population proportion, which again, for a large sample, is
9 8 an approximate standard normal variable.
Tests of Hypothesis-II
16.3 STUDENT’S t-DISTRIBUTION
Since we cannot use Z-test, for a small sample, for population mean when the
population standard deviation is not known, we are on the look out for a new
test statistic. It is necessary to know a few terms first.
2
Again x12 + x22 = ∑ x i ~ χ2 and in general,
2 2
i =1
n
x 12 + x 22 + … + x 2n = ∑ x i2 ~ χ 2n …… (16.1)
i =1
n 2
If we write u = ∑ xi , then the probability density function of u is given by :
i =1
0 α
Figure 16.1: χ distribution.
2 9 9
Probability and If x1, x2, x3 …xn are ‘n’ independent variables, each following normal
Hypothesis Testing
xi − µ
distribution with mean (µ) and variance (σ2), then Xi = is a standard
σ
normal variable and as such
n n ( x i − µ) 2
u = ∑ X i2 = ∑ …(16.2)
i =1 i =1 σ2
∑ (x i − x) 2
S2 =
n
∑ (x i − µ) 2
As ~ χ2n
σ 2
(x − µ)2 (x − µ)2 (x − µ2 ) 2
and n = 2 = 2 ~ χ1
σ2 σ /n σ / n
σ
sin ce x ~ N µ, ,
n
ns2
Hence it follows that 2 ~ χ n −1
2
Student’s t-distribution: Consider two independent variables ‘y’ and ‘u’ such
that ‘y’ follows standard normal distribution and ‘u’ follows χ2-distribution with
md.f.
y
Then the ratio t = , follows t-distribution
u/m
with m d.f. The probability density function of t is given by :
m +1
−
t2 2
f ( t ) = const. 1 +
m
+ or – ∞ < t < ∞ .........…(16.3)
( x − µ)
where, t = n , const. = a constant required to make the area under the
s
curve equal to unity; m = n–1, the degree of freedom of t.
−∞ ∞
Since we have :
− m+1
t2 2
f (t) = const. 1 +
m
for − ∞ < t < ∞
m +1 t2
∴Logf = k − Log1 + where, ‘k’ is a constant.
2 m
m +1 t 2 t4
=k− − 2
+ ……to α
2 m 2m
x2
[as Log (1+x) = − + ........ to α
2
for –1 < x ≤ 1
t2
and is rather small for a large m].
m
m +1 t 2 m +1 4
Hence Log f = k − . + t …… to α
m 2 4m 2
m +1
since m is very large tends to 1 and other terms containing powers of
m
‘m’ higher than 2 would tend to zero. Thus we have:
t2
Log f = k −
2
101
Probability and
t2 2 2 /2
Hypothesis Testing K−
or, f =e 2 = e k .e − t /2
= const. e − t
Looking from another angle, as the mean of t-distribution is zero and standard
m
deviation = which tends to unity for a large m,
m−2
If x1, x2, x3 …xn denote the n observations of a random sample drawn from a
normal population with mean as µ and the standard deviation as σ, then x1, x2,
x3 …xn can be described as ‘n’ independent random variables each following
normal distribution with the same mean µ and a common standard deviation σ.
If we consider the statistic:
( x − µ)
n
S|
∑ xi − x) 2 ∑ (x i
where, x = , the sample mean; and S =
|
is the standard
n n −1
deviation with divisor as (n–1) instead of n, then we may write :
n ( x − µ)
( x − µ) σ
n =
S| S| , dividing both numerator and denominator by σ
σ
(x − µ / σ / n y
= =
∑ (x i− x) 2 u /( n − 1)
(n − 1)σ 2
x −µ
y= is a standard normal variate
σ/ n
∑ (x i − x)2
Also u = follows χ2-distribution with (n–1) d.f
σ2
(x − µ) y
Hence, by definition n =
s′ u /(n −1)
(x − µ )
102 t = n ~ t n −1
s|
We apply t-distribution for finding confidence interval for mean as well as Tests of Hypothesis-II
testing hypothesis regarding mean. These are discussed in Sections 16.4 and
16.5 respectively.
Let us assume that we have a random sample of size ‘n’ from a normal
population with mean as µ and standard deviation as σ. We consider the case
when both µ and σ are unknown. We are interested in finding confidence
interval for population mean. In view of our discussion in Section 16.3, we
know that :
(x − µ)
t= n
s|
follows t-distribution with (n–1) d.f. We may recall here that x denotes the
sample mean and s|, the sample standard deviation with divisor as (n–1) and not
‘n’. We denote the upper α-point of t-distribution with (n–1) d.f as tα, (n–1).
Since t-distribution is symmetrical about t = 0, the lower α-point of t-distribution
with (n–1) d.f would be denoted by –tα, (n–1). As per our discussion in Unit
15, in order to get 100 (1–α)% confidence interval for µ, we note that :
(x − µ)
P − t α / 2 , (n −1) ≤ n ≤ t α / 2 , (n −1) = 1 − α
s
s| s|
or P − x − .t α/ 2 , (n −1) ≤ −µ ≤ −x + .t α/ 2 , (n −1) = 1− α
n n
s| . s| .t |α / 2 ,
or P x − t α / 2 , (n − 1) ≤ µ ≤ x + (n − 1) = 1 − α
n n
Thus 100 (1-α) % confidence interval to µ is :
s| . s| .
x − t α/2 , ( n − 1), x + t α / 2 , (n − 1) …(16.6)
n n
s|
100 (1-α) % Lower Confidence Limit to µ = x − .t α / 2 , (n − 1)
n
s|
and 100 (1-α) % Upper Confidence Limit to µ = x + .t α / 2 , (n − 1)
n
Selecting α = 0.05, we may note that
s|
95% Lower Confidence Limit to µ = x − .t 0.025 , ( n − 1)
n
s|
and 95% Upper Confidence Limit to µ = x + .t 0.025 , ( n − 1) …(16.7)
n 103
Probability and In a similar manner, setting α = 0.01, we get 99% Lower Confidence Limit to µ
Hypothesis Testing
s|
=x− .t 0 .005 , ( n − 1) and
n
|
s
99% Upper Confidence Limit to µ = x + . t 0.005 , ( n − 1) …(16.8)
n
Values of tα, m for m = 1 to 30 and for some selected values of a are provided
in Appendix Table 5. Figures 16.3, 16.4 and 16.5 exhibit confidence intervals to
µ applying t-distribution as follows :
α/2 α/2
s|
s | µ= x+ .t α / 2 , (n − 1)
µ=x− .t α / 2 , (n −1) n
n
α%) Confidence Interval to µ
Fig. 16.3: 100 (1–α
95% area
s| s|
µ=x− .t 0.005 , (n − 1) µ=x+ .t 0.005 , (n − 1)
n n
Fig. 16.4: 95% Confidence Interval to µ
104
Tests of Hypothesis-II
99% area
s| s|
µ=x− .t 0.005 , (n − 1) µ=x+ .t 0.005 , (n − 1)
n n
Illustration 1
Following are the lengths (in ft.) of 7 iron bars as obtained in a sample out of
100 such bars taken from SUR IRON FACTORY.
we have to find 95% confidence interval for the mean length of iron bars as
produced by SUR IRON FACTORY.
Solution: Let x denote the length of iron bars. We assume that x is normally
distributed with unknown mean µ and unknown standard deviation σ. Then 95%
Lower Confidence Limit to µ
s| N−n
=x− . t 0.005 , 6
n N −1
s| N−n
and 99% Upper Confidence Limit to µ = x + . .t 0.025 ,6
n N −1
∑ (x − x)
2
∑ xi
where, x = ; S =
| i
; n = sample size = 7; N = population size
n n −1
= 100
N−n
and = finite population correction (fpc)
N −1
105
Probability and Table 16.1: Computation of Sample Mean and S.D.
Hypothesis Testing
xi xi 2
4.10 16.8100
3.98 15.8404
4.01 16.0801
3.95 15.6025
3.93 15.4449
4.12 16.9744
3.91 15.2881
28.00 112.0404
28
Thus, we have : x = =4
7
∑ (x i − x ) 2 = ∑ x i2 − nx 2 = 112.0404 – 7 × 42
= 0.0404
100 − 7
f .p.c. = = 0 .969223
100 − 1
0.08 2057
Hence 95% Lower confidence Limit to µ = 4 − × 0.969223 × 2.365
7
= 4 – 0.188091 = 3.811909
So 95% Confidence Interval for mean length of iron bars = [3.81 ft, 4.19 ft].
Illustration 2
Find 90% confidence interval to µ given sample mean and sample S.D as 20.24
and 5.23 respectively, as computed on the basis of a sample of 11 observations
from a population containing 1000 units.
s| s|
x − .t 0.05 , 10, x + .t 0.05 , 10
n n
∑ (xi − x2 )
As S= , the sample standard deviation (S.D).
n
∴nS2 = ∑ (x i − x) 2
106
∑ (x i − x) 2 Tests of Hypothesis-II
Hence (s ) =
| 2
n −1
nS 2
= [since ∑ (xi − x 2) = nS 2]
n −1
n
or, s| = .S
n −1
11
= × 5.23 = 5.4853
10
Consulting Appendix Table-5, given at the end of this block, we find t0.05,
10 = 1.812
Thus 90% confidence interval to µ is given by:
5.4853 5.4853
20.24 − 3.1623 ×1,812, 20.24 + 3.1623 × 1.812 = [17.0969, 23.3831]
Illustration 3
The study hours per week of 17 teachers, selected at random from different
parts of West Bengal, were found to be:
6.6, 7.2, 6.8, 9.2, 6.9, 6.2, 6.7, 7.2, 9.7, 10.4, 7.4, 8.3, 7.0, 6.8, 7.6, 8.1, 7.8
Suppose, we are interested in computing 95% and 99% confidence intervals for
the average hours of study per week per teacher in the state of West Bengal.
Solution: If µ denotes the average hours of study per week per teacher in
West Bengal, then as discussed earlier,
s| s|
95% confidence interval to µ = x − .t 0.025, (n −1), x + .t 0.025, (n −1)
n n
s| s|
and 99% confidence interval to µ = x − .t 0.005 , ( n − 1), x + .t 0.005 , (n − 1)
n n
∑ (x i− x)2
2
∑xi −n.(x)2
s =
|
=
n −1 n −1
1014.41 − 17 × (7.64) 2
=
17 − 1
1014.41 − 992.28
= = 1.1761
16
From Appendix Table-5, given at the end of this block, t0.025, 16 = 2.120; t0.005,
16 = 2.921
1.1761 1.1761
= (7.64 − × 2.12) hrs , (7.64 + × 2.12) hrs
4 4
= [7.0167 hours, 8.2633 hours]
Similarly 99% confidence interval to µ
1.1761 1.1761
= (7.64 − × 2.921) hours, (7.64 + × 2.921) hours
4 4
= [6.7812 hours, 8.4988 hours]
Illustration 4
s| s|
x − .t 0.05 , ( n − 1), x + .t 0.05 , (n − 1)
n n
In this case, n = 26, From Appendix Table-5, given at the end of this block,
t0.05, 25 = 1.708
s|
Hence we have x − ×1.708 = 46.584
26
or x − 0.33497s| = 46.584 …(1)
s|
and x + ×1.708 = 53.416
26
or, x + 0 .33497 s | = 53 .416 …(2)
on adding equation (1) and (2) we get
2 x = 100 or x = 50
108
replacing x by 50 in equation (1), we have Tests of Hypothesis-II
50 − 0.33497 s| = 46.584
3.416
or s| = = 10.19793
0.33497
n −1 |
Hence S = s [from illustration 2]
n
= 0.98058 × 10.19793 = 9.9999 ~− 10
Thus the sample mean is 50 units and sample S.D is approximately 10 units.
∑ (x i− x) 2
~ χ 2n
σ 2
k) Z test has the widest range of applicability among all the commonly used
tests.
4) A random sample of size 10 drawn from a normal population yields sample mean
as 85 and sample S.D as 8.7. Compute 90% and 95% confidence intervals to
population mean. 109
Probability and ..........................................................................................................
Hypothesis Testing
..........................................................................................................
..........................................................................................................
5) Find 99% confidence limits for ‘µ’ given that a sample of 19 units drawn from a
population of 98 units provides sample mean as 15.627 and sample S.D as 2.348.
..........................................................................................................
..........................................................................................................
..........................................................................................................
6) A sample of size 10 drawn from a normal population produces the following results.
Σxi = 92 and Σxi2 = 889
Obtain 95% confidence limits to µ.
..........................................................................................................
..........................................................................................................
..........................................................................................................
H0 : µ = µ 0
against H : µ ≠ µ0 i.e., the population mean is anything but µ0.
or H1 : µ > µ0 i.e., the population mean is greater than µ0.
or H2 : µ < µ0 i.e., the population mean is less than µ0.
As we have noted in Section 16.1 the proper test to apply in this situation is
undoubtedly t-test. If we denote the upper α-point and lower α-point of t-
distribution with m.d.f. by tα, m and t1–a,m = – tα,m (as t-distribution is
symmetrical about 0) then for testing H0, based on the distribution of t, it may
be possible to find 4 values of t such that :
This is shown in the following Figure 16.6. Critical region lies on both the tails.
Acceptance Region
α/2 α/2
– ta/2, m 0 ta/2, m
Fig. 16.6: Critical Region for Both-tailed Test.
Secondly, in order to test the null hypothesis against the right-sided alternative
i.e., to test H0 against H1 : µ > µ0, from (16.11) we note that, as before, if we
choose a small value of α, then the probability that the observed value of t,
would exceed the critical value tα, m is very low. Thus one may have serious
questions in this case, about the validity of H0 if the value of t, as obtained on
the basis of a small random sample, really exceeds tα, m. We then reject H0
and accept H1. The critical region
ω : t0 ≥ tα, m ………(16.15)
lies on the right-tail of the curve and the test as such is called right-tailed test.
This is shown in Figure 16.7.
Acceptance Region
α
0 tα, m
Fig. 16.7: Critical Region for Right-tailed Test.
111
Probability and Lastly, when we proceed to test H0 against the left-sided alternative
Hypothesis Testing H2 : µ < µ0, we note that (16.12) suggests that if α is small, then the
probability that t0 would be less than the critical value –tα, m is very small. So
if the value of t0 as computed, on the basis of a small sample, is found to be
less than –tα, m, we would doubt the validity of H0 and accept H2. The critical
region
ω : t0 ≤ – tα, m …(16.16)
would lie on the left-tail and the test would be left-tailed test. This is depicted
in Fig. 16.8.
Acceptance Region
α
-tα, m 0
Fig. 16.8: Critical Region for Left-tailed Test.
4) Whether the sample drawn is a small one. Again if the answer is ‘no’ i.e., n >
30, we would be satisfied with Z-test. However, if n ≤ 30 and the first three
conditions are fulfilled, we should recommend t-test.
n (x − µ)
t=
s|
where, n = sample size; x = sample mean; and s| = sample S.D with divisor
as (n-1). The test statistic follows t-distribution with (n–1) d.f
112
In order to test H0 : µ = µ0 against the both-sided alternative H : µ ≠ µ0 we Tests of Hypothesis-II
compute :
n ( x − µ0 )
t0 =
s|
if t0 falls on the critical region defined by :
ω : |t0| ≥ tα/2, (n–1)
tα, m being the upper α-point of t-distribution with m d.f, then we reject H0. In
other words, H0 is rejected and H : µ ≠ µ0 is accepted if the observed value
of t, as computed from the sample, exceeds or is equal to the critical value
tα/2, (n-1).
Figure 16.9 shows critical region at 5% level of significance while Figure 16.10
shows critical region at 1% level of significance.
Acceptance Region
95 % of area
Critical Region
ω : t0 ≤-t0.025, (n-1) Critical Region
ω : t0 ≥ t0.025, (n-1)
0.025 0.025
0 t0.025, (n-1)
- t0.025 , (n-1)
Fig.16.9: Critical Region for Both-tailed Test at 5% Level of Significance
Acceptance Region
0.005 0.025
0 t0.005, (n-1)
- t0.005, (n-1)
Fig. 16.10: Critical Region for Both-tailed Test at 1% Level of Significance 113
Probability and Similarly, for testing H0 against the right-sided alternative H1 : µ > µ0, the
Hypothesis Testing critical region is given by :
ω : t0 ≥ tα , (n–1)
ω : t0 ≥ t0.05 , (n–1)
The following Figures 16.11 and 16.12 show these two critical regions.
Acceptance Region
95 % of area
Critical Region
ω : t0 ≥ t0.05, (n-1)
0.05
0 t0.05, (n-1)
Fig. 16.11: Critical Region for Right-tailed Test at 5% Level of Significance
Acceptance Region
95 % of area
Critical Region
ω : t0 ≥ t0.01, (n-1)
0.01
0 t0.01, (n-1)
114
Lastly, when we test H0 against the left-tailed test H2 : µ < µ0, the critical Tests of Hypothesis-II
region would be:
ω : t0 ≤ –tα , (n–1)
ω : t0 ≤ –t0.05 , (n–1)
ω : t0 ≤ –t0.01 , (n–1)
These are depicted in the following Figure 16.13 and Figure 16.14 respectively.
Acceptance Region
0.05
- t0.05, (n-1) 0
Acceptance Region
0.01
- t0.01, (n-1) 0
Illustration 5
9.7, 9.6, 10.4, 10.3, 9.8, 10.2, 10.4, 9.5, 10.6, 10.8, 9.1, 9.4, 10.7
Solution: Let x denote the weight of the packed tins of oil. Since,
n (x −10)
t0 = |
, where, x = ∑ x i
s n
2
∑(xi − x) −n.x2
2
∑xi
s| = =
n −1 n −1
ω : t0 ≤ –tα, (n–1)
13 (10.038 − 10)
Hence t 0 = = 0.245
0.5591
which is greater than –1.782
As t0 does not fall on the critical region w, we accept H0. So, on the basis of
the given data as obtained from the sample observations, we conclude that the
machine worked in accordance with the given specifications.
116
Illustration 6 Tests of Hypothesis-II
Solution: Let x denote the inner diameter of steel tubes as produced by the
company. We are interested in testing
H0 : µ = 4 against
H :µ≠ 4
Assuming that x follows normal distribution, we note that the sample size is 15
(<30) and the population S.D. is unknown. All these factors justify the
application of t-distribution. Thus we compute our test statistic as:
n ( x − 4)
t=
s|
As given, x = 3.96; and s = 0.032
n 15
∴s| = s = × 0.032 = 0.033
n −1 14
n ( x − 4)
t0 =
s|
14 (3.96 − 4)
So, t 0 = = − 4.536
0.033
Hence t 0 = 4.536
ω : t 0 ≥ t α / 2, ( n −1)
Selecting the level of significance as 1%, from the t-table (Appendix Table-5),
we get t0.01/2, (15–1)
= t0.005, 14 = 2.977
Thus, ω : t 0 ≥ 2.977
Since the computed value of t i.e., t 0 = 4.536 , falls on w, we reject H0. Hence
the sample mean is significantly different from the population mean.
Illustration 7
The mean weekly sales of detergent powder in the department stores of the
city of Delhi as produced by a company was 2,025 kg. The company carried
out a big advertising campaign to increase the sales of their detergent powder.
After the advertising campaign, the following figures were obtained from 20
departmental stores selected at random from all over the city (weight in kgs.).
n ( x − 2025)
t0 =
s|
and the critical region for the right-sided alternative is given by :
ω : t 0 ≥ (t 0 , (n −1)
or ω : t0 ≥ 1.729
[By selecting α = 0.05 and consulting Appendix Table-5, given at the end of
this block, we find that for m = 20–1 = 19 and for α = 0.05, value of t is
1.729].
xi ui = xi – 2000 u i2
2000 0 0
2023 23 529
2056 56 3136
2048 48 2304
2010 10 100
2025 25 625
2100 100 10000
2563 563 316969
2289 289 813521
2005 5 25
2082 82 6724
2056 56 3136
2049 49 2401
2020 20 400
2310 310 96100
2206 206 42436
2316 316 99856
2186 186 34596
2243 243 59049
2013 13 169
Total 2600 762076
2600
118 From the above table, we have x = 2000 + kg = 2130 kg
20
Tests of Hypothesis-II
2
∑ ui − nu 2
s =
|
n −1
762076 − 20 × (130 ) 2
= = 149 .3981 kg
19
n ( x − 2025)
As t 0 =
s|
20 ( 2130 − 2025 )
∴ t0 = = 3.143
149 .3981
A glance at the critical region suggests that we reject H0 and accept H1. On
the basis of the given sample we, therefore, conclude that the advertising
campaign was successful in increasing the sales of the detergent powder
produced by the company.
Illustration 8
A random sample of 26 items taken from a normal population has the mean as
145.8 and S.D. as 15.62. At 1% level of significance, test the hypothesis that
the population mean is 150.
Solution: Here we would like to test H0 : µ = 150 i.e., the population mean is
150 against H : µ ≠ 150 i.e., the population mean is not 150. As the necessary
conditions for applying t-test are fulfilled, we compute
n ( x − 150)
t0 =
s|
and the critical region at 1% level of significance is :
ω : t 0 ≥ 2.787
n 26
∴s | = S = × 15.62 = 15.9293
n −1 25
26 (145.8 − 150)
So, t 0 = = − 1.344
15.9293
thereby, t 0 = 1.344
Looking at the critical region, we find acceptance of H0. So on the basis of the
given data, we infer that the population mean is 150.
Similarly one may apply paired t-test to verify the necessity of a costly
management training for its sales personnel by recording the sales of the
selected trainees before and after the management training or the validity of
special coaching for a group of educationally backward students by verifying
their progress before and after the coaching programme or the increase in
productivity due to the application of a particular kind of fertiliser by recording
the productivity of a crop before and after applying this particular fertiliser and
so on.
Let us now discuss the theoretical background for the application of paired t-
test. In our earlier discussions, we were emphatic about the observations being
independent of each other. Now we consider a pair of random variables which
are dependent or correlated. Earlier, we considered normal distribution, to be
more precise, univariate normal distribution. Similarly, we may think of bivariate
normal distribution. Let x and y be two random variables following bivariate
normal distribution with mean µ1 and µ2 respectively, standard deviations σ1 and
σ2 respectively and a correlation co-efficient (ρ).
Thus ‘x’ and ‘y’ may be the bodyweight of the babies before and after the
application of the restorative, sales before and after the training programme,
marks of the weak students before and after the coaching, yield of a crop
before and after applying the fertiliser and so on.
Let us consider ‘n’ pairs of observations on ‘x‘ and ‘y’ and denote the ‘n’
pairs by (xi, yi) for i = 1, 2, 3, …, n.
Thus testing H0 : µ1–µ2 is analogous to testing for population mean when the
population standard deviation is unknown. In view of our discussion in Section
16.5, if the sample size is small, it is obvious that the appropriate test statistic
would be:
n (u − µ u )
120 t= ……(16.17)
s| u
Tests of Hypothesis-II
∑ ui
where n = sample size; u = ;u=x−y
n
∑ (ui − u)2 ∑ ui − nu 2
su =
|
=
n −1 n −1
nu
As before, under H0, t0 = follows t-distribution with (n–1) d.f.
s| u
Thus for testing H0 : µu = 0 against H : µu ≠ 0,
the critical region is provided by :
ω : t0 ≥ tα/2 (n–1)
For testing H0 against H1 : µ1 > µ2 i.e., H1 : µu > 0
We consider the critical region
ω : t ≥ tα, (n–1)
when the sample size exceeds 30, the assumption of normality for u may be
nu
avoided and the test statistic can be taken as a standard normal variable
s| u
and accordingly we may recommend Z-test.
Illustration 9
Is it reasonable to believe that the drug has no effect on the change of blood-
pressure?
Solution: Let x denote blood-pressure before applying the drug and y, the
blood-pressure after applying the drug. Further let µ1 denote the average blood-
pressure in the population before applying the drug and µ2, the average blood-
pressure after applying the drug. Thus the problem is reduced to testing :
nu
follows t-distribution with (n–1) d.f under H0.
s|u
Thus the critical region would be
ω : t0 ≥ tα , (n–1)
or ω : t0 ≥ 1.895
121
Probability and By taking α = 0.05, tα, (n–1)= t0.05, 7 = 1.895 from Appendix Table-5.
Hypothesis Testing
From the given data, we find that n = 8, Σui = 6, Σui2 = 120
∑ ui 6
Hence u = = = 0.75
n 8
n u
t0 =
s |u
8 × 0.75
∴t0 = = 0.522
4.062
Looking at the critical region, we find that H0 is accepted. Thus on the basis of
the given data we conclude that the drug has been unsuccessful in reducing
blood-pressure.
Illustration 10
A group of students was selected at random from the set of weak students in
statistics. They were given intensive coaching for three months. The marks in
statistics before and after the coaching are shown below.
1 19 32
2 38 36
3 28 30
4 32 30
5 35 40
6 10 25
7 15 30
8 29 20
9 16 15
Solution: Let x and y denote the marks in statistics before and after the
coaching respectively. If the corresponding mean marks in the population be µ1
and µ2 respectively, then we are to test :
H0 : µ1 = µ2 i.e., the coaching has really improved the standard of the students,
against the alternative hypothesis H : µ1 < µ2.
We compute :
nu
122 t0 = which follows t-distribution with (n–1) d.f under H0.
s| u
where, n = no. of students selected = 9 Tests of Hypothesis-II
u = x – y = difference in statistics marks
∑ ui2 − n(u)2
s| u =
n −1
since α = 0.05, n = 9,
consulting Appendix Table-5, we find that t0.05 , 8 = 1.86.
Thus the left-sided critical region is provided by w : t0 ≤ –1.86.
Marks in Statistics
Serial No. (x0) (y0) ui = (xi–yi) u i2
of student Before After
coaching coaching
1 19 32 –13 169
2 38 36 2 4
3 28 30 –2 4
4 32 30 2 4
5 35 40 –5 25
6 10 25 –15 225
7 15 30 –15 225
8 29 20 9 81
9 16 15 1 1
Total – – –36 738
∑ u i − 36
Thus u = = =−4
n 9
738 − 9 × (−4) 2
s| u = = 8.6168
8
8 × −4
∴t0 = = − 1.313
8.6168
A glance at the critical region suggests that we accept H0. On the basis of the
given data, therefore, we infer that the coaching has failed to improve the
standard of the students.
Illustration 11
123
Probability and
Serial number Sales (’000 Rs.)
Hypothesis Testing
of trainee Before the course After the course
1 15 16
2 16 17
3 13 19
4 20 18
5 18 22.5
6 17 18.3
7 16 19.2
8 19 18
9 20 20
10 15.5 16
11 16.2 17
12 15.8 17
13 18.7 20
14 18.3 18
15 20 22
Was the training programme effective in promoting sales? Select α = 0.05.
H0 : µ1 = µ2 against
H1 : µ1 < µ2
µ1 and µ2 being the average sales in the population before the training and
after the training. As before the critical region is :
ω : t0 ≤ –1.761
as m = n–1 = 14 and t0.05, 14 = 1.761
1 15 16 –1 1
2 16 17 –1 1
3 13 19 –6 36
4 20 18 2 4
5 18 22.5 – 4.5 20.25
6 17 18.3 – 1.3 1.69
7 16 19.2 – 3.2 10.24
8 19 18 1 1
9 20 20 0 0
10 15.5 16 – 0.5 0.25
11 16.2 17 – 0.8 0.64
12 15.8 17 – 1.2 1.44
13 18.7 20 – 1.3 1.69
14 18.3 18 0.3 0.09
15 20 22 –2 4
Total – – –19.l5 83.29
124
Tests of Hypothesis-II
∑ µi − nu
2
su =
|
n −1
From the above table 16.5, we have
− 19.5
u= = − 1.3
15
61 − 6 ( 0 .833 ) 2
su =
|
= 3 .3715
5
nu 15 × − 1.3
Hence t 0 = |
= = − 2.428
su 2.0343
to being less than –1.761, we reject H0. Thus on the basis of the given sample,
we conclude that the training programme was effective in promoting sales.
Illustration 12
Six pairs of husbands and wives were selected at random and their IQs were
recorded as follows:
Pair : 1 2 3 4 5 6
IQ of Husband : 105 112 98 92 116 110
IQ of Wife : 102 108 100 96 112 110
Do the data suggest that there is no significant difference in average IQ
between the husband and wife? Use 1% level of significance.
Solution: Let x denote the IQ of husband and y, that of wife. We would like
to test
H0 : µ1 = µ2 i.e., there is no difference in IQ.
ω : t 0 ≥ t 0.01 , (6 − 1)
2
i.e., ω : t 0 ≥ t 0.05,5
i.e., ω : t 0 ≥ 4 . 032
1 105 102 3 9
2 112 108 4 16
3 98 100 –2 4
4 92 96 –4 16
5 116 112 4 16
6 110 110 0 0
Total – – 5 61
125
Probability and From the above Table, we get,
Hypothesis Testing
5
u= = 0.8333
6
2
∑ ui − n(u ) 2
su =
|
n −1
61 − 6 (0.833) 2
s| u = = 3.3715
5
n u
t0 =
s |u
5 × 0.8333
so, t0 = = 0.553
3.3715
Therefore, we accept H0 and conclude that, on the basis of the given sample,
there is no reason to believe that IQs of husbands and wives are different.
2) Describe the different steps one should undertake in order to apply t-test.
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
5) A certain diet was introduced to increase the weight of pigs. A random sample of
12 pigs was taken and weighed before and after applying the new diet. The
differences in weights were :
7, 4, 6, 5, – 6, – 3, 1, 0, –5, –7, 6, 2
can we conclude that the diet was successful in increasing the weight of the pigs?
....................................................................................................................
....................................................................................................................
....................................................................................................................
....................................................................................................................
ω : t 0 ≥ t α / 2 , ( n −1)
ω : t0 ≥ tα, (n–1) and the critical region for the left-sided alternative
We have concluded our discussion by describing paired t-test and the critical
region for t-tests applied to dependent samples. 127
Probability and
Hypothesis Testing 16.9 KEY WORDS AND SYMBOLS
Chi-square Distribution: If x1, x2 …, xn are ‘m’ independent standard normal
variables, then u = ∑ x i 2 follows χ2-distribution with md.f and this is denoted by
u ~ χ 2m .
Degree of Freedom (d.f.): no. of observations – no. of constraints.
Large Sample: when sample size (n) is more than 30.
Large Sample Tests or Approximate Tests: tests based on large samples.
Paired Samples: Another term used for dependent samples.
Small Sample: when sample size (n) is less than 30.
Small Sample Tests or Exact Tests: tests based on small samples only.
t-distribution: If x is a standard normal variable and u is a chi-square with
m.d.f., and x and u are independent variables, then the ratio.
x
follows t-distribution
u/m
with m.d.f. and is denoted by t ~ tm
100 (1–α) % confidence interval to m
s| s|
x − t 1 / − α / 2 , ( n −1) × , x + t 1− α / 2 , ( n −1) ×
n n
For testing population mean from independent samples, we use the test statistic
n (x − µ0 )
t0 =
s|
and for testing for a particular effect, we use
nu
t0 =
s| u
where u0 = specific value for mean; s| = simple S.D. with (n–1) divisor;
u = x–y = difference in paired sample; and s|u= sample of S.D. of u with (n–1)
divisor
5. No, t0 = – 0.226
6. No, t0 = 0.518
2) How would you distinguish between a t-test for independent sample and a paired
t-test?
6) A technician is making engine parts with axle diameter of 0.750 inch. A random
sample of 14 parts shows a mean diameter of 0.763 inch and a S.D. of 0.0528
inch.
7) St. Nicholas college has 500 students. The heights (in cm.) of 11 students chosen
at random provides the following results:
175, 173, 165, 170, 180, 163, 171, 174, 160, 169, 176
Determine the limits of mean height of the students of St. Nicholas college at 1%
level of significance.
(Ans: 164.6038 cm. and 176.4870 cm.)
8) For a sample of 15 units drawn from a normal population of 150 units, the mean
and S.D. are found to be 10.8 and 3.2 respectively. Find the confidence level for
the following confidence intervals.
(i) 9.415, 12.185
(ii) 9.113, 12.487
[Ans: (i) 90% (ii) 95%]
Sales
(’000 Rs.)
After 17 17 12 15 20 19 14 15 24 12 10 12 18 17 34
campaign
11. A suggestion was made that husbands are more intelligence than wives. A social
worker took a sample of 12 couples and applied I.Q. Tests to both husbands and
wives. The results are shown below:
Sl.No. I.Q. of
Husbands Wives
1. 110 115
2. 115 113
3. 102 104
4. 98 90
5. 90 93
6. 105 103
7. 104 106
8. 116 118
9. 109 110
10. 111 110
11. 87 100
12. 100 98
Note: These questions/exercises will help you to understand the unit better.
Try to write answers for them. But do not submit your answers to the
university for assessment. These are for your practice only.
130
Tests of Hypothesis-II
16.12 FURTHER READING
The following text books may be used for more indepth study on the topics
dealt within this unit.
Levin and Rubin, 1996, Statistics for Management. Printice-Hall of India Pvt. Ltd.,
New Delhi.
Hooda, R.P., 2000, Statistics for Business and Economics, MacMillan India Ltd.,
Delhi.
Gupta, S.P., 1999, Statistical Methods, Sultan Chand & Sons, New Delhi.
131