Module 4 (301 SI-2)
Module 4 (301 SI-2)
MODULE-IV
STATISTICAL INFERENCE-2
Sampling Variables
Sampling variables refers to the process of selecting data points or observations from a larger
population or dataset for the purpose of analysis, experimentation, or research. It is a fundamental
concept in statistics and data analysis. When you sample data, you are essentially taking a subset of
the entire population to draw conclusions or make inferences about the entire population. Here are
some key points related to sampling variables:
Population: The population refers to the entire set of individuals, items, or data points that you are
interested in studying. Its often impractical or impossible to study an entire population, so you sample
from it.
Sample: A sample is a subset of the population. It consists of a smaller number of data points or
observations that are chosen in a way that they represent the larger population to some extent.
Sampling Methods: There are various methods for sampling data, including simple random
sampling (each data point has an equal chance of being selected), stratified sampling (dividing the
population into subgroups and then sampling from each subgroup), systematic sampling (selecting
every nth data point), and more.
Sampling Error: When you take a sample from a population, there is a chance that the sample may
not perfectly represent the population. This difference between the sample and the population is
called sampling error.
For example, the mean of a population is a parameter, while the mean of a sample is a statistic.
Sampling Size: The number of data points or observations you include in your sample is known as
the sample size. A larger sample size generally provides more accurate estimates of population
parameters.
The theorem which explains this sort of relationship between the shape of the population
distribution and the sampling distribution of the mean is known as the central limit theorem.
This theorem is by far the most important theorem in statistical inference. It assures that the
sampling distribution of the mean approaches normal distribution as the sample size increases.
In formal terms, we may say that the central limit theorem states that the distribution of means
of random samples taken from a population having mean µ and finite variance 𝜎 2
approaches the normal distribution with mean µ and variance 𝜎 2 /𝑛 as n goes to infinity.
̅ is the mean of random sample of size n taken from a population with mean 𝝁 and
If 𝒙
̅−𝝁
𝒙
finite variance 𝝈𝟐 , then the limiting form of the distribution of 𝒁 = 𝝈 , as 𝒏 → ∞ is
√𝒏
The significance of the central limit theorem lies in the fact that it permits us to use sample
statistics to make inferences about population parameters without knowing anything about the
shape of the frequency distribution of that population other than what we can get from the
sample.
Problems:
1. A sample of size 9 from a normal population gave 𝑥̅ = 15.8 and 𝑠 2 = 10.3. Find a 99%
interval for population mean.
10.3
=15.8±3.36√ 9
=12.2055,19.3944.
Hence 99% confidence interval = [12.2055,19.3944].
3.52
=20±2.145√ 14
=18.06,21.94.
Hence 95% confidence interval = [18.06,21.94].
we shall be interested not so much in the value of the correlation in the parent population, but
more generally this value could have come from an un – correlated population, i.e. whether it
is significant in the parent population. It is widely accepted that when we work with small
samples, estimates will vary from sample to sample.
Further, in the theory of small samples also, we begin study by assuming that parent population
is normally distributed unless otherwise stated. Strictly, whatever the decision one takes in
hypothesis testing problems is valid only for normal populations. Sir William Gosset and R.
A. Fisher have contributed a lot to theory of small samples. Sir W. Gosset published his
findings in the year 1905 under the pen name “student”. He gave a test popularly known as “t
– test” and Fisher gave another test known as “z – test”. These tests are based on “t distribution
and “z – distribution”.
𝑥̅1 −𝑥̅2
Test Statistic t= 1 1
𝑆√ +
𝑛1 𝑛2
Problems:
1. The average number of articles produced by two machines per day are 200 and 250 with
standard deviations 20 and 25 respectively on the basis of records of 25 days production.
Can you regard both the machines equally efficient at 1% level of significance?
Solution: Given 𝑛1 = 25, ̅̅̅=200,
𝑥1 𝑠1 = 20
𝑛2 = 25, ̅̅̅=250,
𝑥2 𝑠2 = 25
Assume Null Hypothesis: 𝐻0 : 𝜇1 = 𝜇2 ie., both the machines are equally efficient.
Alternate Hypothesis: 𝐻1 : 𝜇1 ≠ 𝜇2
Estimated standard deviation:
𝑛1 𝑠12 + 𝑛2 𝑠22
𝑆=√ = 23.1
𝑛1 + 𝑛2 − 1
̅𝑥̅̅1̅−𝑥
̅̅̅2̅ 200−250
Test Statistic t= | 1 1
|= 1 1
= |−7.7| = 7.7
𝑆√ + 23.1√ +
𝑛1 𝑛2 25 25
𝑛1 𝑠12 + 𝑛2 𝑠22
𝑆=√ = 23.12
𝑛1 + 𝑛2 − 1
̅𝑥̅̅1̅−𝑥
̅̅̅2̅ 170−205
Test Statistic t= | 1 1
|= 1 1
= 4.73
𝑆√ + 23.12√ +
𝑛1 𝑛2 20 18
𝑛1 𝑠12 + 𝑛2 𝑠22
𝑆=√ = 423.42
𝑛1 + 𝑛2 − 1
̅𝑥̅̅1̅−𝑥
̅̅̅2̅ 1456−1280
Test Statistic t= | 1 1
|= 1 1
= 1.04
𝑆√ + 423.42√ +
𝑛1 𝑛2 10 17
at 5% level is 2.131.
Solution: Given 𝑛1 = 9, 𝑥
̅̅̅=600,
1 𝑠12 = 121
𝑛2 = 8, 𝑥
̅̅̅=40,
2 𝑠22 = 144
Assume Null Hypothesis: 𝐻0 : 𝜇1 = 𝜇2 ie., There is no significant difference in the two
means.
Alternate Hypothesis: 𝐻1 : 𝜇1 ≠ 𝜇2
Estimated standard deviation:
𝑛1 𝑠12 + 𝑛2 𝑠22
𝑆=√ = 12.22
𝑛1 + 𝑛2 − 1
̅𝑥̅̅1̅−𝑥
̅̅̅2̅ 600−640
Test Statistic t= | 1 1
|= 1 1
= 6.73
𝑆√ + 12.22√ +
𝑛1 𝑛2 9 8
( x -x )
i=n 2
x-μ i
defined as t cal = × n where S=
i=1
, x is the sample mean, n is the sample
S n-1
The t – distribution function has been derived mathematically under the assumption of a
γ+1
-
t2 2
normally distributed population; it has the following form f(t)=C 1+ where C is a
γ
constant term and = n - 1 denotes the number of degrees of freedom. As the p.d.f. of a t –
distribution is not suitable for analytical treatment. Therefore, the function is evaluated
numerically for various values of t, and for particular values of . The t – distribution table
normally given in statistics text books gives, over a range of values of , the probability values
of exceeding by chance value of t at different levels of significance. The t – distribution
function has a different value for each degree of freedom and when degrees of freedom
approach a large value, t – distribution is equivalent to normal distribution function.
The application of t – distribution includes (i) testing the significance of the mean of a random
sample i.e. determining whether the mean of a sample drawn from drawn from a normal
population deviates significantly from a stated value (i.e. hypothetical value of the populations
mean) and (ii) testing whether difference between means of two independent samples is
significant or not i.e. ascertaining whether the two samples comes from the same normal
population? (iii) Testing difference between means of two dependent samples is significant?
(iv) Testing the significance of on observed correlation coefficient.
Procedures to be followed in testing a hypothesis made about the population parameter
using student’s t - distribution:
then, t tab is to be obtained by looking in 9th row and in the column = 0.025
(i.e. half of = 0.05) .
x-μ
• The test criterion is then calculated using the formula, t cal = × n
S
• Later, the calculated value above is compared with tabulated value. As long as the
calculated value matches with the tabulated value, we as usual accept the null hypothesis
and on the other hand, when the calculated value becomes more than tabulated value, we
reject the null hypothesis and accept the alternate hypothesis.
Problems:
1. The manufacturer of a certain make of electric bulbs claims that his bulbs have a mean life
of 25 months with a standard deviation of 5 months. Random samples of 6 such bulbs have
the following values: Life of bulbs in months: 24, 20, 30, 20, 20, and 18. Can you regard the
producer’s claim to valid at 1% level of significance? (Given that t tab = 4.032 corresponding
to = 5 ).
Solution: To solve the problem, we first set up the null hypothesis H0 : = 25 months ,
(4.032 value has been got by looking in the 5th row ) . The test criterion is given by
( x -x )
i=n 2
x-μ i
t cal = × n where S=
i=1
.
S n-1
Consider
xi − x (x )
2
xi x −x
i
24 1 1
26 3 9
30 7 49
23
20 -3 9
20 -3 9
18 -5 25
102 23 − 25
Thus, S = = 20.4 = 4.517 and t cal = 6 = 1.084 . Since the calculated value,
5 4.517
1.084 is lower than the tabulated value of 4.032; we accept the null hypothesis as mean life of
bulbs could be about 25 hours.
2. A certain stimulus administered to each of the 13 patients resulted in the following increase
of blood pressure: 5, 2, 8, -1, 3, 0, -2, 1, 5, 0, 4, 6, 8. Can it be concluded that the stimulus, in
general, be accompanied by an increase in the blood pressure?
Solution: We shall set up H0 :μ before =μafter i.e. there is no significant difference in the blood
pressure readings before and after the injection of the drug. The alternate hypothesis is
H0 :μ before >μafter i.e. the stimulus resulted in an increase in the blood pressure of the patients.
Taking α=1% and α=5% , as n = 13, γ = n − 1 = 12 , respective tabulated values are
t tab | =1% and =12 = 3.055 and t tab | =5% and =12 = 2.179 . Now, we compute the value of test
xi − x (x )
2
xi x −x
i
5 2 4
2 -1 1
8 5 25
-1 -4 16
3 0 0
0 -3 9
-2 -5 25
1 3 -2 4
5 2 4
0 -3 9
4 1 1
6 3 9
8 5 25
( x -x )
i=n 2
i
132 x-μ
Consider S= i=1
= = 11 = 3.317 . Therefore, t cal = × n may be obtained
n-1 12 S
0−3
as t cal = 13 = 3.2614 . As the calculated value 3.2614 is more than the tabulated values
3.317
of 3.055 and 2.179, we accept the alternate hypothesis that after the drug is given to patients,
there is an increase in the blood pressure level.
3. the life time of electric bulbs for a random sample of 10 from a large consignment gave the
following data: 4.2, 4.6, 3.9, 4.1, 5.2, 3.8, 3.9, 4.3, 4.4, 5. 6 . Can we accept the hypothesis that
the average life time of bulbs is 4, 000 hours?
Solution: Set up H0 :μ=4,000 hours , H1:μ 4,000 hours . Let us choose that = 5% . Then
tabulated value is t tab | =5% and =9 = 2.262 . To find the test criterion, consider
xi − x (x )
2
xi x −x
i
( x -x )
i=n 2
i
3.12 x-μ
Consider S= i=1
= = 0.589 . Therefore, t cal = × n is computed as
n-1 9 S
4.4 − 4.0
t cal = 10 = 2.148. As the computed value is lower than the tabulated value of
0.589
2.262, we conclude that mean life of time bulbs is about 4, 000 hours.
4. Consider the sample consisting of nine numbers 45, 47, 50, 52, 48, 47, 49, 53 and 51. The sample
is drawn from a population whose mean is 47.5. Find whether the sample mean differs significantly
Solution: for the given sample, the size is N=9. Therefore its mean is
1
𝑋̅ = ( 45 + 47+50+5 2 +48+47 + 49 +53 + 51) = 49.11
9
5. Eleven school boys were given a test in mathematics carrying a maximum of 25 marks. They
were given a month’s extra coaching and a second test of equal difficulty was held thereafter.
The following table gives the marks in the two tests.
Boy 1 2 3 4 5 6 7 8 9 10 11
I Test Marks 23 20 19 21 18 20 18 17 23 16 19
II Test Marks 24 19 22 18 20 22 20 20 23 20 17
Do the marks given evidence that the students have benefitted by extra coaching? Use 0.05
level of significance.
Solution: We first calculate the mean and the standard deviation in the difference in marks in
the two tests.
We note that the difference in marks(marks in II test – marks in I test) are
We note that this t- score is less than 𝑡0.05 (𝛾) = 2.23. Hence, we do not reject the
hypothesis at 0.05 level of significance. This means that it is likely that the students have not
been benefitted by extra coaching.
6. Two horses A and B were tested according to the time (in seconds) to run a particular race
with the following results.
Horse A: 28 30 32 33 33 29 34
Horse B: 29 30 30 24 27 29
Test whether you can discriminate between the two horses. (t0.05=2.2 for 11 d.f.)
Solution: Let the variables x and y respectively correspond to horse A and horse B.
∑ 𝑥 219
𝑥̄ = = = 31.3
𝑛1 7
∑ 𝑦 169
𝑦̄ = = = 28.2
𝑛2 6
∑(𝑥 − 𝑥̄ )2 = 31.43 ∑(𝑦 − 𝑦̄ )2 = 26.84
⟨∑(𝑥 − 𝑥̄ )2 + (𝑦 − 𝑦̄ )2 ⟩
𝑠=√ = 2.30
𝑛1 + 𝑛2 − 2
𝑥̄ − 𝑦̄
𝑡= = 2.42 > 2.2
1 1
𝑠√𝑛 + 𝑛
1 2
In above section, we have discussed t – distribution function (i.e. t – test). The study was based
on the assumption that the samples were drawn from normally distributed populations, or, more
accurately that the sample means were normally distributed. Since test required such an
assumption about population parameters. For this reason, A test of this kind is called
parametric test. There are situations in which it may not be possible to make any rigid
assumption about the distribution of population from which one has to draw a sample.
Thus, there is a need to develop some non – parametric tests which does not require any
assumptions about the population parameters.
With this in view, now we shall consider a discussion on 2 distribution which does not
require any assumption with regard to the population. The test criterion corresponding to this
( O -E )
2
i i
distribution may be given as χ 2 = i
where Oi : Observed values ,
Ei
Ei : Expected values .
The calculated χ 2 value (i.e. test criterion value or calculated value) is compared with the
tabular value of χ 2 value for given degree of freedom at a certain prefixed level of
significance. Whenever the calculated value is lower than the tabular value, we continue to
accept the fact that there is not much significant difference between expected and observed
results. On the other hand, if the calculated value is found to be more than the value suggested
in the table, then we have to conclude that there is a significant difference between observed
and expected frequencies.
As usual, degrees of freedom are γ=n-k where k denotes the number of independent
constraints. Usually, it is 1 as we will be always testing null hypothesis against only one
hypothesis, namely, alternate hypothesis.
This is an approximate test for relatively a large population. For the usage of test, the following
conditions must checked before employing the test. These are:
1. The sample observations should be independent.
2. Constraints on the cell frequencies, if any, must be linear.
3. i.e. the sum of all the observed values must match with the sum of all the expected values.
4. N, total frequency should be reasonably large
5. No theoretical frequency should be lower than 5.
frequencies and on the degrees of freedom, it does not make any assumptions regarding the
population.
Problems:
1. The following table gives the number of road accidents that occurred in a large city during
the various days of a week. Test the hypothesis that the accidents are uniformly distributed over
all the days of a week.
Day Sun Mon Tue Wed Thu Fri Sat Total
No. of 14 16 8 12 11 9 14 84
accidents
Solution: under the hypothesis that the accidents on each day are uniformly distributed over
the week, the expected number of accidents on each day are 12. (because a total of N = 84
accidents have occurred in 7 days).
Thus, her, the expected frequencies are 12 each observed frequencies are the number of
accidents shown in the given table.
Using these, we find that
We note that n=7 frequency pairs are used in the computation of 𝜒 2 . Further, N = ∑ 𝑓𝑖 = 84. Is
the only quantity used in the computation of ei. Therefore, the number of degrees of freedom
2 2
is v= 7-1 = 6. From the Table we find that 𝜒0.05 (6) = 12.59 and 𝜒0.01 (6) = 16.81.
2 2
Since 𝜒 2 =4.17 is much less than both of 𝜒0.05 (6) and 𝜒0.01 (6), we do not reject the hypothesis.
This means that the accidents seem to be distributed uniformly over the week.
2. A set of five similar coins is tossed 320 times and the result is
No. of heads 0 1 2 3 4 5
Frequency 6 27 72 112 71 32
Test the hypothesis that the data follow a binomial distribution function.
Solution: We shall set up the null hypothesis that data actually follows a binomial distribution.
Then alternate hypothesis is, namely, data does not follow binomial distribution. Next, to set
up a suitable level of significance, = 5% , with n = 6, degrees of freedom is = 5.
Therefore, the tabulated value is 2 | = 0.05, =5 = 11.07 . Before proceeding to finding test
criterion, first we compute the various expected frequencies. As the data is set to be following
n
binomial distribution, clearly probability density function is F ( X ) =N p k q n-k .
k
Here, n = 320, p = 0.5, q = 0.5 , and k takes the values right from 0 up to 5. Hence, the
expected frequencies of getting 0, 1, 2, 3, 4, 5 heads are the successive terms of the binomial
expansion
Here, observed values are: Oi : 6, 27, 72, 112, 71, 32
( 6 − 10 ) 2 ( 27 − 50 ) 2 ( 72 − 100 ) 2
|cal =
2
+ +
10 50 100
(112 − 100 ) 2 ( 71 − 50 ) 2 ( 32 − 10 ) 2
+ + + = 78.68.
100 50 10
As the calculated value is very much higher than the tabulated value of 3.841, we reject the
null hypothesis and accept the alternate hypothesis that data does not follow the binomial
distribution.
3. A set of five identical coins is tossed 320 times and the result is shown in the following
table.
No. of heads 0 1 2 3 4 5
Frequency 6 27 72 112 71 32
Test the hypothesis that the data follows a binomial distribution associated with a fair coin.
Solution: The Probability that x number of fair coins out of 5 shows a head in a single toss is
given by the binomial function
1 1
b (5, ½, x) = 5𝐶𝑥 (1/2)𝑥 (1/2)5−𝑥 = (5𝐶𝑥 ) = 32 (5𝐶𝑥 ) = b(x), say,
25
accordingly, in 320 tosses the expected number of tosses in which x number of coins show a
head is 320 × b(x). Hence the expected frequencies (i,e. the number of tosses in which
0,1,2,3,4,5 coins show a head) are, respectively,
1
𝑒1 = 320 × b(0) = 320 × 32× 5𝐶0 = 10,
1
𝑒2 = 320 × b(1) = 320 × 32× 5𝐶1 = 50,
1
𝑒3 = 320 × b(2) = 320 × 32× 5𝐶2 = 100,
1
𝑒4 = 320 × b(4) = 320 × 32× 5𝐶4 = 100,
1
𝑒5 = 320 × b(5) = 320 × 32× 5𝐶5 = 50,
1
𝑒6 = 320 × b(6) = 320 × 32× 5𝐶6 = 10,
We note that the number of degrees of freedom is 6-1 = 5. From the table we find that
2 2
𝜒0.05 (5) = 11.07 and 𝜒0.01 (5) = 15.09. We observe that 𝜒 2 = 78.68, is very much greater than
2 2
both of𝜒0.05 (5) and 𝜒0.01 (5). Therefore, we reject the hypothesis that the observed data
follows a binomial distribution associated with a fair coin.
4. Five dice were thrown 96 times and the numbers 1, 2 or 3 appearing on the dice follows the frequency
distribution as below.
No. of dice showing 1, 2 or 3 5 4 3 2 1 0
Frequency 7 19 35 24 8 3
2
Test the hypothesis that the data follows a binomial distribution. (𝜒0.05 = 11.07 for 5 d.f).
Solution:
p = q = 0.5
F ( x ) = N ( n C x ) p x q n− x
By fitting of Binomial distribution, we get
0i 7 19 35 24 8 3
Ei 3 15 30 30 15 3
(𝐸𝑖 − 𝑂𝑖 )2
𝜒2 = ∑ = 11.7 > 11.07
𝐸𝑖
Therefore, hypothesis rejected at 5% level of significance.
5. Fit a Poisson distribution to the following data and test for its goodness of fit at a level of significance
2
0.05. (𝜒0.05 with 3 d.f = 9.48)
X 0 1 2 3 4
f 419 352 154 56 19
Solution:
∑ 𝑓𝑥 904
𝑥̄ = = 1000 = 0.904 = 𝑚, the mean of Poisson distribution.
𝑁
𝑚𝑥 𝑒 −𝑚 (0.904)𝑥 𝑒 0.904
Hence 𝑃(𝑥) = = , 𝑥 = 0, 1, 2, 3, 4
𝑥! 𝑥!
1000×(0.904)𝑥 𝑒 −0.904
𝐸𝑥 = 𝑁 × 𝑃(𝑥) = , where x = 0, 1, 2, 3, 4.
𝑥!
Putting x = 0, 1, 2, 3 , 4 we get
1000 × (0.904)0 𝑒 −0.904
𝐸0 = 𝑁 × 𝑃(0) = = 405,
0!
1000 × (0.904)1 𝑒 −0.904
𝐸1 = 𝑁 × 𝑃(1) = = 366,
1!
1000 × (0.904)2 𝑒 −0.904
𝐸2 = 𝑁 × 𝑃(2) = = 165.4,
2!
1000 × (0.904)3 𝑒 −0.904
𝐸3 = 𝑁 × 𝑃(3) = = 49.8,
3!
4 −0.904
1000 × (0.904) 𝑒
𝐸4 = 𝑁 × 𝑃(4) = = 11.2,
4!
Hence the theoretical frequencies are
x: 0 1 2 3 4
f: 405 366 165.4 49.8 11.2
= 7.87
Exercises:
The F-test was first originated by the statistician R.A. Fisher. This test is also known as
Fisher’s F-test or simply F-test. It is based on the F-distribution, which is defined as the ratio
of two independent chi-square variates which is derived by dividing each variable by its
𝝍𝟐⁄
𝝂𝟏
corresponding degree of freedom 𝑭 = 𝝍𝟐⁄
𝝂𝟐
To test if the two samples have come from same population we use F test (OR) To test there
is any significant difference between two estimates of population variance.
F= greater variance/smaller variance
𝑆12
𝐹=
𝑆22
Where
̅)2
∑(𝑥−𝑥
𝑆21 =
𝑛1 −1
∑(𝑦 − 𝑦̅)2
𝑆22 =
𝑛2 − 1
Where n1 is the first sample size and n2 is the second sample size.
If the sample variance S2 is not given we can obtain the population variance byusing the
𝑛1 𝑠21 𝑛2 𝑠22
relation 𝑆21 = and 𝑆22 =
𝑛1 −1 𝑛2 −1
Assumptions in F-test.
The F-Test is based on the following assumptions:
1. Normality: The values in each group should be normally distributed.
2. Independence of Error: The variation of each value around its own group mean.
3. Homogeneity: The variances within each group should be equal for all groups.
If, however, the sample sizes are large enough, we do not need the assumption of normality.
Problems
1. In one sample of 8 observations the sum of the squares of deviations of the sample
values from the sample mean was 84.4 and in the other sample of 10 observation
it was 102. 6. Test whether this difference is significant at 5 % level.
Solution: Assume Null Hypothesis: 𝐻0 : 𝑆12 = 𝑆22 (There is no significant difference)
Alternate Hypothesis: 𝐻1 : 𝑆12 ≠𝑆22
Given ∑(𝑥 − 𝑥̅ )2 = 84.4, 𝑛1 = 8, ∑(𝑦 − 𝑦̅)2 = 102.6, 𝑛2 = 10
̅)2
∑( 𝑥 − 𝑥 84.4
𝑆21 = = = 12.057
𝑛1 − 1 8−1
𝑆12
𝐹 = 2 = 1.018
𝑆2
Calculated F value = 1.018
Tabulated Value at 5% level of significance with (9,11) degrees of freedom= 2.90
Calculated value < Tabulated value,
Hence accept Ho (Null hypothesis)
3. The time taken by workers in performing a job by method I and method II isgiven
below.
Method I 20 16 26 27 23 22
Method II 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population fromwhich
these samples are drawn do not differ significantly?
Solution: Assume Null Hypothesis: 𝑯𝟎 : 𝑺𝟐𝟏 = 𝑺𝟐𝟐 (The two samples have the same variance)
Alternate Hypothesis: 𝐻1 : 𝑆12 ≠𝑆22
x 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 y 𝑦 − 𝑦̅ (𝑦 − 𝑦̅)2
20 -2 4 27 -8 64
16 -6 36 33 -2 4
26 4 16 42 7 49
27 5 25 35 0 0
23 1 1 32 -3 9
22 0 34 -1 1
38 3 9
134 241
Given 𝑥̅ = = 22, 𝑦̅ = = 34.428 = 35
6 7
̅ )2
∑( 𝑥 − 𝑥 82
𝑆21 = = = 16.4
𝑛1 − 1 5
Tabulated Value = 4.95 (at 5% level of significance with (6,5) degrees of freedom)
Calculated value < Tabulated value, Accept Ho (Null hypothesis)
4. In a test given to two groups of students drawn from two normal populations, the marks
obtained were as follows:
Group A 18 20 36 50 49 36 34 49 41
Group B 29 28 26 35 30 44 46
Examine at 5% level, Whether the two populations have the same variance.
Solution: Assume Null Hypothesis: 𝑯𝟎 : 𝑺𝟐𝟏 = 𝑺𝟐𝟐 (The two samples have the same variance)
333 238
Given 𝑥̅ = = 37, 𝑦̅ = = 34
9 7
̅) 2
∑( 𝑥 − 𝑥 1134
𝑆21 = = = 141.75
𝑛1 − 1 8
∑(𝑦 − 𝑦̅)2 386
𝑆22 = = = 64.33
𝑛2 − 1 6
𝑆12
𝐹 = 2 = 2.203
𝑆2
Calculated F value = 2.203
The table value of F at 5% level for 8 and 6 degrees of freedom is 4.15
Calculated value < Tabulated value,
Hence accept the Null hypothesis.
Exercises
1. The nicotine content in milligrams of two samples of tobacco were found to beas
follows:
Sample A 24 27 26 21 25
Sample B 27 30 28 31 22 36
Can it be said that two samples come from normal populations having the samevariances.
2. The standard deviations calculated from two random samples of size 9 and 13 are 2 and
1.9 respectively May the sample be regarded as drawn from the normal population with
the same standard deviation.
Video links: