ITM Chapter 6 On Testing of Hypothesis
ITM Chapter 6 On Testing of Hypothesis
Testing of hypothesis
Null hypothesis and alternate hypothesis
In the case of a two tail test, the area of rejection of H0 lies on both sides of the normal
distribution curve. In the case of a one tail test, the area of rejection of H0 lies on only one side
i.e. either the right side or the left side of the normal distribution curve. This area where we reject
H0 is called the ‘critical region’ and is shown shaded in the diagram.
H0: µ= 80
H1: µ≠80
This is an example of two tail test because we reject H0 when the value of µ becomes more than
or less than 80, so H0 is rejected on both sides.
H0: p= 0.8
This is an example of one tail test because as long as p is up to 0.8, we accept H0 but if it
becomes more than 0.8, it gets rejected. Hence here, the area of rejection is on the right hand side
of normal distribution curve.
H0: µ = 80000
H1: µ < 80000
This is also an example of a one tail test because as long as µ is greater than or equal to 20000,
we accept H0 but if it becomes less than 20000, it gets rejected. So the area of rejection of H0
lies on the left side of the normal distribution curve.
Type I error occurs when we reject the null hypothesis when it is true and Type II error occurs
when we accept the null hypothesis when it is false.
Accept Ho Reject Ho
H0 is true No error Type I error
Suppose we divide the entire population into 100 samples out of which there are 95 samples
where Ho is true (i.e. there is no significant difference between sample value and population
value) and for the remaining 5 samples, Ho is false i.e. there is a significant difference between
the sample and population values. If we select at random any one of these 5 samples, then we
would reject Ho which should have been accepted because it is true for 95% of the samples. This
error is called as Type I error and is similar to the producer’s risk where the entire lot of a
manufacturer is rejected by a customer on the basis of a small minority of defective units. The
probability of making the Type I error is called as level of significance and is denoted by alpha
(α). In the above example, value of α is 5 %. The level of significance α can take different values
like 5%, 10%, 1%, 2% etc. and conventionally it can be taken as 5%.
Type II error occurs when we accept Ho on the basis of a small minority of samples where it is
true but should have been rejected because it is false for majority of the samples. This is similar
to the consumer’s risk where a consumer accepts a lot on the basis of a small minority of non-
defective units and later finds the majority of the lot to be defective. The probability of making
the Type II error is denoted by Beta (β). 1 – β is called as the power of the test.
1. First find the population parameter, i.e. whether it is the population mean or population
proportion.
2. Find out the sample size, whether it is large, i.e. >= 30 or small i.e. less than 30
3. Find out if the data is taken from one sample or two samples.
5. Write the data given in the problem, i.e. population mean µ, sample mean X bar, population
standard deviation sigma, population proportion p and sample proportion p bar.
6. Consider an appropriate level of significance alpha to be used. If the value of alpha is not
given in the problem take alpha = 5%.
7. See whether the test is a one tail test or two tail test.
8. Use the appropriate test to test the hypotheses, i.e. if the sample size is large i.e. n >= 30, use
the Z test and if the sample size is small i.e. n < 30, use the t test.
9. For testing any hypothesis, we need to find a calculated value and a table value.
10. The calculated value is found using a particular formula and the table value is found from the
statistical tables using a particular value of alpha, whether it is a one tail test or two tail test and
degrees of freedom (in case of t test). The degrees of freedom indicate the number of
independent variables.
If the calculated value < table value, we accept the null hypothesis H0, otherwise we reject
H0.
Z test: This test is used when the population is normally distributed and the sample size is large,
i.e. n ≥30.
(i) Using mean values: Here we compare the average of a sample X bar with the mean of the
population µ, to see whether there is a significant difference between them. We find out the
calculated value of Z using the formula
Then we find the table value of Z. This table value depends on 2 factors (i) the value of level of
significance α and (ii) whether it is a one tail test or a 2 tail test.
For a two tail test, we divide the value of α equally on both the sides. For example, if α = 5% and
it is a 2 tail test, we divide 5% equally on both the sides. So, the percentage from the mean upto
this point is 47.5%. So we look for 0.475 in the Z table. The table value of Z obtained is 1.96.
For a one tail test, we take the value of α on only one side, i.e. either the right side or left side,
depending on whether H1 has either a > sign or a < sign. If H1 has a > sign, then 5% will be
taken entirely on the right side and if H1 has a < sign, then 5% will be taken entirely on the left
side.
Q1. An agency claims that the average weight of college students is at least 55 kg and in a study
made to test this claim, 150 students selected at random had an average weight of 54 kg with a
S.D. of 8. Use α = 1% to test the claim of the agency.
Since Z cal < Z tab , we accept Ho and conclude that the agency’s claim that the average weight
of the college students is at least 55kg is correct.
Q2. A bus company advertised a mean time of 150 minutes for a trip between two cities. A
consumer group had reasons to believe (based on complaints received) that the mean time was
more than 150 minutes. A sample of 40 trips shows a mean of 153 minutes and a standard
deviation of 7.5 minutes.
a. Write the null and alternative hypothesis for the above problem.
c. Using the complete process of hypothesis testing, test the above hypothesis at 5% level of
significance.
Solution: ( a) µ = 150, X bar = 153, σ = 7.5, α = 5%, n=40>30 so we use the Z test.
Ho: µ = 150
(c) Z cal = mod( X bar - µ) / SEM = mod( 153 – 150) / 1.19 = 3 / 1.19 = 2.52
Since Z cal > Z tab, reject Ho and conclude that µ > 150 i.e. mean time was more than 150
minutes.
Q3. Salaried class employees have been used to spending Rs. 1000 per month on use of internet
on an average. During the past year, they have been feeling expressed that their bills are going
up. A survey was conducted by collecting data from 85 salaried employees at random and found
that the average expenditure on internet use was Rs. 1100 per month. The population standard
deviation was believed to be Rs. 200. Use 5% level of significance to test the hypothesis that the
expenditure has indeed increased. Write the null and alternative hypothesis and go through the
entire steps of hypothesis test to answer this research question.
Z cal = mod (X bar - µ)/ σ /√n = mod ( 1100 – 1000)/ 200/ √85 = 4.6098
Since Z cal > Z tab, we reject Ho and conclude that the expenditure on internet usage has gone
up.
Q4. The management of a bank was operating with an old computer system and could provide
customer service at the rate of 22 per hour on an average. The bank was not satisfied with this
rate of service and was hence thinking of buying a new computer system. A firm which was
selling this new computer system informed them that the rate of customer handling would
improve from the existing rate. The new system was tried out for some 36 days and it would
found to have on an average 26 customers per hour and the population standard deviation was
assumed to be 2.5 customers per hour. Is there any evidence to show that the new computer
system was giving any improvement on the old computer system? Use 5% level of significance.
Ho: µ = 22
Since Z cal > Z tab, we reject Ho and conclude that the new computer system will give an
improvement over the old computer system regarding rate of customer handling.
Q5. The mean life time of a sample of 400 light bulbs produced by a company is found to be 1570 hours
with a standard deviation of 150 hours. Test the hypothesis that the mean life time of the bulbs produced
by the company is 1600 hours against the alternate hypothesis that it is greater than 1600 hours at 1%
level of significance.
H0: µ = 1600
Since Zcal > Ztab, we reject H0 and conclude that the mean life time of the bulbs produced by
the company is greater than 1600 hours.
(ii) Using proportions: Here we compare the proportion of a sample p bar with the proportion of
the population p, to see whether there is a significant difference between them. We find out the
calculated value of Z using the formula
Then we find the table value of Z and compare with the calculated value and then decide whether
to accept H0 or reject H0.
Q6. An auditor claims that at least 10 % of customers’ ledger accounts are carrying mistakes of
posting and balancing. A random sample of 600 was taken to test the accuracy of posting and
balancing and 45 mistakes were found. Are these sample results consistent with the claim of the
auditor? Use 5% level of significance.
H0: p= 0.1
H1: p< 0.1
Z cal = mod (p bar –p)/ sqrt(pq/n) = mod (0.075-0.1)/ sqrt(0.1 x 0.9/ 600)= 2.041
Since Zcal > Ztab, we reject H0 and conclude that the claim of the auditor is wrong.
Q7. A study by Hewitt Associates showed that 79% of the companies offer flexible scheduling to
their employees. Suppose a researcher believes that in accounting firms this figure is lower. The
researcher randomly selects 415 accounting firms and through interviews determines that 303 of
these firms have flexible scheduling. Using a 5% level of significance, does the test show enough
evidence to conclude that a significantly lower proportion of accounting firms offer employees
flexible scheduling?
Solution: n= 415 > 30 so use Z test., p = 0.79, q = 1-p = 0.21, p bar = 303/415 = 0.73
Ho: p = 0.79 H1 : p < 0.79 , so it is a one tail test
Z cal = mod ( p bar – p)/ √pq/n = mod( 0.73 – 0.79) / √0.79 x 0.21/ 415 = 0.06/ 0.02 = 3
Z tab for α= 5% and one tail test is 1.645
Since Z cal > Z tab, we reject Ho and conclude that a significantly lower proportion of
accounting firms offer employees flexible scheduling.
(iii) Comparing proportions of two samples: Here we compare the proportion of one sample
with that of another sample to see whether there is any significant difference between them and
both the samples are large, i.e. n≥30.
Q8. Before an increase in excise duty on tea, 400 people out of a sample of 500 people were
found to be tea drinkers. After an increase in duty, 485 people were tea drinkers in a sample of
600 people. State, whether there is a significant decrease in the consumption of tea. Use a 5%
level of significance for test.
P1 bar = 400/500 = 0.8, p2 bar = 485/600 = 0.808, n1 = 500 > 30, n2 = 600 > 30, so use Z test
Z cal = mod (p1bar –p2 bar)/ √pq(1 /n1 + 1/n2) = mod(0.8-0.808)/ √0.805 x 0.195 (1/500 +
1/600) = 0.333
Since Z cal < Z tab, we accept Ho and conclude that there is no significant difference in the
consumption of tea after the increase in excise duty.
(iv) Comparing means of two samples: Here we compare the mean of one sample with that of
another sample to see whether there is any significant difference between them and both the
samples are large, i.e. n≥30.
Then we find the table value using the value of α and whether it is a 1 tail or a two tail test and
compare the calculated value with the table value.
H0: µ1= µ2
H1: µ1≠µ2
The following are the results for two independent samples taken from the population:
Sample 1 Sample 2
Sample size 80 70
Sample mean 104 106
Sample standard deviation 8.4 7.6
Carry out the test at 5% level of significance and conclude.
H1: µ1≠µ2
Z cal = mod (X1bar –X2 bar)/ (√σ12 / n1 + σ22 / n2) = mod (104 – 106)/ √8.42/ 80 + 7.62/70
= 1.53
Since Z cal < Z tab, we accept H0 and conclude that µ1= µ2.
Q10. A random sample of 1000 workers from environmental consultancy shows that their mean
wages are Rs. 470 per week with a standard deviation of Rs. 280 per week. A random sample of
1500 workers from environmental contaminant monitoring shows a mean wage of Rs. 490 per
week with a standard deviation of Rs. 400. Is there any significant difference between their mean
level of wages. Use 5% level of significance.
Solution: n1 = 1000, X1 bar = Rs. 470, σ1 = Rs. 280, n2 = 1500, X2 bar = Rs. 490, σ2 = Rs.
400, α = 5%
H0: µ1 = µ2
H1: µ1≠ µ2
Z cal = mod (X1bar –X2 bar)/ (√σ12 / n1 + σ22 / n2) = mod (470 – 490)/ √280 2/ 1000 +
4002/1500= 1.47
Since Z cal < Z tab, we accept H0 and conclude that there is a significant difference between
their mean level of wages.
Q11. Last year television stations WXYZ’s share of the 11pm news audience was approximately
equal to 25%. The station’s management believes that the current audience share is higher than
last year’s 25% figure. In an attempt to substantiate this belief, the station surveyed 400 11pm
viewers and found that 146 watched WXYZ. Set up the null and alternative hypothesis and test
the same at 5% level of significance. What is your conclusion?
H0: p= 0.25
Since Z cal < Z tab, we accept H0 and conclude that the current audience share is more than
25%.
Q12. A firm had redesigned the method of producing a product which was supposed to reduce
the time from an existing time of 36 minutes. The firm selected a sample of 50 units and found a
mean time of 30 minutes. The population standard deviation was 8 minutes. Use 5% level of
significance to test whether the new method of producing the product is resulting in its objective.
Solution: µ= 36 min, n = 50, X bar = 30 min, σ= 8 min, α = 5%
H0: µ= 36 min
Since Z cal > Z tab, we reject H0 and conclude that the new method has resulted in reducing the
time.
Q13. A report in the Investor’s Business Daily in the year 2015 reported that the mean work
week for the population of workers is 39.2 hours. A researcher wants to know whether this time
has increased in 2016. To test this, a sample of 112 workers is taken which showed a sample
mean of 40.5 hours with a population standard deviation of 4.8 hours.
(c) Perform the test at 5% level of significance and conclude your result.
(b) Since the sample size is 112> 30, hence we use the Z test.
t test : This is used when the population is normally distributed and the sample size is small i.e.
n< 30.
Paired t test is used when we have two related samples or dependent samples. For example, the
management of a company plagued by poor productivity realizes the need to provide technical
training to employees. It hires a researcher to measure the productivity levels of a sample of 25
employees. The productivity levels are measured again after a one month training programme. In
this kind of pre and post training study, samples which are taken before and after the study
cannot be treated as independent because each observation in sample 1 is related to the
observation in sample 2. The productivity scores obtained before training is related to the scores
obtained after training because the two measurements apply to the same person.
For dependent samples or related samples test or paired t test, it is important that the two samples
in the study are of the same size and small i.e. less than 30.
(ii) Unpaired t test or Independent samples t test: When the sample size is small (n1, n2 < 30)
and samples are independent (not related) and population standard deviation is unknown, the t-
statistic can be used to test the hypothesis for the difference between two population means. This
is called as unpaired t test or independent samples t test. This technique is based on the
assumption that the characteristic being studied is normally distributed for both the populations.
Q1. The consumers are asked to rate a company both before and after viewing a video on the
company twice a day for a week. The data are given below. Test at 5% level of significance to
determine whether there is a significant increase in the ratings of the company after the one week
video treatment.
∑(X-Xbar)2 = 222.8572
σ = √∑(X – xbar)2/ (n-1) = √222.8572/ 6 = 6.09 { Use the formula for Std. deviation}
For standard deviation, we have used the denominator as (n-1) because the sample size is small
and this correction is done to get a more accurate value of standard deviation. Since µ is not
given, we use
Since tcal > t tab, we reject Ho and conclude that there is a significant increase in ratings of the
company after the one week video treatment.
Q2. Memory capacity of 10 students was tested before and after training. State whether the
training was effective at α = 5%.
Roll No. 1 2 3 4 5 6 7 8 9 10
Before training: 12 14 11 8 7 10 3 0 5 6
After training : 15 16 10 7 5 12 10 2 3 6
Solution: H0: µ1 = µ2
H1: µ2 > µ1
∑(X – X bar)2 = 70
σ = sqrt(70/9) = 2.78
Since t cal < t tab, we accept H0 and conclude that the training is not effective.
Q3. Ten students are given coaching for environmental statistics. The score obtained in tests
Sr. No of student 1 2 3 4 5 6 7 8 9 10
Does the score from test 1 to test 5 shown an improvement? Test at 5% level of significance.
Unpaired t test
Q1. The mean life of a sample of 10 electric bulbs was found to be 1456 hours with a standard deviation
of 423 hours. A second sample of 17 bulbs chosen from a different batch had a mean life of 1280 hours
with a standard deviation of 398 hours. Test at α = 5 % whether there is a significant difference between
the means of 2 batches.
Solution: n1 = 10, n2 = 17, σ1 = 423, σ2 = 398, α = 5%, X1 bar = 1456, X2 bar = 1280
H0: µ1 = µ2
H1: µ1 ≠ µ2
So it is a two tail test.
σ12 = ∑(X1- X1 bar)2 /(n1 -1), σ22 = ∑(X2- X2 bar)2 /(n2 -1)
= (1610361+2534464)/(10-1+17-1) = 165793
For α = 5%, two tail test and d.f. = n1-1 + n2-1 = 10-1 + 17-1 = 25, t tab=2.060
Since t cal < t tab, we accept Ho and conclude that there is no significant difference between the
means of the two batches.
Q2. The following data gives the marks of two groups of students taken from classes A and B
respectively. Test at α = 5% whether the performance of class B is better than that of class A.
Class A Class B
56 55
67 57
54 74
65 79
76 64
72 81
37 58
68
Solution: Here n1 = 7, n2 = 8 , α = 5%
56 55 25 144
67 57 36 100
54 74 49 49
65 79 16 144
76 64 225 9
72 81 121 196
37 58 576 81
68 1
H0: µ1=µ2
H1: µ2> µ1
t cal = mod(X1 bar – X2 bar) / σ (sqrt(1/n1 + 1/n2)) = mod(61-67)/ 11.675 (sqrt(1/7 + 1/8) =
0.993.
For α = 5%, one tail test and d.f. = n1-1 + n2-1 = 7-1 + 8-1 = 13, t tab=1.771
Since t cal < t tab, we accept Ho and conclude that there is no significant difference in the
performance of both the classes.
Parametric and non-parametric tests: Parametric tests are those in which the population
distribution is specified and we can specify the parameters defining the distribution. Examples of
parametric tests are Z test, t test and F test. In these tests, the population is normally distributed
and the parameters are mean µ and variance σ 2. Non-parametric tests are those where we cannot
specify the population distribution or the parameters. Examples of non-parametric tests are chi-
square test, U test, H test etc.
Now, we see one example of a non-parametric test i.e. the chi-square test.
Chi-square test
(a) To test the independence of two qualitative attributes: The chi-square test helps to find
out whether two qualitative attributes are related to each other or not, but it does not give the
extent of relation between them. The hypotheses can be taken as
H0: The two attributes are independent i.e. there is no relation between them
H1: The two attributes are dependent i.e. there is a relation between them
We can consider the following examples:
Q1. In a historical perspective study of psychotic disorders, each person in a sample of 96 male
schizophrenics and a sample of 94 female schizophrenics was classified on the basis of
chronicity of illness during the 40 year period of follow-up. Included in the classification scheme
were amounts of time spent in inpatient and outpatient care and the numbers of such contacts.
Each person was then classified as being chronically ill if at least 75% of his follow up time was
spent in outpatient care. Additionally, the person was classified chronically ill if he had at least
one inpatient or one out patient contact with a psychiatric care unit during each decade of the
patient’s follow-up. The data, gathered via a structured psychiatric interview are summarized
below:
male Females Total
Chronically ill 19 33 52
( yes)
Chronically ill ( no) 77 61 138
Total 96 94 190
At 10 % level of significance, test whether there is sufficient evidence to conclude that the illness
rates are different between males and females.
Solution: Ho: Illness rates are not different between males and females i.e. there is no relation
H1: Illness rates are different between males and females i.e. there is a relation
For chi square tab value, α = 10%, degrees of freedom = (R-1) (C-1) = (2-1) (2-1) =1
Since chi square cal > chi square tab value, we reject Ho and conclude that illness rates are
different for males and females.
Q2. 200 randomly selected adults were asked whether TV shows as a whole are primarily entertaining,
educational or a waste of time. The respondents were categorised by gender. Their responses are given
----------------------------------------------------------------------------------------------
Gender Opinion
Entertaining Educational Waste of time Total
-------------------------------------------------------------------------------------------
Male 28 12 50 90
Female 52 28 30 110
--------------------------------------------------------------------------------------------
Total 80 40 80 200
12 90 x 40 / 200 = 18 36/18= 2
For chi square tab value, α = 5%, degrees of freedom = (R-1) (C-1) = (2-1) (3-1) =2
Since chi square cal > chi square tab value, we reject Ho and conclude that the two attributes
opinion and gender are related.
Q3. In a locality, 100 persons were randomly selected and asked about their educational
achievement. The results are given as follows:
----------------------------------------------------------------------------------------------
Gender Education
-------------------------------------------------------------------------------------------
Male 10 15 25 50
Female 25 10 15 50
--------------------------------------------------------------------------------------------
Total 35 25 40 100
For chi square tab value, α = 5%, degrees of freedom = (R-1) (C-1) = (2-1) (3-1) =2
Since chi square cal > chi square tab value, we reject Ho and conclude that education is related to
gender.
Q4. One of the questions on the Business Week subscriber study was, “In the past 12 months
when travelling for business, what type of airline ticket you purchased most often”. The data
obtained are shown in the following table.
Type of Flight
Type of Ticket Domestic flights International flights
First Class 29 22
Business / Executive class 95 121
Full fare economy /coach class 518 135
Based on the above can it be concluded that type of ticket is dependent on the type of
flight taken. Use 5% level of significance.
For chi square tab value, α = 5%, degrees of freedom = (R-1) (C-1) = (3-1) (2-1) =2
Since chi square cal > chi square tab value, we reject Ho and conclude that type of ticket is
dependent on type of flight.
Q5. Two sample polls of votes for 2 candidates A and B for a public office are taken, one from
among the residents of rural areas and one from urban areas. The results are given in the table
below. Examine whether the nature of the areas is related to voting preference in this election.
A B
For chi square tab value, α = 5%, degrees of freedom = (R-1) (C-1) = (2-1) (2-1) =1
Since chi square cal > chi square tab value, we reject Ho and conclude that nature of area is
related to voting preference.
(b) To test the goodness of fit: Here we compare the observed data with the data expected
according to some theory or distribution and see whether they are consistent with each other or
not. The hypotheses can be taken as
Q1. A survey of 320 families with 5 children each revealed the following distribution:
No. of boys : 5 4 3 2 1 0
No. of girls : 0 1 2 3 4 5
Is the result consistent with the hypothesis that male and female births are equally probable?
Solution: Here we use the concept of binomial distribution because in a binomial distribution
there are two results possible i.e. success and failure. In this example, the two results are boys
and girls. We can take either boys or girls as success and so probability of success p = 0.5
because we want to find whether male and female births are equally probable. So p=q = 0.5
H0: The result is consistent with the hypothesis that male and female births are equally probable
H1: The result is not consistent with the hypothesis that male and female births are equally
probable
The observed values are the number of families given in the question. We will find P(X =x)
taking x = 0, 1, 2, 3, 4 and 5 and these probabilities will be multiplied by total number of families
i.e. 320 to get the expected values. For using the binomial distribution, we take n = 5 because
total number of children per family is 5.
For chi square tab value, α = 5%, degrees of freedom = n-1 = 6-1 =5
Q2. The HR manager of a company believes that absenteeism is uniformly distributed over all
the days of the week. The following data shows the number of employees absent on different
days of a week. Test at α = 5% whether the belief of the HR manager is right.
Monday 5
Tuesday 3
Wednesday 2
Thursday 4
Friday 6
Solution: Here the observed values are the actual number of employees absent on the days of the
week. Since the HR manager feels that absenteeism is uniformly distributed, we take each
expected value as the average of all the values.
H0: The observed data is consistent with the belief that absenteeism is uniformly distributed over
the week
H1: The observed data is not consistent with the belief that absenteeism is uniformly distributed
over the week
5 4 1/4
3 4 1/4
2 4 4/4
4 4 0
6 4 4/4
For chi square tab value, α = 5%, degrees of freedom = n-1 = 5-1 =4
Since chi square cal < chi square tab value, we accept Ho and conclude that the observed data is
consistent with the belief that absenteeism is uniformly distributed over the week.
ANOVA is used to compare the average values of more than two samples. In this case, we divide
the total variation in the dependent variable into two components:
Total sum of squares (TSS) = Sum of squares between samples (SSB) + Sum of squares
within samples (SSW)
Here X1 represents the individual observations of the first sample, X2 represents the individual
observations of the second sample, X3 represents the individual observations of the third sample
and so on. X represents the observations of all the samples taken together. Here n is the total
number of observations, n1 is the number of observations in the first sample, n2 is the number of
observations in the second sample, n3 is the number of observations in the third sample and so
on.
ANOVA table
Then we find F table using the value of α and degrees of freedom for numerator and
denominator.
Q1. To test the significance of variation in the retail prices of a commodity in 3 principal cities,
Mumbai, Kolkata and Delhi, 4 shops were chosen at random in each city and the prices were
observed as follows:
Mumbai: 16 8 12 14
Kolkata: 14 10 10 6
Delhi: 4 10 8 8
Do the data indicate that prices in 3 cities are significantly different? Use α = 5%
Solution: Since there are three samples and we have to compare their average values, we can
write the hypotheses as
H0: µ1 = µ2= µ3 i.e. there is no significant difference between the average prices of the three
cities
H1: µ1, µ2 and µ3 are not all equal i.e. there is a significant difference between the average
prices of the three cities
∑X = 16 + 8 + 12 + 14 + 14 + 10 + 10 + 6 + 4 + 10 + 8 + 8= 120
ANOVA table
F table for α = 5 % and (2, 9) degrees of freedom for numerator and denominator is 4.26.
Since Fcal < Ftable, we accept H0 and conclude that there is no significant difference between
the average prices of the three cities
Q2. The data regarding life (in hours) of 3 types of bulbs manufactured by a company are given
in the following table. Test the hypothesis that mean life of bulbs of different types are the same.
A 16 18 19
Types of bulbs B 14 13 15 20
C 18 17 19 21 21
Solution: Since there are three samples and we have to compare their average values, we can
write the hypotheses as
H0: µ1 = µ2= µ3 i.e. there is no significant difference between the average life of the three types
of bulbs
H1: µ1, µ2 and µ3 are not equal i.e. there is a significant difference between the average life of
the three types of bulbs
∑X2 = 162 + 182 + 192 +142 + 132 + 152 +202 + 182 + 172 + 192 + 212 + 212 = 3787
ANOVA table
F table for α = 5 % and (2, 9) degrees of freedom for numerator and denominator is 4.26.
Since F cal < F table, we accept H0 and conclude that there is no significant difference between
the average life of the three types of bulbs.
Q3. In a bumper test, three types of autos were deliberately crashed into a barrier at 5 Kmph and
the resulting damage (in Rupees) was estimated. Five test vehicles of each type were crashed,
with the results shown below:
Vehicle type
Small Medium Heavy
1600 1290 1090
760 1400 2100
880 1390 1830
1950 1850 1250
1220 950 1920
It was decided to carry out an Analysis of Variance test and the following results were obtained
but with certain values were missing.
ANOVA
Source of
Variation SS Df MS F cal P-value F table
0.41880
Between Groups ** ** 170180 ** 2 3.885294
Within Groups ** ** **
252064
Total 0 **
ANOVA
Source of
Variation SS Df MS F cal P-value F table
(a) Ho: µ1 = µ2 = µ3 i.e. there is no significant difference between the average values of the
3 samples
H1: µ1, µ2 and µ3 are not all equal i.e. there is a significant difference between the average
values of the 3 samples
(b) Filled in values are shown in above table.
(c) Since F cal < F critical, we accept Ho and conclude that there is no significant difference
between the average values of the 3 samples.
Q4. A company manufacturing potato chips has been using two different machines for packing
their product. Both these machines have nearly the same levels of productivity levels (measured
as the number of packets that are packed per minute). A new machine for packing is being
promoted by a vendor and is being considered by the company. However, the company is willing
to adopt the new machine only if there is significant improvement in the productivity. The
vendor had requested the company to install the machine and then test the same for some period
of time and decide whether they should procure the same for future use. Further once procured,
they can keep using the existing machine or standby which can be maintained and therefore do
not suffer any production loss. The company agreed to install the machine and test the same
before any decision is taken. The following are the productivity figures by using these three
machines.
The researcher made a preliminary analysis and produced the following table. However the table
had several missing values which got missed due to some error in the software.
SUMMARY
Groups Count Sum Average Variance
12.0555555
machine 1 10 425 42.5 6
47.77777 24.9444444
machine II 9 430 778 4
new machine 10 530 53 24
ANOVA
Source of
Variation SS df MS F P-value F table
551.254789 275.6273 8.74914E- 3.3690163
Between Groups 3 2 946 ** 05 6
Within Groups ** ** **
1075.31034
Total 5 **
Analysis is done at 5% level of significance.
Solution: (a) H0: µ1 = µ2 = µ3, H1: µ1, µ2 and µ3 are not all equal
(b) The missing values are shown below:
ANOVA
Source of F
Variation SS Df MS Fcal P-value table
Between 551.2547 275.6273 275.627/ 20.15 = 8.74914E 3.36
Groups 893 2 946 13.67471861 -05 9
524.0555 29-3 = 20.15598
Within Groups 557 26 291
1075.310
Total 345 28
(c)Since F cal > Ftab, we reject Ho and conclude that there is a significant difference in the
average productivity of the 3 machines. Hence the company should adopt the new machine
having a higher productivity.
In the case of 1 way ANOVA, the variation in the dependent variable is on account of only one
factor. For example, variation in marks of different batches is on account of one factor i.e. batch.
In the case of 2 way ANOVA, the variation in the dependent variable is on account of two
factors. For example, variation in productivity of different plots of land can be on account of two
factors i.e. type of seeds and type of fertilisers.
Hence, in the case of 2 way ANOVA, the explained variation is on account of two components,
one because of factor 1 and the other because of factor 2.
H01: µ1 = µ2= µ3= ... i.e. there is no significant difference between the average values of the
different samples for factor 1
H11: µ1, µ2, µ3, ... are not equal i.e. there is a significant difference between the average values
of the different samples for factor 1
H02: µ1 = µ2= µ3= ... i.e. there is no significant difference between the average values of the
different samples for factor 2
H12: µ1, µ2, µ3, ... are not equal i.e. there is a significant difference between the average values
of the different samples for factor 2
SSB (factor 1) = (∑X1)2/n1 + (∑X2)2/n2 + (∑X3)2/n3 + … - (∑X)2/n where the factor 1 can be
taken rowwise.
Similarly, SSB (factor 2) = (∑X1)2/n1 + (∑X2)2/n2 + (∑X3)2/n3 + … - (∑X)2/n where the factor
2 can be taken columnwise.
ANOVA table
Using the F test, we find F cal (factor 1) = SSB factor 1/(k1-1)/ (SSW/n-k1-k2+1)
Then we find F table (factor 1) using the value of α and degrees of freedom for numerator and
denominator.
If Fcal (factor 1) < Ftable (factor 1), we accept H01 otherwise we reject H01.
Then we find F table (factor 2) using the value of α and degrees of freedom for numerator and
denominator.
If Fcal (factor 2) < Ftable (factor 2), we accept H02 otherwise we reject H02.
Unsolved questions
Q5. To study the effect of temperature on yield in a chemical process, five batches were
produced at each of three temperature levels. The results follow. Construct an analysis of
variance table. Use a 5% level of significance to test whether the temperature level has an effect
on the mean yield of the process.
Yield
Temperatur
e
50degC 34 24 3 39 32
6
60degC 30 31 3 23 27
4
70degC 23 28 2 30 31
8
Q6. Three new high‐definition television models are compared. The distances (in miles) over
which a clear signal is received in random trials for each of the models are given below:
General Instrument 11 12 13 12 12 12 14 12 12 13 12 13 13 13 14
Phillips 12 12 12 12 13 11 11 12 12 13 11 12 12 13 11
Zenith 11 10 11 10 11 9 10 10 8 8 7 9 10 10 8
Do you believe there are differences between the three models? Test for 5% level of significance.
(State your assumptions)