Chapter 5, ANOVA
Chapter 5, ANOVA
The probability distribution used in this chapter is the F-Distribution. It was named to honor Sir
Ronald Fisher, one of the founders of modern day statistics. This probability distribution is used
as the distribution of the test statistic for several situations. It is used to test whether two samples
are from populations having equal variance and it also applied when we want to compare several
population means simultaneously. The situations comparison of several populations means
known as analysis of variance (ANOVA).
Characteristics of ANOVA
Types of ANOVA
1. One-way ANOVA
It refers to the situations when only one fact or variable is considered. For example; in testing for
differences in sales for three sales men, we are considering only one factors which is the sales
man’s ability.
1|Page
2. Two-way ANOVA
If we take two facts simultaneously and investigate the differences among their various
categories having numerous possible values, we said to use two-way variance. For example; the
sales not only affected by sales man’s selling ability but also by the price charged by the
company.
ANOVA Assumptions
Another use of the F-distribution is ANOVA technique in which we compare three or more
population means to determine whether they could be equal. To use ANOVA, we can assume the
followings:
When these conditions are meet, F-used as the distribution of the test statistic
SSB
Variance between SSB/df k−1 MSB
F= = = =
Variance within SSW / df SSW MSW
n−k
2|Page
Rational behind ANOVA
Even though each observation comes from the same population, some chance of variation can
occur. This variance may be due to sampling errors or other natural causes. It can be calculate
through the following steps.
1. Calculate the mean value of each sample i.e. X1, X2, X3, Xk
2. Take one sample at a time and take the deviation of each item in the sample from its mean.
3. Square the difference and take the total sum of all these squared differences. This is also
known as SSW.
4. Divided these SSW by the corresponding df; df = N – k
5. This figure SSW/df is also known as (2within). It is called mean of sum of within (MSW).
II.Variance between Sample (2between)
It is due to the effect of different treatments i.e. the population means (µ) may be affected by
factors under consideration, making the different mean; inter sample variability, also known as
the sum of between samples (SSB). SSB can be calculated as follows:
1. Take k sample of size n each and calculate each mean of the sample i.e. X1, X2, X3, Xk
2. calculate the grand sum of mean X of the distribution of these sample mean so that:
k
xi
X=∑ k
i=1 k
3. Take the difference between the mean of various samples and the grand mean i.e. (X 1 – X,
X2 – X, Xk – X)
4. Square the difference individually, multiply each of these squared deviations by its products
so that we get:
∑ni (Xi – X)2. Where ni = size of the ith sample. This will be the value of SSB.
SSB = n1(X – X) 2 + n2(X – X) 2 + nk(Xk – X)2
3|Page
5. Divided SSB by the df, which are (k – 1), where k is the number of samples and these would
give us the value of (2between)
Degree of freedom
The degrees of freedom are associated both with the numerator and denominator of the F-ratio.
1. Numerator: Since the variance between samples (2between)comes from many samples and if
there are k-number of samples, then the df associated with the numerator will be k – 1
2. Denominator: It is the mean variance of k-samples and size each variance in each sample is
associated with would be df = N – k
Then the value of F is compared with the critical value of F from the table and decision is made
about the validity of null hypothesis.
ANOVA-Table
After variance calculations for SSB, SSW, and the df have been made, these figures can be
presented in simple table called ANOVA table as follows:
5.2. Comparison of the mean of more than two population and Variance Test
When we use ANOVA to test whether the means of k-populations are equal, rejection of null
hypothesis allows us to conclude only that the population means are not all equal. In some case,
we will want to go a step further and determine whether the differences among means occur. The
purpose of this section is to show how multiple comparison procedures can be used to conduct
statistical comparisons between pairs of the population means.
Example:
1. To test all teachers teach the same material in different sections of the Statistics for
Management class or not, four sections of the same course were selected and the common
test was administered to five students selected at random from each section. The score for
each student from each section were noted and are given below. We want to test for any
4|Page
difference in learning as reflected in the average score for each section. At = 05, test
whether there is any significant difference in teaching.
5|Page
3. Awash Insurance Company wants to test whether three of its sales men A, B, and C in a
given territory make similar number of appointments with prospective customers during a
given period of time. A record of previous four months showed that the following results for
the number of appointments made by each sales man for each month.
Sales Men
Months A B C
1 8 6 14
2 9 8 12
3 11 10 18
4 12 4 8
Do you think that at 95% confidence level, there is significant difference in the average
number of appointments made by the three sales men per month?
4. A department store chair is considering building new store at one of the three locations. An
important factor in making such a decision is the households’ income in these areas. If the
average income per house hold is similar, then they can pick any one of these three locations. A
random survey of various households in each location is undertaken and their annual combined
income is recorded. These data are tabulated as follow: Annual household income (1,000 birr). At
= 0.01, test if the average income per household in all these locations can be considered as the same.
Area-1 70 72 75 80 83 - -
Area-2 100 110 108 112 113 120 100
Area-3 60 65 57 84 84 70 -
6|Page