Cha 5
Cha 5
ANALYSIS OF VARIANCE
5. 1 Comparison of the Mean of More than Two Populations
Analysis of variance (ANOVA) is a procedure to test the hypothesis that several populations
have the same mean; i.e., it is used to test the equality of several means. The name ANOVA
stems from the somewhat surprising fact that a set of computations of several variances is used to
test the equality of several means.
When testing for differences in mans of more than two populations, we usually do not proceed
by considering all combinations of two populations at a time and testing for differences in each
pair.
1. Such an approach would require several tests rather than just one.
2. If each individual test were conducted using a level of significance of say α = 0.05, then
the overall level of significance would be higher than 0.05. For example, if Ho: μ 1 = μ2 =
μ3, α (the probability of rejecting a true null hypothesis) = 0.143 (1-0.953).
Thus, we want to test simultaneously for differences among the means of all the populations, and
we want the joint level of significance of the test to be α. To perform this test we make use of the
F-distribution and use a method called ANOVA.
2. The populations from which the samples were drawn are normally distributed. If
however, the sample sizes are large enough, we do not need the assumption of normality.
1. The variance obtained by calculating the variation within the samples themselves –
Mean Square Within (MSW).
2. The variance obtained by calculating the variation among sample means – Mean Square
Between (MSB).
Since both are estimates of σ 2, they should be approximately equal in value when the null
hypothesis is true. If the null hypothesis is not true, these two estimates will differ considerably.
The three steps in ANOVA, then, are:
1. Determine one estimate of the population variance from the variation among
sample means
The variance among the sample means is called Between Column Variance or Mean Square
Between (MSB).
Sample variance =
Now, because we are working with sample means and the grand mean, let’s substitute for X,
for , and K (number of samples) for n to get the formula for the variance among the sample
means:
In sampling distribution of the mean we have calculated the standard error of the mean as
. Cross multiplying the terms . Squaring both sides .
In ANOVA, we do not have all the information needed to use the above equation to find σ 2.
Specifically, we do not know . We could, however, calculate the variance among the sample
means, , using So, why not substitute for and calculate an estimate of
There is a slight difficulty in using this equation as it stands. n represents the sample size, but
which sample size should we use when different samples have different sizes? We solve this
problem by multiplying by its own appropriate nj, and hence becomes:
MSB = .
It is based on the variation of the sample observations within each sample. It is called the within
column variance or Mean Square Within (MSW). We calculate the sample variance for each
sample as .
Since we have assumed that the variances of the populations from which samples have been
drawn are equal, we could use any one of the sample variances as the second estimate of the
population variance. Statistically, we can get a better estimate of the population variance by
using a weighted average of all sample variances. The general formula for this second estimate
of is:
MSW =
Where:
= Second estimate of the population variance based on the variation within the samples
(the Within Column Variance – MSB)
nj = the size of the jth sample
nj - 1 = degree of freedom in each sample
nT – k = degrees of freedom associated with SSB
The sample variance of jth sample
K = the number of samples
1
MSW is based on the variation within each of the samples; it is not influenced by whether or not the null
hypothesis is true. Thus, MSW always provides an unbiased estimate of the population variance.
Stat. for Mgt. II Page 3
nT = Σnj = the total sample size = n1 + n2 + -----+ nk
The estimate of population variance based on variation that exists between sample means (MSB)
is somewhat suspect because it is based on the notion that all the populations have the same
mean. That is, the estimate MSB is a good estimate of the σ 2 only if Ho is true and all the
populations’ means are equal: μ1 = μ2 = μ3 = ------ = μk.
If the unknown population means are not equal, and probably are radically different from one
another, then the sample means ( ) will most likely be radically different from each other too.
This difference will have a marked effect on MSB. That is to say, the values will vary a great
deal and the terms will be large. Thus, if the population means are not all equal, then
the MSB estimate will be large relative to the MSW estimate. That is, is the MSB is large
relative to the MSW, and then the hypothesis that all the population means are equal is not likely
to be true.
The important question is, of course, How large is “large?” also, how do we measure the relative
sizes of the two variance estimates? The answer to these questions is given by the F-distribution.
If k samples of nj (j = 1, 2… k) items of each are taken from k normal populations that have
equal variances and for which the hypothesis Ho: μ1 = μ2 = …= μk is true, then the ratio of the
MSB to the MSW is an F-value that follows an F-probability distribution.
THE F-DISTRIBUTION
Characteristics of F-distribution
1. It is a continuous probability distribution
2. It is unimodal
3. It has two parameters; pair of degrees of freedom, ν1 and ν2
ν1 = the number of degrees of freedom in the numerator of F-ratio; ν1 = k – 1
ν2 = the number of degrees of freedom in the denominator of F-ratio; ν2 = nT - k
4. It is a positively skewed distribution, and tends to get more symmetrical as the degrees of
freedom in the numerator and denominator increase.
Illustration One
The training director of a company is trying to evaluate three different methods of training new
employees. The first method assigns each to an experienced employee for individual help in the
MSB =
MSW =
Solution
1. Ho: μ1 = μ2 = μ3 = μ4
=4-1=3 = 27 – 4 = 23
3. Sample F
Stat. for Mgt. II Page 6
MSB =
MSW =
4. Do not reject Ho.There is no difference exists in the average annual household incomes
among the four communities.
5.2 Variance Test
Most commonly used statistical hypothesis tests, such as t tests, compare means or other
measures of location. Some studies need to compare variability also. Equality of variance tests
can be used on their own for this purpose but they are often used alongside other methods (e.g.
analysis of variance) to support assumptions made about variance.
In addition to comparing two means, statisticians are interested in comparing two variances or
standard deviations. For example, is the variation in the temperatures for a certain month for two
cities different? In another situation, a researcher may be interested in comparing the variance of
the cholesterol of men with the variance of the cholesterol of women. For the comparison of two
variances or standard deviations, an F test is used. The F test should not be confused with the
chi-square test, which compares a single sample variance to a specific population variance, as we
shown discus now.
5.2.1 Hypothesis Testing of Population Variance
Using σ20 to denote the hypothesized value for the population variance, the three forms for a
hypothesis test about a population variance are as follows:
After computing the value of the χ2 test statistic, either the p-value approach or the critical value
approach may be used to determine whether the null hypothesis can be rejected.
Illustration
The St. Louis Metro Bus Company wants to promote an image of reliability by encouraging its
drivers to maintain consistent schedules. As a standard policy the company would like arrival
times at bus stops to have low variability. In terms of the variance of arrival times, the company
standard specifies an arrival time variance of 4 or less when arrival times are measured in
minutes. Suppose that a random sample of 24 bus arrivals taken at a downtown intersection
provides a sample variance of s2 = 4.9. Formulate hypothesis testing to help the company
whether the arrival time population variance is excessive or not. α = 0.05.
Solution:
Step 1: H0: σ2 ≤ 4
Ha: σ2 > 4
Therefore, if χ2cal. value is less than χ2tab (35.172), accept H0 unless reject.
Step 4: Because χ2cal. (28.18) is less than χ2tab. (35.172), accept H0.
In some statistical applications we may want to compare the variances in product quality
resulting from two different production processes, the variances in assembly times for two
assembly methods, or the variances in temperatures for two heating devices. In making
comparisons about the two population variances, we will be using data collected from two
independent random samples, one from population 1 and another from population 2. The two
sample variances S12 and S22 will be the basis for making inferences about the two population
variances σ12 and σ22. Whenever the variances of two normal populations are equal ( σ12 = σ22),
the sampling distribution of the ratio of the two sample variances is S12/S22 as follows.
has an F distribution with n1 - 1 degrees of freedom for the numerator and n2 - 1 degrees of
freedom for the denominator; S12 is the sample variance for the random sample of n1 items from
population 1, and S22 is the sample variance for the random sample of n1 items from population2.
Illustration
Suppose Raya University has planned to purchase service buses for the coming year and must
select one of two bus companies, the Mercedes Company or the Daewoo Company. We will use
the variance of the arrival or pickup/delivery times as a primary measure of the quality of the bus
service. Low variance values indicate the more consistent and higher quality service. If the
variances of arrival times associated with the two services are equal, RU administrators will
select the company offering the better financial terms. However, if the sample data on bus arrival
times for the two companies indicate a significant difference between the variances, the
administrators may want to give special consideration to the company with the better or lower