0% found this document useful (0 votes)
34 views10 pages

Cha 5

Uploaded by

Senay Haftu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views10 pages

Cha 5

Uploaded by

Senay Haftu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPTER -FIVE

ANALYSIS OF VARIANCE
5. 1 Comparison of the Mean of More than Two Populations

Analysis of variance (ANOVA) is a procedure to test the hypothesis that several populations
have the same mean; i.e., it is used to test the equality of several means. The name ANOVA
stems from the somewhat surprising fact that a set of computations of several variances is used to
test the equality of several means.

When testing for differences in mans of more than two populations, we usually do not proceed
by considering all combinations of two populations at a time and testing for differences in each
pair.
1. Such an approach would require several tests rather than just one.
2. If each individual test were conducted using a level of significance of say α = 0.05, then
the overall level of significance would be higher than 0.05. For example, if Ho: μ 1 = μ2 =
μ3, α (the probability of rejecting a true null hypothesis) = 0.143 (1-0.953).
Thus, we want to test simultaneously for differences among the means of all the populations, and
we want the joint level of significance of the test to be α. To perform this test we make use of the
F-distribution and use a method called ANOVA.

In order to use ANOVA, we assume the following:


1. All the samples were randomly selected and are independent of one another.

2. The populations from which the samples were drawn are normally distributed. If
however, the sample sizes are large enough, we do not need the assumption of normality.

3. All the population variances are equal.

ANOVA is based on a comparison of two different estimates of the variances, σ 2, of overall


population.

1. The variance obtained by calculating the variation within the samples themselves –
Mean Square Within (MSW).
2. The variance obtained by calculating the variation among sample means – Mean Square
Between (MSB).
Since both are estimates of σ 2, they should be approximately equal in value when the null
hypothesis is true. If the null hypothesis is not true, these two estimates will differ considerably.
The three steps in ANOVA, then, are:
1. Determine one estimate of the population variance from the variation among
sample means

Stat. for Mgt. II Page 1


2. Determine a second estimate of the population variance from the variation within
the samples
3. Compare these two estimates. If they are approximately equal in value, accept
the null hypothesis.

Calculating the Variance among the Sample Means – MSB

The variance among the sample means is called Between Column Variance or Mean Square
Between (MSB).

Sample variance =

Now, because we are working with sample means and the grand mean, let’s substitute for X,
for , and K (number of samples) for n to get the formula for the variance among the sample
means:

In sampling distribution of the mean we have calculated the standard error of the mean as
. Cross multiplying the terms . Squaring both sides .
In ANOVA, we do not have all the information needed to use the above equation to find σ 2.
Specifically, we do not know . We could, however, calculate the variance among the sample

means, , using So, why not substitute for and calculate an estimate of

the population variance? This will give us:

Which sample size to use?

There is a slight difficulty in using this equation as it stands. n represents the sample size, but
which sample size should we use when different samples have different sizes? We solve this
problem by multiplying by its own appropriate nj, and hence becomes:

MSB = .

Stat. for Mgt. II Page 2


Where:
= First estimate of the population variance based on the variation among sample means
(the Between Column Variance – MSB)
nj = the size of the jth sample
= the sample mean of the jth sample
= the grand mean
K = the number of samples
K-1 = the degrees of freedom associated with SSB.

Calculating the Variance With In the Samples (MSW)1

It is based on the variation of the sample observations within each sample. It is called the within
column variance or Mean Square Within (MSW). We calculate the sample variance for each

sample as .

Since we have assumed that the variances of the populations from which samples have been
drawn are equal, we could use any one of the sample variances as the second estimate of the
population variance. Statistically, we can get a better estimate of the population variance by
using a weighted average of all sample variances. The general formula for this second estimate
of is:

MSW =

If n1, n2, -----, nk are equal MSW = .

Where:
= Second estimate of the population variance based on the variation within the samples
(the Within Column Variance – MSB)
nj = the size of the jth sample
nj - 1 = degree of freedom in each sample
nT – k = degrees of freedom associated with SSB
The sample variance of jth sample
K = the number of samples
1
MSW is based on the variation within each of the samples; it is not influenced by whether or not the null
hypothesis is true. Thus, MSW always provides an unbiased estimate of the population variance.
Stat. for Mgt. II Page 3
nT = Σnj = the total sample size = n1 + n2 + -----+ nk
The estimate of population variance based on variation that exists between sample means (MSB)
is somewhat suspect because it is based on the notion that all the populations have the same
mean. That is, the estimate MSB is a good estimate of the σ 2 only if Ho is true and all the
populations’ means are equal: μ1 = μ2 = μ3 = ------ = μk.

If the unknown population means are not equal, and probably are radically different from one
another, then the sample means ( ) will most likely be radically different from each other too.
This difference will have a marked effect on MSB. That is to say, the values will vary a great
deal and the terms will be large. Thus, if the population means are not all equal, then
the MSB estimate will be large relative to the MSW estimate. That is, is the MSB is large
relative to the MSW, and then the hypothesis that all the population means are equal is not likely
to be true.

The important question is, of course, How large is “large?” also, how do we measure the relative
sizes of the two variance estimates? The answer to these questions is given by the F-distribution.

If k samples of nj (j = 1, 2… k) items of each are taken from k normal populations that have
equal variances and for which the hypothesis Ho: μ1 = μ2 = …= μk is true, then the ratio of the
MSB to the MSW is an F-value that follows an F-probability distribution.

THE F-DISTRIBUTION

Characteristics of F-distribution
1. It is a continuous probability distribution
2. It is unimodal
3. It has two parameters; pair of degrees of freedom, ν1 and ν2
ν1 = the number of degrees of freedom in the numerator of F-ratio; ν1 = k – 1
ν2 = the number of degrees of freedom in the denominator of F-ratio; ν2 = nT - k
4. It is a positively skewed distribution, and tends to get more symmetrical as the degrees of
freedom in the numerator and denominator increase.
Illustration One
The training director of a company is trying to evaluate three different methods of training new
employees. The first method assigns each to an experienced employee for individual help in the

Stat. for Mgt. II Page 4


factory. The second method puts all new employees in a training room separate from the factory,
and the third method uses training films and programmed learning mat erials. The training director
chooses 18 new employees assigned at random to the three training methods and records their daily
production after they complete the programs. Below are productivity measures for individuals trained
by each method.

Method 1 Method 2 Method 3


45 59 41
40 43 37
50 47 43
39 51 40
53 39 52
44 49 37

271 288 250

= 45.17 = 48.00 = 41.67 = 44.94

= 30.17 = 47.60 = 31.07


At the 0.05 level of significance, do the three training methods lead to different levels of
productivity?
Solution
1. Ho: μ1 = μ2 = μ3
μ1 = μ2 = μ3 are not all equal
2. α = 0.05
ν1 = K – 1 ν2 = nT – k F0.05, 2,15 = 3.68
=3-1=2 = 18 – 3 = 15
Reject Ho if sample F > 3.68
3. Sample F

MSB =

MSW =

Stat. for Mgt. II Page 5


4. Fail to Reject Ho. There is no differences in the effects of the three training programs
(methods) on employee productivity.
Illustration Two
A department store chain is considering building a new store at one of the four different sites.
One of the important factors in the decision is the annual household income of the residents of
the four areas. Suppose that, in a preliminary study, various residents in each area are asked what
their annual household incomes are. The results are shown in the accompanying table below. Is
there sufficient evidence to conclude that differences exist in the average annual household incomes
among the four communities? Use α = 0.01.
Area 1 Area 2 Area 3 Area 4
25 32 27 18
27 35 32 23
21 30 48 29
17 46 25 26
29 32 20 42
30 22 12
19 18
51
27
CT 159 294 182 138
n 6 9 7 5
= 26.50 = 32.67 = 26.00 = 27.60 = 28.63

= 26.30 = 107.5 = 136.33 = 81.30

Solution

1. Ho: μ1 = μ2 = μ3 = μ4

Ha: μ1, μ2, μ3, μ4 are not all equal


2. α = 0.01

ν1 = K – 1 ν2 = nT – k F0.01, 3,23 = 4.76

=4-1=3 = 27 – 4 = 23

Reject Ho if sample F > 4.76

3. Sample F
Stat. for Mgt. II Page 6
MSB =

MSW =

4. Do not reject Ho.There is no difference exists in the average annual household incomes
among the four communities.
5.2 Variance Test
Most commonly used statistical hypothesis tests, such as t tests, compare means or other
measures of location. Some studies need to compare variability also. Equality of variance tests
can be used on their own for this purpose but they are often used alongside other methods (e.g.
analysis of variance) to support assumptions made about variance.

In addition to comparing two means, statisticians are interested in comparing two variances or
standard deviations. For example, is the variation in the temperatures for a certain month for two
cities different? In another situation, a researcher may be interested in comparing the variance of
the cholesterol of men with the variance of the cholesterol of women. For the comparison of two
variances or standard deviations, an F test is used. The F test should not be confused with the
chi-square test, which compares a single sample variance to a specific population variance, as we
shown discus now.
5.2.1 Hypothesis Testing of Population Variance
Using σ20 to denote the hypothesized value for the population variance, the three forms for a
hypothesis test about a population variance are as follows:

H0: σ2 ≥ σ20 H0: σ2 ≤ σ20 H0: σ2 = σ20

Ha: σ2 < σ20 Ha: σ2 > σ20 Ha: σ2 ≠ σ20


These three forms are similar to the three forms that we used to conduct one-tailed and two-tailed
hypothesis tests about population means and proportions in Chapters 3 of the first module.

Stat. for Mgt. II Page 7


The procedure for conducting a hypothesis test about a population variance uses the
hypothesized value for the population variance σ20 and the sample variance s2 to compute the
value of a χ2 test statistic. Assuming that the population has a normal distribution, the test
statistic is as follows:

TEST STATISTIC FOR HYPOTHESIS TESTS ABOUTA POPULATION VARIANCE

where χ2 has a chi-square distribution with n - 1 degrees of freedom.

After computing the value of the χ2 test statistic, either the p-value approach or the critical value
approach may be used to determine whether the null hypothesis can be rejected.

Illustration

The St. Louis Metro Bus Company wants to promote an image of reliability by encouraging its
drivers to maintain consistent schedules. As a standard policy the company would like arrival
times at bus stops to have low variability. In terms of the variance of arrival times, the company
standard specifies an arrival time variance of 4 or less when arrival times are measured in
minutes. Suppose that a random sample of 24 bus arrivals taken at a downtown intersection
provides a sample variance of s2 = 4.9. Formulate hypothesis testing to help the company
whether the arrival time population variance is excessive or not. α = 0.05.

Solution:

Step 1: H0: σ2 ≤ 4

Ha: σ2 > 4

Step 2: Degrees of freedom v = n – 1= 24 – 1 = 23 and α = 0.05, thus:

χ2tab. = χ2(v, α) = χ2(23, 0.05) = 35.172

Therefore, if χ2cal. value is less than χ2tab (35.172), accept H0 unless reject.

Stat. for Mgt. II Page 8


Step 3:

Step 4: Because χ2cal. (28.18) is less than χ2tab. (35.172), accept H0.

5.2.2 Hypothesis Testing of the Difference Between Two Variances

In some statistical applications we may want to compare the variances in product quality
resulting from two different production processes, the variances in assembly times for two
assembly methods, or the variances in temperatures for two heating devices. In making
comparisons about the two population variances, we will be using data collected from two
independent random samples, one from population 1 and another from population 2. The two
sample variances S12 and S22 will be the basis for making inferences about the two population
variances σ12 and σ22. Whenever the variances of two normal populations are equal ( σ12 = σ22),
the sampling distribution of the ratio of the two sample variances is S12/S22 as follows.

SAMPLING DISTRIBUTION OF S12/S22 WHEN σ12 = σ22


Whenever independent simple random samples of sizes n1 and n2 are selected from two normal
populations with equal variances, the sampling distribution of

has an F distribution with n1 - 1 degrees of freedom for the numerator and n2 - 1 degrees of
freedom for the denominator; S12 is the sample variance for the random sample of n1 items from
population 1, and S22 is the sample variance for the random sample of n1 items from population2.

Illustration
Suppose Raya University has planned to purchase service buses for the coming year and must
select one of two bus companies, the Mercedes Company or the Daewoo Company. We will use
the variance of the arrival or pickup/delivery times as a primary measure of the quality of the bus
service. Low variance values indicate the more consistent and higher quality service. If the
variances of arrival times associated with the two services are equal, RU administrators will
select the company offering the better financial terms. However, if the sample data on bus arrival
times for the two companies indicate a significant difference between the variances, the
administrators may want to give special consideration to the company with the better or lower

Stat. for Mgt. II Page 9


variance service. A sample of 25 arrival times for the Mercedes service provides a sample
variance of 48 and a sample of 16 arrival times for the Daewoo service provides a sample
variance of 20. Formulate hypothesis testing whether there is a difference between services of
companies. α = 0.05.
Solution:
Step 1: H0: σ12 = σ22
Ha: σ12 ≠ σ22
Step 2: Degrees of freedom v1 = n1 – 1= 25 – 1 = 24, v2 = n2 – 1= 16 – 1 = 15 and
α = 0.05, thus:

Ftab. = F(v1, v2, α) = F(24, 15, 0.05) = 2.29


Therefore, if Fcal. value is less than Ftab (2.29), accept H0 unless reject.
Step 3:
Step 4: Since Fcal. value (2.40) is greater than Ftab (2.29), reject H0

THE END OF CHAPTER-5

Stat. for Mgt. II Page 10

You might also like