Ch.6 ANOVA & F Distribution Overview-Module
Ch.6 ANOVA & F Distribution Overview-Module
Distribution
Topic Outline
Reading Text:
The essence of ANOVA is that the total amount of variation in a set of data is broken
down into two types, that amount which can be attributed to chance and that amount
which can be attributed to specified causes/factors/treatments. ANOVA is designed to
detect differences among means from several populations subject to different
treatments; such as in comparing: the average yield of crops grown using different
treatments (fertilizers);the average mileage fuel consumption of four automobiles; and
the mean income in four different communities. Treatment is specified source of
variation in a set of data. ANOVA was first developed for applications in agriculture, and
many of the terms related to that context remain. For example, treatment refers to how
a plot of ground was treated with a particular type of fertilizer. Generally, the term
treatment is used to identify the different populations being examined (even when no
actual treatment is administered).
In the above concerned examples, yield of crops is the dependent or response variable,
where as fertilizer is the independent variable, factor or treatment. Depending on the
number of factors whose influence is assessed, ANOVA may be referred to as one way,
two or more way.
The original ideas of analysis of variance were developed by the English statistician Sir
Ronald A. Fisher in 1920 for the purpose of designing agricultural experiments and
interpreting experimental data; the F distribution that is critically important in ANOVA
methods described below was named in his honor.
Assumptions of ANOVA
The ANOVA to test the equality of three or more population means requires that three
assumptions are true concerning the populations under study:
1. Generally, the populations are normally distributed
2. The populations have equal standard deviations (δ), but the means µi may or
may not be equal
3. The populations are independent, i.e. all samples are randomly selected from the
populations and are independent of one another
When these conditions are met, F is used as the distribution of the test statistic.
The F Distribution
The test statistic used to compare the sample variances and to conduct the ANOVA test
is the F distribution. It is a continuous probability distribution where F is always 0 or
positive.
2 22 1 2 2
for ν2 > 4.
1 2 2 2 4
2
4. F cannot be negative (values may range from 0 to plus infinity) and is a
continuous distribution
5. It is positively skewed and asymptotic (as the values of F increase, the
distribution approaches the horizontal axis but never touches it.
Synopsis:
Read about the application of the F distribution to test hypothesis that two
population variances are equal
Topic 52: Analysis of Variance (ANOVA): Comparing Two Population
Variances
Test hypothesis that two population variances are equal using the F
distribution
Topic Outline
Reading Text:
The assumption that variances of two normally distributed populations are equal can be
statistically tested using the F distribution. F distribution is used to test the hypothesis
that the variance of one normal population equals the variance of another normal
population. It helps to find whether two independent estimates of the population
variances differ significantly.
equation:
The terms s12 and s22 are the respective sample variances calculated.
Note: The larger of the two sample variances is placed in the numerator,
forcing the ratio to be at least 1.00. This allows always using the upper tail of
the F statistic, thus avoiding the need for more extensive F tables.
Conclude by comparing the Critical value of F with calculated value of F
Calculated F= s12/ s22 = 45,600/21,330 = 4.57 ; since F calculated > F critical, i.e.
4.57 > 3.16
Conclusion: Reject H0. There is more variation in the selling price of Bole area homes.
Exercise 52
1. Solyana Electric Products Inc. assembles cell phones. For the last 10 days, Merga
completed a mean of 39 phones per day, with a standard deviation of 2 per day.
Debebe completed a mean of 38.5 phones per day, with a standard deviation of
1.5 per day. At the .05 significance level, can we conclude that there is more
variation in Merga’s daily production? N.B. assume independent normal population
2. Two samples are drawn from normal populations independently. From the
following data test whether the population variances differ using significance
level of 10%. Note: First find the variances of each sample.
Sample 1 9 11 13 11 15 19 14 14
Sample 2 10 12 10 14 9 8 10
Synopsis:
F distribution is used to test the hypothesis that the variance of one normal
population equals the variance of another normal population.
F(n1-1, n2-1)= s12/ s22 , if calculated value of F>Tabulated/critical value of F;
reject H0
Test a hypothesis that three or more population means are equal using
ANOVA.
Organize data into a one-way ANOVA table
Topic Outline
1. Test hypothesis about the difference of the means of several populations using
ANOVA
2. Organizing data into one-way ANOVA table
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment
Reading Text:
The ANOVA technique allows comparing the means (µi) of several (k) populations
simultaneously at a selected significance level, with the assumption that the populations
concerned are normally distributed and have equal variances. The respective sample
means of k Independent random samples taken from k populations are compared
through their variances. The technique analyzes the variance of the sample data to
determine whether it can be inferred that the population means differ.
The same hypothesis testing procedure used so far is employed in ANOVA test. When
the means of k populations are compared, the null and alternate hypotheses are
written:
H0: µ1=µ2=µ3=… =µk
H1: not all means are equal or not all µi (i= 1,...k) are equal or (at least two means differ)
The appropriate test statistic for an ANOVA problem is the F distribution. Eventually, the
F value is calculated using the following formula:
or
or
or
Under the one-way ANOVA, we consider only one factor (independent variable), and
the variable X is called the response variable, and its values are called responses.
Results of the analysis can be summarized in the following ANOVA table:
Table 53.1 One-way (or Single Factor) ANOVA Table
The variables and essence, in the formula or equation that is used to calculate F, are
explained/ interpreted below:
Note:
Find the grand (overall) mean i.e. mean of the sample means as
follows:
SS Total (Sum of squares total) = sum of squares total also referred to a total
Variation; it has two components: variation due to the treatment variation and random
variation (the error component or sampling error).
2
Thus SS Total= SST+SSE, it can also be found as Σ( − G) , i.e. the sum of
the squared differences between each observation ( ) and the overall or
grand mean( G).
SST (Sum of squares treatment) =sum of squares explained by the treatment also
referred to as treatment variation. The variation due to treatments is also called
variation between treatment means.
Thus SST= SS Total-SSE, it can also be found as:
o ncΣ(xc− xG)2= n1(x1− xG)2+n2(x2− xG)2+…+nk(xc− xG)2,
o i.e. the sum of the squared differences between each treatment mean
(xc) and the grand or overall mean [xG] multiplied by respective sample
sizes or number of observations per column or treatment (nc)
2
Thus SSE = Σ(X − c) = i.e. the sum of the squared differences between
each observation and its treatment mean.
n= total number of observation in all treatments/random samples (n1+ n2+.. +nk)
nc, (c represents column number)= total number of observations/values in each
treatment or random sample or in each column (n1, n2,… nk)
k= number of random samples or treatments
Degrees of freedom in the numerator= k-1
Degrees of freedom in the denominator=n-k
MST= Mean square treatment; MST=SST/k-1; this gives the variance between the
sample means (treatment means)
MSE= Mean square error; MSE=SSE/n-k; this gives the variance within the samples
(treatments)
Notice that each Mean Square is just the Sum of Squares divided by its degrees of
freedom
SS Total= ∑ 2- ; i.e. subtract the correction factor from the sum of the square of
each observation
SST= ; i.e. subtract the correction factor from the sum (of the square of
each sample or column totals divided by number of corresponding observations of each
sample/column)
2
SSE= ∑ - ; i.e. subtract the sum (of the square of each sample or column
totals divided by number of respective observations of each sample/column) from the
sum of the square of each observation in each respective samples; or SSE=SS Total-
SST
Example 53
A company sells identical soap in three different wrappings at the same price. The sales
for randomly selected 5 months are given in the table below. Assume sales data are
normally distributed with equal variance. Test at 5% level of significance whether the
mean soap sales for each wrappings is equal or not
Table 53.2 Five-Month Sales of Soap in Wrappings 1, 2, and 3
Wrapping 1 Wrapping 2 Wrapping 3
87 78 90
83 81 91
79 79 84
81 82 82
80 80 88
Solution:
Step1. State the null and the alternate hypothesis
H0: µ1=µ2=µ3
H1: not all means are equal
Step 2. Select the level of significance α= 5%
Step 3. Determine the test statistic: It follows the F distribution
Step 4. Formulate the Decision rule: It is based on critical value of F(k-1,n-k) read from
table using degree of freedoms
o Degrees of freedom in the numerator= K-1=3-1=2
o Degrees of freedom in the denominator=n-k=15-3=12
Thus Tabular value of F or F critical=F (2,12)= 3.88; if F computed > 3.88,
reject H0 and accept H1
Step 5. Compute F (test statistic) and make a decision using the formula and ANOVA
table; refer to Table 41.2 and calculate the sample mean for each of the 3
columns or treatments (xc) and find the overall/grand mean (xG) as follows:
Table 53.3 Calculation of Sample mean for each Column and Overall Mean
SST= = =130
Also, SST= SS Total – SSE= 240-110= 130; refer to solutions of SS Total & SSE below;
Find SSE (sum of squares error) as follows:
SSE = Σ(x − xc)2=
=Σ(x − x1)2 +Σ(x − x2)2 + Σ(x − x3)2
= [(87-82)2+(83-82)2+(79-82)2+(81-82)2+(80-82)2]=40
+
[(78-80)2+(81-80)2+(79-80)2 +(82-80)2+(80-80)2]=10
+
[(90-87)2+ (91-87)2+(84-87)2+(82-87)2+(88-87)2]=60
SSE=110
Step 6. Interpret the result: the population mean of soap sales for each wrapping are
not all equal, i.e. at least one pair of the population mean of soap sales differs
As the factor of influence or the independent variable in the above problem was only
wrapping; this is a one way ANOVA case. Likewise, if two factors or independent
variables are involved two way ANOVA will be used and etc.
Exercise 53 (suppose ANOVA test assumptions hold for the following questions)
Employees
Alemu: 55, 54, 59, 56
Chala: 66, 76, 67, 71
Rakeb: 47, 51, 46, 48
Required: Use the 0.01 significance level
a. State the null hypothesis and the alternate hypothesis.
b. What is the decision rule?
c. Compute the values of SS total, SST, and SSE.
d. Develop an ANOVA table.
e. What is your decision regarding the null hypothesis?
2. There are five treatments for lowering blood pressure. An initial test is needed of
whether there is any real difference between them. Each treatment is given to a
different randomly chosen sample of people with high blood pressure. The
results, using suitable units, are as follows.
Treatment Results
1 12 6 5 7 10
2 10 15 14 13 12 12 15
3 3 2 7 8 3 1
4 7 8 7 10
5 16 18 21 19 21
Note: for problems having unequal total number of observations for respective
treatments, use the same procedure discussed above.
Synopsis:
SS Total= ∑ 2-
SST=
2
SSE= ∑ -