0% found this document useful (0 votes)
8 views17 pages

Ch.6 ANOVA & F Distribution Overview-Module

The document provides an overview of Analysis of Variance (ANOVA) and the F distribution, detailing its purpose, assumptions, and characteristics. ANOVA is a statistical method used to test the equality of means across multiple populations, while the F distribution is critical for comparing sample variances. The document outlines the steps for conducting ANOVA tests and the significance of the F-test in hypothesis testing for population variances.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views17 pages

Ch.6 ANOVA & F Distribution Overview-Module

The document provides an overview of Analysis of Variance (ANOVA) and the F distribution, detailing its purpose, assumptions, and characteristics. ANOVA is a statistical method used to test the equality of means across multiple populations, while the F distribution is critical for comparing sample variances. The document outlines the steps for conducting ANOVA tests and the significance of the F-test in hypothesis testing for population variances.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Topic 51: Analysis of Variance (ANOVA): Overview of ANOVA and the F

Distribution

Topic Learning Objectives:

By the end of this session students are expected to:

 Discuss the purpose of ANOVA and its assumptions


 List the characteristics of the F-distribution

Topic Outline

1. The purpose and assumptions of ANOVA


2. Characteristics of the F-distribution
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:

 Explain the statistical procedure called analysis of variance (ANOVA)


 State the features of the F-distribution

Reading Text:

So far, the applications of z & t distributions to determine whether there is a difference


between two population means were discussed; what if more than two populations are
compared? As the previous methods allow comparing only 2 populations at a time;
doing so will consume lot of time and money, and may also increase possibility of
error. Analysis of variance (abbreviated as ANOVA) is therefore, an efficient statistical
method that is helpful to test the null hypothesis that the means of two or more
populations are equal (H0: µ1=µ2=µ3=… =µr) versus the alternative (H1) that at least
one of the means is different. It is used for tests of hypothesis for data of interval or
ratio scale. It is an extremely useful technique concerning researches in a variety of
fields.
The name ANOVA comes from the way in which the calculations are performed; that is,
the technique requires the analysis of different forms of variance associated with the
random samples under study, to determine whether it can be inferred that the
population means differ.

The essence of ANOVA is that the total amount of variation in a set of data is broken
down into two types, that amount which can be attributed to chance and that amount
which can be attributed to specified causes/factors/treatments. ANOVA is designed to
detect differences among means from several populations subject to different
treatments; such as in comparing: the average yield of crops grown using different
treatments (fertilizers);the average mileage fuel consumption of four automobiles; and
the mean income in four different communities. Treatment is specified source of
variation in a set of data. ANOVA was first developed for applications in agriculture, and
many of the terms related to that context remain. For example, treatment refers to how
a plot of ground was treated with a particular type of fertilizer. Generally, the term
treatment is used to identify the different populations being examined (even when no
actual treatment is administered).

In the above concerned examples, yield of crops is the dependent or response variable,
where as fertilizer is the independent variable, factor or treatment. Depending on the
number of factors whose influence is assessed, ANOVA may be referred to as one way,
two or more way.

The original ideas of analysis of variance were developed by the English statistician Sir
Ronald A. Fisher in 1920 for the purpose of designing agricultural experiments and
interpreting experimental data; the F distribution that is critically important in ANOVA
methods described below was named in his honor.

Assumptions of ANOVA
The ANOVA to test the equality of three or more population means requires that three
assumptions are true concerning the populations under study:
1. Generally, the populations are normally distributed
2. The populations have equal standard deviations (δ), but the means µi may or
may not be equal
3. The populations are independent, i.e. all samples are randomly selected from the
populations and are independent of one another

When these conditions are met, F is used as the distribution of the test statistic.

The F Distribution

The test statistic used to compare the sample variances and to conduct the ANOVA test
is the F distribution. It is a continuous probability distribution where F is always 0 or
positive.

Figure 51 The F distribution (Family of F curve)

Major characteristics of the F distribution are:

1. It is unimodal, and continuous probability distribution


2. There is a family of F distributions, each defined by two (parameters) kinds of
degrees of freedom – the degrees of freedom of the numerator ( ) always listed
as the first item in the parentheses, and the degrees of freedom of the
denominator ( ) always listed as the second item in the parentheses.

3. The mean of the distribution is for , and the standard deviation is

2 22  1   2  2
for ν2 > 4.
 
1 2  2  2  4
2

4. F cannot be negative (values may range from 0 to plus infinity) and is a
continuous distribution
5. It is positively skewed and asymptotic (as the values of F increase, the
distribution approaches the horizontal axis but never touches it.

Synopsis:

 ANOVA (Analysis of Variance) is a statistical method for determining the


existence of differences among several population means
 ANOVA assumes: populations under study are normally distributed, independent
and have equal variances,
 F distribution is positively skewed continuous probability distribution based on
two parameters: degrees of freedom of the numerator and degrees of freedom
of the denominator
 F distribution is critically important in ANOVA tests; and is used to compare
sample variances

Wrap up Discussion Questions:

 Briefly discuss the origin, purpose and essence of ANOVA


 State the major assumptions of ANOVA
 List properties of F distribution

Next Session’s Assignment:

 Read about the application of the F distribution to test hypothesis that two
population variances are equal
Topic 52: Analysis of Variance (ANOVA): Comparing Two Population
Variances

Topic Learning Objectives:

By the end of this session students are expected to:

 Test hypothesis that two population variances are equal using the F
distribution

Topic Outline

1. Test hypothesis about the equality of two population variances using F-


distribution
2. Synopsis
3. Wrap up discussion questions
4. Next session’s assignment

Reading Assignment Discussion:

 Explain the application of the F distribution to statistically test the equality of


two population variances

Reading Text:

The assumption that variances of two normally distributed populations are equal can be
statistically tested using the F distribution. F distribution is used to test the hypothesis
that the variance of one normal population equals the variance of another normal
population. It helps to find whether two independent estimates of the population
variances differ significantly.

The F-Test (Variance-Ratio Test)


 State the null hypothesis that the two variances are equal, and alternate
hypothesis that the two differ in any one of the following ways as the
problem requires
H0: σ12=σ22 or H0:σ12≤ σ22
H1: σ12≠σ22 H1: σ12>σ22
 Identify the desired significance level and distribution for the test (which is F
distribution in this case)
 Find the critical value of F from F table using the degree of freedom
parameters; degree of freedom of the numerator (n1-1), and degree of
freedom of the denominator (n2-1) ; (note: n1 & n2 represent the size of the
respective independent random samples taken from each populations under
study); and set the decision rule (thus, if the test statistic or the calculated
value of F is above the tabulated or critical value of F the null hypothesis will
be rejected). Note: The critical value of F for a two-tailed test is found by
dividing the significance level in half (α/2) and then referring to the
appropriate degrees of freedom in tables for critical values of the F
distributions.
 To conduct the test, select a random sample of (n1) observations from one
population, and a random sample of (n2) observations from the second
population. Calculate the test statistic (the value of F) using the following

equation:

The terms s12 and s22 are the respective sample variances calculated.
Note: The larger of the two sample variances is placed in the numerator,
forcing the ratio to be at least 1.00. This allows always using the upper tail of
the F statistic, thus avoiding the need for more extensive F tables.
 Conclude by comparing the Critical value of F with calculated value of F

Example 52 (Assume independent and normal population)


A real estate agent in Addis Ababa wants to compare the variation in the selling price of
homes in Bole area with those in Old airport area. A sample of 21 Bole homes sold
within the last year revealed the standard deviation of the selling prices was $45,600. A
sample of 18 homes, also sold within the last year that were in Old airport area
revealed that the standard deviation was $21,330. At the .01 significance level, can we
conclude that there is more variation in the selling prices of the Bole area homes?
H0: σ12≤ σ22 α=0.01; n1=21; n2=18 df1 numerator=n1-1=20;
H1: σ12>σ22 df2 denominator=n2-1=17

Critical F(20,17)= 3.16 Decision rule: H0 is rejected if F > 3.16

Calculated F= s12/ s22 = 45,600/21,330 = 4.57 ; since F calculated > F critical, i.e.
4.57 > 3.16
Conclusion: Reject H0. There is more variation in the selling price of Bole area homes.

Exercise 52

1. Solyana Electric Products Inc. assembles cell phones. For the last 10 days, Merga
completed a mean of 39 phones per day, with a standard deviation of 2 per day.
Debebe completed a mean of 38.5 phones per day, with a standard deviation of
1.5 per day. At the .05 significance level, can we conclude that there is more
variation in Merga’s daily production? N.B. assume independent normal population
2. Two samples are drawn from normal populations independently. From the
following data test whether the population variances differ using significance
level of 10%. Note: First find the variances of each sample.

Sample 1 9 11 13 11 15 19 14 14
Sample 2 10 12 10 14 9 8 10

Synopsis:
 F distribution is used to test the hypothesis that the variance of one normal
population equals the variance of another normal population.
 F(n1-1, n2-1)= s12/ s22 , if calculated value of F>Tabulated/critical value of F;
reject H0

Wrap up Discussion Questions:

 Discuss the purpose of the F-test (variance ratio test).

Next Session’s Assignment:

 Attempt Exercise 52; 1 and 2.


 Read about how one way ANOVA test is applied

Topic 53: Analysis of Variance (One-way ANOVA): Testing Hypothesis


about the Difference of the Means of several Populations

Topic Learning Objectives:

By the end of this session students are expected to:

 Test a hypothesis that three or more population means are equal using
ANOVA.
 Organize data into a one-way ANOVA table

Topic Outline

1. Test hypothesis about the difference of the means of several populations using
ANOVA
2. Organizing data into one-way ANOVA table
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:

 Briefly explain the steps required to conduct ANOVA test

Reading Text:

The ANOVA technique allows comparing the means (µi) of several (k) populations
simultaneously at a selected significance level, with the assumption that the populations
concerned are normally distributed and have equal variances. The respective sample
means of k Independent random samples taken from k populations are compared
through their variances. The technique analyzes the variance of the sample data to
determine whether it can be inferred that the population means differ.

The same hypothesis testing procedure used so far is employed in ANOVA test. When
the means of k populations are compared, the null and alternate hypotheses are
written:
H0: µ1=µ2=µ3=… =µk
H1: not all means are equal or not all µi (i= 1,...k) are equal or (at least two means differ)

The appropriate test statistic for an ANOVA problem is the F distribution. Eventually, the
F value is calculated using the following formula:

or

or

or

Under the one-way ANOVA, we consider only one factor (independent variable), and
the variable X is called the response variable, and its values are called responses.
Results of the analysis can be summarized in the following ANOVA table:
Table 53.1 One-way (or Single Factor) ANOVA Table

Source of Sum of Degree of Mean F F Critical


Variation Squares Freedom Squares Computed
(SS) (MS)
Treatments(T) SST k-1 MST=SST/k-1 F=MST F(k-1,n-k)
(numerator) MSE Identify
Error (E) SSE n-k MSE=SSE/n-k from table
(denominator)
Total SS Total n-1

The variables and essence, in the formula or equation that is used to calculate F, are
explained/ interpreted below:

Note:

 Obtain the mean of each sample i.e., obtain:


x1, x2, x3, …. xk
when there are k independent random samples taken from k populations

 Find the grand (overall) mean i.e. mean of the sample means as
follows:

SS Total (Sum of squares total) = sum of squares total also referred to a total
Variation; it has two components: variation due to the treatment variation and random
variation (the error component or sampling error).
2
 Thus SS Total= SST+SSE, it can also be found as Σ( − G) , i.e. the sum of
the squared differences between each observation ( ) and the overall or
grand mean( G).

Treatment is a specified source of variation in a data set

SST (Sum of squares treatment) =sum of squares explained by the treatment also
referred to as treatment variation. The variation due to treatments is also called
variation between treatment means.
 Thus SST= SS Total-SSE, it can also be found as:
o ncΣ(xc− xG)2= n1(x1− xG)2+n2(x2− xG)2+…+nk(xc− xG)2,
o i.e. the sum of the squared differences between each treatment mean
(xc) and the grand or overall mean [xG] multiplied by respective sample
sizes or number of observations per column or treatment (nc)

SSE (Sum of squares error) = sum of squares of error unexplained by the


treatment, i.e. the other source of variation it is also referred to as random
variation. This is also called the variation within the treatments.

2
 Thus SSE = Σ(X − c) = i.e. the sum of the squared differences between
each observation and its treatment mean.
n= total number of observation in all treatments/random samples (n1+ n2+.. +nk)
nc, (c represents column number)= total number of observations/values in each
treatment or random sample or in each column (n1, n2,… nk)
k= number of random samples or treatments
 Degrees of freedom in the numerator= k-1
 Degrees of freedom in the denominator=n-k
MST= Mean square treatment; MST=SST/k-1; this gives the variance between the
sample means (treatment means)
MSE= Mean square error; MSE=SSE/n-k; this gives the variance within the samples
(treatments)

Notice that each Mean Square is just the Sum of Squares divided by its degrees of
freedom

In the end, if F computed > F critical; reject H0 and accept H1

ANOVA can be performed by the following shortcut method (it is convenient


particularly when the sample means or the grand mean happen to be non-integer
(decimal) values.

Find the total of all individual observations i.e. T, T=∑


Find the correction factor as: Correction factor=

SS Total= ∑ 2- ; i.e. subtract the correction factor from the sum of the square of
each observation

SST= ; i.e. subtract the correction factor from the sum (of the square of
each sample or column totals divided by number of corresponding observations of each
sample/column)

2
SSE= ∑ - ; i.e. subtract the sum (of the square of each sample or column
totals divided by number of respective observations of each sample/column) from the
sum of the square of each observation in each respective samples; or SSE=SS Total-
SST

The following illustration will demonstrate an application of ANOVA:

Example 53

A company sells identical soap in three different wrappings at the same price. The sales
for randomly selected 5 months are given in the table below. Assume sales data are
normally distributed with equal variance. Test at 5% level of significance whether the
mean soap sales for each wrappings is equal or not
Table 53.2 Five-Month Sales of Soap in Wrappings 1, 2, and 3
Wrapping 1 Wrapping 2 Wrapping 3
87 78 90
83 81 91
79 79 84
81 82 82
80 80 88

Solution:
Step1. State the null and the alternate hypothesis
H0: µ1=µ2=µ3
H1: not all means are equal
Step 2. Select the level of significance α= 5%
Step 3. Determine the test statistic: It follows the F distribution
Step 4. Formulate the Decision rule: It is based on critical value of F(k-1,n-k) read from
table using degree of freedoms
o Degrees of freedom in the numerator= K-1=3-1=2
o Degrees of freedom in the denominator=n-k=15-3=12
Thus Tabular value of F or F critical=F (2,12)= 3.88; if F computed > 3.88,
reject H0 and accept H1
Step 5. Compute F (test statistic) and make a decision using the formula and ANOVA
table; refer to Table 41.2 and calculate the sample mean for each of the 3
columns or treatments (xc) and find the overall/grand mean (xG) as follows:

Table 53.3 Calculation of Sample mean for each Column and Overall Mean

Treatments Wrapping 1 Wrapping 2 Wrapping 3 TOTAL


87 78 90
83 81 91
79 79 84
81 82 82
80 80 88

Column Total 410 400 435 Grand total


=1245
nc n1=5 n2=5 n3=5 n=15
# of observations in each column (Total # of
observation)
(xc) x1= 82 x2= 80 x3= 87 xG =83
Sample Mean for each column (Overall mean)

Find SST (sum of squares treatment) as follows:

SST= nc Σ(xc − xG)2

=n1 (x1 − xG)2 + n2 (x2 − xG)2 + n3(x3 − xG)2 =


=5(82-83)2 + 5(80-83)2 + 5(87-83)2= 130

Shortcut method can also be used:

SST= = =130

Also, SST= SS Total – SSE= 240-110= 130; refer to solutions of SS Total & SSE below;
Find SSE (sum of squares error) as follows:
SSE = Σ(x − xc)2=
=Σ(x − x1)2 +Σ(x − x2)2 + Σ(x − x3)2
= [(87-82)2+(83-82)2+(79-82)2+(81-82)2+(80-82)2]=40
+
[(78-80)2+(81-80)2+(79-80)2 +(82-80)2+(80-80)2]=10
+
[(90-87)2+ (91-87)2+(84-87)2+(82-87)2+(88-87)2]=60
SSE=110

Find SS Total (Sum of squares total):


SS Total= Σ(x − xG)2
=Σ[(87-83)2+(83-83)2+(79-83)2+(81-83)2+(80-83)2+(78-83)2+(81-83)2+(79-83)2
+(82-83)2+(80-83)2+(90-83)2+(91-83)2+(84-83)2+(82-83)2+(88-83)2]= 240

Table 53.4 ANOVA Table for Example 53

Source of Sum of Degree of Mean F F Critical


Variation Squares Freedom Squares Computed
(SS) (MS)
Treatments(T) SST= 130 k-1=3-1=2 MST=SST/k-1 F=MST F(k-1,n-k)
(numerator) =130/2=65 MSE Identify
Error (E) SSE= 110 n-k=15-3=12 MSE=SSE/n-k from table
F=
(denominator)
F=7.09 F(2,12)=3.88
=110/12=9.17
Total SS Total= n-1=15-1=14
240
Decision: Since the computed value of F is greater than the critical (tabular) value of F;
i.e. 7.09>3.88; H0 is rejected and H1 is accepted

Step 6. Interpret the result: the population mean of soap sales for each wrapping are
not all equal, i.e. at least one pair of the population mean of soap sales differs

As the factor of influence or the independent variable in the above problem was only
wrapping; this is a one way ANOVA case. Likewise, if two factors or independent
variables are involved two way ANOVA will be used and etc.

Exercise 53 (suppose ANOVA test assumptions hold for the following questions)

1. Helen manages a regional financial center. She wishes to compare the


productivity, as measured by the number of customers served, among three
employees. Four days are randomly selected and the number of customers
served by each employee is recorded. The results are:

Employees
Alemu: 55, 54, 59, 56
Chala: 66, 76, 67, 71
Rakeb: 47, 51, 46, 48
Required: Use the 0.01 significance level
a. State the null hypothesis and the alternate hypothesis.
b. What is the decision rule?
c. Compute the values of SS total, SST, and SSE.
d. Develop an ANOVA table.
e. What is your decision regarding the null hypothesis?
2. There are five treatments for lowering blood pressure. An initial test is needed of
whether there is any real difference between them. Each treatment is given to a
different randomly chosen sample of people with high blood pressure. The
results, using suitable units, are as follows.
Treatment Results
1 12 6 5 7 10
2 10 15 14 13 12 12 15
3 3 2 7 8 3 1
4 7 8 7 10
5 16 18 21 19 21

Required: Using 5% significance level, test whether there is a difference in the


mean blood pressure among the five treatments. Show all required steps and
present the ANOVA table for the problem.

Note: for problems having unequal total number of observations for respective
treatments, use the same procedure discussed above.

Synopsis:

 A one-way ANOVA is used to compare several treatment (population) means


given the assumptions for the test is fulfilled.
 There is only one factor or independent variable in one-way ANOVA where
as in the case of two-way ANOVA there are two independent variables
 ANOVA assumes: populations under study are normally distributed,
independent and have equal variances,
 Treatment is a specified source of variation
 The information for finding the value of F is summarized in an ANOVA table.
 F(k-1,n-k)= MST (Mean square treatment) or
MSE (Mean square error)
 F(k-1,n-k)= Variance between the sample (treatment)means
Variance within the samples (treatments)
o MST= SST/k-1; MSE=SSE/n-k ; n (number of total observations);
k (number of random samples (treatments)
o Degree of freedoms: numerator (k-1); denominator (n-k)
o SST (Sum of squares treatment) = nc Σ(xc − xG)2 ; or SS Total-SSE
o
SSE (Sum of squares error) = Σ(x − xc)2
o SS Total (Sum of squares total)= Σ(x − xG)2
o Also, shortcut method can also be used

 SS Total= ∑ 2-

 SST=

2
 SSE= ∑ -

o T=Total observation; xG=Grand Mean(Mean of all sample means);


X=each observation; n=Number of total observation; Tc=Total
observation in each column or sample; xc =Mean for each
sample/column; nc=number of observations in each sample or column
Wrap up Discussion Questions:
 Why is ANOVA test required?
 Identify the main components of ANOVA table.
 List the procedure for testing hypothesis that three or more population means
are equal using ANOVA.
Next Session’s Assignment:
 Attempt Exercise 53, 1 and 2
 Read about Two-way ANOVA without interaction

You might also like