Unit 4-1
Unit 4-1
By
Wilhemina Adoma Pels
July 1, 2024
1 / 38
Introduction to Analysis of Variance (ANOVA)
What is an ANOVA?
An ANOVA test is a type of statistical test used to determine if
there is a statistically significant difference between two or more
categorical groups by testing for differences of means using
variance.
2 / 38
ANOVA
3 / 38
Assumptions of ANOVA
4 / 38
Hypotheses in ANOVA
Null Hypothesis
Null Hypothesis (H0 ): There is no significant difference between
the means of the groups. H0 : µ1 = µ2 = µ3 · · · = µt
All population means are equal
No treatment effect
Alternative Hypothesis
Alternative Hypothesis (HA ): There is a significant difference
between at least one pair of group means.
HA : At least one µi is different
at least 1 population is different
Treatment effect
NOT H1 : µ1 ̸= µ2 ̸= µ3 . . . ̸= µt
5 / 38
Hypothesis
6 / 38
7 / 38
ANOVA Test Statistic
Between-group variance
F =
Within-group variance
.
8 / 38
ANOVA Test Procedure
1 Formulate hypotheses.
9 / 38
Interpreting ANOVA Results
10 / 38
Types of ANOVA
11 / 38
One-way-Anova
12 / 38
One-way ANOVA example
13 / 38
When to use a one-way ANOVA
• Use a one-way ANOVA when you have collected data about one
categorical independent variable and one quantitative dependent
variable. The independent variable should have at least three
levels (i.e. at least three different groups or categories).
14 / 38
Null Hypothesis
The null hypothesis (H0 ) of ANOVA is that there is no
difference among group means.
The null hypothesis is that all the groups have equal means
H0 : µ1 = µ2 = µ3
Alternative Hypothesis
The alternate hypothesis (Ha ) is that at least one group differs
significantly from the overall mean of the dependent variable.
15 / 38
Why not several t-tests
16 / 38
Cont’d
• α-level = 0.05
17 / 38
How does an ANOVA test work
18 / 38
The F-test compares the variance in each group mean from the
overall group variance. If the variance within groups is smaller
than the variance between groups, the F-test will find a higher
F-value, and therefore a higher likelihood that the difference ob-
served is real and not due to chance.
19 / 38
Computing the one way ANOVA
20 / 38
We can see that there are two different sources of variation that
an ANOVA measures:
21 / 38
Degrees of freedom
• And for the within error, subtract d.f. for groups from the total
degrees of freedom.
If there are n total data points collected and m groups being
compared, then there are n − m error degrees of freedom.
22 / 38
Example
23 / 38
Solution
1 State the null and alternative hypotheses The null
hypothesis for an ANOVA always assumes the population
means are equal. Hence, we may write the null hypothesis
as:
H0 : µ1 = µ2 = µ3
this means the mean head pressure is statistically equal
across the three types of cars.
• Since the null hypothesis assumes all the means are
equal, we could reject the null hypothesis if only mean is
not equal. Thus, the alternative hypothesis is:
Ha : At least one mean pressure is not statistically equal.
2 Calculate the appropriate test statistic The test statistic in
ANOVA is the ratio of the between and within variation in
the data. It follows an F distribution
24 / 38
Solution
25 / 38
Solution cont’d
26 / 38
Solution Cont’d
Note that
27 / 38
• Hence, you only need to compute any two of three sources
of variation to conduct an ANOVA. Especially for the first few
problems you work out, you should calculate all three for practice.
• The next step in an ANOVA is to compute the “average”
sources of variation in the data using SST, SSTR, and SSE.
SST
• Total Mean Squares (M ST ) = N −1 → “average total variation
in the data” (N is the total number of observations)
96303.55
MST = = 12037.94
(9 − 1)
Mean Square Treatment (MSTR) = SST R
c−1 → “average between
variation” ( c is the number of columns in the data table)
86049.55
MSTR = = 43024.78
(3 − 1)
SSE
Mean Square Error (MSE) = N −c → “average within variation”
10254
MSE = = 1709
9−3
28 / 38
The test statistic may now be calculated. For a one-way ANOVA
the test statistic is equal to the ratio of MSTR and MSE. This
is the ratio of the “average between variation” to the “average
within variation”. In addition, this ratio is known to follow an F
distribution. Hence,
M ST R 43024.78
F = = = 25.17
M SE 1709
The intuition here is relatively straightforward. If the average
between variation rises relative to the average within variation,
the F statistic will rise and so will our chance of rejecting the
null hypothesis.
29 / 38
Obtain the Critical Value
To find the critical value from an F distribution you must know
degrees of freedom for the numerator (MSTR) and that of de-
nominator (MSE), along with the significance level.
• FCV has df1 and df2 degrees of freedom, where df1 is the numer-
ator degrees of freedom equal to m−1 and df2 is the denominator
degrees of freedom equal to n − m.
• In our example,
df1 = 3 − 1 = 2
and
df2 = 9 − 3 = 6
. Hence we need to find the critical values of F corresponding
to α = 5%. Using the F tables in your text we determine that
F2,6 = 5.14
30 / 38
One-way-ANOVA Decision Criteria
• Fcalculated > Fcritical , We reject the null hypothesis and accept
the alternative hypothesis that there is at least a difference be-
tween two of the group means.
• If Fcalculated < Fcritical , We fail to reject the null hypothesis
and conclude that there are no significant differences between
the group means.
Decision rule per example
In our example 25.17 > 5.14, so we reject the null hypothesis
Interpretation
Since we rejected the null hypothesis, we are 95% confident
(1 − α) that the mean head pressure is not statistically equal for
compact, midsize, and full size cars. However, since only one
mean must be different to reject the null, we do not yet know
which mean(s) is/are different. In short, an ANOVA test will
test us that at least one mean is different, but an additional test
must be conducted to determine which mean(s) is/are different.
31 / 38
Example 2: Reed Manufacturing
32 / 38
Example 2: Reed Manufacturing
33 / 38
Solution
Hypothesis
H0 : µ1 = µ2 = µ3
Ha : not all the means are equal
where:
µ1 = mean number of hours worked per week by the managers
at Plant,
µ2 = mean number of hours worked per week by the managers
at Plant,
µ3 = mean number of hours worked per week by the managers
at Plant 3
34 / 38
Cont’d
SSW=4(26.0)+4(26.5)+4(24.5)=308
MSW= 308/(15-3)=25.667
35 / 38
F-test
If H0 is true, the ratio MSB/MSW should be near 1 since both
MSB and MSW are estimating σ 2 . If Ha is true, the ratio
should be significantly larger than 1 since MSB tends to
overestimate σ 2
Rejection Rule
Assuming α = 0.05, F0.05 = 3.89(2 d.f. numerator, 1 d.f.
denominator).
Reject H0 if F > 3.89.
Test Statistic
F=MSB/MSW=245/25.667=9.55
36 / 38
Conclusion
F = 9.55 > F.05 = 3.89, so we reject H0. The mean number of
hours worked per week by department managers is not the same
at each plant.
ANOVA Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Square
Within Groups 490 2 245 9.55
Between Groups 308 12 25.667
Total 798 14
37 / 38
Group Assignment
38 / 38