0% found this document useful (0 votes)
19 views38 pages

Unit 4-1

Uploaded by

emmanuelemlucks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views38 pages

Unit 4-1

Uploaded by

emmanuelemlucks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Statistical Inference:

By
Wilhemina Adoma Pels

Department of Statistics and Actuarial Science


KNUST

Analysis of Variance (ANOVA)

July 1, 2024
1 / 38
Introduction to Analysis of Variance (ANOVA)

What is an ANOVA?
An ANOVA test is a type of statistical test used to determine if
there is a statistically significant difference between two or more
categorical groups by testing for differences of means using
variance.

2 / 38
ANOVA

ANOVA is used to compare means across multiple groups


or conditions.

It is a statistical technique for making inferences about


population means based on sample data.

ANOVA allows one to determine whether the differences


between the samples are simply due to random error
(sampling errors) or whether there are systematic
treatment effects that causes the mean in one group to
differ from the mean in another.

3 / 38
Assumptions of ANOVA

Normality: Observations within each group follow a normal


distribution.

Independence: Observations within each group are


independent of each other. This means that subjects in the
first group cannot also be in the second group (e.g.
independent samples/between-groups)

Homogeneity of Variance or homoscedasticity means that


the deviation of scores (measured by the range or standard
deviation for example) is similar between populations.

The different groups/levels must have equal sample sizes.

4 / 38
Hypotheses in ANOVA

Null Hypothesis
Null Hypothesis (H0 ): There is no significant difference between
the means of the groups. H0 : µ1 = µ2 = µ3 · · · = µt
All population means are equal
No treatment effect

Alternative Hypothesis
Alternative Hypothesis (HA ): There is a significant difference
between at least one pair of group means.
HA : At least one µi is different
at least 1 population is different
Treatment effect
NOT H1 : µ1 ̸= µ2 ̸= µ3 . . . ̸= µt

5 / 38
Hypothesis

6 / 38
7 / 38
ANOVA Test Statistic

The F-test is used as the test statistic in ANOVA.

It compares the between-group variance to the


within-group variance.

The formula for calculating the F-statistic is:

Between-group variance
F =
Within-group variance
.

8 / 38
ANOVA Test Procedure

1 Formulate hypotheses.

2 Calculate the test statistic (F-statistic).

3 Determine the p-value.

4 Compare the p-value to the significance level (e.g.,


α = 0.05).

5 Make a conclusion based on the p-value and significance


level.

9 / 38
Interpreting ANOVA Results

If the p-value is less than the significance level, reject the


null hypothesis and conclude that there is a significant
difference between at least one pair of group means.

If the p-value is greater than the significance level, fail to


reject the null hypothesis and conclude that there is not
enough evidence to support a significant difference between
group means.

10 / 38
Types of ANOVA

11 / 38
One-way-Anova

• A one-way ANOVA (analysis of variance) has one categorical


independent variable (also known as a factor) and a normally dis-
tributed continuous (i.e., interval or ratio level) dependent vari-
able.
• A one way ANOVA is used to compare two means from two in-
dependent (unrelated) groups using the F-distribution. The null
hypothesis for the test is that the two means are equal. There-
fore, a significant result means that the two means are unequal.

Hence, we want to study the effect of one or more qualitative


variables on a quantitative outcome variable

12 / 38
One-way ANOVA example

Note: Qualitative variables are referred to as factors


Scenario
As a food scientist, you want to test the effect of three different
additive mixtures on yoghurt making. You can use a one-way
ANOVA to find out if there is a difference in the long lasting of
the yoghurt between the three groups.

13 / 38
When to use a one-way ANOVA

• Use a one-way ANOVA when you have collected data about one
categorical independent variable and one quantitative dependent
variable. The independent variable should have at least three
levels (i.e. at least three different groups or categories).

• ANOVA tells you if the dependent variable changes according


to the level of the independent variable. For example:

• Your independent variable is brand of soda, and you collect


data on Coke, Pepsi, Sprite, and Fanta to find out if there is a
difference in the price per 100ml.

14 / 38
Null Hypothesis
The null hypothesis (H0 ) of ANOVA is that there is no
difference among group means.
The null hypothesis is that all the groups have equal means
H0 : µ1 = µ2 = µ3

Alternative Hypothesis
The alternate hypothesis (Ha ) is that at least one group differs
significantly from the overall mean of the dependent variable.

Level of significance α is selected as 0.05

15 / 38
Why not several t-tests

• Imagine we have a design with three groups that have to be


compared:
G1, G2, G3

• We will have to run several separate t-tests


(one to compare G1 with G2, one to compare G1 with G3, and
one to compare G2 with G3)

• For every test we use a general α-level of 0.05

16 / 38
Cont’d

• α-level = 0.05

• 5% possibility to make Type I error, i.e. rejecting H0 , when


H0 is actually true.

• Our aim is too reduce the possibilities to have Type I error

• If we were to run 3 separate t-tests to compare G1, G2 and G3,


each with an α − level of 0.05, the overall possibility not to make
Type I error would be 0.857 [i.e. (0.95)3 = 0.857]

• Therefore subtracting that from the overall possibility not to


make Type 1 error (1 = 100%)1 − 0.857 = 0.14
• We have 14% of possibilities to make Type 1 error.
• 14% greater than the usual 5%.
• We can’t be happy with that

17 / 38
How does an ANOVA test work

• ANOVA determines whether the groups created by the levels of


the independent variable are statistically different by calculating
whether the means of the treatment levels are different from the
overall mean of the dependent variable.

• If any of the group means is significantly different from the


overall mean, then the null hypothesis is rejected.

• ANOVA uses the F-test for statistical significance. This allows


for comparison of multiple means at once, because the error is
calculated for the whole set of comparisons rather than for each
individual two-way comparison (which would happen with a t-
test).

18 / 38
The F-test compares the variance in each group mean from the
overall group variance. If the variance within groups is smaller
than the variance between groups, the F-test will find a higher
F-value, and therefore a higher likelihood that the difference ob-
served is real and not due to chance.

19 / 38
Computing the one way ANOVA

Here is the basic one-way ANOVA table

20 / 38
We can see that there are two different sources of variation that
an ANOVA measures:

• Between Group Variation: The total variation between each


group mean and the overall mean.

• Within-Group Variation: The total variation in the individual


values in each group and their group mean.
or ”unexplained random error

• If the Between group variation is high relative to the Within-


group variation, then the F-statistic of the ANOVA will be higher
and the corresponding p-value will be lower, which makes it more
likely that we’ll reject the null hypothesis that the group means
are equal.

21 / 38
Degrees of freedom

• The degrees of freedom, noted in are calculated as Ni − 1 for


the total(Ni is the total number of observations).
If there are n total data points collected, then there are n − 1
total degrees of freedom.

• If there are m groups being compared, then there are m − 1


degrees of freedom associated with the factor of interest.
Number of groups minus one for the between groups

• And for the within error, subtract d.f. for groups from the total
degrees of freedom.
If there are n total data points collected and m groups being
compared, then there are n − m error degrees of freedom.

22 / 38
Example

Suppose the National Transportation Safety Board (NTSB) wants


to examine the safety of compact cars, midsize cars, and full-size
cars. It collects a sample of three for each of the treatments
(cars types). Using the hypothetical data provided below, test
whether the mean pressure applied to the driver’s head during a
crash test is equal for each types of car. Use α = 5%.
Compact cars Midsize cars Full-size cars
643 469 484
655 427 456
702 525 402
X̄ 666.67 473.67 447.33
S 31.18 49.17 41.68

23 / 38
Solution
1 State the null and alternative hypotheses The null
hypothesis for an ANOVA always assumes the population
means are equal. Hence, we may write the null hypothesis
as:
H0 : µ1 = µ2 = µ3
this means the mean head pressure is statistically equal
across the three types of cars.
• Since the null hypothesis assumes all the means are
equal, we could reject the null hypothesis if only mean is
not equal. Thus, the alternative hypothesis is:
Ha : At least one mean pressure is not statistically equal.
2 Calculate the appropriate test statistic The test statistic in
ANOVA is the ratio of the between and within variation in
the data. It follows an F distribution
24 / 38
Solution

Total Sum of Squares - the total variation in the data. It is the


sum of the between and within variation.
¯ 2
 
• Total Sum of Squares (SST) = ri=1 cj=1 Xij − X̄
P P

where r is the number of rows in the table, c is the number of


¯ is the grand mean, and X is the i th observation in
columns, X̄ ij
the j th column. Using the data in Table we may find the grand
mean:
P
¯= Xij (643 + 655 + 702 + 469 + 427 + 525 + 484 + 456 + 402)
X̄ =
N 9
= 529.22
SST =
(643 − 529.22)2 + (655 − 529.22)2 + (702 − 529.22)2 + (469 − 529.22)2
+ . . . + (402 − 529.22)2 = 96303.55

25 / 38
Solution cont’d

Between Sum of Squares (or Treatment Sum of Squares): varia-


tion in the data between the different samples (or treatments).
P  ¯ 2

• Treatment Sum of Squares (SSTR) = rj X̄j − X̄
where rj is the number of rows in the j th treatment
X̄j is the mean of the j th treatment. Using the data in Table

SSTR = 3 × (666.67 − 529.22)2 + 3 × (473.67 − 529.22)2


   

+ 3 × (447.33 − 529.22)2 = 86049.55


 

26 / 38
Solution Cont’d

Within variation (or Error Sum of Squares): variation in the data


from each individual treatment.
PP 2
Error Sum of Squares (SSE) = Xij − X̄j
From Table,

SSE = (643 − 666.67)2 + (655 − 666.67)2 + (702 − 666.67)2 +


 

(469 − 473.67)2 + (427 − 473.67)2 + (525 − 473.67)2 +


 

(484 − 447.33)2 + (456 − 447.33)2 + (402 − 447.33)2 = 10254


 

Note that

SST = SSTR + SSE = (96303.55 = 86049.55 + 10254)

27 / 38
• Hence, you only need to compute any two of three sources
of variation to conduct an ANOVA. Especially for the first few
problems you work out, you should calculate all three for practice.
• The next step in an ANOVA is to compute the “average”
sources of variation in the data using SST, SSTR, and SSE.
SST
• Total Mean Squares (M ST ) = N −1 → “average total variation
in the data” (N is the total number of observations)
96303.55
MST = = 12037.94
(9 − 1)
Mean Square Treatment (MSTR) = SST R
c−1 → “average between
variation” ( c is the number of columns in the data table)
86049.55
MSTR = = 43024.78
(3 − 1)
SSE
Mean Square Error (MSE) = N −c → “average within variation”
10254
MSE = = 1709
9−3
28 / 38
The test statistic may now be calculated. For a one-way ANOVA
the test statistic is equal to the ratio of MSTR and MSE. This
is the ratio of the “average between variation” to the “average
within variation”. In addition, this ratio is known to follow an F
distribution. Hence,
M ST R 43024.78
F = = = 25.17
M SE 1709
The intuition here is relatively straightforward. If the average
between variation rises relative to the average within variation,
the F statistic will rise and so will our chance of rejecting the
null hypothesis.

29 / 38
Obtain the Critical Value
To find the critical value from an F distribution you must know
degrees of freedom for the numerator (MSTR) and that of de-
nominator (MSE), along with the significance level.
• FCV has df1 and df2 degrees of freedom, where df1 is the numer-
ator degrees of freedom equal to m−1 and df2 is the denominator
degrees of freedom equal to n − m.
• In our example,
df1 = 3 − 1 = 2
and
df2 = 9 − 3 = 6
. Hence we need to find the critical values of F corresponding
to α = 5%. Using the F tables in your text we determine that
F2,6 = 5.14

30 / 38
One-way-ANOVA Decision Criteria
• Fcalculated > Fcritical , We reject the null hypothesis and accept
the alternative hypothesis that there is at least a difference be-
tween two of the group means.
• If Fcalculated < Fcritical , We fail to reject the null hypothesis
and conclude that there are no significant differences between
the group means.
Decision rule per example
In our example 25.17 > 5.14, so we reject the null hypothesis

Interpretation
Since we rejected the null hypothesis, we are 95% confident
(1 − α) that the mean head pressure is not statistically equal for
compact, midsize, and full size cars. However, since only one
mean must be different to reject the null, we do not yet know
which mean(s) is/are different. In short, an ANOVA test will
test us that at least one mean is different, but an additional test
must be conducted to determine which mean(s) is/are different.
31 / 38
Example 2: Reed Manufacturing

J. R. Reed would like to know if the mean number of hours


worked per week is the same for the department managers at her
three manufacturing plants (Buffalo-Plant 1,Pittsburgh-Plant 2,
and Detroit-Plant 3). A simple random sample of 5 managers
from each of the three plants was taken and the number of hours
worked by each manager for the previous week is shown on the
next slide.

32 / 38
Example 2: Reed Manufacturing

Plant 1 Plant 2 Plant 3


Observation Buffalo Pittsburgh Detroit
1 48 73 51
2 54 63 63
3 57 66 61
4 54 64 54
5 62 74 56
Sample Mean 55 68 57
Sample Variance 26.0 26.5 24.5

33 / 38
Solution

Hypothesis
H0 : µ1 = µ2 = µ3
Ha : not all the means are equal

where:
µ1 = mean number of hours worked per week by the managers
at Plant,
µ2 = mean number of hours worked per week by the managers
at Plant,
µ3 = mean number of hours worked per week by the managers
at Plant 3

34 / 38
Cont’d

Mean Square Between Since the sample sizes are all


equal,
¯ = (55 + 68 + 57)/3 = 60

SSB = 5(55 − 60)2 + 5(68 − 60)2 + 5(57 − 60)2 = 490
M SB = 490/(3 − 1) = 245
Mean Square within

SSW=4(26.0)+4(26.5)+4(24.5)=308
MSW= 308/(15-3)=25.667

35 / 38
F-test
If H0 is true, the ratio MSB/MSW should be near 1 since both
MSB and MSW are estimating σ 2 . If Ha is true, the ratio
should be significantly larger than 1 since MSB tends to
overestimate σ 2

Rejection Rule
Assuming α = 0.05, F0.05 = 3.89(2 d.f. numerator, 1 d.f.
denominator).
Reject H0 if F > 3.89.

Test Statistic
F=MSB/MSW=245/25.667=9.55

36 / 38
Conclusion
F = 9.55 > F.05 = 3.89, so we reject H0. The mean number of
hours worked per week by department managers is not the same
at each plant.

ANOVA Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Square
Within Groups 490 2 245 9.55
Between Groups 308 12 25.667
Total 798 14

37 / 38
Group Assignment

38 / 38

You might also like