AS Lecture 10 (Anova Test)
AS Lecture 10 (Anova Test)
Lecture # 10
2
One sample, two sample, or paired t test
• If the groups come from a single population (e.g., measuring
before and after an experimental treatment), perform
a paired t test. This is a within-subjects design.
• If the groups come from two different populations (e.g., two
different species, or people from two separate cities), perform
a two-sample t test (a.k.a. independent t test). This is a between-
subjects design.
• If there is one group being compared against a standard value
(e.g., comparing the acidity of a liquid to a neutral pH of 7),
perform a one-sample t test.
3
ANOVA
• ANOVA, which stands for Analysis of Variance, is a statistical
test used to analyze the difference between the means of more
than two groups.
• A one-way ANOVA uses one independent variable, while a two-
way ANOVA uses two independent variables.
4
One Way ANOVA
• Use a one-way ANOVA when you have collected data about
one categorical independent variable and one quantitative
dependent variable. The independent variable should have at least
three levels (i.e. at least three different groups or categories).
• ANOVA tells you if the dependent variable changes according to the
level of the independent variable. For example:
• Your independent variable is social media use, and you assign groups
to low, medium, and high levels of social media use to find out if there is a
difference in hours of sleep per night.
• Your independent variable is brand of soda, and you collect data
on Coke, Pepsi, Sprite, and Fanta to find out if there is a difference in
the price per 100ml.
• You independent variable is type of fertilizer, and you treat crop fields
with mixtures 1, 2 and 3 to find out if there is a difference in crop yield.
5
One Way ANOVA
• The null hypothesis (H0) of ANOVA is that there is no difference
among group means.
• The alternative hypothesis (Ha) is that at least one group
differs significantly from the overall mean of the dependent
variable.
• If you only want to compare two groups, use a t test instead.
6
How does an ANOVA test works?
• ANOVA determines whether the groups created by the levels of the
independent variable are statistically different by calculating whether the
means of the treatment levels are different from the overall mean of the
dependent variable.
• If any of the group means is significantly different from the overall mean,
then the null hypothesis is rejected.
• ANOVA uses the F test for statistical significance. This allows for
comparison of multiple means at once, because the error is calculated for
the whole set of comparisons rather than for each individual two-way
comparison (which would happen with a t test).
• The F test compares the variance in each group mean from the overall
group variance. If the variance within groups is smaller than the
variance between groups, the F test will find a higher F value, and
therefore a higher likelihood that the difference observed is real and not
due to chance
7
One Way ANOVA: Example
Three varieties of wheat are
sown in four plots each, and the
yields are recorded as shown Varieties of
Plot
below. Conduct an ANOVA to Wheat
𝑷𝟏 𝑷𝟐 𝑷𝟑 𝑷𝟒
analyze the differences between
A 5 3 4 2
the wheat varieties.
B 4 4 3 3
𝑯𝟎 : The difference between 3 types
of wheat is not significant. C 3 2 5 2
or
𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 = 𝝁𝟑 :
8
ANOVA Table
𝑺𝒖𝒎 𝑺𝒒𝒖𝒂𝒓𝒆 𝑴𝒆𝒂𝒏 𝑺𝒖𝒎 𝑺𝒒𝒖𝒂𝒓𝒆 𝑭 − 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔
𝒅𝒇
(𝑺𝑺) 𝑴𝑺𝑺 (𝑭𝒄 )
Between Group
Within Group
Total
9
ANOVA Table
𝑺𝒖𝒎 𝑺𝒒𝒖𝒂𝒓𝒆 𝑴𝒆𝒂𝒏 𝑺𝒖𝒎 𝑺𝒒𝒖𝒂𝒓𝒆 𝑭 − 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔
𝒅𝒇
(𝑺𝑺) 𝑴𝑺𝑺 (𝑭𝒄 )
Between Group 2
Within Group 9
Total 11
• 𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏 = 𝒌 − 𝟏 = 𝟑 − 𝟏 = 𝟐
• 𝒅𝒇𝒕𝒐𝒕𝒂𝒍 = 𝑵 − 𝟏 = 𝟏𝟐 − 𝟏 = 𝟏𝟏
• 𝒅𝒇𝒘𝒊𝒕𝒉𝒊𝒏 = 𝒅𝒇𝒕𝒐𝒕𝒂𝒍 − 𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏 = 𝟏𝟏 − 𝟐 = 𝟗
10
Two Way ANOVA: Example
Correction Factor: Varieties
Plot
𝑻𝒊 𝑻𝟐𝒊
In ANOVA, the correction factor (grand mean) of Wheat
𝑷𝟏 𝑷𝟐 𝑷𝟑 𝑷𝟒
helps us understand if the differences between
group means are statistically significant. It A 5 3 4 2 14 196
serves as a reference point for evaluating the
variability within and between groups, allowing B 4 4 3 3 14 196
us to assess whether the observed differences
are likely due to the treatment effect or simply C 3 2 5 2 12 144
random variation.
40
𝑇2 40 2
𝐶𝐹 = = = 133.33
𝑁 12
11
Two Way ANOVA: Example
• 𝐶𝐹 = 133.33 Varieties
Plot
2 𝑻𝒊 𝑻𝟐𝒊
• 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = σ𝑖 σ𝑗 𝑥𝑖𝑗 − 𝐶𝐹 of Wheat
𝑷𝟏 𝑷𝟐 𝑷𝟑 𝑷𝟒
• 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 146 − 133.33 = 12.67 A 5 3 4 2 14 196
𝑻𝟐𝒊
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = σ − 𝐶𝐹 B 4 4 3 3 14 196
𝑛𝑖
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 =
196
+
196
+
144
= 0.67 C 3 2 5 2 12 144
4 4 4
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 − 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 40
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 12.67 − 0.67 = 12
2 2 2 2
𝑥𝑖𝑗 = 𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = 5 + 3 + ⋯+ 2 = 146
12
𝑖 𝑗
ANOVA Table
𝑺𝒖𝒎 𝑺𝒒𝒖𝒂𝒓𝒆 𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑭 − 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔
𝒅𝒇
(𝑺𝑺) 𝑴𝑺 (𝑭𝒄 )
𝑆𝑆 0.67
Between Group 2 0.67 𝑀𝑆𝑏 = = = 0.335 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝑑𝑓 2 𝐹𝑐 =
𝑆𝑆 12 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛
Within Group 9 12 𝑀𝑆𝑤 = = = 1.33 0.335
𝑑𝑓 9 𝐹𝑐 = = 0.251
1.33
Total 11 12.67
If 𝐹𝑐 ≤ 𝐹𝑡 then we accept 𝐻0
From the F-table Ft,0.05,2,9 = 4.25
0.251 < 4.25
Here 𝑭𝒄 < 𝑭𝒕 , Therefore we Accept 𝑯𝟎
13
𝒅𝒇𝒘𝒊𝒕𝒉𝒊𝒏 𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏
14
One way ANOVA test: Practice Problem
A company is testing three different brands of smartphones to see if
there is a significant difference in their battery life. They randomly
select four smartphones from each brand and record their battery life
(in hours). The data is as follows:
𝒊𝒑𝒉𝒐𝒏𝒆 20 18 19 21
𝑺𝒂𝒎𝒔𝒖𝒏𝒈 22 23 21 20
𝑵𝒐𝒌𝒊𝒂 17 18 16 19
Perform a one-way ANOVA analysis on the battery life data. Use a
significance level of 0.05. State your conclusions based on the results.
15
One way ANOVA test: Practice Problem
• 𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏 = 𝒌 − 𝟏 𝒊𝒑𝒉𝒐𝒏𝒆 20 18 19 21
• 𝒅𝒇𝒕𝒐𝒕𝒂𝒍 = 𝑵 − 𝟏 𝑺𝒂𝒎𝒔𝒖𝒏𝒈 22 23 21 20
• 𝒅𝒇𝒘𝒊𝒕𝒉𝒊𝒏 = 𝒅𝒇𝒕𝒐𝒕𝒂𝒍 − 𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏
𝑵𝒐𝒌𝒊𝒂 17 18 16 19
𝑇2
• 𝐶𝐹 = 𝑭 − 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔
𝑁 𝒅𝒇 (𝑺𝑺) 𝑴𝑺𝑺
(𝑭𝒄 )
2
• 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = σ𝑖 σ𝑗 𝑥𝑖𝑗 − 𝐶𝐹
Between Group
𝑻𝟐𝒊
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = σ − 𝐶𝐹 Within Group
𝑛𝑖
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 − 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 Total
17
Two Way ANOVA - Example
• You are researching which type of fertilizer and planting
density produces the greatest crop yield in a field experiment.
You assign different plots in a field to a combination of fertilizer
type (1, 2, or 3) and planting density (1=low density, 2=high
density), and measure the final crop yield in bushels per acre at
harvest time.
• You can use a two-way ANOVA to find out if fertilizer type and
planting density have an effect on average crop yield.
18
When to use a Two-Way ANOVA
• You can use a two-way ANOVA when you have collected data on a
quantitative dependent variable at multiple levels of two categorical
independent variables.
• A quantitative variable represents amounts or counts of things. It can be
divided to find a group mean.
• Bushels per acre is a quantitative variable because it represents the
amount of crop produced. It can be divided to find the average bushels
per acre.
• A categorical variable represents types or categories of things. A level is
an individual category within the categorical variable.
• Fertilizer types 1, 2, and 3 are levels within the categorical
variable fertilizer type. Planting densities 1 and 2 are levels within the
categorical variable planting density.
• Both of your independent variables should be categorical. If one of your
independent variables is categorical and one is quantitative, use an
ANCOVA instead. 19
Two Way ANOVA: Example
Four workers alternatively work on
four machines. The number of
defective goods which each worker Workers
has produced are shown below in the Machines
𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒
table. Use ANOVA to examine the
difference between the machine and 𝑴𝟏 10 15 7 44
difference between the workers. 𝑴𝟐 12 20 10 58
𝑯𝟎 :
𝑴𝟑 14 7 9 40
• The difference between 4 types of
machines is not significant 𝑴𝟒 8 16 20 52
• The difference between 4 types of workers
is not significant.
20
Two Way ANOVA: Example
• 𝑁 = 16 (Total elements) Workers
Machine
𝑻𝒊 𝑇𝑖2
• ℎ = 4 (No. of values in each s
𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒
Row)
• 𝑘 = 4 (No. of values in each 𝑴𝟏 10 15 7 12 44 1936
Columns) 𝑴𝟐 12 20 10 16 58 3364
• σ 𝑇𝑖2 = 9604 𝑴𝟑 14 7 9 10 40 1600
• σ 𝑇𝑗2 = 9532 𝑴𝟒 8 16 20 8 52 2704
2
• σ𝑖 σ𝑗 𝑥𝑖𝑗 = 2628 𝑻𝒋 44 58 46 46 194 9604
𝑻𝟐𝒋 1936 3364 2116 2116 9532
2 2 2 2
𝑥𝑖𝑗 = 𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = 10 + 15 + ⋯+ 8 = 2628
𝑖 𝑗 21
Two Way ANOVA: Example
• σ 𝑇𝑖2 = 9604 Machine
Workers
𝑻𝒊 𝑇𝑖2
s
2
• 𝑇𝑗 = 9532
σ 𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒
2 𝑴𝟏 10 15 7 12 44 1936
• σ𝑖 σ𝑗 𝑥𝑖𝑗 = 2628
𝑴𝟐 12 20 10 16 58 3364
𝑴𝟑 14 7 9 10 40 1600
Correction Factor 𝑴𝟒 8 16 20 8 52 2704
𝑇2 194 2
T=
𝐶𝐹 = = = 2352.25 𝑻𝒋 44 58 46 46
194
9604
𝑁 16
𝑻𝟐𝒋 1936 3364 2116 2116 9532
22
Two Way ANOVA: Example
𝐶𝐹 = 2352.25 Workers
Machines 𝑻𝒊 𝑇𝑖2
Sum of Square 𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒
2
• 𝑆𝑆𝑇 = σ𝑖 σ𝑗 𝑥𝑖𝑗 − 𝐶𝐹
𝑴𝟏 10 15 7 12 44 1936
𝑆𝑆𝑇 = 𝟐𝟔𝟐𝟖 − 𝟐𝟑𝟓𝟐. 𝟐𝟓 = 𝟐𝟕𝟓. 𝟕𝟓
σ 𝑇𝑖2 𝑴𝟐 12 20 10 16 58 3364
• 𝑆𝑆_𝑏𝑒𝑡𝑤𝑒𝑒𝑛(𝑅𝑜𝑤) = − 𝐶𝐹
𝟗𝟔𝟎𝟒
ℎ 𝑴𝟑 14 7 9 10 40 1600
𝑆𝑆𝑏 = − 𝟐𝟑𝟓𝟐. 𝟐𝟓 = 𝟒𝟖. 𝟕𝟓 𝑴𝟒 8 16 20 8 52 2704
𝟒
σ 𝑇𝑗2
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛(𝑐𝑜𝑙.) = − 𝐶𝐹 𝑻𝒋 44 58 46 46 194 9604
𝑘
𝟗𝟓𝟑𝟐 𝑻𝟐𝒋 1936 3364 2116 2116 9532
𝑆𝑆𝐶 = − 𝟐𝟑𝟓𝟐. 𝟐𝟓 = 𝟑𝟎. 𝟕𝟓
𝟒
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 𝐸𝑟𝑟𝑜𝑟 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶
𝑆𝑆𝐸 = 𝟐𝟕𝟓. 𝟕𝟓 − 𝟒𝟖. 𝟕𝟓 − 𝟑𝟎. 𝟕𝟓 = 𝟏𝟗𝟔. 𝟐𝟓 23
Two Way ANOVA: Example
Degree of Freedom Machine
Workers
𝑻𝒊 𝑇𝑖2
s
𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒
• 𝑑𝑓𝑡𝑜𝑡𝑎𝑙 = 𝑁 − 1 = 16 − 1 𝑴𝟏 10 15 7 12 44 1936
𝒅𝒇𝒕𝒐𝒕𝒂𝒍 = 𝟏𝟓
𝑴𝟐 12 20 10 16 58 3364
• 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑟𝑜𝑤𝑠 = ℎ − 1 = 4 − 1
𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝒓𝒐𝒘𝒔 = 𝟑 𝑴𝟑 14 7 9 10 40 1600
• 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 = 𝑘 − 1 = 4 − 1 𝑴𝟒 8 16 20 8 52 2704
𝒅𝒇𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝒄𝒐𝒍𝒖𝒎𝒏𝒔 = 𝟑
𝑻𝒋 44 58 46 46 194 9604
• 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = ℎ − 1 𝑘 − 1 = 3 3
𝒅𝒇𝒆𝒓𝒓𝒐𝒓 = 𝟗 𝑻𝟐𝒋 1936 3364 2116 2116 9532
24
ANOVA Table
Sources of variation 𝒅𝒇 𝑺𝑺 𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑭 − 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔 (𝑭𝒄 )
27
Two Way ANOVA: Practice Problem
2
• 𝑆𝑆𝑇 = σ𝑖 σ𝑗 𝑥𝑖𝑗 − 𝐶𝐹 • 𝑑𝑓𝑡𝑜𝑡𝑎𝑙 = 𝑁 − 1 Plots
σ 𝑇𝑖2 Fertilizers
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛(𝑅) . =
ℎ
− 𝐶𝐹 • 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑟𝑜𝑤𝑠 = ℎ − 1 𝑷𝟏 𝑷𝟐 𝑷𝟑 𝑷𝟒
σ 𝑇𝑗2 • 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 = 𝑘 − 1 Nitrogen 10 15 7 44
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛(𝐶) = − 𝐶𝐹
𝑘
• 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = ℎ − 1 𝑘 − 1
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛(𝐸) = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 − 𝑆𝑆𝐶 Potash 12 20 10 58
𝑇2
• 𝐶𝐹 = Phosphate 14 7 9 40
𝑁
Sources of variation 𝒅𝒇 𝑺𝑺 𝑴𝒆𝒂𝒏 𝑺𝒒𝒖𝒂𝒓𝒆 𝑭 − 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔 (𝑭𝒄 )
𝑆𝑆
Between Fertilizers (rows) 𝑀1 =
𝑑𝑓 𝑀1
𝐹𝑐1 =
𝑀3
Between Plots (Columns) 𝑀2 = 𝑀2
𝐹𝑐1 =
𝑀3
Within (Error) 𝑀3 =
𝑻𝒐𝒕𝒂𝒍
28
Acknowledgment
• [Peter Andrew Bruce] Practical Statistics for Data Scientists
• [David Forsyth] Probability and Statistics for Computer Science
• [Michael Baron] Probability and Statistics for Computer Scientists
• .
29