Anova-2
Anova-2
UE22CS342AA2
UNIT-1
Lecture 9 : Analysis of Variance - 2
Gowri Srinivasa
Department of Computer Science and Engineering
Data Analytics
Unit 1
Lecture 9 : Analysis of Variance - 2
Slides collated by:
Nishanth M S, CSE 2023, PES University
[email protected]
Slides excerpted from: U. Dinesh Kumar, Harshitha Srikanth, CSE 2024, PES University
“Business Analytics”, Wiley, 2nd Edition 2022 [email protected]
Karthik Namboori, Sem VII, PES University
[email protected]
• Therefore
DATA ANALYTICS
Solution
• Therefore
• The critical F-value with degrees of freedom (2, 87) for a = 0.05 is 3.101
• The p-value for F2,87 = 2.592 is 0.0805
• Since the calculated F-statistic is less than the critical F-value, we retain the null
hypothesis and conclude that the average annual returns under industrial sectors
consumer goods, services, and industrial goods are not different.
F-distribution with critical value Excel output of ANOVA for this data
DATA ANALYTICS
Two-Way ANOVA
• The values of response variable may be influenced by several factors. For example,
in addition to price discounts, location of the stores may also play an important role
in the sales quantity.
• The discounts may not have much impact if the store is located near affluent
community compared to stores located near non-affluent community.
• We would like to understand the impact of both factors (price discount and
location) simultaneously on sales by trying to answer to the following questions:
▪ Are there differences in the average sales quantity with different levels of price
discounts?
▪ Are there differences in the average sales quantity with respect to different
locations?
▪ Are there interactions between price discounts and location with respect to
average sales quantity?
DATA ANALYTICS
Setting up Two-Way ANOVA
Where,
Yijk = Value of the kth observation (k = 1, 2, …, K) of the response variable at level i
(i = 1, 2, …, a) of factor A and level j (j = 1, 2, …, b) of factor B.
m = Overall mean value of the response variable Yijk
ai = Level (effect) of factor A (i = 1, 2, …, a)
bj = Level (effect) of factor B (j = 1, 2, …, b)
aibj = Interaction of ith level of factor A and jth level of factor B
eijk = Error associated with kth of observation at level i of factor A and level j of
DATA ANALYTICS
Setting up Two-Way ANOVA
The sum of squares in the case of two-way ANOVA with equal sample sizes is
given by
SST = SSA + SSB + SSAB + SSW
Various components in the above equation are provided as follows :
• Sum of squared of total deviation (SST):
where c is the number of observations in each group and m is the overall mean.
DATA ANALYTICS
Setting up Two-Way ANOVA
where mi is the mean of all observations in level i of factor A and c is the number
of observations in each group (assumed to be same for all groups).
• Sum of squares of deviation due to factor B (SSB):
where mij is the average of ith level of factor A and jth level of factor B.
Variation
The table next slide shows the sales quantity of detergents at different
discount values and different locations collected over 20 days. Conduct a
two-way ANOVA at a = 0.05 to test the effects of discounts and location
on the sales.
Location 1 Location 2
Discount Discount
20 28 32 20 19 20
16 23 29 21 27 31
24 25 28 23 23 35
20 31 27 19 30 25
19 25 30 25 25 31
10 24 26 22 21 31
24 28 37 25 33 31
16 23 33 21 26 23
25 26 27 26 22 22
16 25 31 22 28 32
18 22 37 25 24 22
20 24 28 23 23 29
17 26 25 23 26 25
26 28 23 24 16 34
16 21 26 20 30 30
21 27 33 23 22 25
24 25 28 18 16 39
19 20 30 19 25 32
19 26 30 19 34 29
DATA ANALYTICS
Solution
The two-way ANOVA with replication (since the data is repeated for locations) output from Microsoft
Excel is shown
ANOVA
Sample
Columns
• In the table , the sample stands for the row factor (which in this case is
location), column stands for the column factor (discount in this case), and
interaction stands for interaction effect (location × discount).
• The p-value for locations (data in rows) is 0.5065, thus it is not statistically
significant (we retain the null hypothesis that the locations have no statistical
influence on sales), whereas for discount rates (data in column) the p-value
is 1.06 × 10-13, so we reject the null hypothesis (that is discount rate has
influence on sales).
• The p-value for the interaction effect is 0.0724 and is not significant. That is
only the factor discount is statistically significant at a = 0.05.
DATA ANALYTICS
Summary