0% found this document useful (0 votes)
25 views23 pages

Anova-2

The document discusses Analysis of Variance (ANOVA), focusing on both one-way and two-way ANOVA methods used to compare means across different groups. It provides examples, including a one-way ANOVA test on stock returns across industrial sectors and a two-way ANOVA on sales quantities influenced by discounts and locations. Key concepts such as hypothesis testing, F-statistics, and the significance of results are also covered, along with references for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views23 pages

Anova-2

The document discusses Analysis of Variance (ANOVA), focusing on both one-way and two-way ANOVA methods used to compare means across different groups. It provides examples, including a one-way ANOVA test on stock returns across industrial sectors and a two-way ANOVA on sales quantities influenced by discounts and locations. Key concepts such as hypothesis testing, F-statistics, and the significance of results are also covered, along with references for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

DATA ANALYTICS

UE22CS342AA2
UNIT-1
Lecture 9 : Analysis of Variance - 2

Gowri Srinivasa
Department of Computer Science and Engineering
Data Analytics
Unit 1
Lecture 9 : Analysis of Variance - 2
Slides collated by:
Nishanth M S, CSE 2023, PES University
[email protected]
Slides excerpted from: U. Dinesh Kumar, Harshitha Srikanth, CSE 2024, PES University
“Business Analytics”, Wiley, 2nd Edition 2022 [email protected]
Karthik Namboori, Sem VII, PES University
[email protected]

Gowri Srinivasa With grateful thanks for contribution of slides to:


Department of Computer Science and Engineering Dr. Mamatha H R, Professor at the Department of CSE, PESU
DATA ANALYTICS
One-Way ANOVA : Example ( Observational Study )
Share Raja Khan (SRK) is a top stockbroker and believes that the average annual stock return depends on
the industrial sector. To validate his belief, SRK collected annual return of shares from three different
industrial sectors - consumer goods, services, and industrial goods. The annual return of shares in 2015-
2016 for different sectors is shown below. Conduct an ANOVA test at 5% significance level.
DATA ANALYTICS
Solution

• In this case, the number of cases k = 3; n1 = n2 = n3 = 30; m1 = 0.082,


m2 = 0.079, m3 = 0.0605; and m = 0.0743
• The sum of squares of between (SSB) groups variation is given by

• Therefore
DATA ANALYTICS
Solution

• The sum of squares of within the group variation is given by

• Therefore

• The F-statistic value is


DATA ANALYTICS
Solution

• The critical F-value with degrees of freedom (2, 87) for a = 0.05 is 3.101
• The p-value for F2,87 = 2.592 is 0.0805
• Since the calculated F-statistic is less than the critical F-value, we retain the null
hypothesis and conclude that the average annual returns under industrial sectors
consumer goods, services, and industrial goods are not different.

F-distribution with critical value Excel output of ANOVA for this data
DATA ANALYTICS
Two-Way ANOVA

• The values of response variable may be influenced by several factors. For example,
in addition to price discounts, location of the stores may also play an important role
in the sales quantity.

• The discounts may not have much impact if the store is located near affluent
community compared to stores located near non-affluent community.

• We would like to understand the impact of both factors (price discount and
location) simultaneously on sales by trying to answer to the following questions:
▪ Are there differences in the average sales quantity with different levels of price
discounts?
▪ Are there differences in the average sales quantity with respect to different
locations?
▪ Are there interactions between price discounts and location with respect to
average sales quantity?
DATA ANALYTICS
Setting up Two-Way ANOVA

The two-way ANOVA model can be expressed as

Where,
Yijk = Value of the kth observation (k = 1, 2, …, K) of the response variable at level i
(i = 1, 2, …, a) of factor A and level j (j = 1, 2, …, b) of factor B.
m = Overall mean value of the response variable Yijk
ai = Level (effect) of factor A (i = 1, 2, …, a)
bj = Level (effect) of factor B (j = 1, 2, …, b)
aibj = Interaction of ith level of factor A and jth level of factor B
eijk = Error associated with kth of observation at level i of factor A and level j of
DATA ANALYTICS
Setting up Two-Way ANOVA

The hypothesis tests associated with two-way ANOVA are as follows:

Test of Factor A Main Effects:


H0: ai = 0 for all i (i = 1, 2, …, a)
HA: Not all ai are zero

Test of Factor B Main Effects:


H0: bj = 0 for all j (j = 1, 2, …, b)
HA: Not all bj are zero

Test of Interaction Effects:


H0: aibj = 0 for all i (i = 1, 2, …, a) and j (j = 1, 2, …, b)
HA: Not all aibj are zero
DATA ANALYTICS
Setting up Two-Way ANOVA

The sum of squares in the case of two-way ANOVA with equal sample sizes is
given by
SST = SSA + SSB + SSAB + SSW
Various components in the above equation are provided as follows :
• Sum of squared of total deviation (SST):

where c is the number of observations in each group and m is the overall mean.
DATA ANALYTICS
Setting up Two-Way ANOVA

• Sum of squares of deviation due to factor A (SSA):

where mi is the mean of all observations in level i of factor A and c is the number
of observations in each group (assumed to be same for all groups).
• Sum of squares of deviation due to factor B (SSB):

Here mj is the mean of all observations in level j of factor B.


DATA ANALYTICS
Setting up Two-Way ANOVA

• Sum of squares of deviation due to interaction of factors A and B (SSAB)

where mij is the average of ith level of factor A and jth level of factor B.

• Sum of squares of deviation within a group (SSW):


DATA ANALYTICS
Setting up Two-Way ANOVA
Sum of squares of deviation for various effects and the corresponding F-statistic in a
two-way ANOVA with equal sample size

Sum of SquaredDegrees of Freedom Mean Squared Variation F-Statistics

Variation

SSA a-1 MSA = SSA/(a - 1) F = MSA/MSW

SSB b-1 MSB = SSB/(b - 1) F = MSB/MSW

SSAB (a - 1)(b - 1) MSAB = SSAB/(a - 1)(b - 1) F = MSAB/MSW

SSW ab(c - 1) MSW = SSW/ab(c - 1)


DATA ANALYTICS
Two-Way ANOVA : Example

The table next slide shows the sales quantity of detergents at different
discount values and different locations collected over 20 days. Conduct a
two-way ANOVA at a = 0.05 to test the effects of discounts and location
on the sales.
Location 1 Location 2

Discount Discount

0% 10% 20% 0% 10% 20%

20 28 32 20 19 20

16 23 29 21 27 31

24 25 28 23 23 35

20 31 27 19 30 25

19 25 30 25 25 31

10 24 26 22 21 31

24 28 37 25 33 31

16 23 33 21 26 23

25 26 27 26 22 22

16 25 31 22 28 32

18 22 37 25 24 22

20 24 28 23 23 29

17 26 25 23 26 25

26 28 23 24 16 34

16 21 26 20 30 30

21 27 33 23 22 25

24 25 28 18 16 39

19 20 30 19 25 32

19 26 30 19 34 29
DATA ANALYTICS
Solution
The two-way ANOVA with replication (since the data is repeated for locations) output from Microsoft
Excel is shown

ANOVA

Source of Variation SS df MS F P-value F crit

Sample

(Location) 7.008333 1 7.008333 0.443898 0.506593 3.92433

Columns

(Discount) 1240.317 2 620.1583 39.27997 1.06E-13 3.075853

Interaction 84.81667 2 42.40833 2.686085 0.07246 3.075853

Within 1799.85 114 15.78816

Total 3131.992 119


DATA ANALYTICS
Solution

• In the table , the sample stands for the row factor (which in this case is
location), column stands for the column factor (discount in this case), and
interaction stands for interaction effect (location × discount).
• The p-value for locations (data in rows) is 0.5065, thus it is not statistically
significant (we retain the null hypothesis that the locations have no statistical
influence on sales), whereas for discount rates (data in column) the p-value
is 1.06 × 10-13, so we reject the null hypothesis (that is discount rate has
influence on sales).
• The p-value for the interaction effect is 0.0724 and is not significant. That is
only the factor discount is statistically significant at a = 0.05.
DATA ANALYTICS
Summary

• Analysis of Variance (ANOVA) is a hypothesis testing procedure used for


comparing means from several groups simultaneously.
• In one-way ANOVA, we test whether the mean values of an outcome variable
for different levels of factor are different. Using multiple two sample t-test to
simultaneously test group means will result in incorrect estimation of Type- I
error and ANOVA overcomes this problem.
• ANOVA plays an important role in multiple linear regression model
diagnostics. The overall significance of the model is tested using ANOVA.
• In a two-way ANOVA we check the impact of more than one factor
simultaneously on several groups.
DATA ANALYTICS
Test your understanding!

• A two-way ANOVA means that the experimental design includes


a) Two dependent variables
b) Two independent variables
c) Two types of variance
d) All of these
Solution
b) Two independent variables.
DATA ANALYTICS
Test your understanding!

• In a two-way ANOVA, the interaction effect is the

a) Effect of changing the levels of one factor on the dependent scores


b) Effect of changing the levels of one factor on the dependent scores, ignoring all
other factors in the study
c) Extent to which the influence one factor has on scores depends on the level of
the other factor
d) Effect on the independent variables of changing the levels of a factor
Solution
c) Extent to which the influence one factor has on scores depends on the level of
the other factor
DATA ANALYTICS
Quick Glance-Points to remember

• Why 2-way ANOVA(2 factors considered)


• Setting up 2-way ANOVA:
o Understanding all the variables and subscripts used
o Test of Factor A(Main effect)
o Test of Factor B(Main effect)
o Interaction effect
o SST,SSA,SSB,SSAB,SSW and corresponding DoF’s
o F-statistics
o Finally, when to accept and reject the null hypothesis(based on
calculated F value and critical F-value and p value)
DATA ANALYTICS
References

• Business Analytics by U. Dinesh Kumar – Wiley 2nd Edition, 2022


Chapter : 7.4
• https://www.analyticsvidhya.com/blog/2018/01/anova-analysis-of-v
ariance/
• https://www.analyticsvidhya.com/blog/2020/06/introduction-anova-
statistics-data-science-covid-python/
THANK YOU
Dr. Gowri Srinivasa
Professor, Department of Computer Science and
Engineering, PES University, Bengaluru
Email: [email protected]

You might also like