Advanced Statistics: Analysis of Variance (ANOVA) Dr. P.K.Viswanathan (Professor Analytics)
Advanced Statistics: Analysis of Variance (ANOVA) Dr. P.K.Viswanathan (Professor Analytics)
Analysis of Variance(ANOVA)
Dr. P.K.Viswanathan(Professor
Analytics)
1
ANOVA Basics
This technique is part of the domain called
“Experimental Designs”.
This helps in establishing in a precise fashion the Cause
- Effect relation amongst variables.
From the Statistical Inference Point of View, ANOVA is
an extension of independent t test for testing the
equality of two population means.
When we have to compare more than two population
means, we use ANOVA
Typically, the null hypothesis( H0 ) is as under:
H0 : µ1= µ2=µ3=µ4=……=µk for testing the equality of
Population Means for k populations
2
ANOVA-One Way Classification
Assumptions involved in using ANOVA
3
Hypotheses of One-Way ANOVA
All population means are equal
H 0 : μ1 μ 2 μ 3 μ k
For at least one pair, the population means are
unequal
μ1 μ 2 μ 3
One-Way ANOVA
Alternative Hypothesis(H1=True)
H 0 : μ1 μ 2 μ 3 μ k
or
μ1 μ2 μ3 μ1 μ2 μ3
ANOVA Basics
7
Partition of Total Variation(Information
Content
Total Variation [Total Sum of Squares(TSS)]
8
ANOVA-One Way Classification-
Example
A supermarket is interested in knowing
whether it should go for a quarter-page,
half-page, or a full-page advertisement
for a Product.
In order to choose the size of the
advertisement that will bring in the most
store traffic, the supermarket can use
ANOVA technique.
Here, you are trying to establish a cause-
effect relationship between store traffic
and the various sizes of advertisement.
9
ANOVA-One Way Classification
How One-Way Classification Works in Practice?
10
One Way ANOVA- Application
Sporting goods manufacturing company wanted to compare the
distance traveled by golf balls produced using four different designs.
Ten balls were manufactured with each design and were brought to the
local golf course for the club professional to test. The order in which the
balls were hit with the same club from the first tee was randomized so
that the pro did not know which type of ball was being hit. All 40 balls
were hit in a short period of time, during which the environmental
conditions were essentially the same. The results (distance traveled in
yards) for the four Designs are stored in Golfball.csv
Mean F
Df Sum Sq Sq value Pr(>F)
2.73E-
Design 3 2990.99 997.00 53.03 13
Residu
als 36 676.82 18.80
TukeyHSD Test
diff lwr upr p adj
Design2- 11.90 6.679 17.12 2.65E-
Design1 20 5 45 06
Design3- 19.97 14.75 25.19 1.64E-
Design1 40 15 65 11
Design4- 22.00 16.78 27.23 8.89E-
Design1 80 55 05 13
Design3- 8.072 2.849 13.29 0.0010
Design2 0 5 45 3
Design4- 10.10 4.883 15.32 4.51E-
Design2 60 5 85 05
Interesting Application of Two Factor Anova/Ancova
Testing The Effects of Price and Advertising