Anova Test
Anova Test
LEARNING OBJECTIVES
r07-1 Understand the need for Analysis of Variance (ANOVA).
1072 Understand the difference between two-sample t-test for mean and ANOVA.
107-3 Understand one-way ANOVA and calculation of F-statistic.
107-4 Understand computation within the group variation, between the group variation and F-statistic.
IL07-5 Learn toconduct atwo-way ANOVA and the computations involved in conducting a
two-way ANOVA.
ANALYSIS OF VARIANCE In many situations we may have conduct a hypothesis test to compare mean
values simultaneously for more than two groups (samples) created using
a factor (or factors). For example, a marketer may like to understand the
impact of three different discount values (such as 0%, 10%, and 20% dis
count)on the average sales. When we have to compare the impact of a factor
of the factor)
on mean on more than two groups (created by different levels
simultaneously, the hypothesis tests such as two-sample t-tests discussed in
Chapter 6are not ideal approach since they can result in incorrect Type I and
understand the
Type II errors. We use the Analysis of Variance (ANOVA) to
differences in population means among more than two populations.
IMPORTANT
mean from more than
The objective of ANOVA is to check simultaneously whether population
two populations are different.
diferentfrom
IMPORTANT
verify whether the ariation due to treatmentfis
In ANOVA, our objective is to
the variation due to randomness.
Analysis of Varíance 191
However, when we want to test the hypothesis simultaneously, the Type Iand Type II errors will not be
same if we conduct the three different tests listed above. For examnple, assume that the mean sale (popu
lation mean) at 0%, 10%, and 20% discount is , M and , respectively. Consider the following three
two-sample t-tests shown in Table 7.1.
Let
u=Overall mean
n j=1 j=!
To arrive at the statistic, we calculate the following measures, which are variations within group and
between groups:
1. Sum of Squares of Total Variation (SST): Total variation is the sum of squares variation of all
values of response variable (Y) from the overall mean (4) and is given by
k
The degrees of freedom for SST is (n -1) since only the value of is estimated from n observa
tions and thus only one degree of freedom is lost. Mean Square Total (MST) variation is given by
SST
MST = (7.5)
n-1
2. Sum of Squares of Between (SSB) Group Variation: Sum of squares of between group variation
is the sum of squares variation between the group mean (4) and the overall mean () of the data
and is given by
SSB = x(4, -) (7.6)
194 Business Analytics
The degrees of freedom for SSB is (k 1). Since the overal mean uis
estimated
one degree of freedom is lost. Mean square between variation (MSB) is given by from the data,
SSB
MSB
k-1
3. Sum of Squares of Within (SSW) Group Variation: Sum of squares of within the
group
is the sum of squares variation of all observations (Y) from that group mean (u) and iis variation
given by
SSW = E2Y,-u,)
i=l j=l (78)
The degrees of freedom for SSWis (n k). Here k degrees of freedom are lost since we estimate
kgroup means (u). The mean square of variation within the group is
SSW
MSW =
n-k (79)
We can prove algebraically
- ) - , x(4 -u}+SY,-4)}
i=l i=l j=l
(7.10)
i=l j=l
That is
SST= SSB + SSW (7.1)
Note that, in Eq. (7.11) the SST is decomposed into twwo sums of squares (SSB and SSW) and thus,
SSB/a²and SSWlo²are chi-square variables.
the means are equal in the null hypothesis, the actual testing is carricd out by checking whether the
variation between groups is higher than within the groups, thus it is a one-tailed (right-tailed)test. It 1S
important to note that rejecting the null hypothesis will not tcll us exactly which means differ from cach
other, but it will only indicate that there is a difference in at least one of the group means. We may have
to conduct two-sample t-tests to find which mean values are diferent.
EXAMPLE 7.1
Ms Rachael Khanna the brand manager of ENZO detergent powder at the 'one stop'
retail was interested in understanding whether the price discounts has any impact
on the sales quantity of ENZO. To test whether the price discounts had any impact,
price discounts of 0% (no discount), 10% and 20% were given on randomly selected
days. The quantity (in kilograms) of ENZO sold in a day under different discount
levels is shown in Table 7.2. Conduct a one-way ANOVA to check whether discount
had any significant impact on the average sales quantity at a= 0.05.
37 34 28 36 38 38 34 31 39 36
34 25 33 26 33 26 26 27 32 40
10% Discount
34 41 45 39 38 33 35 41 47 34
47 44 46 38 42 33 37 45 38 44
35 34 34 37 39 34 34 36 41
38
20% DisCount
43 44 46 41 52 43 42 50 41
42
55 47 48 41 42 45 48
41 47 55
43 47 55 49 46 55 42
40 50 52
Solution:
38.77,
In this case, the number of groups k = 3; n, = n, = n, = 30; , = 32, u, =
4, = 46.4; and u = 39.05.
The sum of squares of between groups variation (SSB) is given by
+(46.4-39.05)°] =3114.156
So
SSB 3114.156
MSB = = 1557.078
k-1 2
The sum of squares of
within the group variation is
given by
sSW = , 30
j=1
30 j=1
EXAMPLE 7.2
Share Raja Khan (SRK) is a top stockbroker and believes that the average annual
stock return depends on the industrial sector. To validate his belief, SRK collected
annual return of shares from three different industrial sectors - consumer goods,
services, and industrial goods. The annual return of shares in 2015-2016 for differ
ent sectors is shown in Table 7.4.
8.73% 13.85% 5.29% 9.06% 2.84% 5.82% 7.66% 4.12% 9.10% 8.76%
10.77% 1.48% 4.719% 10.66% 0.44% 2.94% 6.55% 2.84% 3.90% 7.28%
SSW=
i=l j=l j=l
30
The critical F-value with degrees of freedom (2, 87) for a= 0.05 is 3.101 [Excel func-
tion FINV(0.05, 2, 87) or FINVRT(0.05, 2, 87)]. The P-value for F,.= 2.592 is 0.080
[using Excel function FDIST(2.592, 2, 87) or EDIST.RT(2.592, 2, 87)]. Since the cal-
culated F-statistic is less than the critical F-value, we retain the nul hypothesis and
conclude that the average annual returns under industrial sectors consumer gO0de e
vices, and industrial goods are not different (Figure 7.2 shows the F-critical value and
F-statistic value for an F-distribution with degrees of freedom 2 and 87 for numerzto
and denominator, respectively). The Excel output of ANOVA is shown in Table 7.5.
F Statisticvalue =2.592 F critical value =3.101
1.2
0.8
0.2
1 2 4 5 6 7
EXAMPLE 7.3
Table 7.7 shows the sales quantity of detergents at diferent discount values and dif
ferent locations collected over 20 days. Conduct a two-way ANOVA at =0.05 to
test the effects of discounts and location on the sales.
TABLE 7.7 Sales quantity at different locations under different discount rates
Location 1 Location 2
Discount Discount
0% 10% 20% 0% 10% 20%
20 28 32 20 19 20
16 23 29 21 27 31
24 25 28 23 23 35
20 31 7 s 19 30 25
19 25 30 25 25 31
10 24 26 22 21 31
24 28 37 25 33 31
16 23 33 26 23
25 26 27 26 22 27
16 25 31 22 28 32
18 22 37 25 24 22
20 24 28 23 23 29
17 26 25 23 26 25
26 28 23 24 16 34
16 21 26 20 30 30
21 27 33 23 22 25
24 25 28 18 16 39
19 20 30 19 25 32
19 26 30 19 34 29
21 26 26 30 23 22
The two-way ANOVA with replication (since the data in Table 7.7 is repeated for
locations) output from Microsoft Excel is shown in Table 7.8.
Busihess AnalytlCs
for
The two way ANOVA with replication (since the data in Table 7.7 is repeated
locations) output from Microsoft Excel is shown in Table 7.8.
In Table 7.8, the sample stands for the row factor (which in this case is location)
column stands for the column factor (discount in this case), and interaction stands
for interaction effect (location x discount). The p-value for locations (data in rows
is 0.5065, thus it is not statistically significant (we retain the null hypothesis that the
locations have no statistically significant influenceon sales), whereas for discount rates
(data in column) the p-value is 1.06 x 10-15, so we reject the null hypothesis (that is dis
count rate has influence on sales). The p-value for the interaction effect is 0.0724 and
isnot significant. That is, only the factor discount is statistically significant at a= 0.05.
SUMMARY
1. Analysis of Variance (ANOVA) is a hypothesis testing procedure used for comparing means rom s
groups simultaneously.
2. In. aone-way ANOVA, we test whether the mean values of an outcome variable for diferent
levels of afa
are different. Using multiple two-sample t-tests to simultaneously test group means will result n
estimation of Type Ierror and ANOVA overcomes this problem. signiticanceote
3. ANOVA plays an important role in multiple linear regression model diagnostics. The overall
model is tested using ANOVA.
4. In atWo-way ANOVA we check the impact of more than one factor simultaneously on several groups
3. An original equipment manufacturer of awashing machine is interested in finding the impact of three different
ttheechnolwashing
ogies machinereliability
on the washing machine.
of the between failures number
Data on time (in
manufactured using different technologies is shown in Table 7. 10. Conduct a one-way
of days) of
ANOVA at a= 0.01tocheck whether themcan tinmes between failures are diterent for ditferent technologies.
204 Business Analytics
technotoqies
TABLE 7.10 Time between failures of washing machine under different
Technolegy 1
327 366 270
340 324 326 319 358 287
195 292 307 250
271 343 327 304 59
392 293 252 315
292 303 328 298 294 353
336 295 339 290
327 299 298 324 363 337
313 329 274 407
451 370 331 413 371 322
Technology 2
369 385 296 360 330 360 353 345
362 334
357 363 329 346 404 403 325
352 360 275
Technology 3
375 403 437 418 375 410 358 305
352 419
367 400 360 349 375 395 405 382
432 418
389 427 391 363 380 419 376
400 327 320
TABLE 7.11 Data related to salary, degree discipline and year of experience
Engineering
1.36 1.54 1.97 1.95
1.57 1.26 1.53 1.45 1.26 1.52
0.84 1.68 1.66 1.77 1.02
1.6 1.64 0.76 1.38 2.16
Less than 2years 1.7
1.77 0.9 1.39 2.08 1.8
2.47 1.75 1.45 1.86
1.63 2.14 2.61
2.09 1.77 2.07 2.15 1.18 2.18 1.89
1.73 2.46 1.61 2.07
More than 2years 2.05 1.41 1.28 1.1 2.12 2.06
1.57 1.63 1.99 2.07
2.15 1.8 2.53 2.09 2.65 2.51
Science
1.97 1.77
1.57 1.62 0.76 1.85 1.18
1.18 1.47 1.72 1.55 L.29
Less than 2years 1.09 1.31 1.3
1.43 1.44 0.98 1.16 1.75 0,94 164
2.15 1.43 1.77 1.81 1.59 1.64 0.85 2.8
1.59
More than 2years 1.81 2.18 2.04
1.88 2.11 1.43 1.69 2.1 1.69
Commerce L79
1.18 2.3
2.23 1.99 2.78 1.91 2.72 2.13 2, 18 L37
Less than 2years 2.21
2.09 2.03 2.18 1.6 2.1
2.27 1.09 2.25 2.13 2.72 1L75
1.3 2.03 1.5 L82
More than 2years 2.24 2.18 2.44 2.84
1.87 2.72
1.58 2.12 2,46 2.43 1.96 1.55 1.95
References 205
REFERENCES
1. Fisher RA (1934), "Statistical Methods for Research Workers, Oliver and Boyd, London.
2. Kirk R E(1995), Experimental Designs: Procedures for the Behavioural Sciences', 34 Edition, Brooks Cole.
New York.
3. Kutner M H, Nachtsheim N J, Nester J, and Li W (2013), "Applied Linear Statistical Models 5th Edition, McGraw
Hill.
4. Lunney G H (1969), "Individual Size for Multiple t-Tests, American Educational Research Journal, 6(4), 701-703.
5. SiegelA F(1990), "Multiple t-Tests: Some Practical Considerations"; TESOL Quarterly, 24(4), 773-775.