0% found this document useful (0 votes)
42 views17 pages

Anova Test

The document discusses Analysis of Variance (ANOVA), a statistical method used to compare means across multiple groups simultaneously, which is essential when dealing with more than two samples. It highlights the differences between ANOVA and two-sample t-tests, emphasizing the importance of avoiding Type I and Type II errors when comparing multiple groups. The document also outlines the conditions for one-way ANOVA, the setup of hypotheses, and the calculations involved in determining the F-statistic.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views17 pages

Anova Test

The document discusses Analysis of Variance (ANOVA), a statistical method used to compare means across multiple groups simultaneously, which is essential when dealing with more than two samples. It highlights the differences between ANOVA and two-sample t-tests, emphasizing the importance of avoiding Type I and Type II errors when comparing multiple groups. The document also outlines the conditions for one-way ANOVA, the setup of hypotheses, and the calculations involved in determining the F-statistic.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Analysis of Variance 7

"Analysis of variance is not amathematical theorem, but rather aconvenient


method of arranging the arithmetic:
Ronald Fisher

LEARNING OBJECTIVES
r07-1 Understand the need for Analysis of Variance (ANOVA).
1072 Understand the difference between two-sample t-test for mean and ANOVA.
107-3 Understand one-way ANOVA and calculation of F-statistic.
107-4 Understand computation within the group variation, between the group variation and F-statistic.
IL07-5 Learn toconduct atwo-way ANOVA and the computations involved in conducting a
two-way ANOVA.

ANALYSIS OF VARIANCE In many situations we may have conduct a hypothesis test to compare mean
values simultaneously for more than two groups (samples) created using
a factor (or factors). For example, a marketer may like to understand the
impact of three different discount values (such as 0%, 10%, and 20% dis
count)on the average sales. When we have to compare the impact of a factor
of the factor)
on mean on more than two groups (created by different levels
simultaneously, the hypothesis tests such as two-sample t-tests discussed in
Chapter 6are not ideal approach since they can result in incorrect Type I and
understand the
Type II errors. We use the Analysis of Variance (ANOVA) to
differences in population means among more than two populations.

IMPORTANT
mean from more than
The objective of ANOVA is to check simultaneously whether population
two populations are different.

7.1||NTRODUCTIONTO ANALYSIS OF VARIANCE (ANOVA)


Consider a retail store which would like to study the impact of different levels of price discounts (factor)
100%
on sales (outcome variable) of a specific product or brand. Price discount can range from 0% to
(theoretically). For easier understanding, assume that the levels of discounts are 0%, 10%, and 20%.
190 Business Analytics whether the variable price discount'
would like to
understand
called single factor
has any vg
marketing manager studies are experimental
Suchlevels of the factor (differenttlevels ofiprice discun)4
Different
The discount rates
cantimpact on
the
correspond
average sales quantity.
to different
randomlyto different units. In the
studie,
are assigned
10%and 20%) randomly since the quantity ofisales may cae prK.
ot
different levels (such as 0%,
different days chosen quantity may be higher than the:
also dependa
discounts, units refer to
possiblethat the
weekend sales
observational studiesin which we observe the
sales quarty
finance, imarrnpactketitg
ofthe week. It is with ha
the day cases, we may deal specialization in MBAsuch as analytics,
on weekdays. In
many
example, impact of units are not subjected to experiment,, is
factor on a variable, for graduates. Here the
income of the researcher. To understand whether
the effect (different which
etc. onthe
graduating
control of the we compare two models sas
leves
áa
probably not under the
significance onthe population
parameter, descrite.
statistical
factor) has any
below:
given by
1. Means Model: It is (1
the
of jh observation for jh factor level, u is
the outcome variable assumed to be a normaldistribution with mr
t
where Y, is the value of
of all observations, E, is the error often called the reduced model., in whi
mean value defined in Eq. (7.1) is
standard deviation a. Model
and
common for all levels of the factor. with mean 0 and
standard deviaticn
the mean is distribution
the error g,is normal standard deviation a
Sincewe assume that distribution with mean u and
variable Y, is a normal
g, the outcome
It is given by
2. Factor Effect Model:
Y,=u+,+E
factor i (or factor effect). , is the diier
and z is the effect of wouldbetochek
the overall mean case
In Eq. (7.2), u is and the factor level mean. Our
interest in this
calledfull model.
Te
between overall mean (7.2) is
ence
different from zero. The model
in Eq.
iszero lor
defined in Ea. (7.2) in which t
whether the values of z, are
special case of model
reduced model in Eq. (7.1) is a

the value of the


IMPORTANT has influence on
that the factor
Anon-zero t, value in Eq. (7.2) implies
Outcome variable Y.

diferentfrom
IMPORTANT
verify whether the ariation due to treatmentfis
In ANOVA, our objective is to
the variation due to randomness.
Analysis of Varíance 191

72|MULTIPLE t-TESTS FOR COMPARING SEVERAL MEANS


Continuing with the example from Section 7.1, if we had only two values for price discount, then we
could have used the two-sample t-test to check whether there is a statistically significant relationship
between price discount and average sales quantity. When we have more than two levels of discounts, one
option is to compare the population parameters two at atime (two discount values), For example, we can
compare the following three cases using two-sample t-test:

1. Test between 0% and 10%


2. Test between 0% and 20%
3. Test between 10% and 20%

However, when we want to test the hypothesis simultaneously, the Type Iand Type II errors will not be
same if we conduct the three different tests listed above. For examnple, assume that the mean sale (popu
lation mean) at 0%, 10%, and 20% discount is , M and , respectively. Consider the following three
two-sample t-tests shown in Table 7.1.
Let

P(A) = P(Retain H, in test A|H, in test A is true)


P(B) = P(Retain H, in test B|H, in test Bis true)
P(C) = P(Retain H, in test C|H, in test C is true)
Note that values of P(A) =PB) = P(C)=1-=l - 0.05 =0.95
The conditional probability of simultaneously retaining all 3 null hypotheses when they are true is
P(AnBn)=0.8573. Now consider the following null hypothesis:
H; M=40yo (7.3)
If we retain the null hypothesis based on the three individual t-tests, then the
significance or Type Ierror
is not a-value, but much higher than c (Lunney, 1969; Siegel, 1990). For the case discussed above, if we
retain the null hypothesis based on 3 individual tests, then the Type I error is 1 - 0.8573 = 0.1426. That is,
when more than 2 groups are involved,
checking the population parameter values simultaneously using
I-tests is inappropriate since the Type I and Type II errors will be estimated incorrectly. For this
we use analysis of variance (ANOVA) whenever we need to reason,
compare 3 or more groups for population
parameter values simultaneously.

TABLE 7.1 Three different two-population t-tests


Test Null Hypothesis Alternative Hypothesis Significance(c)
A
H;4,=A Q=0.05
a=0.05
=0.05
192 Business Analytics
7.3 | ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
Onc way ANOVA is appropriate under the following conditions:
1. We would like to study theimpact of asingle treatment (also known:as
(thus forming difterent groups) on a continuous response variable (or atfactor)
example discussed in Section 7.1, the variable 'price discount' is the
09%. 10%, and 20% price discounts are the different levels (3
outtreatcomne
(or
lev
vard
id
aif
a
ment factor
e
blre
en)t
levels in this case): and
of discount are likely to have varying impact on the sales of
come variable. We would like to understand the impact of different
the product, where aif erert
is theleomtam.
levels sales
of price
the response variable, sales. The term treatment' is used since one of the
ANOVA was to indthe impact of diferent fertilizer treeatments on agricultural initial discotapplicatiostnsuiet
by British statistician RA Fisher (1934). yield as
2. In each group, the population response variable follows a normal distribution and the samtole
subjects are chosen using random sampling.
3. The population variances for different groups are assumed to be same. That is,
response variable values within different groups is same. variability in te
Although conditions 2and 3are necessary for one-way ANOVA,the model is robust and minor w
tions of the assumptions may not result in incorrect decision about the null hypothesis. Howeve
need to check whether conditions 2 and 3 are met to ensure validity of ANOVAas a good practik
Normality assumption can be checked either using P-P plot (probability-probability plot) or using goe
ness of fit tests such as chi-square goodness of fit test. The equality of variance can be checked thou
the hypothesis test for equal variance discussed in Chapter 6.

7.3.1 | Setting up an Analysis of Variance


Assume that we would like to study the impact of a factor (such as discount) with k levels on a conti
ous variable (such as sales quantity). Then the null and alternative hypotheses for one-way A
given by
H; 4, =4,= M,=.. =
H,: Not all values are equal them cous
of
some
Note that the alternative hypothesis, 'not all uvalues are equal implies that ,
detine!
t, I,, .* values
be equal. The null hypothesis is equivalent to stating that the factor effects 7.1. Difterent Varube
Eq. (7.2) are zero. The hypothesis test can be visualized as shown in Figure response doa the
on levels
mean (4, H,, and u) imply statistically significant impact of factor levels
another if the factor
We expect the group means (u,, , and ,) to be closer to one
have any impact. withintheanahvia
fcases
If the mean values of different groups are not equal, then the variation of interestedin
be much smaller compared to variations between groups. Assunme that we are
single factor effect with k levels; thus we will have k groups.
193
Analysis of Variance

FIGURE 7.1 Comparing three means (u, , and ).


Let

k= Number of groups (or samples)


n,= Number of observations in group EM
i (i= 1, 2, ..., k)
k
n= Total number of observations
Y,= Observation jin group i
u = Mean of group i

u=Overall mean
n j=1 j=!
To arrive at the statistic, we calculate the following measures, which are variations within group and
between groups:
1. Sum of Squares of Total Variation (SST): Total variation is the sum of squares variation of all
values of response variable (Y) from the overall mean (4) and is given by
k

sST = i=l j=l


Y, (7.4)

The degrees of freedom for SST is (n -1) since only the value of is estimated from n observa
tions and thus only one degree of freedom is lost. Mean Square Total (MST) variation is given by
SST
MST = (7.5)
n-1

2. Sum of Squares of Between (SSB) Group Variation: Sum of squares of between group variation
is the sum of squares variation between the group mean (4) and the overall mean () of the data
and is given by
SSB = x(4, -) (7.6)
194 Business Analytics
The degrees of freedom for SSB is (k 1). Since the overal mean uis
estimated
one degree of freedom is lost. Mean square between variation (MSB) is given by from the data,
SSB
MSB
k-1

3. Sum of Squares of Within (SSW) Group Variation: Sum of squares of within the
group
is the sum of squares variation of all observations (Y) from that group mean (u) and iis variation
given by
SSW = E2Y,-u,)
i=l j=l (78)
The degrees of freedom for SSWis (n k). Here k degrees of freedom are lost since we estimate
kgroup means (u). The mean square of variation within the group is
SSW
MSW =
n-k (79)
We can prove algebraically
- ) - , x(4 -u}+SY,-4)}
i=l i=l j=l
(7.10)
i=l j=l
That is
SST= SSB + SSW (7.1)

7.3.2 | Cochran's Theorem


According to Cochran's theorem (Kutner et al., 2013, page 70):
"If Y, Y ...,Y are drawn from a normal distribution with mean uand standard deviation o and sum
of squares of total variation ([Eq. (7.11)] is decomposed into k sum of squares (SS) with degrees of free
dom df, then the ratio (SSr/ o') are independent y variables with df, degrees of freedom if 4,=n-1:
r=l

Note that, in Eq. (7.11) the SST is decomposed into twwo sums of squares (SSB and SSW) and thus,
SSB/a²and SSWlo²are chi-square variables.

7.3.3 |THE F-TEST


If the null hypothesis is true, then there will be no difference in the mean values which willbe resu
largerthan
difference between MSB and MSW. Alternatively, if the means are different, then MSB will and
the mean values
MSW. That is the ratio MSB/MSW will be close to 1 if there is no difference between MSB/MSWisa
will be larger than 1if the means are different. Following Cochran's theorem (Kirk, 1995) hypothesisis
ratio oftwo chi-square variate which is an F-distribution. Thus the statistic for testing the null
(7.12)
SSB /(k -1) MSB
F=
SSW I(n-k) MSW
whetherthe
Note that the test statistic is a one-tailed test (right tailed) since we are interestedin finding
checking
whether
variation between groups is greater than variation within the groups. Although we are
195
Analysis of Variance

the means are equal in the null hypothesis, the actual testing is carricd out by checking whether the
variation between groups is higher than within the groups, thus it is a one-tailed (right-tailed)test. It 1S
important to note that rejecting the null hypothesis will not tcll us exactly which means differ from cach
other, but it will only indicate that there is a difference in at least one of the group means. We may have
to conduct two-sample t-tests to find which mean values are diferent.

EXAMPLE 7.1

Ms Rachael Khanna the brand manager of ENZO detergent powder at the 'one stop'
retail was interested in understanding whether the price discounts has any impact
on the sales quantity of ENZO. To test whether the price discounts had any impact,
price discounts of 0% (no discount), 10% and 20% were given on randomly selected
days. The quantity (in kilograms) of ENZO sold in a day under different discount
levels is shown in Table 7.2. Conduct a one-way ANOVA to check whether discount
had any significant impact on the average sales quantity at a= 0.05.

TABLE 7.2 Sales of ENZO at different price discounts


No Discount (0% discount)
39 32 25 25 37 28 26 26 40 29

37 34 28 36 38 38 34 31 39 36

34 25 33 26 33 26 26 27 32 40

10% Discount
34 41 45 39 38 33 35 41 47 34

47 44 46 38 42 33 37 45 38 44

35 34 34 37 39 34 34 36 41
38
20% DisCount

43 44 46 41 52 43 42 50 41
42
55 47 48 41 42 45 48
41 47 55
43 47 55 49 46 55 42
40 50 52

Solution:
38.77,
In this case, the number of groups k = 3; n, = n, = n, = 30; , = 32, u, =
4, = 46.4; and u = 39.05.
The sum of squares of between groups variation (SSB) is given by

SSB =n, x (4-u)' =30x[(32 - 39.05) + (38.77 -39.05)


i=]

+(46.4-39.05)°] =3114.156
So
SSB 3114.156
MSB = = 1557.078
k-1 2
The sum of squares of
within the group variation is
given by
sSW = , 30

-4}-r,, -32' +Y,, -38.77)


30

j=1
30 j=1

+(1,, -46.4) =2056.567

MSW = SSW 2056.567


n-k 90 -3
=23.63
The F-statistic value is
MSB 1557.078
MSW =65.86
The critical F-value with 23.6387
tion FINV(0.05, 2, 87) ordegrees of freedom (2, 87) for 8=0.05 is 3.101 (Excel
x 10-15 [using Excel FINVRT(0.05, 2, 87)). The p-value for F.
function FDIST(65.86, 2, 87) or
func
2,87 =65.86 is 3.82
the calculated
F-statistic is much higher than the critical
hypothesis and conclude that the mean sales
FDIST.RT(65.86,
2, 87)]. Since
F-value, we reject the null
counts are different. The Excel output of quantity values under different dis
ANOVA is shown in Table 7.3.
TABLE 7.3 One-way ANOVA excel output for
Example 7.1
Anova: Single Factor
SUMMARY
Groups Count Sum
No Discount
30
Average Variance
960 32
10% Discount 27.17241
30 1163
20% Discount 38.76667 20.46092
30 1392 46.4 23.28276
ANOVA
Source of Variation SS df MS
Between Groups P-value Fcrit
3114.15556 2 1557.078
Within Groups 65.86986 3.82E-18 3.101296
2056.56667 87 23.6387
Total
5170.7222) 89
Example 7.l is an experimental studyin
the impact of tostudy
which the marketer was
discounts on sales. Example 7.2 is an trying we
understand impact of different sectors on
the observational study in which
stock returns.
Analysis of Variance 197

EXAMPLE 7.2

Share Raja Khan (SRK) is a top stockbroker and believes that the average annual
stock return depends on the industrial sector. To validate his belief, SRK collected
annual return of shares from three different industrial sectors - consumer goods,
services, and industrial goods. The annual return of shares in 2015-2016 for differ
ent sectors is shown in Table 7.4.

TABLE 7.4 Annual retum of stocks under different industrial sector


Annualreturn on 30 consumer goods stocks
6.32% 14.73% 11.95% 12.36% 10.28% 3.819% 10.15% 11.06% 6.29% 5.15%
8.44% 14.28% 8.89% 5.98% 6.96% 11.62% 5.22% 5.34% 5.93% 7.10%
10.91% 8.20% 10.19% 9.04% 8.61% 9.39% 2.63% 2.77% 4.76% 9.60%
Annual return on 30 services stocks
13.70% 3.58% 1.36% 17.41% 10.01% 10.88% 15.63% -0.04% 10.32% 7.40%
11.48% 9.71% 11.19% 8.21% 1.64% 1.45% 10.12% 13.85% -10.27% 5.26%
12.059% 4.47% 8.71% 5.59% 10.02% 7.65% 10.03% 7.87% 6.59% 13.60%

Annual return on 30 industrial goods stocks


6.749% 7.11% 5.69% 2.48% 5.42% 8.00% 2.55% 8.34% 4.99% 3.39%

8.73% 13.85% 5.29% 9.06% 2.84% 5.82% 7.66% 4.12% 9.10% 8.76%

10.77% 1.48% 4.719% 10.66% 0.44% 2.94% 6.55% 2.84% 3.90% 7.28%

Solution: In this case, the number of cases k = 3; n, =n, = n, = 30; 4 = 0.082,


, =0.079, u, =0.0605; andu=0.0743.
The sum of squares of between groups (SSB) variation is given by
SSB ==n,x( -u)' =30x[(0.082 -0.0743)° +(0.079-0.0743)°
+(0.0605-0.0743)']=0.0087
Therefore
SSB 0.0087
MSB = =0,0043
k-1 2

The sumn of squares of within thegroup variation is given by


30
30

-2X«,-H}-X, -0.082) + S(r,,-0.079)


k

SSW=
i=l j=l j=l
30

+ X(Y, -0.0605) =0.1463


j=l
So

SSW 0.1463 =0.0016


MSW =
n-k 90-3
The F-statistic value is
MSB 0.0043
=2.592
MSW 0.0016

The critical F-value with degrees of freedom (2, 87) for a= 0.05 is 3.101 [Excel func-
tion FINV(0.05, 2, 87) or FINVRT(0.05, 2, 87)]. The P-value for F,.= 2.592 is 0.080
[using Excel function FDIST(2.592, 2, 87) or EDIST.RT(2.592, 2, 87)]. Since the cal-
culated F-statistic is less than the critical F-value, we retain the nul hypothesis and
conclude that the average annual returns under industrial sectors consumer gO0de e
vices, and industrial goods are not different (Figure 7.2 shows the F-critical value and
F-statistic value for an F-distribution with degrees of freedom 2 and 87 for numerzto
and denominator, respectively). The Excel output of ANOVA is shown in Table 7.5.
F Statisticvalue =2.592 F critical value =3.101
1.2

0.8

0.6 Rejection region (= 0.05)


0.4

0.2

1 2 4 5 6 7

FIGURE 7.2 F-distribution with critical value for Example 7.2.

TABLE 7.5 Microsoft excel ANOVA table for Example 7.2


ANOVA: Single Factor
SUMMARY
Groups Count Sum
Consumer Goods Average Variance
30 2.4796 0.082653 0.00101
Services 30 2.3947 0.079823 0.003073
Industrial Goods 30 1.8151 0.060503 0.000963
ANOVA Fotiad
Source of Variation SS P-value 3,101296
df MS F 0.080572
Between Groups 0.008722 2 0.004361 2.59294
Within Groups 0.146317 87 0.001682
Total
0.155039 89
Analysis of Variance 199

74| TWO-WAY ANALYSISOF VARIANCE (ANOVA)


The values of response variable may be influenced by several factors. For example, in addition to price
discounts, location of the stores may also play an important role in the sales quantity. The discounts may
not have much impact if the store is located near affuent community compared to stores located near
non-afluent community. We would like to understand the impact of both factors (price discount and
location) simultaneously on sales by trying to answer to the following questions:
1. Are there differences in the average sales quantity with different levels of price discounts?
2. Are there differences in the average sales quantity with respect to different locations?
3. Are there interactions between price discounts and location with respect to average sales
quantity?

The two-way ANOVA model can be expressed as


Y=u+a +ß +aB, +¬ (7.13)

variable at level i (i= 1, 2, ..., a)


where Y = Value of the kh observation (k = 1, 2, ..., c) of the response
of factor A and levelj (j= 1, 2, ..., b) of factor B.
u=Overall mean value of the response variable Y,
a=Level (efect) of factor A(i=1, 2, .., a)
B= Level (effect) of factor B (j= 1, 2, ..., b)
aB = Interaction of r level of factor A and level of factor B
level jof factor B.
E,= Error associated with k of observation at level i of factor A and
The hypothesis tests associated with two-way ANOVA are as follows:

1. Test of FactorAMain Effects:


H,: a= 0 for all i (i=1, 2, ..., a)
H,: Not all a are zero

2. Test of Factor B Main Effects:


H,: ß=0for all j(j = 1, 2, ..., b)
H,:Not all ß are zero
3. Test of Interaction Effects:
H; aß=0 for all i(i=1, 2, ..., a) andj (j= 1, 2, ..., b)
H:Not all aß are zero
by (Fisher, 1934)
The sum of squares in the case of two-way ANOVA with equal sample sizes is given
(7)
SST= SSA + SSB+ SSAB + SSE
200
Business Analytics
Various components in Eq. (7.14) are provided below:
1. Sum of squares of total
deviation (SST):
SST =
i=1 j=I k=1 (715)
where cis the number of observations in each group and is the overal mean.
2. Sumn of squares of deviation due to factor A (SSA):

SSA =bxcx (4 i=1


(7.16,
where u is the mean of allobservations in level i of factor Aand cis the number of observations
in each group (assumed to be same for all groups).
3. Sumn of squares of deyiation due to factor B (SSB):
b

SSB =axcX (u, -u (7.17)


j=l

Here 4,is the mean of all observations in levelj of factor B.


4. Sum of squares of deviation due to interaction of factors A and B (SSAB):

SSAB =cx i=l j-l 4-4-H+u) (7.18)

where , is the average of fh level of factor A and th level of factor B.

5. Sum of squares of deviation within a group (SSW):


b
(7.19)
sSw - )-4}
i=l j=l k=l
Inumberu
Different factors, degrees of freedom, and F-statistic for two-way ANOVA withequal
samples is given in Table 7.6.
samplet
with equal
TABLE 7.6 Sum of squares of deviation for various effects and the corresponding F-statistic in atwo-way ANOVA
F-Statistics
Sum of Squares Variation Degrees of Freedom Mean Squared Variation F=MA/MSH
SSA a-1 MSA= SSA/a-1) f=MSBMSW
SSB b-1 t=MSABMy
MSB=SSB/b-1)
SSAB (a-1)(6-1) MSAB =SSAB/a-1)(6--1)
SSW ablc-1) MSW= SSW/ablc-1)
Analysis of Variance 201

EXAMPLE 7.3

Table 7.7 shows the sales quantity of detergents at diferent discount values and dif
ferent locations collected over 20 days. Conduct a two-way ANOVA at =0.05 to
test the effects of discounts and location on the sales.

TABLE 7.7 Sales quantity at different locations under different discount rates
Location 1 Location 2
Discount Discount
0% 10% 20% 0% 10% 20%
20 28 32 20 19 20
16 23 29 21 27 31
24 25 28 23 23 35
20 31 7 s 19 30 25
19 25 30 25 25 31
10 24 26 22 21 31
24 28 37 25 33 31
16 23 33 26 23
25 26 27 26 22 27

16 25 31 22 28 32
18 22 37 25 24 22
20 24 28 23 23 29
17 26 25 23 26 25

26 28 23 24 16 34
16 21 26 20 30 30
21 27 33 23 22 25
24 25 28 18 16 39

19 20 30 19 25 32
19 26 30 19 34 29

21 26 26 30 23 22

The two-way ANOVA with replication (since the data in Table 7.7 is repeated for
locations) output from Microsoft Excel is shown in Table 7.8.
Busihess AnalytlCs

for
The two way ANOVA with replication (since the data in Table 7.7 is repeated
locations) output from Microsoft Excel is shown in Table 7.8.

TABLE 7.8 Two way ANOVA with replication exel output


ANOVA
p-value fcrit
Source of Variation SS df MS
7.008333 0.443898 0.506593 3.92433
Sample (Location) 7.008333 1

620.1583 39.27997 1.06E-13 3.075853


Columns (Discount) 1240.317 2
42.40833 2.686085 0.07246 3.075853
Interaction 84.81667 2

Within 1799.85 114 15.78816

Total 3131.992 119

In Table 7.8, the sample stands for the row factor (which in this case is location)
column stands for the column factor (discount in this case), and interaction stands
for interaction effect (location x discount). The p-value for locations (data in rows
is 0.5065, thus it is not statistically significant (we retain the null hypothesis that the
locations have no statistically significant influenceon sales), whereas for discount rates
(data in column) the p-value is 1.06 x 10-15, so we reject the null hypothesis (that is dis
count rate has influence on sales). The p-value for the interaction effect is 0.0724 and
isnot significant. That is, only the factor discount is statistically significant at a= 0.05.

SUMMARY

1. Analysis of Variance (ANOVA) is a hypothesis testing procedure used for comparing means rom s
groups simultaneously.
2. In. aone-way ANOVA, we test whether the mean values of an outcome variable for diferent
levels of afa
are different. Using multiple two-sample t-tests to simultaneously test group means will result n
estimation of Type Ierror and ANOVA overcomes this problem. signiticanceote
3. ANOVA plays an important role in multiple linear regression model diagnostics. The overall
model is tested using ANOVA.
4. In atWo-way ANOVA we check the impact of more than one factor simultaneously on several groups

MULTIPLE CHOICE QUESTIONS


ror a One-way ANOVA, which of the following assumptions should be satisted:
(a) The samples are drawn from a
(b) The response variable should benormal population.
a continuous variable.
(c) The standard deviation of
(d) All of above different groups should be equal.
Analysis of Variance 203
2. For an experiment with asingle factor with k levels with n
squares of variation within the group is observations, the degrees of freedom for sum of
(a) n-1 (b) k-1 (c) n - k (d) n - k+ 1
3, For a one-way ANOVA, the hypothesis test is a
(a) Right-tailed test (b) Left-tailed test
(c) Two-tailed test
(d) Depends on null hypothesis
4. Adata scientist is studying the impact of marital status (single,
married, and
The sample contains 140 singles, 110 married, and 40 divorced people. The divorced) on the annual income.
49, 567.8. The value of F-statistic is values of SSB = 2425.6 and SSW=
(a) 20.43 (b) 0.0489 (c) 7.02 (d) 0.1424
5. In a two-way ANOVA
(a)The number of factors is two (b) The number of levels in a factor is two
(c) The number of factors is more than two (d) The
number of levels under each factor is more than two
EXERCISES
1. If 10 t-tests are conducted at =0.05 and all are
statically significant, calculate the value of Type I error that all
tests are simultaneously significant.
2. Ms Sophia Smith, Senior Manager of Career
Development Services (CDS) at the Institute of Science and
Business (ISB), believes that the salary of graduating MBA students depends on their degree of specialization.
To test her belief, Ms Smith collected discipline-wise
graduating annual salary (in millionsof rupees) from
2016 graduating students and the data is shown in Table 7.9. Conduct a one-way
ANOVA at a= 0.05 to check
whether the annual salary depends on the degree discipline.
TABLE 7.9 Annual salary of graduating students in millions of rupees for different degree disciplines
Engineering
1.79 2.34 2.83 2.52 1.92 1.72 2.33 2.08 1.84 2.12
2.20 2.76 1.81 2.42 1.66 2.14 2.93 2.03 2.60 2.02
2.11 2.39 1.92 2.35 2.55 2.82 1.81 2.45 2.56 2.13
1.79 2.61 1.32 1.95 2.47 1.91 2.36 2.43 2.04 2.35
Science
2.91 2.19 1.72 2.02 1.36 1.84 1.88 1.75 2.04 1.76
2.32 2.11 1.86 1.86 1.97 2.76 2.62 1.61 1.58 1.57
2.20 1.61 1.56 1.59 1.86 2.56 1.55 1.90 1.47 2.12
Commerce
1.58 2.42 2.55 1.79 1.91 0.82 0.63 2.14 L21 2.65
2.24 2.11 1.83 1.68 2.06 0.51 2.92 2.53 1.27 2.70

3. An original equipment manufacturer of awashing machine is interested in finding the impact of three different
ttheechnolwashing
ogies machinereliability
on the washing machine.
of the between failures number
Data on time (in
manufactured using different technologies is shown in Table 7. 10. Conduct a one-way
of days) of

ANOVA at a= 0.01tocheck whether themcan tinmes between failures are diterent for ditferent technologies.
204 Business Analytics
technotoqies
TABLE 7.10 Time between failures of washing machine under different
Technolegy 1
327 366 270
340 324 326 319 358 287
195 292 307 250
271 343 327 304 59
392 293 252 315
292 303 328 298 294 353
336 295 339 290
327 299 298 324 363 337
313 329 274 407
451 370 331 413 371 322
Technology 2
369 385 296 360 330 360 353 345
362 334
357 363 329 346 404 403 325
352 360 275
Technology 3
375 403 437 418 375 410 358 305
352 419
367 400 360 349 375 395 405 382
432 418
389 427 391 363 380 419 376
400 327 320

Mr Dicki Bird, Chairman of the Career Developrmen


4. In continuation of Question 2, Ms Smith was told by degree discipline. Ms Snt
Services that one should also look at the work experience in addition to the
more than 2 years of experience and collecte
grouped the studentswith less than 2 years of experience and
Conduct a two-way ANOVA at a=0.05 to check whether the
a new set of data which is shown in Table 7.1l.
have an impact on the graduating salary.
factors - degree discipline and years of work experience -

TABLE 7.11 Data related to salary, degree discipline and year of experience
Engineering
1.36 1.54 1.97 1.95
1.57 1.26 1.53 1.45 1.26 1.52
0.84 1.68 1.66 1.77 1.02
1.6 1.64 0.76 1.38 2.16
Less than 2years 1.7
1.77 0.9 1.39 2.08 1.8
2.47 1.75 1.45 1.86
1.63 2.14 2.61
2.09 1.77 2.07 2.15 1.18 2.18 1.89
1.73 2.46 1.61 2.07
More than 2years 2.05 1.41 1.28 1.1 2.12 2.06
1.57 1.63 1.99 2.07
2.15 1.8 2.53 2.09 2.65 2.51
Science
1.97 1.77
1.57 1.62 0.76 1.85 1.18
1.18 1.47 1.72 1.55 L.29
Less than 2years 1.09 1.31 1.3
1.43 1.44 0.98 1.16 1.75 0,94 164
2.15 1.43 1.77 1.81 1.59 1.64 0.85 2.8
1.59
More than 2years 1.81 2.18 2.04
1.88 2.11 1.43 1.69 2.1 1.69
Commerce L79
1.18 2.3
2.23 1.99 2.78 1.91 2.72 2.13 2, 18 L37
Less than 2years 2.21
2.09 2.03 2.18 1.6 2.1
2.27 1.09 2.25 2.13 2.72 1L75
1.3 2.03 1.5 L82
More than 2years 2.24 2.18 2.44 2.84
1.87 2.72
1.58 2.12 2,46 2.43 1.96 1.55 1.95
References 205

REFERENCES

1. Fisher RA (1934), "Statistical Methods for Research Workers, Oliver and Boyd, London.
2. Kirk R E(1995), Experimental Designs: Procedures for the Behavioural Sciences', 34 Edition, Brooks Cole.
New York.
3. Kutner M H, Nachtsheim N J, Nester J, and Li W (2013), "Applied Linear Statistical Models 5th Edition, McGraw
Hill.
4. Lunney G H (1969), "Individual Size for Multiple t-Tests, American Educational Research Journal, 6(4), 701-703.
5. SiegelA F(1990), "Multiple t-Tests: Some Practical Considerations"; TESOL Quarterly, 24(4), 773-775.

You might also like