0% found this document useful (0 votes)
9 views22 pages

ST ACtg 5

Uploaded by

gre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

ST ACtg 5

Uploaded by

gre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

4.4.

Further Analysis within ANOVA

1) Estimation of the effects

Fixed effects model:

αi = µi − µ is estimated by ai = (x̄i − x)
if H0 : µ1 = µ2 = · · · = µk is rejected.

Random effects model:


2 = 0 is rejected, then we estimate the
If H0 : σA
2 among the population means by
variability σA
Pk
MSB − MSW N − i=1 n2
2
s2
A= with n0 = i ,
n0 N (k − 1)
where N = n1 + n2 + · · · + nk , with n0 = n in
the special case of equal sampel sizes ni = n
in all groups.

121
2) Planned Comparisons: Contrasts

The hypothetical data below shows errors in


a test made by subjects under the influence
of two drugs in 4 groups of 8 subjects each.
Group A1 is a control group, not given any
drug. The other groups are experimental
groups, group A2 gets drug A, group A3 gets
drug B, and group A4 gets both drugs.

122
Suppose we are really interested in answering
specific questions such as:
1. On the average do drugs have any effect
on learning at all?
2. Do subjects make more errors if given
both drugs than if given only one?
3. Do the two drugs differ in the number of
errors they produce?

All these questions can be formulated as null


hypotheses of the form

H0 : λ1µ1 + λ2µ2 + · · · + λk µk = 0,
where λ1 + λ2 + · · · + λk = 0.

For example, the first question asks whether


the mean of group A1, µ1, differes from the
average of the means for the groups A2, A3,
and A4, (µ2 + µ3 + µ4)/3. That is, we wish
to test the null hypothesis
1 1 1
H0(1) : µ1 − µ2 − µ3 − µ4 = 0.
3 3 3
123
A contrast is a combination of population
means of the form
X X
ψ= λiµi, where λi = 0.
The corresponding sample contrast is
X
L= λix̄i.
In our example:
1 1 1
L1 = X̄1 − X̄2 − X̄3 − X̄4.
3 3 3
Now, since each individual observation Xij is
distributed as N (µi, σ 2) and all observations
are independent of each other, each sample
mean X̄ is distributed as N (µi, σ 2/ni), such
that
X λ2
!
2 i
X
L∼N λiµi, σ ,
ni
which implies that under the null hypothesis
L
r
2
∼ N (0, 1).
P λi
σ ni

124
Now recalling that the square of a standard
normally distributed random variable is al-
ways χ2-distributed with df = 1, we obtain
2 P 2
denoting Q = L /( λi /ni):

Q/σ 2 ∼ χ2(1).
Furthermore, using SSW/σ 2 ∼ χ2(N −k), we
obtain that:

Q/MSW ∼ F (1, N −k),


the square root of which has a t-distribution,
that is:
L
t= ∼ t(N −k),
s(L)
where
v
uX λ2 √
u
s(L) = st i and s= MSW,
ni
which simplifies to
sP
λ2i for equal sample sizes n.
s(L) = s
n
s(L) is called the standard error of the contrast.

125
This suggests to test the null hypothesis
H0 : λ1µ1 + λ2µ2 + · · · + λk µk = 0
with a conventional t-test of the form
L
t= ∼ t(N −k).
s(L)

Example: (continued.)
1 1 1
L = X̄1 − X̄2 − X̄3 − X̄4
3 3 3
1
= 6.75 − (10.375 + 8.625 + 13.75)
3
= −4.167,
s P 2 s
M SW λi 7.38393 (1 + 3/9)
s(L) = =
n 8
= 1, 109 such that
−4.167
t= = −3.76.
1.109
The associated p-value in a two-sided test is
T.DIST.2T(3.76;28)=0.08%, and in a one-
sided test against H1 : µ1 < (µ2 + µ3 + µ4)/3,
T.DIST.RT(3.76;28)=0.04%, which provides
clear statistical evidence that the drugs con-
sidered increase error rates.
126
The remaining questions may be tackled in
an analogous way:

1. Do subjects make more errors if given


both drugs than if given only one?
1 (µ + µ ) = 0
H0(2) : µ4 − 2 2 3

2. Do the two drugs differ in the number of


errors they produce?
H0(3) : µ2 − µ3 = 0

The computational steps in conducting the


test are summarized in the tables below:
λ2i
P
A1 A2 A3 A4
X̄i 6.750 10.375 8.625 13.750
λi (1) 1 -1/3 -1/3 -1/3 4/3
λi (2) 0 -1/2 -1/2 1 3/2
λi (3) 0 1 -1 0 2

L s(L) t p
H0 (1) -4.167 1.109 -3.76 0.0008
H0 (2) 4.250 1.177 3.61 0.0012
H0 (3) 1.750 1.359 1.29 0.208

127
Contrasts in Excel

The Real Statistics Single Factor ANOVA


tool has an option to calculate contrasts. Af-
ter entering your contrast weights (the λ’s)
into the grey shaded area labeled ‘c’ you will
get the sample contrast L, its standard er-
ror s(L), the corresponding t-statistics and
its p-value.

Below is the contrast for testing H0(2) as an


example:

128
Orthogonal hypotheses

The 3 hypotheses in the preceding section


were special in the sense that the truth of
each null hypothesis is unrelated to the truth
of any others. If H0(1) is false, we know that
drugs have some effect upon errors, although
H0(1) says nothing about whether the effect
of taking both drugs simultaneously is differ-
ent from the average of the effects of the
drugs taken seperately. Similarly, if H0(2) is
false, we know that the effect of taking both
drugs is different from taking only one, but
we don’t know which of the two drugs taken
individually is more effective.

A set of contrasts such that the truth of any


one of them is unrelated to the truth of any
other is called orthogonal. For equal sample
sizes in each group the orthogonality of two
P P
contrasts L1 = λ1ix̄i and L2 = λ2ix̄i may
be assessed by checking the orthogonality
condition
X
λ1iλ2i = 0.

129
If the group sizes ni involved in the contrast
are not all identical, the orthogonality condi-
tion becomes
X λ1iλ2i
= 0.
ni

In an ANOVA with k groups, no more than


k − 1 orthogonal contrasts can be tested.
Such a set of k − 1 orthogonal contrasts is
however not unique.

In practice the researcher will select a set of


orthogonal contrasts such that those con-
trasts of particular interest in the research
question are included. Once such a set is
found, it exploits all available information that
can be extracted for answering the maximum
amount of k − 1 independent questions, that
can be asked in form of contrasts.

130
To illustrate the importance of orthogonal hypoth-
esis, assume that instead of testing the orthogonal
hypotheses H0 (1) − H0 (3) we would have tested the
original hypothesis
1 1 1
H0 (1) : µ1 − µ2 − µ3 − µ4 = 0
3 3 3
together with
H0 (4) : µ4 − µ1 = 0,

that is that error rates are the same when taking both
drugs or taking no drugs, upon the data below:

131
The two hypotheses are not orthogonal:

A1 A2 A3 A4
X̄i 5.50 5.75 5.75 9.00
λi(1) 1 -1/3 -1/3 -1/3
λi(4) -1 0 0 1

We may therefore get conflicting results from


both hypothesis tests:

L s(L) t p
H0(1) -1.33 1.21 -1.10 0.281
H0(4) 3.50 1.49 2.36 0.026

In this example, we safely accept H0(1) that


taking drugs has no impact on error rates,
while we strongly reject H0(4) that taking
both drugs has no impact.

132
Constructing Orthogonal Tests

If all sample sizes are equal (ni = n=const.),


orthogonal tests may be constructed based
on the principle that the test on the differ-
ences among a given set of means is orthog-
onal to any test involving their average.

Consider our original set of hypotheses:


1
H0(1) : µ1 − (µ2 + µ3 + µ4) = 0
3
1
H0(2) : µ4 − (µ2 + µ3) = 0
2
H0(3) : µ2 − µ3 = 0

H0(1) involves the average of µ2, µ3, and


µ4, while H0(2) and H0(3) involve differences
among them. Similarly, H0(2) involves the
average of µ2 and µ3, while H0(3) involves
the difference between them.

133
A Word of Caution

Contrasts are a special form of planned com-


parisons, that is, the hypotheses with the
contrasts to be tested must be set up be-
fore taking a look at the data, otherwise the
associated p-values will not be valid.

The situation is similar to whether applying a one-


sided or a two-sided test. Assume you want to test
whether the means in two samples are the same using
the conventional significance level of α = 5%. You
apply a two-sided test and get a p-value of 8%, so you
may not reject. Let’s say as a step in your calculations
you figured out that x̄1 > x̄2 . You may be tempted
then to replace your original two-sided test against
H1 : µ1 6= µ2 by a one-sided test against H1 : µ1 > µ2 ,
which, technically, would allow you to divide your p-
value by 2 and get a significant result at 4%. But
that p-value of 4% is fraud, because the reason that
it is only one half of the two sided p-value is exactly
that without looking at the data, you also had a 50%
chance that the sample means would have come out
as x̄1 < x̄2 .
134
3) Multiple Comparisons (Post hoc Tests)

As seen above, contrasts must be formulated


in advance of the analysis in order for the
p-values to be valid. Multiple-comparisons
procedures, on the other hand, are designed
for testing effects suggested by the data after
the ANOVA F -test led to a rejection of the
hypothesis that all sample means are equal.
For that reason they are also called post hoc
tests.

All multiple comparisons methods we will dis-


cuss consist of building t-ratios of the form
x̄i − x̄j
tij = , or, equivalently,
SEij
setting up confidence intervals of the form
[(x̄i − x̄j ) ± t∗SEij ]
k(k − 1)
k 
for all = pairs that can be built
2 2
within the k groups.

The methods differ in the calculation of the


standard errors SEij and the distributions and
critical levels used in the determination of t∗.
135
Fisher’s LSD = least-significant difference

One obvious choice is to extend the 2 inde-


pendent sample t-test to k independent sam-
ples with test statistics
x̄i − x̄j √
tij = r , where s = MSW,
1
s n +n 1
i j

and to declare µi and µj different whenever


|tij | > tα/2(DFW), or, equivalently, whenever
 v 
u1 1 
u
0∈
/ (x̄i − x̄j ) ± tα/2(DFW) · st + ,
ni nj
where DFW= N − k in the case of one-way
ANOVA. This procedure fixes the probabil-
ity of a false rejection for each single pair of
means being compared as α.

This is a problem if the numbers of means be-


ing compared is large. For example, if we use
LSD with a significance of α=5% to compare
k=20 means, then there are k(k−1) 2 = 190
pairs of means and we expect 5% · 190 = 9.5
false rejections!
136
Bonferroni Method

The Bonferroni method uses the same test


statistic as Fisher’s LSD, but replaces the
significance
  level α for each single pair with
k
α0 = α/ 2 or equivalently the original p-value
k
 
with p0 = 2 p as a conservative estimate of
the probability that any false rejection among
all k(k−1)
2 comparisons will occur. This is also
called the experimentwise error rate. An ob-
vious disadvantage is that the test becomes
weak when k is large.

Example:
Comparison of A1 and A2 (original data):

x̄i − x̄j 6.75 − 10.375


|tij | = r = q = 2.668
1 1 2
7.38393 · 8
s n +n
i j

LSD:
pLSD = T.DIST.2T(2.668; 28) = 1.25%.
Bonferroni:
4·3
pB = · pLSD = 6 · 1.25% = 7.5%.
2
137
Tukey’s HSD = honestly significant difference

Tukey’s method for multiple comparisons de-


livers the most precise estimate of the proba-
bility that any false rejection among all paired
comparisons will occur. In its original form it
is only applicable in the case of equal sample
sizes in each group (ni = n), but corrections
for unequal sample sizes have been suggested
and are implemented in Real Statistics.
Tukey’s honestly significant difference is:
s
HSD = qα(k, DFW) · √ ,
n
where qα denotes the α-critical value from
the so called Studentized range distribution
tabulated e.g.√in table 6 of Aczel, and s is
estimated by MSW, as usual.
Two group means µi and µj are declared dif-
ferent at significance level α if
h i
0∈/ (x̄i − x̄j ) ± HSD ,
or, equivalently, if
x̄i − x̄j
√ > qα(k, DFW).
s/ n
138
Example:
Comparison of A1 and A2 (continued).

LSD and Bonferroni gave conflicting results


whether the difference between µ1 and µ2 is
significant at 5% or not (recall pLSD = 1.25%
and pB = 7.5%), because LSD underesti-
mates the probability of any false rejection
among all paired comparisons, whereas Bon-
ferroni overestimates it.

Applying Tukey’s procedure yields

x̄i − x̄j 6.75 − 10.375


√ = q = 3.77.
s/ n 7.38393/8
Looking up in a table or using the QCRIT
function from Real Statistics reveals that
q0.05(4; 28) = QCRIT(4;28;0.05;2) = 3.86,
which is larger than the statistic calculated
above. The difference between µ1 and µ2 is
therefore not significant at 5% level, if the
test was first suggested by the data.
Note: A planned comparison before looking at the
data in form of a contrast would have found a sig-
nificant difference with a p-value of 1.25%(=pLSD ).
139
Simultaneous Confidence Intervals
The idea with multiple comparisons in post-
hoc tests is usually to determine, which groups
may be combined into larger groups, that are
homogeneous in the sense that their group
means are not significantly different.
For that purpose, one constructs confidence
intervals of the form
[(x̄i − x̄j ) ± smallest significant difference],
where the smallest significant differences are:
v
u1 1
u
LSD : t α (DFW) · s t +
2 ni nj
v
u1 1
u
Bonferroni: t α k (DFW) · s t +
/
2 2( ) ni nj
s
HSD : qα(k, DFW) · √
n
Combinations of groups are considered homogeneous
at confidence level (1 − α), if none of their paired
comparisons lies outside the corresponding confidence
intervals; that is, pairs of means, the confidence inter-
vals of which include the value 0 will not be declared
significantly different, and vice versa.
140
Multiple Comparisons in Excel

The Real Statistics toolpack implements post


hoc tests as contrasts restricted to have all
weights coefficients λ1,...,k = 0, except for
the two groups i and j which are currently
compared with weights λi = 1 and λj = −1.
Choosing ‘Contrasts’ with ‘No correction’ un-
der ‘Alpha correction for contrasts’ within
the Single Factor ANOVA tool corresponds
then to Fisher’s LSD.

Note that Real Statistics implements the


‘Bonferroni correction’ under ‘Alpha correc-
tion for contrasts’ as dividing α by the max-
imal number of orthogonal contrasts k − 1
rather than
  the number of all possible com-
k
parisons 2 . Generally the Bonferroni cor-
rection should not be used, because it overes-
timates the experimentwise error rate, mak-
ing the test too conservative.

141
The best way to do multiple comparisons is
Tukey’s HSD, which is implemented as its
own option in the Single Factor ANOVA tool.

TUKEY HSD/KRAMER alpha 0.05


Groups mean n ss df q-crit
1 6.75 8 57.5
2 10.375 8 61.875
3 8.625 8 67.875
4 13.75 8 19.5
32 206.75 28 3.861
Q TEST
group 1 group 2 mean std err q-stat lower upper p-value x-crit
1 2 3.625 0.960724 3.773195 -0.08436 7.334356 0.05728 3.709356
1 3 1.875 0.960724 1.951653 -1.83436 5.584356 0.521842 3.709356
1 4 7 0.960724 7.28617 3.290644 10.70936 0.000103 3.709356
2 3 1.75 0.960724 1.821542 -1.95936 5.459356 0.577915 3.709356
2 4 3.375 0.960724 3.512975 -0.33436 7.084356 0.084534 3.709356
3 4 5.125 0.960724 5.334517 1.415644 8.834356 0.004057 3.709356

For unequal group sizes ni and nj , Real Statis-


tics implements Tukey’s honestly significant
difference as
v !
u MSW 1 1
u
HSD = qα(k, DFW) · t + .
2 ni nj

142

You might also like