0% found this document useful (0 votes)
4 views25 pages

Lectura 1

The document discusses various aspects of interaction, including definitions, techniques for detecting interactions, and interpreting interactions. It reviews how absence of interaction relates to constant variance and distributional form. The main part considers interaction in the usual statistical sense as connected to the structure of an expected value in multiply classified data.

Uploaded by

karen.lasso1405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views25 pages

Lectura 1

The document discusses various aspects of interaction, including definitions, techniques for detecting interactions, and interpreting interactions. It reviews how absence of interaction relates to constant variance and distributional form. The main part considers interaction in the usual statistical sense as connected to the structure of an expected value in multiply classified data.

Uploaded by

karen.lasso1405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Interaction

Author(s): D. R. Cox
Reviewed work(s):
Source: International Statistical Review / Revue Internationale de Statistique, Vol. 52, No. 1
(Apr., 1984), pp. 1-24
Published by: International Statistical Institute (ISI)
Stable URL: http://www.jstor.org/stable/1403235 .
Accessed: 25/07/2012 12:48

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to
International Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org
International Statistical Review, (1984), 52, 1, pp. 1-31. Printed in Great Britain
? International Statistical Institute

Interaction
D.R. Cox
Department of Mathematics, Imperial College, London SW7 2BZ, U.K.

Summary
A broad review is given of various aspects of interaction. After a brief discussion of the relation
between absence of interaction and constancy of variance, the main part of the paper considers
interaction in the usual sense, connected with the structure of an expected value in multiply
classified data. Interactions are first classified depending on the character of the factors involved.
Techniques for detecting interactions are reviewed, including aspects of model formulation,
especially for unbalanced data. Special methods for interpreting interactions are described. Finally,
the relation between interactions and error estimation is considered and some short remarks made
on extensions to binary and other essentially nonnormal models.

Key words: Balanced design; Component of variance; Contrast; F test; Factorial experiment;
Interaction;Latin square;Log linear model; Main effect; Transformation;Unbalanceddesign.

1 Introduction
The notion of interaction and indeed the very word itself are widely used in scientific
discussion. This is largely because of the relation between interaction and causal connex-
ion. Interaction in the statistical sense has, however, a more specialized meaning related,
although often in only a rather vague way, to the more general notion.
The object of the present paper is to review the statistical aspects of interaction. The
main discussion falls under three broad headings:
(i) the definition of interaction and the reasons for its importance;
(ii) the detection of interaction;
(iii) the interpretation and application of interaction.
For the most part the discussion is in terms of the linear and quadratic statistics familiar in
the analysis of variance and linear regression analysis, although most of the ideas apply
much more generally, for instance to various analyses connected with the exponential
family, including logistic and other analyses for binary data. First, however, in ? 2 some
brief comments are made on the connexion between constancy of distributional shape and
constancy of variance and the absence of interaction.
The paper presupposes some familiarity with the broad issues involved. Emphasis is
placed on matters, for example of interpretation, that, however widely known, are not
extensively discussed in the literature. References are confined to key ones.
Although some emphasis will be put on data from balanced experimental designs, in
principle the ideas apply equally to observational data and to unbalanced data.

2 Distributional form and interaction

In the majority of the discussion of interaction we concentrate on the structure of the


expected response, Y, or at least on contrasts of location. That is, any nonconstancy of
2 D.R. Cox

variance may suggest that a more elaborate method of analysis involving weighting is
desirable, but does not by itself change the central objectives of the analysis. In some
contexts, however, the form of the frequency distribution is of some primary interest and
there is then a relation with notions of interaction in the following sense.
Consider first an experiment with just two treatments. Under the simplest assumption
of unit-treatment additivity, the response obtained on a particular experimental unit
under treatment 2 differs by a constant 7, say, from the observation that would have been
obtained on that unit under treatment 1. This implies that if F2(.) and FI(.) are the
corresponding cumulative distribution functions of response then
F2(y) = - 7). (2.1)
FI(y
That is, the two distributions differ only by translation and in particular have the same
variance. Any departure from (2.1) implies that simple unit-treatment additivity cannot
hold; in other words, to match the two distributions in (2.1), 7 must be a function of y,
7(y) say. That is, the treatment effect depends on the initial level of response and this is
one form of interaction.
Consistency with (2.1) can be examined by inspecting histograms or cumulative dis-
tribution functions, but preferably via a o-- plot (Wilk & Gnanadesikan, 1968) in which
corresponding quantiles of the two distributions are plotted against one another to
produce under (2.1) a line of unit slope.
Doksum & Sievers (1976) have given formal procedures based on this and in particular
a confidence band for 7(y) defined by
F2(y) = F1{y- (y)}, r(y)= y-F71F2(y).
If (2.1) does not hold, it is natural to consider whether a monotonic transformation from y
to h(y) is available such that h(Y) satisfies (2.1). G.E.H. Reuter (unpublished) has shown
that such a transformation exists if F, and F2 do not intersect. If there are more than two
'treatments' transformation to a location family representation exists only exceptionally.
A more traditional approach, very reasonable with modest amounts of data insufficient
for detailed examination of distributional shape and with distributions that are not very
long-tailed, is to consider only the variance. That is, we replace (2.1) by the condition of
constant variance. The condition for the existence of an approximate variance stabilizing
transformation is, for any number of treatment groups, that the variance is determined by
the mean. Then if
var (Y)= v(fk), E(Y)= fk

a familiar argument yields

h(y) = dx/k/v(x) (2.2)

as the appropriate transformation. That is, evidence of the kind of interaction under
discussion here could come only by showing that
(i) two or more distributions with the same mean have different variances; or
(ii) aspects of distributional shape, e.g. third moments, are not adequately stabilized
by (2.2); or
(iii) the linearizing approximations involved in deriving (2.2) are inadequate and that
a more careful examination of variance shows that it cannot be reasonably
stabilized.
Points (ii) and (iii) are probably best studied numerically in a particular application,
Interaction 3

although some results can be given by higher order expansion. Thus, if py3(t) denotes the
standardized third cumulant of Y, a higher-order approximation is
var{h(Y)} = v()I){h'(L)}2py3+P( ){v([)}?h'(It)h"(tk). (2.3)
Thus, on writing g(k)= {h'(tk)}2, we have a first-order linear differential equation for the
variance-stabilizing transformation. If the solution does not lead to a monotonic h,
variance cannot be stabilized. An alternative use of (2.3) is to retain the transformation
(2.2) when we have that
var {h(Y)} =1- py(G/)v(?G )//v(G), (2.4)
and the magnitude of the correction term can be examined.
For the special choice (2.2) we have that

3v'(t)
Ph(Y),3(t) = PY3(AL)- v PY4 3() 2}
to the next order and if this is strongly dependent on fk there is evidence that distribu-
tional shape is not adequately stabilized, point (ii) above.
For multivariate responses the formulation analogous to that leading to (2.2) is that for
each vector fk there is a covariance matrix fl(fk). It is known (Holland, 1973; Song, 1982)
that only exceptionally does there exist a vector transformation from y to h(y) such that
the covariance matrix of h(Y) is, in the linearized theory, a constant matrix, say the
identity matrix; certain compatability conditions have to be satisfied. Practical techniques
for finding suitable approximate transformations do not seem to be available.
In the remainder of the paper we concentrate on the structure of E(Y), making the
simplest assumptions about variances and distributions. The essence of the present section
is that constancy of variance is a special form of absence of interaction. If variance or
distributional shape vary to an extent that is judged scientifically important, the descrip-
tion of the effect of 'treatments' on response should not be confined to the examination of
means.

3 Some key ideas


We consider a response variable Y whose expectation depends on explanatory vari-
ables, or factors, which it is convenient to classify into three types as follows.
First there are treatment variables, which we denote by x1,... , x4.In experiments these
represent the levels of various treatment factors under the control of the experimenter. In
observational studies, they represent explanatory variables which for each individual could
conceivably have taken different levels and which in an experimental study of the same
phenomenon would have been allocated by the investigator, preferably with randomiza-
tion. The levels of such variables may be binary, nominal, ordinal or quantitative. The
central objective of the investigation is to study the effect of some or all of the treatment
variables on the response.
Secondly, there are intrinsic variables, to be denoted by z1,... ,z,, and which are of
two different kinds. First there are those that describe properties of the individuals under
study that one regards, in the current context, as in principle not capable of variation by
the investigator, but as defining the 'material' under study. Examples are age and sex in an
observational or experimental investigation and, more broadly in experiments, concomit-
ant observations measured before the allocation of treatments. A second type of intrinsic
variable is concerned with that part of the environment of the investigation outside the
investigator's control and which it is inappropriate to consider as a treatment. In an
4 D.R. Cox

industrial experiment, the uncontrolled meteorological conditions holding during the


processing of a particular batch exemplify this kind of observation. The z's may be binary,
nominal, ordinal or quantitative.
Finally, there are variables whose levels are not uniquely characterized. Such variables,
defining blocks, litters, replicates and so on, are sometimes called random factors, but we
wish to avoid any implication that the levels are randomly sampled from a well-defined
population, or that the effects associated with such variables are automatically best
represented by random variables. We therefore call such variables nonspecific. They will
be denoted by u1, ... , the levels of each u are usually nominal. If, however,
u,;
replication in time or space is involved, the levels of u are quantitative, although special
care is needed if, for example, the same individual is observed at a series of time points.
We write Wewrite (x, . . ., x; z1,.. ., zq; ul, . . ., r)
E(Y)= (x, z, ), (3.1)
say.
A key assumption in the discussion is that interest focuses on the effect of the x's on the
response; the z's and u's must be considered, but only in so far as they bear on the effect
of the treatment variables, the x's.
Even if we restrictour attentionfor simplicityto two-factorinteractions,three different
kinds have to be considered:
(a) treatmentx treatment;
x intrinsic;
(b) treatment
x nonspecific.
(c) treatment
While the formal definitions are common, interpretation is rather different. We discuss
briefly the simplest cases.
With just two treatment factors x, and x2, and no intrinsic or nonspecificfactors,
absence of interaction of type (a) means that
n(x1, X2) = rn(1)(x)+ -r2(2)(X2). (3.2)
The importance of this is partly the empirical one that, to the extent that (3.2) is adequate,
the function of two variables n (x1, x2) can be replaced by the much simpler form of two
functions of one variable: for nominal x, and x2 a two-way table is reduced to two
one-way tables. In some ways, a more basic reason for the importance of (3.2) is that,
because the effect of changing, say, x, does not depend on the level of x2, the two
treatments act in a way that appears causally independent, at least over the range of levels
examined. The converse conclusion, that interaction, in the sense of failure of (3.2),
implies causal dependence is in general unsound. First, transformation of the expected
response scale may remove the interaction: see Scheff6 (1959, p. 95) for the conditions for
this when x, and x2 are quantitative. Secondly, it is possible that some simple mechanism
in which in a reasonable sense the effects of x, and x2 are physically separate nevertheless
leads to a function not satisfying (3.2). A simple example over the region x2 > 0 is

nr(X1,X2) = rll(X1)+ e-x2,12(x1),


which as a function of x2 moves between levels controlled solely by xx.
The clearest evidence of the failure of (3.2), in a way not removable by transformation,
arises when the rank order of the responses at a series of levels of, say, x1, changes with
x2. It might be worth using a special term, such as order-based interaction of x1 with x2,
for this, an asymmetric relation.
When we turn to treatment x intrinsic interactions, the mathematical condition

nr(x, z) = r x(x)+ r2(z) (3.3)


Interaction 5

appears identical to (3.2) but the interpretation is different, essentially because the roles of
x and z are now asymmetrical. Here the concern is with stability of the x-effect as z
varies, such stability both enabling the treatment effect to be simply specified in a single
function and also providing a basis for rational extrapolation of conclusions to new
individuals and environments. The function -r2(z) itself is in the present context of no
direct interest.
Finally, there are treatment x nonspecific interactions. We discuss these in more detail
in ? 7. Briefly, however, such interactions have either to be given a rational explanation or
regarded as extra sources of unexplained variation, i.e. error.
Similar remarks apply to higher order interactions.

4 Remarkson model formulation

4.1 Some generalprinciples


To an important extent, the interpretation of balanced data can be made via one-way,
two-way,... tables of mean values, using simple contrasts with a tangible practical
meaning, and hence not depending critically on formal probabilistic models. Such models
are, however, needed either for assessments of precision, including significance tests, or as
a guide for the analysis of unbalanced sets of data. While the definition and interpretation
of interaction is in principle the same for unbalanced as for balanced data, the strategies
of analysis may well be different.
There are two extreme approaches to the initial choice of such a model. The first is to
start from the most general model estimable from the data, i.e. writing into the analysis of
variance table all interactions that can be found, given the structure of the data. The
second is to start from a simple representation, possibly only involving main effects, and to
introduce further effects only when the data dictate a need to do so. In both cases the
objective is to end with an incisive interpretation. For balanced data, the use of the full
analysis of variance table has the major advantages that critical inspection of the table will
reveal any unanticipated effects or anomalies and that the magnitude of the lower order
interactions will indicate which interactions have to be directly considered in the final
interpretation. Further, the form of the analysis of variance table is a valuable concise
summary of the structure of the data, essentially equivalent to, but in many ways easier to
assimilate than, the associated linear model.
For the analysis of balanced or nearly balanced data, it will normally be a good idea to
start from the 'full' analysis of variance table. For unbalanced 'factorial' data, it will often
be sensible to look at the structure of the analysis of variance table for the corresponding
balanced data, this indicating the contrasts that may in principle be examined: of course,
the presence of empty cells affects estimability. In principle, it is possible to find by least
squares, corresponding to each mean square in the balanced analysis of variance table, a
mean square adjusting for all other 'relevant' contrasts, although in complex cases this can
be cumbersome to implement and interpret. We discuss below the meaning of the
restriction to 'relevant' contrasts.
The comments above concern the initial analysis. Normally, final presentation of
conclusions will be via some simpler specification, for example one involving rather few
interaction terms. For balanced data, a simplification of the model will not usually change
the estimates of the contrasts of primary interest. Indeed, as noted above, it is an
attraction of balanced data that simple contrasts of one-way and two-way tables of means
have a very direct interpretation. The error variance under the simplified model can be
estimated via the error estimate in the original model, or via that in some simplified
6 D.R. Cox

model, often but not necessarily that used to present the conclusions, i.e. in effect by some
pooling of sums of squares. Unless the degrees of freedom for error in the 'full' model are
rather small, decisions on pooling error mean squares usually have relatively little effect
on the conclusions, and use of the error estimate from the 'full' model will often be
simplest. Of course, where there are several layers of error, choice of the appropriate one
can be of critical importance.
In unbalanced data, explicit formulation and fitting of a model will normally be required
both for parameter estimation and for estimating error variance. In this it is usually
reasonable to err on the side of overfitting; for instance, treatment main effects should not
be omitted just because they are insignificant.
In formulating a 'full' model, the broad procedure is, roughly speaking, to identify all
the factors specifying the explanatory variables for each observation and then to set out
the 'types' of these factors (treatment, intrinsic and nonspecific) and the relationships
between the factors in the design (completely cross-classified, hierarchical, incomplete): all
interactions among completely cross-classified variables are included in the 'full' model,
with the exception of those with nonspecific factors which are often amalgamated for
error; see, however, ? 7.

4.2 Two examples


Example. Consider an experiment in which in each of a number of blocks (nonspecific
factor) all combinations of two treatment factors x2) occur with all levels of an
intrinsic factor z. Table 1 outlines the 'full' analysis of(xl,
variance table. Note especially the
grouping according to the type of explanatory variable involved and that the interactions
x1 x blocks, ... have not been isolated.
In contemplating simplified models an almost invariable rule is that if one term is
omitted, so are all 'below' it in the analysis of variance table; see, for example, Nelder
(1977). For instance, if x x2 is omitted, it would rarely be sensible to include z x x2.
xl if some contrast interacts with, say, z, and is therefore xxl
The reason for this is that nonzero
at some levels of z, it would normally be very artificial to suppose that the value averaged
out exactly to zero over the levels of z involved in defining the 'main effect' for the
contrast. The implications for the analysis of unbalanced data will be developed in more
detail in ? 4.3.

Table 1
Outline analysis of variance table

Source of variation Description


I x1t Treatment*, main effec,
x2t
x x2t Interaction
X1
z
II Intrinsic, main effect
z x xit Intrinsic x treatment interaction
z X x2
zx X
X X2t
III u Nonspecific main effect, blocks
u x all other contrasts Errort
* Treatment effects will often be
subdivided, especially if one or both x's
are quantitative.
t Under some circumstances this term should be subdivided to provide
separate errors for different contrasts of interest.
t Contrast of interest.
Interaction 7

Example. As a rather more complicated example of model formulation, we consider


next the two-treatment two-period cross-over design. The simplest such design is that
shown in Table 2a, having just two periods; between the two periods a 'washout' period
will normally be inserted. Subjects are randomized into two groups, usually but not
necessarily in equal numbers. The explanatory variables to be considered are:
(i) treatment factors: direct treatment effect; residual effect of treatment (in period
2 only);
(ii) intrinsic factor: periods;
(iii) nonspecific factor: subjects.
First consider the role of the nonspecific factor, subjects. If a separate parameter is
inserted for each level, i.e. for each individual subject, it is clear that analysis free of these
parameters is possible only from the difference in response, period 2 minus period 1,
calculated for each subject. On the other hand, subjects have been randomized to subject
groups and therefore a second and independent analysis can be made of the subject totals,
period 2 plus period 1, having a larger, and possibly very much larger, error variance than
the analysis of differences. This device, and its obvious generalizations, avoid the need to
write elaborate models containing subject terms. Interaction with the nonspecific factor is
regarded as error and not entered explicitly in the model.
There remain three factors. Interaction between direct and residual treatment effect is
undefined because, regarded as factors, they are an incomplete cross-classification. Simi-
larly, residual treatment effects are defined only in the second period and so cannot lead
to an interaction with periods. Thus the only interaction for consideration is direct
treatment effect x periods.
Because the factors all occur at just two levels, it is convenient, at least initially, to
define parameters as follows: g, general mean; direct treatment -T for T1, 7 for T2;
residual treatment -p following T1, p following T2; period -7T for period 1, - for period
2; direct treatment x period interaction -y for T2 in period 1 and T1 in period 2, y
otherwise. It follows that, omitting subject parameters, the expected values are those
shown in Table 2a. The parameterization is that convenient for model formulation: for
interpretation the treatment effect in period 1, namely 27 - 2y, might be taken as one of
the parameters instead of y.
For the reasons indicated above, we replace observations by differences obtaining

Table 2
Two-periodcross-overdesign
(a) Simple design (b) Extended design
Subject Subject
group Period 1 Period 2 group Period 1 Period 2

I T, T2 I T1 T2
pt-T-7+y + 7T+T- p + T --7rT-T+ +7rT+T-p-S+Y
II T2 T1 II T2
T L+7R--+p-Y
fL-ITr+T-y -+1/TT+r-T+P-S-/
TL++T- A T, T,
-Trr-T+T
/+Trr-T-p+S-
B T2 T2
--7+7--y R+Tr+7+p+8+y

General mean, pt; period effect, ir; direct treatment effect, 7; residual treatment effect, p; direct
treatment x period interaction, y; direct x residual treatment interaction, S.
8 D.R. Cox

respectively for group I and II expected values:

differences: 27r+ 27-p, 21r-27+p;


sums: 2 -p + 2y, 2+ p-2 y.

It follows immediately that from the within-subject analysis, the only treatment parameter
that can be estimated is 7-?Ip, which is of interest only if p can be assumed negligible. If
information from between-subject comparisons is added, y -p can be estimated, so that
residual effects and direct treatment x period interaction cannot be separated except by a
priori assumption. Thus from the combined analysis only linear combinations of - p and
y -(P can be estimated and the only such combination of general interest is 1 - -, the
treatment effect in period 1. The conclusion that the design is advisable for 'routine'
comparison of treatment only when there is strong a priori reason for thinking p negligible
seems fairly widely accepted (Brown, 1980; Hills & Armitage, 1979; Armitage & Hills,
1982).
If we use the extended design of Table 2b, subjects are randomized between the four
groups, typically with equal numbers in groups I and II and equal numbers in groups A
and B. The new feature is that residual effect is cross-classified with direct effect and
hence we should introduce for the second period a further parameter for direct x residual
treatment interaction, -8 if the two treatments are different, 8 if the two treatments are
the same. Table 2b gives the expected responses. It follows that in the four groups the
expected values are:
differences: 2 + 2 - - , 2rr -27 + - ,
2w-p + -2y, 2?+p?+8+2y;
sums: 2w -p-8 + 2y, 2 + p- -2y,
2gt-p +-2r, 2~ +p+8+27.
Thus if only within-subject comparisons are used, i.e. differences analysed, the parameters
that can be estimated are 7, p, y +-1, w. That is, a direct treatment effect and a residual
treatment effect can be separately estimated but direct treatment x period interaction and
direct x residual treatment interaction cannot be disentangled. If, however, between-
subject comparisons are introduced all contrasts become estimable.
Three and four period designs are dealt with similarly, although of course the details get
more complicated. The precision of between-subject comparisons can often be increased
by an initial period of standardization supplemented by the measurement of a concomit-
ant variable. We return later to the questions of interpretation; here the initial model
formulation is the aspect of concern. In many applications, use of such designs is advisable
only when residual effects and treatment x period interactions are very likely to be
negligible. There are contexts, however, where the occurrence and nature of residual
effects are of direct interest.

4.3 Unbalanced data


There are three difficulties in applying the above ideas to unbalanced data. The first is
that in isolating a sum of squares for any particular contrast, we must specify which other
parameters are to be included and which are to be set at null values. In a complex analysis
many such specifications are involved. Secondly, the amount of computation is greatly
increased and, while each individual fitting operation is simple, easy methods of specifica-
tion and easily understood arrangements of answers are needed. Thirdly, and most
seriously, interpretation is no longer via easily assimilated tables of means, but via
Interaction 9

corresponding sets of derived values less directly related to the 'raw' data and having a
more complicated error structure.
There is probably no uniquely optimal way of addressing the problem of formulation,
but the following seems a reasonable general procedure.
(i) Set out the skeleton analysis of variance table for the balanced form analogous to
the data under study. Consider whether 'empty cells' will make any of the contrasts
nonestimable.
(ii) Formulate a baseline model containing, usually, all main effects and any other
contrasts likely to be important, or found in preliminary analysis to be important.
(iii) In isolating the sums of squares for each set of contrasts, not in the baseline model,
include the parameters of the baseline model, the parameters under test, and any
parameters 'above' the parameters under test in the natural hierarchy. Thus, in the
notation used above, if we are examining the three-factor interaction, all the associated
two-factor interactions and, of course, main effects should be included, as well as all
parameters in the baseline model.
Because of the possibility mentioned in (ii) that the initial baseline model will need
modification, the procedure is in general iterative. The final baseline model will usually be
the one underpinning the presentation of conclusions. The procedure is close to a severely
constrained form of stepwise regression.

5 Detection of interaction

5.1 General remarks


We now discuss some aspects of the detection of interaction. The literature on the
formal use of F tests is large and we concentrate here on the appropriate choice of the
upper mean square. Graphical techniques are discussed in ? 5.3 and ? 5.4 deals with the
important issues involved when the interaction of interest is not completely estimable.
By far the most powerful device for detecting interaction is, however, the critical
inspection of one-way, two-way,... tables of means.

5.2 Some comments on F tests


The setting up of an F test for an interaction requires the choice of an upper mean
square associated with the effect under study and a lower mean square estimating error.
The former choice is normally the more critical because it amounts to a specification of
the kind of interaction under examination. Where there are initially several degrees of
freedom in the upper mean square, it is usually important to partition the interaction sum
of squares either into single or small numbers of degrees or at least to separate off the
most important degrees of freedom. The decomposition will nearly always be based on
that of the corresponding treatment main effect. Choice of error mean square is usually
critical only when hierarchical error structure is present.
A simple example is of a treatment x intrinsic interaction in which the treatment
variable, x, is quantitative and the intrinsic variable, z, nominal with a small number of
levels. If it is fruitful to decompose the main effect into linear, quadratic and remaining
components, there is a corresponding decomposition of interaction
xLXZ, XoXZ, XRem X Z, (5.1)

is a self-explanatory notation.
10 D.R. Cox

If the treatmentfactoris nominalit will be sensibleto decomposethe interactionon the


basis either of a prioriconsiderationsor after inspectionof the treatmentmain effect. To
take a simple example with 4 levels, suppose that inspectionof main effects shows that
level 3 differs from levels 1, 2, 4 which differ ratherlittle among themselves. It is then
reasonableto divide the interactioninto
(3 versus 1, 2, 4) x z, (Within1, 2, 4) x z. (5.2)
Tukey (1949) pointed out that the selection of the component of interaction after
examining the main effect does not change the distributionof the resulting F statistic,
althoughhis use of the idea is slightlydifferentfrom that here.
Examination of the various components of (5.1) and (5.2) serves rather different
purposes. Most attentionwill usually apply to the leading componentswhich, if they are
appreciable,hopefullypoint to some fairly direct interpretation:in terms of power these
statisticscorrespondto testing againstquite specificand meaningfulalternatives.We may
call such tests directly interpretable.The final tests involved in (5.1) and (5.2) serve a
rather different purpose. A large final component in (5.1) is incompatible with the
viewpoint implied in the whole formulationand so would be evidence of an unexpected
occurrencein need of detection and explanation,such as by the isolation of an outlying
observationor groupof observationsor by some radicalreformulation.Such tests may be
called ones of specification;the distinctionfrom directlyinterpretabletests is not always
clearcut.Tests of higher order interactionsare usuallyones of specification.
The calculationof P-values, while by no meansalwaysnecessary,is often a useful guide
to interpretation;usually there will be several P-values from each analysis. Elaborate
formalizationof the process of analysis is probably counterproductive,but often the
following possibilitiescan usefully be distinguished.
Suppose for simplicity that just two F statistics are being examined. The following
possibilitiesmay arise.
First,underlyingthe two statisticsmay be a two-dimensionalspace of possibilitiesall on
an equal footing. Then a combinedmean square can be calculatedin the usual way.
Secondly, the two possibilitiesmay correspondto quite separate effects, none, either
one or both of whichmay be present.An examplewouldbe the interactionof a treatment
effect with two quite differentintrinsicvariables.Then the separateF statisticsshould be
examined and, as some protection against selection, the smaller P-value doubled. For
balanced data the two statisticsare, under the global null hypothesis,slightlypositively
correlated because of the common denominator,so that the standard allowance for
selection slightlyovercorrects.
Thirdly, the two possibilitiesunderlyingthe F statisticsmay be nested in that, while
alternative1 without alternative2 is the basis of a sensible interpretation,alternative2 is
sensible only in conjunctionwith alternative1. The decompositionxLXZ and xo x z of
(5.1) is a typical example: in most circumstancesit would not be sensible to have an
interpretationin whichthe slope is the same for all z whereasthe curvaturevarieswith z,
althoughinterpretationwith varyingslopes but constant,especiallyzero, curvaturewould
be acceptable. Another example connected with the same situation is where the two F
statistics are associated with the main effect xo and with xo x z.
In this third case, it will be thus reasonable to examine the statistics F1 and F12
corresponding to the first alternative and the merged alternatives. If P1 and P12 are the
significance levels associated with the marginal distributions of these statistics, i.e. calcu-
lated from the standard F tables, we may consider P* = min (P1, P12) as a test statistic.
The 'allowance for selection' is a factor between 1 and 2 depending on the associated
degree of freedom (and level of significance) concerned; if the degrees of freedom of F1
Interaction 11

and F2 are not too disparate a factor 3 is a rough approximation adequate for most
purposes (Azzalini & Cox, 1984).
It is important that checks of specification should be carried out, not necessarily by
formal calculation of significance levels, although these are needed in the background for
calibration. Analysis of variance is, among other things, a powerful exploratory and
diagnostic technique and total commitment to an analysis in the light of a priori
assumptions about the form of the response can lead to the overlooking of important
unanticipated effects, defects in the data or misconceptions in formulation.
There are many generalizations of the above discussion, for instance to models for
nonlinear treatment effects: here the interaction with intrinsic variables should, in princi-
ple, be decomposed corresponding to the meaningful parameters in the nonlinear model.

5.3 Graphicalmethods
When, as in a highly factorial treatment system, there are a considerable number of
component interactions, graphical representation may be helpful:
(a) in presentation of conclusions,
(b) in isolating large contrasts for detailed examination,
(c) in determining which contrasts can reasonably be used to estimate error,
(d) in detecting initially unsuspected error structure, such as the hierarchical error
involved in a split plot design (Daniel, 1976, Chapter 16), or important correla-
tions between errors.
Graphical methods work most easily when all mean squares under study have the same
number of degrees of freedom. If most of the contrasts have d degrees of freedom, it may
be worth transforming a mean square with d' degrees of freedom, via the probability
integral transformation and an assumed value 62 for variance, into the corresponding
point of the distribution for d degrees of freedom. Pearson & Hartley (1972, Tables 19
and 20) give expected order statistics corresponding to degrees of freedom 1, 2,..., 7 and
this covers most of the commonly occurring cases. As a simple example, suppose that all
but one of the mean squares has one degree of freedom and that a preliminary estimate of
variance is a2. Suppose that there is in addition a mean square of 1-2a2 with 3 degrees of
freedom. The corresponding value of = 3-6 is at the 0-69 quantile and the correspond-
X32
ing point of 1 is 1-05 so that the single mean square is assigned value 1-052.
It is good to show main effects distinctively: if a primary object is (c), the estimation of
error, it is best to omit the main effects, and any other important effects, in the calculation
of effective sample size for finding plotting positions. That is, the expected order statistics
used for plotting refer to a sample size equal to the number of contrasts used for
estimating error, not to the total number of contrasts. When the contrasts have single
degrees of freedom, there is a choice between plotting mean squares, or absolute value of
contrasts, the half normal plot (Daniel, 1959) or signed values of contrasts. For many
purposes the last seems preferable, partly because the signs of important contrasts can be
given a physical interpretation and partly because if levels are defined so that all sample
main effects are positive, interactions have an interpretable sign, a positive interaction
indicating a reinforcement of the separate effects. Examples of the use of these plots are
to be found in the books of Daniel (1976) and Daniel & Wood (1971).
It is well known that in some approximate large sample sense probability plotting is not
systematically affected by dependence between the points plotted, although the precision
of the points will be altered. Because the shape of 'reasonable' distributions is settled by
the first four moments, a condition for the validity of probability plotting methods is that
12 D.R. Cox

the dependence shall be sufficiently weak that the first four sample moments converge in
probability to their population values. Thus, in unbalanced analyses independence of the
various mean squares to be plotted is not necessary.
A rather extreme example of the case of nonindependent contrasts concerns factors
with nominal levels. For simplicity, consider unreplicated observations y~j(i=
1,..., ml;j== 1,..., m2).
Every elementary interaction contrast
-
Yili Y i21+ Yi2 (il i2;jl: 2) (5.3)
has, under the hypothesis of noYili2-
two-factor interaction and homogeneous variance, zero
mean and constant variance. This suggests that all lm1(mI - 1)m2(m2 -1) contrasts (5.3)
should be plotted against normal order statistics, this being appropriate for detection of
interaction confined to a relatively small number of cells. Interaction produced by outlying
observations, or outlying rows or columns may be detected in this way, although the plot
is highly redundant, especially if mi and m2 are large; the redundancy measured by the
ratio of the number of points plotted to the number of independent contrasts is
lm1m2
and so is large if either or both mi and m2 are large. This technique has been studied in
detail by Bradu & Hawkins (1982) and in an as yet unpublished report by R. Denis and E.
Spjotvoll.

5.4 Some special procedures


The graphical methods of ? 5.3 as well as the tests of ? 5.2 depend directly on or are
closely related to standard linear model theory. There are, however, other tests of
interaction which may sometimes be useful.
Particularly with a treatment factor and an intrinsic factor both with nominal levels one
might wish to test for an order-based interaction of treatment with intrinsic factor, in
the sense defined above (3.3). Suppose that the two-way array yi (i = 1,...,m;
j = 1,..., m2) of observations have variance reasonably well estimated by 0'2. For example,
there may be replication within cells and the yij may be cell means. An order-based
interaction of the kind under study is revealed if for some suitably large c > 0 and for four
cells 1U),(i2, , (il,12), (i2, 12)= 1 (il # i2, j1 2), we have that
(il,
c,
yilj2- Yi1 > 2, yi2jl- yi c 2, (5.4)
so that column effects exceeding c standard errors andji2• in opposite directions occur in rows
i1 and i2; if we have one treatment and one intrinsic factor, the treatment factor is
assigned to columns. We take as test statistic the maximum c for which (5.4) holds for
some four cells. The associated significance level is (Azzalini & Cox, 1984) approximately
1 -exp [--ml(m--1)m2(m2-1) )2
A second set of special techniques is connected with outliers. Especially where a single
outlier is involved, the techniques for outliers in a general linear model can be used; in
effect the depression in the residual sum of squares following from the omission of an
observation, in turn related to a square residual, gives a diagnostic statistic. For special
studies of two-factor systems, see Brown (1975), Gentleman & Wilk (1975), Gentleman
(1980) and Bradu & Hawkins (1982). Similar ideas apply if a whole row or column is
deleted. For unbalanced analyses, recent work on the detection of influential observations
(Cook & Weisberg, 1982; Atkinson, 1982) is relevant.
Similar remarks apply if one or more whole 'rows' are regarded as outliers correspond-
ing usually to particular levels of an intrinsic factor.
Interaction 13

5.5 Incompletely identified interactions


A quite common occurrence, especially with treatments with nominal levels, is that an
interaction or set of interactions are of interest having a substantial number of degrees of
freedom and yet only a very small number of component degrees of freedom can be
isolated at a time, because of the incomplete nature of the data. Sometimes the
components for study are determined a priori but in other cases, especially where the
interaction is being studied primarily as a check on specification, it will be required to
isolate special components in a way that depends on the data. One general principle that
can be used in such cases is that large component main effects are more likely to lead to
appreciable interactions than small components. Also, the interactions corresponding to
larger main effects may be in some sense of more practical importance. The application of
this is best seen via a number of examples.
Example. Cox & Snell (1981, Example P) reanalysed some data of Fedorov, Maximov
& Bogorov (1968) consisting of 16 observations on a 210 treatment system, the factor
levels corresponding to low and high concentrations of components in a nutrient medium.
The design is a partly randomized one. The full data are given in Table 3a. A crucial
aspect is that an independent estimate of error variance, 2 = 14-44, is available. The
following is a minor development of the earlier analysis.
Tables 3b, c summarize some aspects of the analysis:
(a) an initial fit of only main effects gives a residual mean square far in excess of &2;
(b) therefore a model including interaction terms is needed, but it is clearly impossi-
ble to include all 45 two-factor interactions;
(c) there is no uniquely appropriate procedure, but one sensible approach is to
introduce main effects one at a time starting with the largest and with each pair of
large main effects to introduce the corresponding two-factor interaction;
(d) after some trial and suppression of small contributions a representation is found
producing a residual mean square close to a2;
(e) as a further check, addition to the model of the remaining main effects, one at a
time, gives estimates that are both individually small and also of both signs: 'real'
main effects are very likely to be positive, corresponding to the higher level of a
component increasing response.
Example. The examination of interaction in a Latin square provides a simple illustra-
tion of the same idea. We consider a Latin square in which the letters represent a
treatment factor with nominal levels; the columns represent an intrinsic factor with
equally spaced quantitative levels, such as serial order; the rows represent a nonspecific
factor such as subjects. Of course, there are many other versions of the Latin square
design depending on the character of the three defining factors.
In the present instance, the interaction treatments x columns is the one of interest. In an
m x m square the isolation of all (m - 1)2 degrees of freedom for that interaction is not
possible and one approach is to take the linear component of the column effect and some
single contrast of treatments based either on a priori considerations or on inspection of
main effects. Table 4 illustrates the method for a 4 x 4 Latin square; we take C versus
A, B, D as the component of treatment main effect. A derived concomitant variable is
formed as a product of the column variable -3, -1, 1, 3 representing linear trend with the
variable taking values -1, -1, 3, -1 for respectively A, B, C, D.
The standard Latin square analysis is augmented by including in the model a term
proportional to the derived variable above. The analysis is conveniently done by the
standard calculations of analysis of covariance. The derived variable, shown explicitly in
14 D.R. Cox

Table 3
Data and outline analysis of 210 system (Fedorov et al., 1968)

(a) Yields of bacteria


Factors

xl x2 x3 x4 x5 x6 X7 x8 x9 X10 Y
NH4CI KH2PO4 MgC12 NaC1 CaCI2 Na2S Na2S203 NaHCO3 FeCl3 micro-
9H20 elements
Levels + 1500 450 900 1500 350 1500 5000 5000 125 15 Yield
- 500 50 100 500 50 500 1000 1000 25 5

1 - + + + - + - + - + 14-0
2 - - + + - + + - - + 4-0
3 + - - + + + - - - - 7-0
4 - - + - + + - + + + 24-5
5 + - + + + + + - - - 14-5
6 + - + - + + + + + + 71-0
7 - - - - - - - - - - 15-5
8 + + - + + - - + + - 18-0
9 - + - + - - + - - + 17-0
10 + + + + + - - - + - 13-5
11 - + + - + - + + + + 52-0
12 + + + - - - + + - - 48-0
13 + + - - + - + - + - 24-0
14 - + - - - + - - + - 12-0
15 + - - - - - - + + + 13-5
16 - - - + - + + + - + 63-0

All the concentrations are given in mg/1, with the exception of factor 10, whose central level (10 ml of solution
of micro-elements per 11 of medium) corresponds to 10 times the amount of micro-element in Larsen's
medium. The yield has a standard error of 3.8.

(b) Some residual mean squares


Model fitted Residual mean squares Degrees of freedom

All main effects 305.7 5


x8 287-9 14
X7,xg 162-0 13
X7,X8, X7X8,x9 48-8 11
x7, x8, x6, x9, x7x8, x6x8 19.1 9
External 14-4

(c) Some estimates and standard errors


Additional main effects,
Final model added singly

x6 1-35 +099 x1 1-37+1-04


x7 11-78?0O99 x2 -1-43+1-11
X8 11-47?0O99 x3
x9 3-26?1-05 x4 -2-08+1"06
-0-47+1-19
X6X8 4-59+0-95 X5 1-03+ 105
xx8 9-53?0O95 x1o -2-25+1-34

Table 4, is not orthogonal to the rows of the square. Because the analysis of variance of
the derived variable has sums of squares:
columns and treatments, 0; rows, 80; residual, 160; total, 240,
the variance of the estimated interaction is 0.2/160, compared with 0.2/240 for a corres-
ponding orthogonal analysis. That is, in the present instance interaction is estimated with 3
Interaction 15

Table 4
Interaction detection in a 4 x 4 Latin square
'Time'
Subjects 1 2 3 4

1 B, 3 C, -3 A, -1 D, -3
2 D, 3 A, 1 B,-1 C,9
3 A, 3 D, 1 C,3 B,-3
4 C, -9 B, 1 D, -1 A, -3
The derived variable is formed from the product of
the treatment effect C versus A, B, D (-1, -1, 3, -1)
and the linear column effect (-3, -1, 1, 3).

efficiency. If the interaction component of interest were known a priori a particular Latin
square for optimal estimation could be used. This form of test is best regarded as one of
specification. If an interaction were detected the whole basis of the analysis would need
reconsideration.
The principle involved here is that of Tukey's (1949) degree of freedom for nonadditiv-
ity, except that here the isolation of defining contrasts is based partly on a priori
considerations rather than being entirely data-dependent.

6 Interpretationof interaction

6.1 General remarks


We now suppose that the existence of interaction has been established and that its
magnitude is sufficient that explanation or interpretation is needed.
If the interactions in question are two-factor ones, treatment x treatment or treatment x
intrinsic, it will often be enough to make a qualitative or graphical summary of the
appropriate two-way tables of means, with associated standard errors; occasionally this is
adequate also for three-factor interactions using three-way tables.
It will, however, rarely be adequate to examine expressions for an expected value of the
form

and to use directly estimates 4j to characterize the interaction.


Some other simple but important possibilities will now be outlined. If all or specified
interactions can be removed by transformation, part at least of the interpretation will
usually exploit this.
A suitable transformation, if one exists, is often best found by informal arguments. The
technique suggested by Box & Cox (1964) may yield a transformation that meets all the
requirements of a normal theory linear model without the interactions in question.
Otherwise, if removal of the interactions is the primary requirement, it may be better to
calculate the relevant F ratio for various transformations, in particular for some
parametric family of transformations; a convenient transformation with acceptably small
value of F may then be used for interpretation, provided such a transformation exists. See
Draper & Hunter (1969).
If there is an interaction between a treatment factor with quantitative levels and a
treatment or intrinsic factor with nominal levels, one aims to characterize the effect of the
16 D.R. Cox

fashionvia a linearor nonlinearmodeland then to


quantitativefactorin a parametric
describe how these parameters change, preferably concentrating the change into a single
parameter.
We concentrate here, as indeed throughout the paper, on the concise description and
interpretation of contrasts of means. Another possibility, however, is that one wishes to
predict the value of a future observation at a specified combination of levels. If the model
for such prediction is clearcut, no essentially new problem arises. Particularly, however, if
point estimates are required and interaction effects are on the borderline of detectability,
it may be sensible in large systems to adopt an empirical Bayes approach in which, for
instance, estimates obtained assuming interaction are 'shrunk' towards those appropriate
in the absence of interaction (Lindley, 1972). It is a bit curious that in the literature on
multiple regression explicit response prediction, as contrasted with parameter estimation,
has received relatively much more attention than in the literature on analysis of variance.
Section 6.2 deals with some general aspects of the relation between interactions and
main effects. Sections 6.3 and 6.4 then discuss various rather specialized ideas that are
sometimes helpful in interpretation when interactions are present.

6.2 Main effects in the presence of interaction


In the following discussion we consider a single treatment factor and a single intrinsic
factor, and suppose that interaction is clearly established; the general ideas below apply,
however, much more broadly.
In ? 4.1 it was stressed that models with interaction present but an associated main
effect set zero were very rarely of interest; for an exception, see Cox (1977). This applied
at the model fitting and interaction detection stage. In interpretation, however, main
effects are fairly rarely of direct concern in the presence of appreciable interaction.
A very simple example will illustrate this. Suppose that in comparing two diets 0, 1 the
mean live weight increases are, in kilograms,

Males: treatment 0, 100; treatment 1, 120


Females: treatment 0, 90; treatment 1, 80

with small standard errors of order 1 kg. The treatment effect (1 versus 0) is thus 20 kg for
males and -10 kg for females and interaction, in fact qualitative interaction in the sense of
? 5.2, is present. The interpretation in such cases is normally via these separate effects.
The main effect, i.e. the treatment effect averaged over the levels of the intrinsic factor, is
of interest only in the following circumstances.
(a) The treatment effects at the various levels of the intrinsic factor are broadly
similar, so that an average treatment effect gives a convenient overall qualitative
summary. Clearly, this is not the case in the illustrated example, nor in any
application with qualitative interaction.
(b) An application is envisaged in which the same treatment is to be applied to all
experimental units, e.g. to all animals regardless of sex. Then an average
treatment effect is of some interest in which the weights attached to the different
levels of the intrinsic factor are proportions in the target population. In the
particular example these might be (1,?) regardless of the relative frequencies of
males and females in the data. Note also that if the data frequencies are regarded
as estimating unknown population proportions the standard error of the esti-
mated main effect should include a contribution from the sampling error in the
weights.
Interaction 17

Note especially that in case (b) the precise definition of main effect depends in an
essential way on having a physically meaningful system of weights. This formulation is in
any case probably fairly rarely appropriate and there is some danger of calculating
standard errors for estimates of artificial parameters.

6.3 Product models


Product models for interactions were in effect introduced by Fisher & Mackenzie (1923)
and as a basis for a so-called joint regression approach by Yates & Cochran (1938). J.
Mandel (in a series of papers) studied such models, in particular in the context of
calibration studies; see, especially, Mandel (1971). For reviews, see Freeman (1973) for
genetic applications and Krishnaiah & Yochmowitz (1980) for the more mathematical
aspects.
In a two-factor system with nominal levels for both factors, suppose that the expected
value qii in 'cell' (i, j) can be written in the special form

rii =-Pt + 0 + 0i + (6.1)


A•yiKi,
where the normalizing conditions

can be applied. One way of viewing (6.1) is that, with fitting by least squares, the model
leads to a rank one approximation to the two-way matrix of interaction terms, and so on if
further terms are added; this establishes a connexion with the singular-value decomposi-
tion of a matrix (Rao, 1973, p. 42).
Important restricted cases of this form of model are obtained by replacing (6.1) by one
of
f 4y+
mO?i +OKi,
?ii=Pt+? 4+?0i (6.2)
/- + 0i4+Wio,
where E 6, = Ei = Ki= yi = 0, or by the symmetrical form

rij = P + Oi+?i + hXOdi. (6.3)


The score test for A = 0 is essentially Tukey's (1949) degree of freedom for nonadditivity.
Note that under the first of (6.2) .i = + so that for fixed i a plot of versus i4j as
a line of + •i, The lines hii a common
j varies gives straight slope (1 yi). straight pass through
point only under (6.3). Yates & Cochran (1938) gave an empirical example where the
suffix i represented varieties and the suffix j places.
Snee (1982) points out the difficulty in unreplicated data of distinguishing a product
structure of the above form from row or column based heterogeneity of variance; see also
Yates (1972).
One particular motivation of these models can best be stated in terms of interlaboratory
calibration experiments, although the idea applies much more widely. Suppose that a
number of batches of %material labelled by j have unobserved true values of some
property. Samples of each batch are sent to various laboratories, labelled by i, and the ith
laboratory has a calibration equation a'+ converting the true value into an expected
+'• = 5+5i, with a =
response. Thus ri = a +
3;'
and writing a'= & + a, ' = 3 +3, oh
C Pi = 5E = 0, we have that
= (&+ t3 ) + (+a+ f3P)+ + I3iJ = C + + 4. + Y4~,
rh, j o.
say, where y, = is the fractional deviation of slope in the ith laboratory from the
/-33laboratories. If the laboratories were all calibrated to give the same
mean slope over
18 D.R. Cox

expected response at the mean , then 6 -0. Note, however, that it would rarely make
sense to test the null hypothesis6, - 0 when the f/3differ,for the reasonsgiven in ? 4.1.
The above discussionis appropriatewhen the levels of both factorsare nominal.If, for
example, the rows in (6.1) have quantitativelevels and models with a linear effect are
under examination,it will be naturalto take both the O6and the yi to be linear in the
carriervariable, and this leads to the decompositioncorrespondingto (5.1).

6.4 Interpretation with appreciablehigher-orderinteraction


In an investigationwith many factorsthe presenceof appreciablethree factoror higher
order interactionscan cause seriousproblemsof interpretation.A direct interpretationin
terms of the interactionsas such is rarely enlightening.
In relativelysimple cases the interactionsmay be used just as a guide as to whichtables
of means should be considered. Thus, if the only important three factor or higher
interactionis of the form treatment,x treatment2x intrinsic,it may be enough to set out
separate treatment,x treatment2tables at each level of the intrinsicfactor.
In more complex cases, especiallywhere interactionsamong several treatmentfactors
occur, other techniquesare needed. These include the following:
(i) transformationof the response variablemay be indicated;
(ii) if the treatmentfactors are quantitative,and a high degree polynomialappears
necessary to fit the data, the possible approachesto a simpler interpretation
include transformationof response and/or factor variables and the fitting of
simple nonlinearmodels;
(iii) the abandonmentof the factorialrepresentationof the treatmentsin favour of
the isolation of a few possibly distinctivefactor combinations;
(iv) the splitting of the factor combinationson the basis of one or perhaps more
factors;
(v) the adoption of a new system of factors for the descriptionof the treatment
combinations.
Approaches(i) and (ii) call for no specialcommenthere. There follow some remarkson
(iii)-(v).
For 2" systems it is easy to see that if all responsesexcept one are equal then all the
standardfactorialcontrastsare equal except for sign. Thus the occurrenceof a numberof
appreciablecontrasts, the higher-orderinteractionsbeing broadly comparablein mag-
nitude with the main effects, suggests that there are one, or perhaps a few, factor
combinationsgivingradicallydifferentresponses.The analysisof the data then consistsof
specifying the anomalous combinations and their responses, and checking the
homogeneityof the remainingcombinations.That is, the factorialcharacteris abandoned
in the final analysis,the decompositioninto main effects and interactionsbeing a step in
the preliminaryanalysis.For an example, see Cox & Snell (1981, Example N).
The next possibility, (iv), is that all or most of the appreciablecontrasts,including
higher-order interactions, involve a particular factor. It may then be possible to find a
simple interpretation by decomposing the data into separate sets, one for each level of the
'special' factor. Each set may then have a relatively simple interpretation. This device is
particularly appealing if the 'special' factor is an intrinsic factor, not a treatment factor,
although the method is not restricted to intrinsic factors. For an example, see the
comments by Cox (1974) on the data in the book of Johnson & Leone (1964).
Finally, point (v), there is the possibility (Cox, 1972; Bloomfield, 1974) that redefinition
of the factors will lead to a simpler explanation or representation. For two factors each at
Interaction 19

two levels the four combinations conventionally labelled


1 a b ab
can be relabelled
1 a'b' b' a'
1 b" a"b" a"
as well as by systems obtained by interchanging upper and lower levels of one or both
factors. If, for example, in the original system the main effect A and the interaction A x B
are appreciable but the main effect of B negligible, the first transformation mentioned will
switch the interaction A x B into the main effect B', leaving the main effect A as the main
effect A'. In general in a 2" system the m largest contrasts that are linearly independent
in the prime power commutative group of contrasts can be redefined as main effects. A
more complex transformation scheme would be appropriate if it is required to control the
position of more than m independent contrasts.
In all these rather special methods there is the possibility of alternative representations,
possibly of different types or possibly variants of the same type, giving approximately
equally good fits to the data. Formal theory is not much guide; the maximized log
likelihood achieved is determined by the residual sum of squares, but the number of
parameters effectively being fitted is not clearly defined and the various descriptions being
compared are not nested. The safest approach in principle is to list all those relatively
simple representations consistent with the data and to make any choice between them on
grounds external to the data.

7 Role of nonspecificfactors and the estimationof error

7.1 General remarks


So far we have not discussed in detail the estimation of error. In some situations, error
estimation is based directly on variation between replicate observations within 'cells'; the
possibility of hierarchical error structure has often to be considered, both in the context of
split plot experiments and also, in observational data, in connexion with multistage
sampling.
If there is no direct replication and all factors are either treatment or intrinsic, error will
be estimated from interactions, often by pooling all those above a certain order, although
there may be some case for using interactions among the intrinsic factors only, when that
is feasible. Comparison with the variance found in previous experience with similar
material gives one important check. Where the mean squares for several different
contrasts are pooled, rough check of homogeneity is desirable, more to avoid the
overlooking of an important unexpected contrast than to avoid overestimation of error.
In the present section, however, we concentrate on the role of nonspecific factors, such
as blocks, replicates, etc., factors whose levels are not defined by uniquely identified
features. For instance, if an investigation is repeated at a number of sites there will usually
be many ways, often rather ill defined, in which the sites differ from one another.

7.2 Interpretationof interactions with nonspecific factors


Suppose for simplicity that there is a single treatment factor, x, and a single nonspecific
factor, u. If there is no replication within (x, u) combinations, or other ways of estimating
error, there may be little option but to treat xx u as error; in effect, we say that if the
treatment effect is different at different levels of u, there is no basis for the detailed
20 D.R. Cox

explanation or representation of this interaction and that it is to be regarded as in some


sense random. This is particularly appropriate if the distinction between the different
levels of u is of little direct interest, such as when u represents blocks, litters, etc.
Suppose, however, that a separate estimate of error is available and that the different
levels of u are in some sense important. For example, a major study may be repeated in
more or less identical form in a number of centres. If an interaction xx u is found, it
should if possible be 'explained', for example,
(i) by removal by transformation,
(ii) by relating treatment effects to concomitant variables attached to the different
levels of u,
(iii) by regarding certain levels of u as anomalous,
(iv) by establishing that the nominal estimate of error is an underestimate of the
'true' variability present.
Failing all these, it seems necessary to regard the interaction as a source of uncontrolled
variability, to be represented in a mathematical model by an additional random variable.
The corresponding treatment main effect is then an average over an infinite population of
levels, usually hypothetical. Of course, where there is a natural infinite or finite population
of u levels, it is more defensible to take this as the basis.
In simple balanced data, the conclusion is that with a single nonspecific factor, u,
showing interaction, the error of treatment contrasts is estimated from the interaction
treatments x u. With unbalanced data, either maximum likelihood analysis of a complex
error structure is required or some simpler approximate method based on reduction to a
balanced system.
With more than one nonspecific factor, synthesis of variance is needed, even in
balanced systems, to estimate error (Cochran & Cox, 1957, ? 14.5):

MStreatxul + MStreatxu2- MStreatxuxxu2

is that normally used, in a self-explanatory notation, and this is assigned an effective


degrees of freedom. It would be worthwhile to have a more formal development of the
theory of the approximate inference so involved, one approach being via maximized log
likelihoods, irrelevant effects having been removed by orthogonal transformation.
If the treatment effect is decomposed into components with very different magnitudes,
it will, provided that enough degrees of freedom are available, be necessary to make an
analogous decomposition of the interaction with a nonspecific factor. For example, if a
treatment effect were linear at each level of u, but with different slopes, the standard error
of the 'overall' slope could be determined from linear treatments x u, which might well be
appreciably greater than quadratic treatments x u, which should equal the internal error if
indeed each separate relation is linear. Similar remarks apply if there are several
treatment factors. In principle, the error of each component of the treatment effect is found
from the interaction of that component with u.
Special care is needed if 'time' is a factor. If an investigation is repeated over time on
physically independent material, and if external conditions such as meteorological ones can
be treated as totally random, then it may be sensible to treat 'time' as a nonspecific factor,
as above. If, however, the same individual is involved at each time point, assumptions of
independence are best avoided and any made need critical attention. A commonly useful
device (Wishart, 1938) is to produce for each individual one or more summary statistics
measuring, for example, trend, cyclical variation, treatment effects, serialcorrelation, etc.
and then to analyse these as derived response variables. This approach usually requires
that the number of time points is not too small, although the development in ? 4 of an
Interaction 21

analysis for the two-period, two-treatment cross-over design could be regarded as an


illustration. Thus the derived variables are the difference and sum of the two observations
on each subject.
If there are several nonspecific factors and their main effects and interactions are of
intrinsic interest, the detailed study of components of variance is involved. Particular
applications include biometrical genetics, the study of sources of variability in industrial
processes and the evaluation of interlaboratory trials. This is a special (and extremely
interesting) topic which, with regret, is omitted from the present paper.

8 Generalizations
Most of the above discussion generalizes immediately to any analysis in which a
parameter describing the distribution of a response variable is related to explanatory
variables via a linear expression. The generalized linear models of Nelder & Wedderburn
(1972) embrace many of the commonly occurring cases. The more indirect the relation
between the original data and the linear representation, the less the interpretative value of
the absence of interaction.
For such generalizations, the method of analysis will usually be either maximum
likelihood or a preliminary transformation followed by least squares with empirical
weights. The main technical difficulty concerns systems with several layers of error. For
these the direct use of weighted least squares will often be simpler, although no generally
useful methods are available at the time of writing.
The definition of interaction within exponential family models involves a linear rep-
resentation of some function of the defining parameter. Thus, for binary data the defining
parameter is the probability of 'success', in this case the moment parameter of the
exponential family. The canonical parameter is the logistic transform. In this case the
canonical parameter has unrestricted range. For exponential distributions, the canonical
parameter is the reciprocal of the mean; the logarithm is the simplest function of the
moment parameter (or of the canonical parameter) having unrestricted range. Interaction
can thus be defined in various ways, in particular relative to linear representations for the
moment parameter, for the canonical parameter, or for the simplest unconstrained
parameter. Especially when the interaction between a treatment and an intrinsic factor is
involved, use of the unconstrained parameter will often be the most appropriate, since
such representations have the greatest scope for extrapolation. Representations in terms
of the canonical parameter lead to the simplest statistical inference, and those in terms of
the moment parameter have the most direct physical interpretation. Of course, often there
will not be enough data to show empirically that one version is to be preferred to another;
with extensive data there is the possibility of finding that function, if any, of the defining
parameter leading to absence of interaction but often familiarity and ease of interpreta-
tion will restrict the acceptable representations.
Darroch (1974) has discussed the relative advantages of so-called additive and multi-
plicative definitions of interaction for nominal variables. His account in terms of associa-
tion between multidimensional responses is easily adapted to the present context of a
single response and several explanatory variables.
Throughout the paper only univariate response variables have been considered. The
formal procedures of multivariate analysis of variance are available for examining interac-
tions affecting several response variables simultaneously. While such procedures may be
used explicitly only relatively rarely, any demonstration that a particular type of simple
representation applied to several response variables simultaneously makes such a rep-
resentation more convincing.
22 D.R. Cox

Acknowledgements
This paper contains the union of invited papers given at the European Meeting of Statisticians, Wroclaw, the
York Conference of the Royal Statistical Society, the Gordon Research Conference, and the Highland Group,
Royal Statistical Society. I am very grateful for various interesting points made in discussion and thank also E.J.
Snell and 0. Barndorff-Nielsen for helpful and encouraging comments on a preliminary written version.

Appendix
Some open technicalissues
The discussionin the main part of the paper has concentratedon general issues and
deliberately put rather little emphasis on details of technique. Nevertheless there are
unsolved technicalmattersinvolved and there follows a list of a few problems.They are
open so far as I am aware, althoughany reader thinkingof workingon any of them is
strongly advised to look through the recent (and not so recent) literaturenot too long
after starting.
Problem 1 [? 2]. Given randomsamples from m populations,test the hypothesisthat
there exists a translation-inducingtransformation.Estimate the transformationand as-
sociated differences.
Problem 2 [? 2]. Calculate the efficiencyof your procedure for 1 as comparedwith
some simple fully parametricformulation.
Problem 3 [?? 2, 5]. Extend the discussionof 1 and 2 to the examinationof inter-
actions.
Problem4 [? 3]. Find some interestingspecial cases satisfyingScheff6'scondition for
transformationto additivity.
Problem4 [? 4]. Extend the notion of order-basedinteractionsto higher order inter-
actions.
Problem6 [? 4]. Develop suitabledesigns and methodsof analysis,and make practical
recommendationsconcerningthe use of, p period, t treatmentcrossoverdesigns (i) for
p = t, (ii) for p > t, (iii) for p < t, when the treatmentsare (a) unstructured,(b) factorial
with quantitativelevels, (c) factorial with qualitativelevels and when interest focuses
primarilyon (I) directtreatmenteffectswith some possibilityof examiningresidualeffects,
(II) estimatingand testing residualeffects, (III) estimating 'total' treatmenteffects.
Problem7 [?? 4, 5]. What is the role of AIC,C, and so on in the study of interactions,
especially from unbalanceddata?
Problem 8 [?? 4, 5]. What is the role of rank invariantproceduresin the study of
interactions?
Problem 9 [? 5]. What is the effect of nonnormalityon the plot of ordered mean
squares?
Problem 10 [? 5]. Are there useful significance tests associated with plots of ordered
mean squares?
Problem 11 [? 5]. Examine in some generality the effect of dependence on probability
plotting methods.
Problem 12 [? 5]. In the two-way arrangement with nominal factors, is there a useful
set of primitive contrasts less redundant than the full set of tetrad differences?
Interaction 23

Problem 13 [? 5]. Develop a systematic theory of testing for treatment x column in-
teraction in an m x m Latin square when both, one or none of the defining contrasts are
known a priori. What is the efficiency of a randomized design for detecting interaction:
what is the lowest efficiency that could be encountered? What are the implications, if any,
for design?
Problem 14 [? 5]. Develop and implement an IKBS(intelligent knowledge-based system)
that would deal with the 2"1 example of ? 5 and similar but much larger problems.
Problem 15 [? 6]. Suppose that Yi (j = 1,... , n) are independently normally distributed
with E(Y) = gi and unit variance. Initially interest lies in the null hypothesis x1
=...=are
= 0 but overwhelming evidence against this is found. Two alternative explanations
1n
contemplated:
(a) Li= 0 (j • 7), fp~# 0, 7 being unknown;
(b) g1, , in are independently normally distributed with zero mean and unknown
....
variance 02.

Develop procedures for examining which if either of (a) and (b) is satisfactory and which is
preferable.
Problem 16 [? 6]. Extend the discussion of Problem 15 to the examination of inter-
actions.
Problem 17 [? 7]. Develop a simple and elegant theory for finding confidence limits for
contrasts where error has to be estimated via synthesis of variance.
Problem 18 [? 7]. Develop from first principles confidence limits for an upper variance
component from balanced and from unbalanced data.
Problem 19 [? 7]. Examine critically the role of time series models for experiments in
which time is a nonspecific factor.
Problem 20 [? 8]. Give a general discussion for analysing interactions in log linear
models for Poisson and binomial data in the presence: (a) of overdispersion, (b) of
underdispersion.

References
Armitage, P. & Hills, M. (1982). The two period crossover trial. Statistician 31, 119-131.
Atkinson, A.C. (1982). Regression diagnostics, transformations and constructed variables (with discussion). J. R.
Statist. Soc. B 44, 1-36.
Azzalini A. & Cox, D.R. (1984). Two new tests associated with analysis of variance. J. R. Statist. Soc. B 46. To
appear.
Bloomfield, P. (1974). Linear transformations for multivariate binary data. Biometrics 30, 609-617.
Box, G.E.P. & Cox, D.R. (1964). An analysis of transformations (with discussion). J. R. Statist. Soc. B 26,
211-252.
Bradu, D. & Hawkins, D.M. (1982). Location of multiple outliers in two-way tables, using tetrads. Technometrics
24, 103-108.
Brown, M.B. (1975). Exploring interaction effects in the analysis of variance. Appl. Statist. 24, 288-298.
Brown, B.W. (1980). The crossover experiment for clinical trials. Biometrics 36, 69-79.
Cochran, W.G. & Cox, G.M. (1957). Experimental Designs, 2nd edition. New York: Wiley.
Cook, P.D. & Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman and Hall.
Cox, D.R. (1972). The analysis of multivariate binary data. Appl. Statist. 21, 113-120.
Cox, D.R. (1974). Discussion of paper by M. Stone. J. R. Statist. Soc. B 36, 140-141.
Cox, D.R. (1977). Discussion of paper by J.A. Nelder. J. R. Statist. Soc. A 140, 71-72.
Cox, D.R. & Snell, E.J. (1981). Applied Statistics. London: Chapman and Hall.
Daniel, C. (1959). Use of half-normal plots in interpreting factorial two-level experiments. Technometrics 1,
311-341.
Daniel, C. (1976). Applications of Statistics to Industrial Experimentation. New York: Wiley.
24 D.R. Cox

Daniel, C. & Wood, F.S. (1971). Fitting Equations to Data, New York: Wiley.
Darroch, J.N. (1974). Multiplicative and additive interaction in contingency tables. Biometrika 61, 207-214.
Draper, N.R. & Hunter, W.G. (1969). Transformations: Some examples revisited. Technometrics 11, 23-40.
Doksum, K.A. & Sievers, G.L. (1976). Plotting with confidence: graphical comparisons of two populations.
Biometrika 63, 421-434.
Fedorov, V.D., Maximov, V.N. & Bogorov, V.G. (1968). Experimental development of nutritive media for
micro-organisms. Biometrika 55, 43-51.
Fisher, R.A. & Mackenzie, W.A. (1923). Studies in crop variation. II. The manurial response of different potato
varieties. J. Agric. Sci. 13, 311-320.
Freeman, G.H. (1973). Statistical methods for the analysis of genotype-environment interactions. Heredity 31,
339-354.
Gentleman, J.F. (1980). Finding the k most likely outliers in two-way tables. Technometrics 22, 591-600.
Gentleman, J.F. & Wilk, M.B. (1975). Detecting outliers in a two-way table. I. Statistical behaviour of residuals.
Technometrics 17, 1-14.
Hills, M. & Armitage, P. (1979). The two-period crossover clinical trial. Brit. J. Clinical Pharmacol. 8, 7-20.
Holland, P.W. (1973). Covariance stabilizing transformation. Ann. Statist. 1, 84-92.
Johnson, N.L. & Leone, F. (1964). Statistics and Experimental Design, 2. New York: Wiley.
Krishnaiah, P.R. & Yochmowitz, M.G. (1980). Inference on the structure of interaction in two-way classification
models. Handbook of Statist. 1, 973-994.
Lindley, D.V. (1972). A Bayesian solution for two-way analysis of variance. Col. Math. Soc. Jdnos Bolyai 9,
475-496.
Mandel, J. (1971). A new analysis of variance for non-additive data. Technometrics 13, 1-18.
Nelder, J.A. (1977). A representation of linear models (with discussion). J. R. Statist. Soc. A 140, 48-76.
Nelder, J.A. & Wedderburn, R.W.M. (1972). Generalized linear models, J. R. Statist. Soc. A 135, 370-384.
Pearson, E.S. & Hartley, H.O. (1972). Biometrika Tables for Statisticians, 2. Cambridge University Press.
Rao, C.R. (1973). Linear Statistical Inference and its Applications, 2nd edition. New York: Wiley.
Scheff6, H. (1959). The Analysis of Variance. New York: Wiley.
Snee, R.D. (1982). Nonadditivity in a two-way classification: is it interaction or nonhomogeneous variance? J.
Am. Statist. Assoc. 77, 515-519.
Song, C.C. (1982). Covariance stabilizing transformations and a conjecture of Holland. Ann. Statist. 10,
313-315.
Tukey, J.W. (1949). One degree of freedom for non-additivity. Biometrics 5, 232-242.
Wilk, M.B. & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika 55,
1-17.
Wishart, J. (1938). Growth-rate determinations in nutrition studies with the bacon pig, and their analysis.
Biometrika 30, 16-28.
Yates, F. (1972). A Monte-Carlo trial on the behaviour of the nonadditivity test with nonnormal data.
Biometrika 59, 253-261.
Yates, F. & Cochran, W.G. (1938). The analysis of groups of experiments. J. Agric. Sci. 28, 556-580.

Resum6
Le r61leen la statistique de I'action reciproque (interaction) est discut6. On classifie les interactions en fonction
de la nature des facteurs qui sont engages. Les tests pour l'interaction sont passes en revue. Les methodes
d'interpretation d'une interaction sont discut6s et la connexion avec l'estimation d'erreur est decrite. Enfin
problemes non resolus sont catalogues.

[Paper received January 1983, revised May 1983]

Discussion of paper by D.R. Cox


A.C. Atkinson
Department of Mathematics, Imperial College, London SW7 2BZ, UK
Professor Cox has provided a cyclopaedic survey. My few comments are mostly
complementary to his wide-ranging paper.
1. In balanced designs, which are the chief concern of the paper, interactions may be
caused by the failure of one or a few units. For example, in an experiment in agriculture
where all factors are individually favourable, it may be that the combination of all factors
at a high level produces conditions under which the plants cannot grow. The simple
additive structure for treatment effects may be salvaged by excluding part of the factor
space.

You might also like