0% found this document useful (0 votes)
31 views

Lecture 19: Interactions

This document discusses interactions in linear models. It defines interactions as when the partial derivative of the mean function with respect to one variable depends on the value of another variable. An interaction term is included in a linear model by adding a product term. Higher-order interactions involving three or more variables are also possible but models become more complex. Interactions can be included between categorical and numerical variables. The document provides examples of modeling interactions in R.

Uploaded by

S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Lecture 19: Interactions

This document discusses interactions in linear models. It defines interactions as when the partial derivative of the mean function with respect to one variable depends on the value of another variable. An interaction term is included in a linear model by adding a product term. Higher-order interactions involving three or more variables are also possible but models become more complex. Interactions can be included between categorical and numerical variables. The document provides examples of modeling interactions in R.

Uploaded by

S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lecture 19: Interactions

Let
m(x) = E[Y |X = x]
where x = (x1 , . . . , xp ). We say that there is no interaction between Xj and Xk if

∂m(x)
∂xi
does not depend on xj .
Consider the linear model
m(x) = β0 + β1 x1 + β2 x2 .
Then ∂m(x)
∂x1 = β1 and
∂m(x)
∂x2 = β2 . There are no interactions.
Now suppose that
m(x) = β0 + β1 x1 + β2 x2 + β3 x1 x2 .
Then ∂m(x) ∂m(x)
∂x1 = β1 + β3 x2 and ∂x2 = β2 + β3 x1 . So we say there is an interaction between x1 and
x2 .
If your model does not fit well, then adding interactions is yet another way to improve the fit
of the model. You could plot the residuals versus X1 X2 or just as the interaction to the model.

1 The Conventional Form of Interactions in Linear Models


The usual way of including interactions in a linear model is to add a product term, as, e.g.,

Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 + . (1)

Once we add such a term, we estimate β3 in exactly the same way we’d estimate any other coefficient.
People often call β1 and β2 the main effects and they call β3 the interaction effect. This is not
the greatest terminology but it is pretty standard. Usually people don’t add interactions into a
model without adding the main effects. So it’s rare to see a model of the form Y = β0 +β3 X1 X2 +.
Adding in the main effects gives a model with more flexibility and generality.

2 Interaction of Categorical and Numerical Variables


If we multiply the indicator variable for a binary category, say XB , with an ordinary numerical
variable, say X1 , we get a different slope on Xi for each category:

Y = β0 + β1 X1 + β1B XB X1 + . (2)

When XB = 0, the slope on X1 is β1 , but when XB = 1, the slope on X1 is β1 + β1B ; the coefficient
for the interaction is the difference in slopes between the two categories.
In fact, look closely at Eq. 2. It says that the categories share a common intercept, but their
regression lines are not parallel (unless β1B = 0). We could expand the model by letting each
category have its own slope and its own intercept:

Y = β0 + βB XB + β1 X1 + β1B XB X1 + .

1
This model is similar to running two separate regressions, one per category. It does, however, insist
on having a single noise variance σ 2 (which separate regressions wouldn’t accomplish). Also, if
there were additional predictors in the model which were not interacted with the category, e.g.,

Y = β0 + βB XB + β1 X1 + β1B XB X1 + β2 X2 + 

then this would definitely not be the same as running two separate regressions. We can also add
categoricla variables and interactions with categorical variables. Just remember that a categorical
variable with k levels requires adding only k − 1 indicator variables.

2.1 Interactions of Categorical Variables with Each Other


Suppose we have two binary categorical variables, with corresponding indicator variables XB and
XC . If we fit a model of the form

Y = β0 + β1 XB + β2 XC + β3 XB XC + 

then we can make the following identifications:

E [Y |XB = 0, XC = 0] = β0 (3)
E [Y |XB = 1, XC = 0] = β0 + β1 (4)
E [Y |XB = 0, XC = 1] = β0 + β2 (5)
E [Y |XB = 1, XC = 1] = β0 + β1 + β2 + β3 (6)

Conversely, these give us four equations in four unknowns, so if we know the group or conditional
means on the left-hand sides, we could solve these equations for the β’s.

3 Higher-Order Interactions
Nothing stops us from considering interactions among three or more variables, rather than just
two. For example

Y = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X1 X2 + β5 X1 X3 + β6 X2 X3 + β7 X1 X2 X3 + .

As you can see, these models get complicated very quickly. Also, we have to ask ourselves: which
interactions should I add? For example, I could have added X12 X2 into the model as well as other
terms. We are now entering the realm of model-building and model-selection that we will discuss
in a future lecture. For now, we will try to keep our models fairly simple.

4 Interactions in R
The lm function is set up to comprehend multiplicative or product interactions in model formulas.
Pure product interactions are denoted by :, so the formula

lm(y ~ x1:x2)

2
tells R to fit the model Y = β0 + βX1 X2 + . (Intercepts are included by default in R.) Since it is
relatively rare to include just a product term without linear terms, it’s more common to use the
symbol *, which expands out to both sets of terms. That is,

lm(y ~ x1*x2)

fits the model


Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 + .
This special use of * in formulas over-rides its ordinary sense of multiplication; if you wanted to
specify a regression on, say 1000X2 , you’d have to write I(1000*x2) rather than 1000*x2. Note
that x1:x1 is the same as x1; if you want higher powers of a variable, use I(x1^2) or poly(x1,2).
The : symbol applies will apply to combinations of variables. Thus

(x1+x2):(x3+x4)

is the same as

x1:x3 + x1:x4 + x2:x3 + x2:x4

Also,

(x1+x2)*(x3+x4)

is the same as

x1 + x2 + x3 + x4 + x1:x3 + x1:x4 + x2:x3 + x2:x4

The reason you can’t just write x1^2 in your model formula is that the power operator also has
a special meaning in formulas, of repeatedly *-ing its argument with itself. That is,

(x1+x2+x3)^2

is the same as

(x1+x2+x3)(x1+x2+x3)

which is

x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3

poly and interactions. If you want to use poly to do polynomial regression and interactions,
do this:

lm(y ~ poly(x1,x2,degree=2)

which fits the model

Y = β0 + β1 X1 + β2 X12 + β3 X2 + β4 X22 + β5 X1 X2 + .

3
4.1 Example
Let’s continue with the mobility data. First, here is a useful trick:

x = c("a","b","c","d","e","f")
y = c("a","b")
z = x %in% y
print(z)
## [1] TRUE TRUE FALSE FALSE FALSE FALSE

The command

%in%

is a matching operator.
Let’s use this to create a binary variable indicating whether a state was or was not part of the
Confederacy in the Civil War.

Confederacy = c("AR","AL","FL","GA","LA","MS","NC","SC","TN","TX","VA")
mobility$Dixie = mobility$State %in% Confederacy
out = lm(Mobility ~ Commute*Dixie,data=mobility)
summary(out)

## Estimate Std. Error t value Pr(>|t|)


## (Intercept) 0.01880 0.00683 2.7600 5.95e-03
## Commute 0.19500 0.01340 14.5000 2.93e-42
## DixieTRUE -0.02120 0.01190 -1.7700 7.64e-02
## Commute:DixieTRUE -0.00131 0.02830 -0.0461 9.63e-01

The coefficient for the interaction is negative, suggesting that increasing the fraction of workers
with short commutes predicts a smaller difference in rates of mobility in the South than it does in the
rest of the country. This coefficient is not significantly different from zero, but, more importantly,
we can be confident it is small, compared to the base-line value of the slope on Commute:

confint(out)
## 2.5 % 97.5 %
## (Intercept) 0.00543 0.03220
## Commute 0.16900 0.22200
## DixieTRUE -0.04470 0.00225
## Commute:DixieTRUE -0.05680 0.05420

Thus, even if the South does have a different slope than the rest of the country, it is not a very
different slope.
The difference in the intercept, however, is more substantial. It, too, is not significant at the
5% level, but that is because (as we see from the confidence interval) it might be quite large and
negative or perhaps just barely positive — it’s not so precisely measured, but it’s either lowering the
expected rate of mobility or adding to it trivially. Of course, we should really do all our diagnostics
here before paying much attention to these inferential statistics.

You might also like