0% found this document useful (0 votes)

31 views

Lecture 19: Interactions

This document discusses interactions in linear models. It defines interactions as when the partial derivative of the mean function with respect to one variable depends on the value of another variable. An interaction term is included in a linear model by adding a product term. Higher-order interactions involving three or more variables are also possible but models become more complex. Interactions can be included between categorical and numerical variables. The document provides examples of modeling interactions in R.

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Lecture 19: Interactions

Uploaded by

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lecture 19: Interactions

Let
m(x) = E[Y |X = x]
where x = (x1 , . . . , xp ). We say that there is no interaction between Xj and Xk if

∂m(x)
∂xi
does not depend on xj .
Consider the linear model
m(x) = β0 + β1 x1 + β2 x2 .
Then ∂m(x)
∂x1 = β1 and
∂m(x)
∂x2 = β2 . There are no interactions.
Now suppose that
m(x) = β0 + β1 x1 + β2 x2 + β3 x1 x2 .
Then ∂m(x) ∂m(x)
∂x1 = β1 + β3 x2 and ∂x2 = β2 + β3 x1 . So we say there is an interaction between x1 and
x2 .
If your model does not fit well, then adding interactions is yet another way to improve the fit
of the model. You could plot the residuals versus X1 X2 or just as the interaction to the model.

1 The Conventional Form of Interactions in Linear Models

The usual way of including interactions in a linear model is to add a product term, as, e.g.,

Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 + . (1)

Once we add such a term, we estimate β3 in exactly the same way we’d estimate any other coefficient.
People often call β1 and β2 the main effects and they call β3 the interaction effect. This is not
the greatest terminology but it is pretty standard. Usually people don’t add interactions into a
model without adding the main effects. So it’s rare to see a model of the form Y = β0 +β3 X1 X2 +.
Adding in the main effects gives a model with more flexibility and generality.

2 Interaction of Categorical and Numerical Variables

If we multiply the indicator variable for a binary category, say XB , with an ordinary numerical
variable, say X1 , we get a different slope on Xi for each category:

Y = β0 + β1 X1 + β1B XB X1 + . (2)

When XB = 0, the slope on X1 is β1 , but when XB = 1, the slope on X1 is β1 + β1B ; the coefficient
for the interaction is the difference in slopes between the two categories.
In fact, look closely at Eq. 2. It says that the categories share a common intercept, but their
regression lines are not parallel (unless β1B = 0). We could expand the model by letting each
category have its own slope and its own intercept:

Y = β0 + βB XB + β1 X1 + β1B XB X1 + .

1
This model is similar to running two separate regressions, one per category. It does, however, insist
on having a single noise variance σ 2 (which separate regressions wouldn’t accomplish). Also, if
there were additional predictors in the model which were not interacted with the category, e.g.,

Y = β0 + βB XB + β1 X1 + β1B XB X1 + β2 X2 +

then this would definitely not be the same as running two separate regressions. We can also add
categoricla variables and interactions with categorical variables. Just remember that a categorical
variable with k levels requires adding only k − 1 indicator variables.

2.1 Interactions of Categorical Variables with Each Other

Suppose we have two binary categorical variables, with corresponding indicator variables XB and
XC . If we fit a model of the form

Y = β0 + β1 XB + β2 XC + β3 XB XC +

then we can make the following identifications:

E [Y |XB = 0, XC = 0] = β0 (3)
E [Y |XB = 1, XC = 0] = β0 + β1 (4)
E [Y |XB = 0, XC = 1] = β0 + β2 (5)
E [Y |XB = 1, XC = 1] = β0 + β1 + β2 + β3 (6)

Conversely, these give us four equations in four unknowns, so if we know the group or conditional
means on the left-hand sides, we could solve these equations for the β’s.

3 Higher-Order Interactions
Nothing stops us from considering interactions among three or more variables, rather than just
two. For example

Y = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X1 X2 + β5 X1 X3 + β6 X2 X3 + β7 X1 X2 X3 + .

As you can see, these models get complicated very quickly. Also, we have to ask ourselves: which
interactions should I add? For example, I could have added X12 X2 into the model as well as other
terms. We are now entering the realm of model-building and model-selection that we will discuss
in a future lecture. For now, we will try to keep our models fairly simple.

4 Interactions in R
The lm function is set up to comprehend multiplicative or product interactions in model formulas.
Pure product interactions are denoted by :, so the formula

lm(y ~ x1:x2)

2
tells R to fit the model Y = β0 + βX1 X2 + . (Intercepts are included by default in R.) Since it is
relatively rare to include just a product term without linear terms, it’s more common to use the
symbol *, which expands out to both sets of terms. That is,

lm(y ~ x1*x2)

fits the model

Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 + .
This special use of * in formulas over-rides its ordinary sense of multiplication; if you wanted to
specify a regression on, say 1000X2 , you’d have to write I(1000*x2) rather than 1000*x2. Note
that x1:x1 is the same as x1; if you want higher powers of a variable, use I(x1^2) or poly(x1,2).
The : symbol applies will apply to combinations of variables. Thus

(x1+x2):(x3+x4)

is the same as

x1:x3 + x1:x4 + x2:x3 + x2:x4

Also,

(x1+x2)*(x3+x4)

is the same as

x1 + x2 + x3 + x4 + x1:x3 + x1:x4 + x2:x3 + x2:x4

The reason you can’t just write x1^2 in your model formula is that the power operator also has
a special meaning in formulas, of repeatedly *-ing its argument with itself. That is,

(x1+x2+x3)^2

is the same as

(x1+x2+x3)(x1+x2+x3)

which is

x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3

poly and interactions. If you want to use poly to do polynomial regression and interactions,
do this:

lm(y ~ poly(x1,x2,degree=2)

which fits the model

Y = β0 + β1 X1 + β2 X12 + β3 X2 + β4 X22 + β5 X1 X2 + .

3
4.1 Example
Let’s continue with the mobility data. First, here is a useful trick:

x = c("a","b","c","d","e","f")
y = c("a","b")
z = x %in% y
print(z)
## [1] TRUE TRUE FALSE FALSE FALSE FALSE

The command

%in%

is a matching operator.
Let’s use this to create a binary variable indicating whether a state was or was not part of the
Confederacy in the Civil War.

Confederacy = c("AR","AL","FL","GA","LA","MS","NC","SC","TN","TX","VA")
mobility$Dixie = mobility$State %in% Confederacy
out = lm(Mobility ~ Commute*Dixie,data=mobility)
summary(out)

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 0.01880 0.00683 2.7600 5.95e-03
## Commute 0.19500 0.01340 14.5000 2.93e-42
## DixieTRUE -0.02120 0.01190 -1.7700 7.64e-02
## Commute:DixieTRUE -0.00131 0.02830 -0.0461 9.63e-01

The coefficient for the interaction is negative, suggesting that increasing the fraction of workers
with short commutes predicts a smaller difference in rates of mobility in the South than it does in the
rest of the country. This coefficient is not significantly different from zero, but, more importantly,
we can be confident it is small, compared to the base-line value of the slope on Commute:

confint(out)
## 2.5 % 97.5 %
## (Intercept) 0.00543 0.03220
## Commute 0.16900 0.22200
## DixieTRUE -0.04470 0.00225
## Commute:DixieTRUE -0.05680 0.05420

Thus, even if the South does have a different slope than the rest of the country, it is not a very
different slope.
The difference in the intercept, however, is more substantial. It, too, is not significant at the
5% level, but that is because (as we see from the confidence interval) it might be quite large and
negative or perhaps just barely positive — it’s not so precisely measured, but it’s either lowering the
expected rate of mobility or adding to it trivially. Of course, we should really do all our diagnostics
here before paying much attention to these inferential statistics.

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Modelling and Parameter Estimation of Dynamic Systems
100% (3)
Modelling and Parameter Estimation of Dynamic Systems
405 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
stats notes
No ratings yet
stats notes
4 pages
CH 06
No ratings yet
CH 06
22 pages
MultivariableRegression 1
No ratings yet
MultivariableRegression 1
30 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
1. Linear regression Model - Applied_Part 1&2
No ratings yet
1. Linear regression Model - Applied_Part 1&2
69 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
CH 2
No ratings yet
CH 2
31 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Exegeses ANOVA III
No ratings yet
Exegeses ANOVA III
26 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
Ra Web
No ratings yet
Ra Web
70 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
Econometrics__2__Notes (2)
No ratings yet
Econometrics__2__Notes (2)
14 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
Lecture3-Enriching the Linear Models Slides-Annotated
No ratings yet
Lecture3-Enriching the Linear Models Slides-Annotated
42 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Elementary Regression Analysis
No ratings yet
Elementary Regression Analysis
25 pages
MultivariableRegression 2
No ratings yet
MultivariableRegression 2
79 pages
Chapter three
No ratings yet
Chapter three
35 pages
Lec3 2019 PDF
No ratings yet
Lec3 2019 PDF
43 pages
R Workshop PART 2
No ratings yet
R Workshop PART 2
36 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
2.linear Regression
No ratings yet
2.linear Regression
49 pages
Efectos de Interacción
No ratings yet
Efectos de Interacción
30 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Dummy Variable
No ratings yet
Dummy Variable
10 pages
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
No ratings yet
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
64 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
110 pages
Stat 473-573 Notes
No ratings yet
Stat 473-573 Notes
139 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Econometric estimation BETA
No ratings yet
Econometric estimation BETA
36 pages
C3-English
No ratings yet
C3-English
31 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Econometrics 1: Classical Linear Regression Analysis
No ratings yet
Econometrics 1: Classical Linear Regression Analysis
20 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
LM Week1 1 2019
No ratings yet
LM Week1 1 2019
28 pages
Supplement 5 - Multiple Regression
No ratings yet
Supplement 5 - Multiple Regression
19 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
HW7
No ratings yet
HW7
1 page
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
1 Review
No ratings yet
1 Review
7 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
MTH6134 Notes11
No ratings yet
MTH6134 Notes11
77 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Chapter 8 Review
No ratings yet
Chapter 8 Review
6 pages
Assignment-15 BA
No ratings yet
Assignment-15 BA
11 pages
1. Basic Summation Notation
No ratings yet
1. Basic Summation Notation
16 pages
Ch. 1 - Endogeneity
No ratings yet
Ch. 1 - Endogeneity
18 pages
Pengaruh Kualitas Produk Dan Harga Terhadap Keputusan Pembelian Produk Kosmetik Wardah Di Kota Bangkalan Madura
No ratings yet
Pengaruh Kualitas Produk Dan Harga Terhadap Keputusan Pembelian Produk Kosmetik Wardah Di Kota Bangkalan Madura
10 pages
Practice Final
No ratings yet
Practice Final
18 pages
Stats CH 6 Final Review
No ratings yet
Stats CH 6 Final Review
3 pages
Exercise 1 (Week 37)
No ratings yet
Exercise 1 (Week 37)
4 pages
ECON 330 - Econometrics - Adeel Tariq
No ratings yet
ECON 330 - Econometrics - Adeel Tariq
4 pages
9B BMGT 220 THEORY of ESTIMATION 2
No ratings yet
9B BMGT 220 THEORY of ESTIMATION 2
4 pages
Full Text 01
No ratings yet
Full Text 01
81 pages
Immediate download Basic Econometrics 5th Edition Gujarati Solutions Manual all chapters
100% (24)
Immediate download Basic Econometrics 5th Edition Gujarati Solutions Manual all chapters
45 pages
Nptel Notes 2
No ratings yet
Nptel Notes 2
21 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Econometrics Test 1
No ratings yet
Econometrics Test 1
3 pages
Dynlm
No ratings yet
Dynlm
7 pages
Linear Regression
No ratings yet
Linear Regression
28 pages
Module 1-ADVANCED STATISTICS
100% (1)
Module 1-ADVANCED STATISTICS
10 pages
Time Series Analysis Using e Views
100% (1)
Time Series Analysis Using e Views
131 pages
MITx+IDS.S24x+2T2024 Time Series Analysis Lecture 7 Annotated
No ratings yet
MITx+IDS.S24x+2T2024 Time Series Analysis Lecture 7 Annotated
13 pages
(eBook PDF) Introduction to Econometrics 4th Edition by James H. Stock download
100% (1)
(eBook PDF) Introduction to Econometrics 4th Edition by James H. Stock download
45 pages
Regression
No ratings yet
Regression
34 pages
Sample Final Exam (SMMD) : Part A: Each Question in This Part Is Worth 1point
No ratings yet
Sample Final Exam (SMMD) : Part A: Each Question in This Part Is Worth 1point
9 pages
Lecture 9 - Geographically Weighted Regression II
No ratings yet
Lecture 9 - Geographically Weighted Regression II
23 pages
REGRESS
No ratings yet
REGRESS
24 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
68 pages
Model Variables Entered Variables Removed Method 1
No ratings yet
Model Variables Entered Variables Removed Method 1
3 pages

Lecture 19: Interactions

Uploaded by

Lecture 19: Interactions

Uploaded by

Lecture 19: Interactions

1 The Conventional Form of Interactions in Linear Models

2 Interaction of Categorical and Numerical Variables

2.1 Interactions of Categorical Variables with Each Other

then we can make the following identifications:

fits the model

x1:x3 + x1:x4 + x2:x3 + x2:x4

x1 + x2 + x3 + x4 + x1:x3 + x1:x4 + x2:x3 + x2:x4

x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3

which fits the model

## Estimate Std. Error t value Pr(>|t|)

You might also like