0% found this document useful (0 votes)
13 views2 pages

Handout 5 Nonlinear and Interaction

The document discusses nonlinear regression specifications including polynomials, logarithms, and interaction terms. Polynomials can fit nonlinear relationships by including additional transformed variables like income squared. Logarithms can be used to model proportional relationships, with log-linear modeling percentage changes. Interaction terms allow effects to depend on other variables. Proper specification and interpretation of these nonlinear models is covered.

Uploaded by

mkevane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

Handout 5 Nonlinear and Interaction

The document discusses nonlinear regression specifications including polynomials, logarithms, and interaction terms. Polynomials can fit nonlinear relationships by including additional transformed variables like income squared. Logarithms can be used to model proportional relationships, with log-linear modeling percentage changes. Interaction terms allow effects to depend on other variables. Proper specification and interpretation of these nonlinear models is covered.

Uploaded by

mkevane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

ECON 41

Outline #5: Nonlinear specifications and interaction terms

6.1. Nonlinear regression specifications


• In many cases a simple linear relationship may not adequately capture the relationship in the data.
• Fortunately, the OLS regression can be modified in various ways to handle this.
• Three kinds of non-linear relationships
o Polynomial
o Logarithmic
o Interaction terms

6.2. Quadratics and other polynomials


• Consider the relationship between average test score and average district income (in $thousands): positively
associated, but not linear!
• Using “linear” regression we can fit a curve by using additional variables or transformed variables.
• Example: quadratic

• To run this in R (or other software): create a new variable for income squared and add it as a regressor. Or,
one can create squared variables within the regression using I(avginc^2) as a regressor.
• Is the quadratic better than linear? Test whether coefficient on income squared is zero.
• Interpreting the results: In the quadratic specification, the effect of district income on test scores is now
captured by two coefficients, and the effect depends on the value of income itself.
• Higher-order polynomials: how to decide?
o Rule of thumb: Choose maximum order and run regression, drop highest order term if not
significant, try again
o Overfitting: Use caution: It is possible to include “too many” polynomial terms

6.3. Logarithms
• The natural log, ln(x) or log(x), is the inverse of (“undoes”) the exponential function exp(x) = ex
• Regressions using the log: linear-log, log-linear, log-log
o linear-log: (divide by 100 to interpret as change in Y for 1% change in X)
o log-linear: (multiply by 100 to interpret as % change in Y for change in X)
o log-log: ( in log-log is interpreted as the elasticity)
• Interpretation: the change in the log is approximately the proportional change
o Hence for example in the log-linear earnings equation, ,

• To predict the level of Y in the log-linear or log-log specifications, we must take the anti-log (exponential).
So for the log-linear (e.g. to predict $ earnings given years of education):

6.4. Interaction terms


• Interactions allow the effect of one variable to depend on another variable, and vice versa
• Interaction between a binary and continuous variable

o This specification allows the effect of education to be different for males and females:

o Interpretation: write out the two versions of the equation for D = 0 and D = 1
• Interaction between two binary (dummy) variables
o Example: female might be one dummy variable, and whether the person has any children might be a
second dummy variable. So women with children might have higher or lower earnings, depending
on the estimate of the coefficient.

• Interaction between two continuous variables

o Example: str and el_pct


• Interactions in R

6.5. Some rules and tips for regressions with binary variables and interactions

(a) Using sets of binary variables to capture categorical effects


• Make sure you know which category or categories you have excluded: This is the reference group
• Examples from log earnings regressions using marital status
o If you just include a binary variable for divorced, the reference group is everyone who is not
divorced, which includes married people, never married people, widowed people, etc.
o If you include binary variables for divorced and married, the reference group is everyone who is
neither divorced nor married, which includes never married people, widowed people, etc.
o If you include dummies for all but one category, that left-out category becomes the reference
group for all the included marital status binary variables. You need to know what that reference
group is!

(b) Interaction terms


• When you include an interaction term, you MUST include the two variables by themselves:
o THIS:
o NOT this:
• You can include interactions for some variables but not others: for example you could interact female with
education but not experience. But make sure you have both variables separately plus the interaction, as
noted.

(c) Quadratics and other polynomials


• If you include the square (such as exper^2), you should always include the linear term (exper) as well.
• If you have exper and exper^2 and want to interact experience with female, you need to include interaction
terms for BOTH exper and exper^2. Hence:
o

(d) Anticipating how to do joint hypothesis test, F-tests can be useful to test whether a set of variables “matters”
• Example: You run a regression with these four binary variables: married, separated, divorced, and
widowed. The reference category is never married.

• A null hypothesis that none of the marital status variables has an effect on earnings, holding other variables
constant:

• We shall see later that we can use lht(…) to run this test in R.

You might also like