Handout 5 Nonlinear and Interaction
Handout 5 Nonlinear and Interaction
• To run this in R (or other software): create a new variable for income squared and add it as a regressor. Or,
one can create squared variables within the regression using I(avginc^2) as a regressor.
• Is the quadratic better than linear? Test whether coefficient on income squared is zero.
• Interpreting the results: In the quadratic specification, the effect of district income on test scores is now
captured by two coefficients, and the effect depends on the value of income itself.
• Higher-order polynomials: how to decide?
o Rule of thumb: Choose maximum order and run regression, drop highest order term if not
significant, try again
o Overfitting: Use caution: It is possible to include “too many” polynomial terms
6.3. Logarithms
• The natural log, ln(x) or log(x), is the inverse of (“undoes”) the exponential function exp(x) = ex
• Regressions using the log: linear-log, log-linear, log-log
o linear-log: (divide by 100 to interpret as change in Y for 1% change in X)
o log-linear: (multiply by 100 to interpret as % change in Y for change in X)
o log-log: ( in log-log is interpreted as the elasticity)
• Interpretation: the change in the log is approximately the proportional change
o Hence for example in the log-linear earnings equation, ,
• To predict the level of Y in the log-linear or log-log specifications, we must take the anti-log (exponential).
So for the log-linear (e.g. to predict $ earnings given years of education):
o This specification allows the effect of education to be different for males and females:
o Interpretation: write out the two versions of the equation for D = 0 and D = 1
• Interaction between two binary (dummy) variables
o Example: female might be one dummy variable, and whether the person has any children might be a
second dummy variable. So women with children might have higher or lower earnings, depending
on the estimate of the coefficient.
6.5. Some rules and tips for regressions with binary variables and interactions
(d) Anticipating how to do joint hypothesis test, F-tests can be useful to test whether a set of variables “matters”
• Example: You run a regression with these four binary variables: married, separated, divorced, and
widowed. The reference category is never married.
• A null hypothesis that none of the marital status variables has an effect on earnings, holding other variables
constant:
• We shall see later that we can use lht(…) to run this test in R.