Lecture 08 Nonlinearity
Lecture 08 Nonlinearity
Yale University
Spring 2024
Lecture 8: Nonlinearity
Nonlinear Relationships
• We have often assumed a linear relationship between X
and Y
• We predict that each one unit change in X is associated
with a b1 change in Y, no matter what the value of X
– Each year of schooling will increase earnings by the same
amount
– 1st cigarette smoked has the same association with APGAR
score as the 15th
– The 1st and 20th year of potential experience have the same
association with unemployment duration
2
Birthweight & Prenatal Visits
5000
4000
3000
2000
1000
0
0 10 20 30 40
total number of prenatal visits
3
Interpretation
• Straight line does not fit the data
perfectly
5000
levels of visits
2000
0 10 20
total number of prenatal visits
30 40 – This systematic pattern
birth weight, grams Predicted bwght - Linear
suggests that a different
functional form would be
better
4
Modeling Nonlinear Relationships
• There are many ways to model nonlinear relationships in
OLS, even though you’ll hear OLS referred to as a linear
estimator
• This just means that we are able to write the relationship as:
Yi*= b0 +b1Xi*+ ui
total
number of
prenatal
visits Freq. Percent Cum.
0 5 0.28 0.28
1 2 0.11 0.40
2 5 0.28 0.68
3 12 0.68 1.36
4 6 0.34 1.70
5 27 1.53 3.23
6 59 3.34 6.58
7 58 3.29 9.86
8 117 6.63 16.50
9 96 5.44 21.94
10 199 11.28 33.22
11 115 6.52 39.74
12 618 35.03 74.77
13 72 4.08 78.85
14 97 5.50 84.35
15 143 8.11 92.46
16 41 2.32 94.78
17 12 0.68 95.46
18 15 0.85 96.32
19 4 0.23 96.54
20 35 1.98 98.53
21 5 0.28 98.81
22 2 0.11 98.92
23 1 0.06 98.98
24 2 0.11 99.09
25 3 0.17 99.26
26 1 0.06 99.32
30 7 0.40 99.72
33 1 0.06 99.77
35 1 0.06 99.83
36 1 0.06 99.89
40 2 0.11 100.00
6
The CEF & OLS
• The OLS and CEF are both about the average (or
mean) of Y for each value of X
– OLS fits a line to the means of Y for each X. That is, it fits
a line to the CEF
7
The CEF & OLS
8
“Saturated” Models
• The conditional expectation function dots in Figure
3.1.2 are an example of a “saturated” regression.
9
Saturated Models More Generally
10
Interpreting Parameters
11
“Saturation” and Functional Form
• The fully saturated model puts no assumptions on
the functional form of the CEF
– The return to each year of schooling can be different
(and doesn’t even have to be positive)
– The level of earnings between men and women can be
different
– The return to each year of schooling can be different
between men and women
• More generally specified:
12
But
• A fully saturated model can need a lot of data
• Many interaction terms will be uninteresting and/or
imprecisely estimated
• You may want to make assumptions about the CEF
and omit some of the interaction terms
• For example, omitting the interaction terms in the
example above makes the simplifying assumption
that the returns to college are similar for men and
women
13
The Saturated Regression
. regress bwght dnpvis2-dnpvis32, robust;
Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]
0 10 20 30 40
total number of prenatal visits
17
Quadratics in Stata
. gen npvis2 = npvis*npvis;
Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]
18
Quadratics
5000
4000
3000
2000
1000
0
0 10 20 30 40
total number of prenatal visits
19
Caution: Interpreting Coefficients on
Quadratics
• In a quadratic specification, the predicted change in Y
associated with a one-unit change in X depends on the level
of X (this was the whole point)
• We need to take special care with our interpretation of
coefficients
• We can’t get our predicted change in Y for a unit change in
X directly from the Stata output
– b1 no longer represents the predicted change in Y given a one-unit
change in X
– When we change X we also change X2
– We need to take the coefficients on both X and X2 into account
20
How to calculate predicted changes in
Y with quadratics
• The predicted change in Y associated with a one-unit
change in X must be computed for specific values of X
• We can calculate the predicted change in one of two ways:
– Numerically – This is exact
• Calculate Ŷ at X = X → Ŷ
• Calculate Ŷ at X = X + ΔX → Ŷ + ΔŶ
• Take the difference
X X
b1 <0, b2<0 b1 <0, b2>0
Y decreases in X at an Y decreases in X at a
Y increasing rate Y decreasing rate
22
X X
Testing Quadratic vs. Linear Form
. gen npvis2 = npvis*npvis;
Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]
Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]
24
Cautions
5000
4000
3000
2000
0 10 20 30 40
total number of prenatal visits
25
Why Should We Care?
• To the extent that we want a model that “fits the
data best” we should care about nonlinearities
– More precise estimates
– Better prediction
• Policy importance
– Diminishing vs. constant marginal returns – what does
one more dollar spent get us?
• Think health care
– “Optimal” amount of investment – what is the optimal
number of prenatal visits?
26