0% found this document useful (0 votes)
10 views

Lecture 08 Nonlinearity

This document discusses nonlinear relationships and modeling them using regression analysis. It notes that while ordinary least squares (OLS) regression is referred to as a linear estimator, the dependent and independent variables can be transformed to allow for nonlinear relationships. One flexible way to model nonlinearity is using a separate dummy variable for each possible value of the independent variable, known as a saturated model. This saturated model estimates the conditional expectation function without assumptions about the functional form and allows the effect to vary across values of the independent variable. An example saturated regression is shown using number of prenatal visits to predict birth weight.

Uploaded by

zmfcm8fcx9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture 08 Nonlinearity

This document discusses nonlinear relationships and modeling them using regression analysis. It notes that while ordinary least squares (OLS) regression is referred to as a linear estimator, the dependent and independent variables can be transformed to allow for nonlinear relationships. One flexible way to model nonlinearity is using a separate dummy variable for each possible value of the independent variable, known as a saturated model. This saturated model estimates the conditional expectation function without assumptions about the functional form and allows the effect to vary across values of the independent variable. An example saturated regression is shown using number of prenatal visits to predict birth weight.

Uploaded by

zmfcm8fcx9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

GLBL 122 – Applied Quantitative Analysis II

Yale University
Spring 2024

Lecture 8: Nonlinearity
Nonlinear Relationships
• We have often assumed a linear relationship between X
and Y
• We predict that each one unit change in X is associated
with a b1 change in Y, no matter what the value of X
– Each year of schooling will increase earnings by the same
amount
– 1st cigarette smoked has the same association with APGAR
score as the 15th
– The 1st and 20th year of potential experience have the same
association with unemployment duration

• Does this fit the data?

2
Birthweight & Prenatal Visits
5000
4000
3000
2000
1000
0

0 10 20 30 40
total number of prenatal visits

birth weight, grams Predicted bwght - Linear

3
Interpretation
• Straight line does not fit the data
perfectly
5000

– At low levels of visits, the


relationship appears steeper
4000

– Then flattens out at higher


3000

levels of visits
2000

• Observations are below the line


1000

at low and high levels and above


it at medium levels
0

0 10 20
total number of prenatal visits
30 40 – This systematic pattern
birth weight, grams Predicted bwght - Linear
suggests that a different
functional form would be
better

4
Modeling Nonlinear Relationships
• There are many ways to model nonlinear relationships in
OLS, even though you’ll hear OLS referred to as a linear
estimator

• This just means that we are able to write the relationship as:
Yi*= b0 +b1Xi*+ ui

• The coefficients (b ’s) enter linearly: no b1*b1, log(b1), 1/b1

• But Yi* and Xi* can be nonlinear transformations of Y,X


– Yi*=log(Yi), Xi*=XiXi , etc
5
Multiple Dummies
• The most flexible way of allowing a non-linear relationship is
to put in a separate dummy for each possible value for X
• That is, to estimate the conditional expectation function
. tabulate npvis, gen(dnpvis);

total
number of
prenatal
visits Freq. Percent Cum.

0 5 0.28 0.28
1 2 0.11 0.40
2 5 0.28 0.68
3 12 0.68 1.36
4 6 0.34 1.70
5 27 1.53 3.23
6 59 3.34 6.58
7 58 3.29 9.86
8 117 6.63 16.50
9 96 5.44 21.94
10 199 11.28 33.22
11 115 6.52 39.74
12 618 35.03 74.77
13 72 4.08 78.85
14 97 5.50 84.35
15 143 8.11 92.46
16 41 2.32 94.78
17 12 0.68 95.46
18 15 0.85 96.32
19 4 0.23 96.54
20 35 1.98 98.53
21 5 0.28 98.81
22 2 0.11 98.92
23 1 0.06 98.98
24 2 0.11 99.09
25 3 0.17 99.26
26 1 0.06 99.32
30 7 0.40 99.72
33 1 0.06 99.77
35 1 0.06 99.83
36 1 0.06 99.89
40 2 0.11 100.00

Total 1,764 100.00

6
The CEF & OLS
• The OLS and CEF are both about the average (or
mean) of Y for each value of X
– OLS fits a line to the means of Y for each X. That is, it fits
a line to the CEF

– In this sense, think of OLS as a good (linear)


approximation of the CEF. It captures the essential
features of the relationship between Y and X

– Visualize this by plotting the means of Y for each X and


fitting the line for yourself

7
The CEF & OLS

8
“Saturated” Models
• The conditional expectation function dots in Figure
3.1.2 are an example of a “saturated” regression.

– There is a dummy for every possible value of education

– The regression produces coefficients that let you exactly


calculate the mean for every value of education

• What is the regression equation?

9
Saturated Models More Generally

10
Interpreting Parameters

11
“Saturation” and Functional Form
• The fully saturated model puts no assumptions on
the functional form of the CEF
– The return to each year of schooling can be different
(and doesn’t even have to be positive)
– The level of earnings between men and women can be
different
– The return to each year of schooling can be different
between men and women
• More generally specified:

12
But
• A fully saturated model can need a lot of data
• Many interaction terms will be uninteresting and/or
imprecisely estimated
• You may want to make assumptions about the CEF
and omit some of the interaction terms
• For example, omitting the interaction terms in the
example above makes the simplifying assumption
that the returns to college are similar for men and
women

13
The Saturated Regression
. regress bwght dnpvis2-dnpvis32, robust;

Linear regression Number of obs = 1764


F( 26, 1732) = .
Prob > F = .
R-squared = 0.0382
Root MSE = 574.22

Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]

dnpvis2 -77.4 224.9696 -0.34 0.731 -518.6406 363.8406


dnpvis3 -421.4 593.7966 -0.71 0.478 -1586.034 743.2338
dnpvis4 -241.9833 311.1766 -0.78 0.437 -852.3048 368.3381
dnpvis5 -129.5667 335.8006 -0.39 0.700 -788.184 529.0507
dnpvis6 -254.1778 282.2709 -0.90 0.368 -807.8054 299.4499
dnpvis7 -360.0441 250.5748 -1.44 0.151 -851.505 131.4169
dnpvis8 -156.2103 240.5693 -0.65 0.516 -628.0471 315.6265
dnpvis9 -68.28889 230.1569 -0.30 0.767 -519.7035 383.1258
dnpvis10 17.80833 230.2451 0.08 0.938 -433.7794 469.396
dnpvis11 -67.45528 227.1256 -0.30 0.767 -512.9246 378.0141
dnpvis12 111.5304 227.8723 0.49 0.625 -335.4034 558.4643
dnpvis13 4.985113 224.4883 0.02 0.982 -435.3116 445.2818
dnpvis14 156.6278 229.7186 0.68 0.495 -293.9273 607.1828
dnpvis15 118.4351 229.6195 0.52 0.606 -331.9256 568.7957
dnpvis16 -35.94545 227.4096 -0.16 0.874 -481.9717 410.0808
dnpvis17 100.5024 238.4421 0.42 0.673 -367.1623 568.1672
dnpvis18 66.51667 281.3691 0.24 0.813 -485.3423 618.3757
dnpvis19 132.2 273.3553 0.48 0.629 -403.9411 668.3411
dnpvis20 -84.65 247.4858 -0.34 0.732 -570.0524 400.7524
dnpvis21 4.885714 248.9302 0.02 0.984 -483.3496 493.1211
dnpvis22 -139 447.2391 -0.31 0.756 -1016.185 738.1855
dnpvis23 -819.4 223.6711 -3.66 0.000 -1258.094 -380.7062
dnpvis24 -664.4 223.4148 -2.97 0.003 -1102.591 -226.2089
dnpvis25 285.1 337.5059 0.84 0.398 -376.8619 947.0619
dnpvis26 -20.06667 423.6433 -0.05 0.962 -850.9729 810.8396
dnpvis27 375.6 223.4148 1.68 0.093 -62.59112 813.7911
dnpvis28 72.45714 349.7399 0.21 0.836 -613.4998 758.4141
dnpvis29 -104.4 223.4148 -0.47 0.640 -542.5911 333.7911
dnpvis30 -524.4 223.4148 -2.35 0.019 -962.5911 -86.20888
dnpvis31 725.6 223.4148 3.25 0.001 287.4089 1163.791
dnpvis32
_cons
468.1
3414.4
392.6624
223.4148
1.19
15.28
0.233
0.000
-302.0424
2976.209
1238.242
3852.591 14
5000
4000
3000
2000
1000
0 Saturated Regression (CEF)

0 10 20 30 40
total number of prenatal visits

birth weight, grams Predicted bwght - Linear


Predicted bwght - CEF
15
Drawbacks to Saturated Model
• Though the line fits the data better (recall
that the CEF is the best predictor of Y), …
– The line jumps all over the place; hard to
interpret

– Point estimates are very imprecise


• Q: Why is that the case?

– Does not exploit the fact that outcomes at 30 and


31 should be pretty similar
16
Quadratics Help

• Adding X2 to the regression allows us to add


curvature to the line without adding a ton of
parameters

birthweight= b0 +b1visits+ b2visits2+ u

• When we include both X and X2 in the regression


we say we have “included a quadratic in X”

17
Quadratics in Stata
. gen npvis2 = npvis*npvis;

. regress bwght npvis npvis2, robust;

Linear regression Number of obs = 1764


F( 2, 1761) = 7.98
Prob > F = 0.0004
R-squared = 0.0136
Root MSE = 576.71

Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]

npvis 40.20734 13.00623 3.09 0.002 14.69806 65.71661


npvis2 -.8508667 .4153901 -2.05 0.041 -1.665576 -.0361571
_cons 3060.49 99.21116 30.85 0.000 2865.906 3255.074

18
Quadratics
5000
4000
3000
2000
1000
0

0 10 20 30 40
total number of prenatal visits

birth weight, grams Predicted bwght - Linear


Predicted bwght - Quadratic

19
Caution: Interpreting Coefficients on
Quadratics
• In a quadratic specification, the predicted change in Y
associated with a one-unit change in X depends on the level
of X (this was the whole point)
• We need to take special care with our interpretation of
coefficients
• We can’t get our predicted change in Y for a unit change in
X directly from the Stata output
– b1 no longer represents the predicted change in Y given a one-unit
change in X
– When we change X we also change X2
– We need to take the coefficients on both X and X2 into account

20
How to calculate predicted changes in
Y with quadratics
• The predicted change in Y associated with a one-unit
change in X must be computed for specific values of X
• We can calculate the predicted change in one of two ways:
– Numerically – This is exact
• Calculate Ŷ at X = X → Ŷ
• Calculate Ŷ at X = X + ΔX → Ŷ + ΔŶ
• Take the difference

– Formulaically – This is approximate


DY = (b1 + 2b2X)DX
• This is the derivative of Y with respect to X. It is the
instantaneous slope at a given point. The slope changes
between X and DX, which is why this is only approximate
21
Thinking Graphically about Quadratics
• We can say something about the shape of the relationship without
computing (or graphing) predicted changes
• The coefficients on X and X2 tell us the shape

b1 >0, b2>0 b1 >0, b2<0


Y increases in X at an Y increases in X at a
Y increasing rate Y decreasing rate

X X
b1 <0, b2<0 b1 <0, b2>0
Y decreases in X at an Y decreases in X at a
Y increasing rate Y decreasing rate

22

X X
Testing Quadratic vs. Linear Form
. gen npvis2 = npvis*npvis;

. regress bwght npvis npvis2, robust;

Linear regression Number of obs = 1764


F( 2, 1761) = 7.98
Prob > F = 0.0004
R-squared = 0.0136
Root MSE = 576.71

Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]

npvis 40.20734 13.00623 3.09 0.002 14.69806 65.71661


npvis2 -.8508667 .4153901 -2.05 0.041 -1.665576 -.0361571
_cons 3060.49 99.21116 30.85 0.000 2865.906 3255.074

• We want to test whether the quadratic or linear specification


fits better
– Test the null hypothesis that the coefficient on the squared term is
zero
– This is the same as testing the null hypotheses that the relationship
between birth weight and prenatal visits is linear
23
Multiple Hypothesis Tests
. gen npvis2 = npvis*npvis;

. regress bwght npvis npvis2, robust;

Linear regression Number of obs = 1764


F( 2, 1761) = 7.98
Prob > F = 0.0004
R-squared = 0.0136
Root MSE = 576.71

Robust
bwght Coef. Std. Err. t P>|t| [95% Conf. Interval]

npvis 40.20734 13.00623 3.09 0.002 14.69806 65.71661


npvis2 -.8508667 .4153901 -2.05 0.041 -1.665576 -.0361571
_cons 3060.49 99.21116 30.85 0.000 2865.906 3255.074

• Say we want to test the null hypothesis that birthweight is


unrelated to prenatal visits

• How would we do this?

24
Cautions
5000
4000
3000
2000

Do we really think that each visit


1000

beyond 24 makes the baby worse


off?
0

0 10 20 30 40
total number of prenatal visits

birth weight, grams Predicted bwght - Linear


Predicted bwght - Quadratic

25
Why Should We Care?
• To the extent that we want a model that “fits the
data best” we should care about nonlinearities
– More precise estimates
– Better prediction

• Policy importance
– Diminishing vs. constant marginal returns – what does
one more dollar spent get us?
• Think health care
– “Optimal” amount of investment – what is the optimal
number of prenatal visits?
26

You might also like