Week 1 Non Linear
Week 1 Non Linear
Econometrics
Lecture 1
Non-Linear Regression
Models
Preview
Linear and Nonlinear Regression
Estimation of Linear and Nonlinear Models
Approaches to Estimating Nonlinear
Regression Models
Learning Outcomes
To explain the linear and nonlinear models
To estimate the nonlinear model
To identify the techniques for estimating
the parameters of a nonlinear model
Non-Linear Regression
Introduction
Previously we have fitted, by least squares, the
General Linear model which were of the type:
Y = b0 + b1X1 + b2 X2 + ... + bpXp + e
7
Example: Quadratic Regression Model
The Effect of Transparency on Economic Growth
Empirical Model:
rgdpci = 1 + 2TIi + 3TI2i + 4Ki +5HCi +
6Popgrowthi + i
8
Example: Quadratic Regression Model…
K = physical capital (% of GDP)
HC = human capital (years of schooling)
Popgrowth = population growth (%)
= the usual disturbance term.
Notice that TI and TI2 are without and with square term.
This means a new variable is generated as the square of
the variable.
9
Nonlinear Regression
Some popular nonlinear regression models:
1. Exponential model: ( y aebx )
2. Power model: ( y ax b )
ax
3. Saturation growth model: y
b x
4. Polynomial model: ( y a 0 a1 x ... amx m )
10
Nonlinear Regression
Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn ) best fit y f (x )
to the data, where f (x) is a nonlinear function of x .
( xn , y n )
( x2 , y 2 )
y f (x)
( xi , yi )
yi f ( xi )
(x , y )
1 1
11
Example: The TestScore – STR
relation looks linear (maybe)…
Example: But the TestScore –
Income relation looks nonlinear...
Nonlinear Regression Functions – General
Ideas
If a relation between Y and X is nonlinear:
14
The general nonlinear regression function
Yi = f(X1i, X2i,…, Xki) + ui, i = 1,…, n
Assumptions
1. E(ui| X1i,X2i,…,Xki) = 0 (same); implies that f is the
conditional expectation of Y given the X’s.
2. (X1i,…,Xki,Yi) are i.i.d. (same).
3. Big outliers are rare (same idea; the precise mathematical
condition depends on the specific f).
4. No perfect multicollinearity (same idea; the precise statement
depends on the specific f).
15
Nonlinear Functions of a Single
Independent Variable
We’ll look at two complementary approaches:
1. Polynomials in X
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial
2. Logarithmic transformations
Y and/or X is transformed by taking its logarithm
this gives a “percentages” interpretation that makes sense
in many applications
16
1. Polynomials in X
Approximate the population regression function by a polynomial:
Yi = 0 + 1Xi + 2 X i2 +…+ r X ir + ui
17
Example: the TestScore – Income
relation
Incomei = average district income in the ith district
(thousands of dollars per capita)
Quadratic specification:
Cubic specification:
18
Estimation of the quadratic
specification in STATA
generate avginc2 = avginc*avginc; Create a new regressor
reg testscr avginc avginc2, r;
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------
20
Interpreting the estimated
regression function….
(b) Compute “effects” for different values of X
·
TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2
(2.9) (0.27) (0.0048)
·
TestScore = 607.3 + 3.85 6 – 0.0423 62
------------------------------------------------------------------------------
| Robust
23
Testing the null hypothesis of linearity, against the alternative
that the population regression is quadratic and/or cubic, that is, it
is a polynomial of degree up to 3:
test avginc2 avginc3; Execute the test command after running the regression
( 1) avginc2 = 0.0
( 2) avginc3 = 0.0
F( 2, 416) = 37.69
Prob > F = 0.0000
x x
Here’s why: ln(x+x) – ln(x) = ln 1
x x
d ln( x ) 1
(calculus: )
dx x
Numerically:
ln(1.01) = .00995 .01;
27
I. Linear-log population regression
function
Y = 0 + 1ln(X) (b)
X
now ln(X + X) – ln(X) ,
X
X
so Y 1
X
Y
or 1 (small X)
X / X
28
Linear-log case, continued
Yi = 0 + 1ln(Xi) + ui
X
Now 100 = percentage change in X, so a 1% increase in X
X
(multiplying X by 1.01) is associated with a .011 change in Y.
(1% increase in X .01 increase in ln(X)
.011 increase in Y)
29
Example: TestScore vs. ln(Income)
First defining the new regressor, ln(Income)
The model is now linear in ln(Income), so the linear-log model
can be estimated by OLS:
·
TestScore = 557.8 + 36.42 ln(Incomei)
(3.8) (1.40)
31
II. Log-linear population regression
function
ln(Y) = 0 + 1X (b)
Y
so 1X
Y
Y / Y
or 1 (small X)
X
32
Log-linear case, continued
ln(Yi) = 0 + 1Xi + ui
Y / Y
for small X, 1
X
Y
Now 100 = percentage change in Y, so a change in X by
Y
one unit (X = 1) is associated with a 1001% change in Y.
1 unit increase in X 1 increase in ln(Y)
1001% increase in Y
Note: What are the units of ui and the SER?
fractional (proportional) deviations
for example, SER = .2 means…
33
III. Log-log population regression
function
ln(Yi) = 0 + 1ln(Xi) + ui (b)
Y X
so 1
Y X
Y / Y
or 1 (small X)
X / X
34
Log-log case, continued
ln(Yi) = 0 + 1ln(Xi) + ui
35
Example: ln( TestScore) vs. ln( Income)
First defining a new dependent variable, ln(TestScore), and the
new regressor, ln(Income)
The model is now a linear regression of ln(TestScore) against
ln(Income), which can be estimated by OLS:
·
ln(TestScore) = 6.336 + 0.0554 ln(Incomei)
(0.006) (0.0021)
36
Example: ln( TestScore) vs. ln( Income),
ctd.
·
ln(TestScore) = 6.336 + 0.0554 ln(Incomei)
(0.006) (0.0021)
37
The log-linear and log-log specifications:
39
Other nonlinear functions (and nonlinear
least squares)
The foregoing nonlinear regression functions have flaws…
Polynomial: test score can decrease with income
Linear-log: test score increases with income, but without
bound
How about a nonlinear function that has has test score always
increasing and builds in a maximum score
Y = 0 e 1 X
41
Nonlinear Least Squares
Models that are linear in the parameters can be estimated by
OLS.
Models that are nonlinear in one or more parameters can be
estimated by nonlinear least squares (NLS) (but not by OLS)
The NLS problem for the proposed specification:
n
2
min 0 ,1 ,2 Yi 0 1 e 1 ( X i 2 )
i 1
This is a nonlinear minimization problem (a “hill-climbing”
problem). How could you solve this?
Guess and check / Try-and-error Method
There are better ways..
Implementation in STATA…
42
. nl (testscr = {b0=720}*(1 - exp(-1*{b1}*(avginc-{b2})))), r
(obs = 420)
Iteration 0: residual SS = 1.80e+08 .
Iteration 1: residual SS = 3.84e+07 .
Iteration 2: residual SS = 4637400 .
Iteration 3: residual SS = 300290.9 STATA is “climbing the hill”
Iteration 4: residual SS = 70672.13 (actually, minimizing the SSR)
Iteration 5: residual SS = 66990.31 .
Iteration 6: residual SS = 66988.4 .
Iteration 7: residual SS = 66988.4 .
Iteration 8: residual SS = 66988.4
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
b0 | 703.2222 4.438003 158.45 0.000 694.4986 711.9459
b1 | .0552339 .0068214 8.10 0.000 .0418253 .0686425
b2 | -34.00364 4.47778 -7.59 0.000 -42.80547 -25.2018
------------------------------------------------------------------------------
(SEs, P values, CIs, and correlations are asymptotic approximations)
43
Negative exponential growth; RMSE = 12.675
Linear-log; RMSE = 12.618 (oh well…)
44
Techniques for Estimating the
Parameters of a Nonlinear System
In some nonlinear problems it is convenient to
determine equations (the Normal Equations) for
the least squares estimates ,
51
Demean Method [Quadratic
Model]
Balli and Sørensen (2012)
Reduce the
collinearity problem
Reduce the
collinearity problem
53
Demean Method
Balli and Sørensen (2012)
54
Step 1: Generate the demean variables
. summarize fdi hc
. generate dmfdi=fdi-3.140721
. generate dmhc=hc-0.4183179
Time series
. generate fdihc=dmfdi*dmhc
data, generate
manually
without
command
55
Step 2: Estimate with OLS robust standard
error
. reg patent fdi hc fdihc pri gdpc ipr, robust
------------------------------------------------------------------------------
| Robust
patent | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fdi | -.7771776 .2658697 -2.92 0.006 -1.31749 -.2368653
hc | -.7956572 5.995207 -0.13 0.895 -12.97938 11.38807
fdihc | .1031641 1.624792 0.06 0.950 -3.198811 3.40514
pri | .1420312 .4682599 0.30 0.763 -.8095874 1.09365
gdpc | 4.095409 1.233559 3.32 0.002 1.588516 6.602303
ipr | .5103322 1.827804 0.28 0.782 -3.204212 4.224876
_cons | -31.67487 9.785524 -3.24 0.003 -51.56145 -11.7883
------------------------------------------------------------------------------
57
Original interaction term without demean
. generate fdihc1=fdi*hc
------------------------------------------------------------------------------
| Robust
patent | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fdi | -.8203331 .7250095 -1.13 0.266 -2.29373 .6530635
hc | -1.119668 8.295886 -0.13 0.893 -17.97894 15.7396
fdihc1 | .1031643 1.624791 0.06 0.950 -3.198809 3.405138
pri | .1420312 .46826 0.30 0.763 -.8095875 1.09365
gdpc | 4.095409 1.233559 3.32 0.002 1.588517 6.602302
ipr | .5103321 1.827803 0.28 0.782 -3.204211 4.224875
_cons | -31.53934 7.937862 -3.97 0.000 -47.67101 -15.40766
------------------------------------------------------------------------------
. outreg2 using int.doc
int.doc
dir : seeout
58
Data
(1) (2) (3) (4) (5) (6) (7) (8) (7) X (8) (2) X (5)
year patent fdi pri gdpc hc ipr dmfdi dmhc fdihc fdihc1
1970 1.79176 1.80446 2.95647 8.37316 0.2135 0.462685 -1.33626 -0.20482 0.273691 0.385252
1971 1.94591 1.81262 3.05447 8.40433 0.2219 0.462685 -1.3281 -0.19642 0.260863 0.40222
1972 2.07944 1.76768 3.12632 8.46943 0.231 0.462685 -1.37304 -0.18732 0.257195 0.408334
1973 2.19722 1.92884 3.26308 8.55591 0.2394 0.462685 -1.21188 -0.17892 0.216827 0.461765
1974 1.38629 3.03424 3.27903 8.61184 0.2492 0.462685 -0.10648 -0.16912 0.018007 0.756134
1975 2.3979 2.521 3.47383 8.59634 0.259 0.462685 -0.61972 -0.15932 0.098733 0.652938
1976 2.48491 2.56762 3.46417 8.68268 0.2695 0.462685 -0.57311 -0.14882 0.085288 0.691972
1977 2.63906 2.5234 3.5169 8.73451 0.28 0.462685 -0.61732 -0.13832 0.085386 0.706553
1978 2.77259 2.6067 3.6504 8.77595 0.2912 0.462685 -0.53403 -0.12712 0.067884 0.75907
1979 2.89037 2.59776 3.70696 8.84193 0.3024 0.462685 -0.54296 -0.11592 0.062939 0.785563
1980 3.04452 3.01191 3.89202 8.88966 0.3143 0.462685 -0.12881 -0.10402 0.013399 0.946643
1981 3.17805 3.02916 4.03813 8.93206 0.3199 0.500787 -0.11156 -0.09842 0.01098 0.969027
1982 3.29584 3.0816 4.12099 8.96446 0.3248 0.538888 -0.05912 -0.09352 0.005529 1.000904
1983 3.43399 3.00598 4.23642 8.99907 0.3304 0.57699 -0.13474 -0.08792 0.011846 0.993175
59
Model (1) Model (2)
Demean Original
VARIABLES patent patent
fdi -0.777*** -0.820
(0.266) (0.725)
hc -0.796 -1.120
(5.995) (8.296)
fdihc 0.103 0.103
(1.625) (1.625)
pri 0.142 0.142
(0.468) (0.468)
gdpc 4.095*** 4.095***
(1.234) (1.234)
ipr 0.510 0.510
(1.828) (1.828)
Constant -31.67*** -31.54***
(9.786) (7.938)
Observations 41 41
R-squared 0.956 0.956
Robust standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1 60
61