Week4 Nonlinear Models
Week4 Nonlinear Models
5th Edition
C. DOUGHERTY
Chapter heading
FALL 2024
Introduction to Econometrics
Chapter heading
LINEARITY and NONLINEARITY
LINEARITY AND NONLINEARITY
Y = 1 + 2 X 2 + 3 X 3 + 4 X 4 + u
1
LINEARITY AND NONLINEARITY
Y = 1 + 2 X 2 + 3 X 3 + 4 X 4 + u
Y = 1 + 2 X 22 + 3 X 3 + 4 log X 4 + u
4
LINEARITY AND NONLINEARITY
Y = 1 + 2 X 2 + 3 X 3 + 4 X 4 + u
Y = 1 + 2 X 22 + 3 X 3 + 4 log X 4 + u
Z 2 = X 22 , Z 3 = X 3 , Z 4 = log X 4
5
LINEARITY AND NONLINEARITY
Y = 1 + 2 X 2 + 3 X 3 + 4 X 4 + u
Y = 1 + 2 X 22 + 3 X 3 + 4 log X 4 + u
Z 2 = X 22 , Z 3 = X 3 , Z 4 = log X 4
Y = 1 + 2 Z 2 + 3 Z 3 + 4 Z4 + u
With these cosmetic transformations, we have made the model linear in both
variables and parameters.
6
LINEARITY AND NONLINEARITY
Y = 1 + 2 X 2 + 3 X 3 + 4 X 4 + u
Y = 1 + 2 X 22 + 3 X 3 + 4 log X 4 + u
Z 2 = X 22 , Z 3 = X 3 , Z 4 = log X 4
Y = 1 + 2 Z 2 + 3 Z 3 + 4 Z4 + u
Nonlinear in parameters:
Y = 1 + 2 X 2 + 3 X 3 + 2 3 X 4 + u
This model's parameters are nonlinear since the coefficient of X4 is the product of the
coefficients of X2 and X3. As we will see, some nonlinear models can be linearized by
appropriate transformations, but this is not one of them.
7
LINEARITY AND NONLINEARITY
We will start with an example of a simple model that can linearize a cosmetic
transformation. The table displays the average annual employment and GDP
growth rates for 31 OECD countries. 8
LINEARITY AND NONLINEARITY
2
e = 1 + +u
g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
GDP growth rate
A plot of the data reveals that the relationship is clearly nonlinear. We will
consider various nonlinear specifications for the relationship in the course
of this chapter, starting with the hyperbolic model shown. 9
LINEARITY AND NONLINEARITY
2 1
e = 1 + +u z= e = 1 + 2 z + u
g g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
GDP growth rate
The data table displays the values of z, which are derived from the values of
g. In practice, you won’t need to do these calculations manually. Regression
applications typically include features that allow you to generate new
variables based on existing ones. 11
LINEARITY AND NONLINEARITY
. gen z = 1/g
. reg e z
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 31
-----------+------------------------------ F( 1, 29) = 13.68
Model | 5.80515811 1 5.80515811 Prob > F = 0.0009
Residual | 12.3041069 29 .424279548 R-squared = 0.3206
-----------+------------------------------ Adj R-squared = 0.2971
Total | 18.109265 30 .603642167 Root MSE = .65137
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
z | -2.356137 .6369707 -3.70 0.001 -3.658888 -1.053385
_cons | 2.17537 .249479 8.72 0.000 1.665128 2.685612
----------------------------------------------------------------------------
12
LINEARITY AND NONLINEARITY
eˆ = 2.18 − 2.36 z
3 ------------------------
e | Coef.
-----------+------------
z | -2.356137
Employment growth rate
_cons | 2.17537
2
------------------------
0
0.0 0.2 0.4 0.6 0.8 1.0
z=1/g
-1
The figure shows the transformed data and the regression line for the
regression of e on z.
13
LINEARITY AND NONLINEARITY
2.36
eˆ = 2.18 − 2.36 z = 2.18 −
g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
-1
-2
2.36
eˆ = 2.18 − 2.36 z = 2.18 −
g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
-1
-2
k
Y = 1 + j X j + u
j =2
k
Yˆ = ˆ1 + ˆ j X j
j=2
1
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION
k
Y = 1 + j X j + u
j =2
k
Yˆ = ˆ1 + ˆ j X j
j=2
^
If Y2 is added to the regression specification, it should pick up quadratic and
interactive nonlinearity, if present, without necessarily being highly correlated
with any of the X variables. 3
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION
k
Y = 1 + j X j + u
j =2
k
Yˆ = ˆ1 + ˆ j X j
j=2
We will do this for a wage equation. Here is the output from a regression of
EARNINGS on S and EXP using EAWE Data Set 21. We save the fitted
values as FITTED and generate FITTEDSQ as the square.
5
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION
Definition:
The elasticity of Y with respect to
X is the proportional change in
Y per proportional change in X. A
Y
dY Y
elasticity =
dX X Y
dY dX X
= O0 X 52
Y X
This sequence defines elasticities and shows how to fit nonlinear models
with constant elasticities—first, the general definition of elasticity.
1
ELASTICITIES AND LOGARITHMIC MODELS
Definition:
The elasticity of Y with respect to
X is the proportional change in
Y per proportional change in X. A
Y
dY Y
elasticity =
dX X Y
dY dX X
= O0 X 52
Y X
slope of the tangent at A
=
slope of OA
Definition:
The elasticity of Y with respect to
X is the proportional change in
Y per proportional change in X. A
Y
dY Y
elasticity =
dX X Y
dY dX X
= O0 X 52
Y X
slope of the tangent at A
=
slope of OA
The elasticity at any point on the curve is the ratio of the slope of the
tangent at that point to the slope of the line joining the point to the origin.
3
ELASTICITIES AND LOGARITHMIC MODELS
Definition:
elasticity < 1
The elasticity of Y with respect to
X is the proportional change in
Y per proportional change in X. A
Y
dY Y
elasticity =
dX X Y
dY dX X
= O0 X 52
Y X
slope of the tangent at A
=
slope of OA
In this case, the tangent at A is clearly flatter than the line OA, so the
elasticity must be less than 1.
4
ELASTICITIES AND LOGARITHMIC MODELS
Definition:
elasticity > 1
The elasticity of Y with respect to
X is the proportional change in
Y per proportional change in X.
A
Y
dY Y
elasticity =
dX X
dY dX
= O0 X 52
Y X
slope of the tangent at A
=
slope of OA
In this case, the tangent at A is steeper than OA, and the elasticity is greater than 1.
5
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 + 2 X Y
dY dX
elasticity =
Y X A
slope of the tangent at A
=
slope of OA
2
=
( 1 + 2 X ) X
O Xx
2
=
( 1 X ) + 2
Y = 1 + 2 X Y
dY dX
elasticity =
Y X A
slope of the tangent at A
=
slope of OA
2
=
( 1 + 2 X ) X
O Xx
2
=
( 1 X ) + 2
7
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 + 2 X Y
dY dX
elasticity =
Y X A
slope of the tangent at A
=
slope of OA
2
=
( 1 + 2 X ) X
O Xx
2
=
( 1 X ) + 2
The tangent at any point is coincidental with the line itself, so its slope is
always b2 in this case. The elasticity depends on the slope of the line
joining the point to the origin. 8
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 + 2 X Y
dY dX
elasticity = B
Y X A
slope of the tangent at A
=
slope of OA
2
=
( 1 + 2 X ) X
O Xx
2
=
( 1 X ) + 2
Y = 1 X 2
However, a function of the type shown above has the same elasticity for all
values of X.
10
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 X 2
dY
= 1 2 X 2 −1
dX
Y = 1 X 2
dY
= 1 2 X 2 −1
dX
Y 1 X 2
= = 1 X 2 −1
X X
12
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 X 2
dY
= 1 2 X 2 −1
dX
Y 1 X 2
= = 1 X 2 −1
X X
d Y d X 1 2 X 2 −1
elasticity = = 2 −1 = 2
Y X 1 X
Y
Y = 1 X 2
2 = 0.25
By way of illustration, the function will be plotted for a range of values of 2.
We will start with a very low value, 0.25.
14
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 0.50
We will increase 2 in steps of 0.25 and see how the shape of the function
changes.
15
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 0.75
2 = 0.75.
16
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 1.00
When 2 equals 1, the curve becomes a straight line through the origin.
17
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 1.25
2 = 1.25.
18
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 1.50
2 = 1.50.
19
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 1.75
2 = 1.75. Note that the curvature can be quite gentle over wide ranges of X.
20
ELASTICITIES AND LOGARITHMIC MODELS
Y
Y = 1 X 2
2 = 1.75
This means that even if the true model is of the constant elasticity form, a
linear model may be a good approximation over a limited range.
21
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 X 2
log Y = log 1 X 2
= log 1 + log X 2
= log 1 + 2 log X
Y = 1 X 2
log Y = log 1 X 2
= log 1 + log X 2
= log 1 + 2 log X
You thus obtain a linear relationship between Y' and X', as defined. All serious
regression applications allow you to generate logarithmic variables from
existing ones. 23
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 X 2
log Y = log 1 X 2
= log 1 + log X 2
= log 1 + 2 log X
24
ELASTICITIES AND LOGARITHMIC MODELS
Y = 1 X 2
log Y = log 1 X 2
= log 1 + log X 2
= log 1 + 2 log X
The constant term will be an estimate of log 1. To obtain an estimate of 1,
calculate exp( ˆ1' ), where ̂ 1' is the estimate of 1' . (This assumes that you have
used natural logarithms, that is, logarithms to base e, to transform the model.)
25
ELASTICITIES AND LOGARITHMIC MODELS
FDHO
7000
6000
5000
4000
3000
2000
1000
0
0 10000 20000 30000 40000 50000 EXP
The regression implies that, at the margin, 6.3 cents out of each dollar of
expenditure is spent on food at home. Does this seem plausible? Probably,
though a little low. 28
ELASTICITIES AND LOGARITHMIC MODELS
It also suggests that $369 would be spent on food at home if total expenditure
were zero. This is impossible. It may be possible to interpret it as baseline
expenditure, but we must consider family size and composition.
29
ELASTICITIES AND LOGARITHMIC MODELS
FDHO
7000
6000
5000
4000
3000
2000
1000
0
0 10000 20000 30000 40000 50000 EXP
30
ELASTICITIES AND LOGARITHMIC MODELS
LGFDHO
10
1
6 7 8 9 10 11 LGEXP
We will now fit a constant elasticity function using the same data. The scatter
diagram shows the FDHO logarithm plotted against the EXP logarithm.
31
ELASTICITIES AND LOGARITHMIC MODELS
. g LGFDHO = ln(FDHO)
. g LGEXP = ln(EXP)
. reg LGFDHO LGEXP
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 6334
-----------+------------------------------ F( 1, 6332) = 4719.99
Model | 1642.9356 1 1642.9356 Prob > F = 0.0000
Residual | 2204.04385 6332 .348080204 R-squared = 0.4271
-----------+------------------------------ Adj R-squared = 0.4270
Total | 3846.97946 6333 .60744978 Root MSE = .58998
----------------------------------------------------------------------------
LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
LGEXP | .6657858 .0096909 68.70 0.000 .6467883 .6847832
_cons | .7009498 .0843607 8.31 0.000 .5355741 .8663254
----------------------------------------------------------------------------
Here is the result of regressing LGFDHO on LGEXP. The first two commands
generate the logarithmic variables.
32
ELASTICITIES AND LOGARITHMIC MODELS
. g LGFDHO = ln(FDHO)
. g LGEXP = ln(EXP)
. reg LGFDHO LGEXP
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 6334
-----------+------------------------------ F( 1, 6332) = 4719.99
Model | 1642.9356 1 1642.9356 Prob > F = 0.0000
Residual | 2204.04385 6332 .348080204 R-squared = 0.4271
-----------+------------------------------ Adj R-squared = 0.4270
Total | 3846.97946 6333 .60744978 Root MSE = .58998
----------------------------------------------------------------------------
LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
LGEXP | .6657858 .0096909 68.70 0.000 .6467883 .6847832
_cons | .7009498 .0843607 8.31 0.000 .5355741 .8663254
----------------------------------------------------------------------------
33
ELASTICITIES AND LOGARITHMIC MODELS
. g LGFDHO = ln(FDHO)
. g LGEXP = ln(EXP)
. reg LGFDHO LGEXP
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 6334
-----------+------------------------------ F( 1, 6332) = 4719.99
Model | 1642.9356 1 1642.9356 Prob > F = 0.0000
Residual | 2204.04385 6332 .348080204 R-squared = 0.4271
-----------+------------------------------ Adj R-squared = 0.4270
Total | 3846.97946 6333 .60744978 Root MSE = .58998
----------------------------------------------------------------------------
LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
LGEXP | .6657858 .0096909 68.70 0.000 .6467883 .6847832
_cons | .7009498 .0843607 8.31 0.000 .5355741 .8663254
----------------------------------------------------------------------------
. g LGFDHO = ln(FDHO)
. g LGEXP = ln(EXP)
. reg LGFDHO LGEXP
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 6334
-----------+------------------------------ F( 1, 6332) = 4719.99
Model | 1642.9356 1 1642.9356 Prob > F = 0.0000
Residual | 2204.04385 6332 .348080204 R-squared = 0.4271
-----------+------------------------------ Adj R-squared = 0.4270
Total | 3846.97946 6333 .60744978 Root MSE = .58998
----------------------------------------------------------------------------
LGFDHO | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
LGEXP | .6657858 .0096909 68.70 0.000 .6467883 .6847832
_cons | .7009498 .0843607 8.31 0.000 .5355741 .8663254
----------------------------------------------------------------------------
ˆ
LGFDHO ˆ = 2.02 EXP 0.666
= 0.701 + 0.666 LGEXP FDHO
LGFDHO
10
1
6 7 8 9 10 11 LGEXP
36
ELASTICITIES AND LOGARITHMIC MODELS
FDHO
7000
6000
5000
4000
3000
2000
1000
0
0 10000 20000 30000 40000 50000 EXP
Here is the regression line from the logarithmic regression plotted in the
original scatter diagram and the linear regression line for comparison.
37
ELASTICITIES AND LOGARITHMIC MODELS
FDHO
7000
6000
5000
4000
3000
2000
1000
0
0 10000 20000 30000 40000 50000 EXP
The logarithmic regression line gives a somewhat better fit, especially at low
expenditure levels.
38
ELASTICITIES AND LOGARITHMIC MODELS
FDHO
7000
6000
5000
4000
3000
2000
1000
0
0 10000 20000 30000 40000 50000 EXP
However, the difference in the fit is not dramatic. The main reason for
preferring the constant elasticity model is that it makes more sense
theoretically. It also has a technical advantage that we will discuss later
when we discuss heteroskedasticity. 39
Introduction to Econometrics
Chapter heading
SEMILOGARITHMIC MODELS
SEMILOGARITHMIC MODELS
Y = 1e 2 X
This sequence introduces the semilogarithmic model and shows how it may
be applied to an earnings function. The dependent variable is linear, but the
explanatory variables, multiplied by their coefficients, are exponents of e. 1
SEMILOGARITHMIC MODELS
Y = 1e 2 X
dY
= 1 2 e 2 X = 2Y
dX
2
SEMILOGARITHMIC MODELS
Y = 1e 2 X
dY
= 1 2 e 2 X = 2Y
dX
dY Y
= 2
dX
Hence, the proportional Y per unit change in X equals 2. It is, therefore,
independent of the value of X.
3
SEMILOGARITHMIC MODELS
Y = 1e 2 X
Y + Y = 1 e 2 ( X + X )
= 1 e 2 X e 2 X
= Ye 2 X
( 2 X )2
= Y 1 + 2 X + + ...
2
Strictly speaking, this interpretation is valid only for small values of 2.
When 2 is not small, the interpretation may be a little more complex.
4
SEMILOGARITHMIC MODELS
Y = 1e 2 X
Y + Y = 1 e 2 ( X + X )
= 1 e 2 X e 2 X
= Ye 2 X
( 2 X )2
= Y 1 + 2 X + + ...
2
Y = 1e 2 X
Y + Y = 1 e 2 ( X + X )
= 1 e 2 X e 2 X
= Ye 2 X
( 2 X )2
= Y 1 + 2 X + + ...
2
6
SEMILOGARITHMIC MODELS
Y = 1e 2 X
Y + Y = 1 e 2 ( X + X )
= 1 e 2 X e 2 X
= Ye 2 X
( 2 X )2
= Y 1 + 2 X + + ...
2
7
SEMILOGARITHMIC MODELS
Y = 1e 2 X
Y + Y = 1 e 2 ( X + X )
= 1 e 2 X e 2 X
= Ye 2 X
( 2 X )2
= Y 1 + 2 X + + ...
2
Z2 Z3
e = 1+ Z +
Z
+ + ...
2! 3!
Now expand the exponential term using the standard expression for e to
some power.
8
SEMILOGARITHMIC MODELS
Y = 1e 2 X
Y + Y = 1 e 2 ( X + X )
= 1 e 2 X e 2 X
= Ye 2 X
( 2 X )2
= Y 1 + 2 X + + ...
2
( 2 X )
2
Y = Y 2 X + + ...
2
9
SEMILOGARITHMIC MODELS
( 2 X )2
Y = Y 2 X + + ...
2
( 2 X )2 negligible
We now consider two cases: where 2 and X are so small that (2 X)2 is
negligible, and the alternative.
10
SEMILOGARITHMIC MODELS
( 2 X )2
Y = Y 2 X + + ...
2
( 2 X )2 negligible
Y = Y 2 X
Y / Y
= 2
X
( 2 X )2
Y = Y 2 X + + ...
2
( 2 X )2 not negligible
Y / Y 22 X
= 2 + + ...
X 2
22
= 2 + + ...
2
( 2 X )2
Y = Y 2 X + + ...
2
( 2 X )2 not negligible
Y / Y 22 X
= 2 + + ...
X 2
22
= 2 + + ... if X is one unit
2
Y = 1e 2 X
X = 0 Y = 1e 0 = 1
Y = 1e 2 X
log Y = log 1e 2 X
= log 1 + log e 2 X
= 1' + 2 X log e
= 1' + 2 X
To fit a function of this type, you take logarithms of both sides. The right side
of the equation becomes a linear function of X (note that the logarithm of e, to
base e, is 1). Hence we can fit the model with a linear regression, regressing
log Y on X. 15
SEMILOGARITHMIC MODELS
. reg LGEARN S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
Residual | 136.016938 498 .273126381 R-squared = 0.1087
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
----------------------------------------------------------------------------
Here is the regression output from a wage equation regression using Data Set.
The estimate of 2 is 0.066. As an approximation, this implies that an extra
year of schooling increases hourly earnings by a proportion of 0.066. 16
SEMILOGARITHMIC MODELS
. reg LGEARN S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
Residual | 136.016938 498 .273126381 R-squared = 0.1087
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
----------------------------------------------------------------------------
. reg LGEARN S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
Residual | 136.016938 498 .273126381 R-squared = 0.1087
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
----------------------------------------------------------------------------
( 2 X )
2
not negligible
Y / Y 2
( 0.066 ) 2
If X is one unit, = 2 + 2
+ ... = 0.066 + = 0.068
X 2 2
If we consider that a year of schooling is not a marginal change and work out
the effect exactly, the proportional increase is 0.068, and the percentage
increase is 6.8%. 18
SEMILOGARITHMIC MODELS
. reg LGEARN S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
Residual | 136.016938 498 .273126381 R-squared = 0.1087
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
----------------------------------------------------------------------------
( 2 X )
2
not negligible
Y / Y 2
( 0.066 ) 2
. reg LGEARN S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
Residual | 136.016938 498 .273126381 R-squared = 0.1087
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
----------------------------------------------------------------------------
( 2 X )
2
not negligible
Y / Y 2
( 0.066 ) 2
However, if a unit change in X is not small, the coefficient may be large, and the
second term might not be negligible. In the present case, a year of schooling is
not marginal, but even so, the refinement makes only a small difference.
20
SEMILOGARITHMIC MODELS
. reg LGEARN S
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 500
-----------+------------------------------ F( 1, 498) = 60.71
Model | 16.5822819 1 16.5822819 Prob > F = 0.0000
Residual | 136.016938 498 .273126381 R-squared = 0.1087
-----------+------------------------------ Adj R-squared = 0.1069
Total | 152.59922 499 .30581006 Root MSE = .52261
----------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
_cons | 1.83624 .1289384 14.24 0.000 1.58291 2.089571
----------------------------------------------------------------------------
( 2 X )
2
not negligible
Y / Y 2
( 0.066 ) 2
In general, when 2 is less than 0.1, working out the effect exactly can be of
little benefit.
21
SEMILOGARITHMIC MODELS
. reg LGEARN S
----------------------------------------------------------------------------
Source
Source | | SSSS dfdf MSMS Number
Number ofof obs
obs = = 540
500
-------------+------------------------------
-----------+------------------------------ F(F(1,1, 498)
538)
= = 60.71
140.05
Model
Model | |16.5822819
38.5643833 1 116.5822819
38.5643833 Prob
Prob > >
F F = =0.0000
0.0000
Residual
Residual | |136.016938
148.14326 498538.273126381
.275359219 R-squared
R-squared = =0.1087
0.2065
-------------+------------------------------
-----------+------------------------------ Adj
Adj R-squared
R-squared = =0.1069
0.2051
Total
Total | | 152.59922
186.707643 499539 .30581006
.34639637 Root
Root MSE
MSE = =.52261
.52475
----------------------------------------------------------------------------
------------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
S
_cons | | .1096934
1.83624 .0092691
.1289384 11.83
14.24 0.000
0.000 .0914853
1.58291 .1279014
2.089571
_cons | 1.292241 .1287252 10.04 0.000 1.039376 1.545107
----------------------------------------------------------------------------
------------------------------------------------------------------------------
The intercept in the regression is an estimate of log 1. From it, we obtain
an estimate of 1 equal to e1.836, which is 6.27.
22
SEMILOGARITHMIC MODELS
. reg LGEARN S
----------------------------------------------------------------------------
Source
Source | | SSSS dfdf MSMS Number
Number ofof obs
obs = = 540
500
-------------+------------------------------
-----------+------------------------------ F(F(1,1, 498)
538)
= = 60.71
140.05
Model
Model | |16.5822819
38.5643833 1 116.5822819
38.5643833 Prob
Prob > >
F F = =0.0000
0.0000
Residual
Residual | |136.016938
148.14326 498538.273126381
.275359219 R-squared
R-squared = =0.1087
0.2065
-------------+------------------------------
-----------+------------------------------ Adj
Adj R-squared
R-squared = =0.1069
0.2051
Total
Total | | 152.59922
186.707643 499539 .30581006
.34639637 Root
Root MSE
MSE = =.52261
.52475
----------------------------------------------------------------------------
------------------------------------------------------------------------------
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
S | .0664621 .0085297 7.79 0.000 .0497034 .0832207
S
_cons | | .1096934
1.83624 .0092691
.1289384 11.83
14.24 0.000
0.000 .0914853
1.58291 .1279014
2.089571
_cons | 1.292241 .1287252 10.04 0.000 1.039376 1.545107
----------------------------------------------------------------------------
------------------------------------------------------------------------------
This literally implies that a person with no schooling would earn $6.27 per hour.
However, it is dangerous to extrapolate so far from the range for which we have
data. 23
SEMILOGARITHMIC MODELS
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
24
SEMILOGARITHMIC MODELS
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
The fit of the regression lines is not much different, but the semilogarithmic
regression is more satisfactory in two respects.
26
SEMILOGARITHMIC MODELS
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
The linear specification predicts that hourly earnings will increase by a fixed
amount, $1.27, with each additional year of schooling. This is implausible for
high levels of education. The semi-logarithmic specification allows the
increment to increase with the level of education. 27
SEMILOGARITHMIC MODELS
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Second, the linear specification predicts very low earnings for an individual
with no schooling. The semilogarithmic specification predicts hourly
earnings of $6.27, which at least is not obvious nonsense. 28
SUMMARY OF THE DIFFERENT NONLINEAR REGRESSION MODELS
93
SUMMARY OF THE INTERPRETATION
COEFFICIENTS OF DIFFERENT NONLINEAR
REGRESSION MODELS
X Y
X X+1 MODEL Y = f(X) Y = f(X+1) CHANGE CHANGE
100 101 Y=3+5X 503 508 1 units 5 units
100(exp(b)-1)% =
100(e0.04-1)% = 4.08%
Introduction to Econometrics
Chapter heading
THE DISTURBANCE TERM IN
LOGARITHMIC MODELS
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
2
Y = 1 + +u
X
1
Z=
X
Y = 1 + 2 Z + u
Thus far, nothing has been said about the disturbance term in nonlinear
regression models.
1
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
2
Y = 1 + +u
X
1
Z=
X
Y = 1 + 2 Z + u
For the regression results in a linearized model to have the desired properties,
the disturbance term in the transformed model should be additive and satisfy
the regression model conditions. 2
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
2
Y = 1 + +u
X
1
Z=
X
Y = 1 + 2 Z + u
2
Y = 1 + +u
X
1
Z=
X
Y = 1 + 2 Z + u
In the case of the first example of a nonlinear model, there was no problem.
If the disturbance term had the required properties in the original model, it
would have them in the regression model. It has not been affected by the
transformation. 4
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
Y = 1 X 2 e u = 1 X 2 v
Y = 1 X 2 e u = 1 X 2 v
Y = 1 X 2 e u = 1 X 2 v
For this to be possible, the random component in the original model must be
a multiplicative term, eu.
7
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
Y = 1 X 2 e u = 1 X 2 v
8
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
Y = 1 X 2 e u = 1 X 2 v
Y = 1 X 2 e u = 1 X 2 v
f(v)
0.45
0.40 Y = 1 X 2 e u = 1 X 2 v
0.35
log Y = log 1 + 2 log X + u
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 v
16
f(v)
0.45
0.40 Y = 1 X 2 e u = 1 X 2 v
0.35
log Y = log 1 + 2 log X + u
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 v
16
12
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
f(v)
0.45
0.40 Y = 1 X 2 e u = 1 X 2 v
0.35
log Y = log 1 + 2 log X + u
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 v
16
13
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
f(v)
0.45
0.40 Y = 1e 2 X e u = 1e 2 X v
0.35
log Y = log 1 + 2 X + u
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 v
16
f(v)
0.45
0.40 Y = 1e 2 X e u = 1e 2 X v
0.35
log Y = log 1 + 2 X + u
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 v
16
Note that, with this distribution, one should expect a small proportion of
observations to be subject to large positive random effects.
15
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
120
100
Hourly earnings ($)
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Here is the scatter diagram for earnings and schooling using Data Set 21.
You can see that there are several outliers, with the three most extreme
highlighted. 16
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Years of schooling (highest grade completed)
Here is the scatter diagram for the semilogarithmic model with its regression
line. The same three observations remain outliers, but they do not appear to
be so extreme. 17
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
160
140
120
100
80
60
40
20
0–3 –2 –1 0 1 2 3
-2.75 to -2.25 -2.25 to 1.75 -1.75 to -1.25 -1.25 to -0.75 -0.75 to -0.25 -0.25 to 0.25 0.25 to 0.75 0.75 to 1.25 1.25 to 1.75 1.75 to 2.25 2.25 to 2.75
The histogram above compares the distributions of the residuals from the linear
and semi-logarithmic regressions. To make them comparable, the distributions
have been standardized, that is, scaled so that their standard deviation is equal
18
to 1.
THE DISTURBANCE TERM IN LOGARITHMIC MODELS
160
140
120
100
80
60
40
20
0–3 –2 –1 0 1 2 3
-2.75 to -2.25 -2.25 to 1.75 -1.75 to -1.25 -1.25 to -0.75 -0.75 to -0.25 -0.25 to 0.25 0.25 to 0.75 0.75 to 1.25 1.25 to 1.75 1.75 to 2.25 2.25 to 2.75
160
140
120
100
80
60
40
20
0–3 –2 –1 0 1 2 3
-2.75 to -2.25 -2.25 to 1.75 -1.75 to -1.25 -1.25 to -0.75 -0.75 to -0.25 -0.25 to 0.25 0.25 to 0.75 0.75 to 1.25 1.25 to 1.75 1.75 to 2.25 2.25 to 2.75
Y = 1 X 2 + u
Y = 1 X 2 + u
log Y = log ( 1 X 2 + u )
If this were the case, we could not linearize the model by taking logarithms.
There is no way of simplifying log ( 1 X + u ) . We should have to use some
2
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
1
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
However, the goodness of fit of models with linear and logarithmic versions of
the same dependent variable can be compared indirectly by subjecting the
dependent variable to the Box-Cox transformation and fitting the model
shown.
This is a family of specifications that depend on the parameter l. Like the other
parameters, l's determination is empirical.
The model's parameters are nonlinear, so a nonlinear regression method
should be used. In practice, maximum likelihood estimation is used.
5
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
Y l −1
= Y −1 when l =1
l
Y l −1
→ log Y when l →0
l
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
Y l −1
= Y −1 when l =1
l
Y l −1
→ log Y when l →0
l
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
Y l −1
= Y −1 when l =1
l
Y l −1
→ log Y when l →0
l
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
Y l −1
= Y −1 when l =1
l
Y l −1
→ log Y when l →0
l
So one could fit the general model and see whether l is close to 0 or close
to 1. Of course. 'close' has no meaning in econometrics. One should test
the hypotheses l = 0 and l = 1 to approach this issue technically. 11
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
Y l −1
= Y −1 when l =1
l
Y l −1
→ log Y when l →0
l
The outcome is that one is rejected and the other not rejected, but of course,
neither might be rejected, or both might be rejected, given your chosen
significance level. 12
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Box–Cox transformation:
Y l −1
= 1 + 2 X + u
l
Y l −1
= Y −1 when l =1
l
Y l −1
→ log Y when l →0
l
If you are interested only in comparing the fits of the linear and logarithmic
specifications, there is a short-cut procedure that involves only standard
least squares regressions. 13
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Y * = Y / geometric mean of Y
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Y * = Y / geometric mean of Y
Y * = 1' + 2' X + u
log Y * = 1' + 2' X + u
You now regress Y* and logeY*, leaving the right side of the equation
unchanged. (The parameters have been given prime marks to emphasize
that the coefficients will not be estimates of the original 1 and 2.) 15
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Y * = Y / geometric mean of Y
Y * = 1' + 2' X + u
log Y * = 1' + 2' X + u
The residual sums of squares are now directly comparable. Therefore, the
specification with the smaller RSS provides a better fit.
16
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
Y = 1 + 2 X + u
log Y = 1 + 2 X + u
Y * = Y / geometric mean of Y
Y * = 1' + 2' X + u
log Y * = 1' + 2' X + u
We will use the transformation to compare the fits of the linear and
semilogarithmic versions of a simple wage equation using EAWE Data Set 21.
17
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
The first step is to calculate the dependent variable's geometric mean. The
easiest way to do this is to take the exponential mean of the dependent
variable's log. 18
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
19
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
Now we use the rule that alog X is the same as log Xa.
20
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
And finally we use the fact that the exponential of the logarithm of X reduces to X.
21
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
. sum LGEARN
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
. sum LGEARN
1 1
log Yi log (Y1Y2 ...Yn )
e n
=e n
1 1
log ( )
=e Y1Y2 ...Yn n
= (Y1Y2 ...Yn ) n
. sum LGEARN
24
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
We run the parallel regression for LGEARNST. The residual sum of squares is
131.4. Thus we conclude that the semilogarithmic version gives a better fit.
26
BOX-COX TESTS
y = + x + u log y = + x + u
y* = y / geometric mean of y
n larger RSS
log e ≈ 𝜒 2 (1)
2 smaller RSS
LINEAR
ccrit
2
= 10.83, 1 d.f, 0.1% level
------------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/theta | .1088657 .05362 2.03 0.042 .0037726 .2139589
------------------------------------------------------------------------------
---------------------------------------------------------
Test Restricted LR statistic P-value
H0: log likelihood chi2 Prob > chi2 when 𝝀 → 𝟎
---------------------------------------------------------
theta = -1 -2025.7902 480.77 0.000 Y l −1
→ log Y
theta = 0 -1787.4901 4.17 0.041 l
theta = 1 -1912.8953 254.98 0.000
---------------------------------------------------------
Here is the output for the complete Box-Cox regression. The parameter that
we have denoted l (lambda) is called theta by Stata. It is estimated at 0.11.
Since it is closer to 0 than to 1, it indicates that the dependent variable should
be logarithmic rather than linear. 27
COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS
------------------------------------------------------------------------------
EARNINGS | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/theta | .1088657 .05362 2.03 0.042 .0037726 .2139589
------------------------------------------------------------------------------
---------------------------------------------------------
Test Restricted LR statistic P-value
H0: log likelihood chi2 Prob > chi2
---------------------------------------------------------
theta = -1 -2025.7902 480.77 0.000
theta = 0 -1787.4901 4.17 0.041
theta = 1 -1912.8953 254.98 0.000
---------------------------------------------------------
However, even the value of 0 does not (quite) lie in the 95 percent confidence
interval. (The log-likelihood tests will be explained in Chapter 10.)
28
Introduction to Econometrics
Chapter heading
QUADRATIC EXPLANATORY
VARIABLES
QUADRATIC EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 22 + u
1
QUADRATIC EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 22 + u
dY
= 2 + 2 3 X 2
dX 2
Differentiating the equation concerning X2, one obtains the change in Y per
unit change in X2.
Thus, the impact of a unit change in X2 on Y (b2 + 2b3X2) is a function of X2.
This means that 2 differs from the ordinary linear model, where a unit
change in X2 on Y is unqualified.
In this model, 2 should be interpreted as the effect of a unit change in X2 on
Y for the special case where X2 = 0. The marginal effect will be different for
nonzero values of X2.
3
QUADRATIC EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 22 + u
dY
= 2 + 2 3 X 2
dX 2
Y = 1 + ( 2 + 3 X 2 ) X 2 + u
Y = 1 + 2 X 2 + 3 X 22 + u
dY
= 2 + 2 3 X 2
dX 2
Y = 1 + ( 2 + 3 X 2 ) X 2 + u
Y = 1 + 2 X 2 + 3 X 22 + u
dY
= 2 + 2 3 X 2
dX 2
Y = 1 + ( 2 + 3 X 2 ) X 2 + u
If 3>0 then Y may have a minimum point. If 3<0 then Y may has a maximum point.
8
QUADRATIC EXPLANATORY VARIABLES
We will illustrate this with the earnings function. The table gives the output of
a quadratic regression of earnings on schooling (SSQ is defined as the square
of schooling). 9
QUADRATIC EXPLANATORY VARIABLES
80
60
40
20 quadratic
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The quadratic relationship is illustrated in the figure. Over the range of the
actual data, it fits the observations tolerably well. The fit is not dramatically
different from the linear and semilogarithmic specifications. 12
QUADRATIC EXPLANATORY VARIABLES
------------------------
120 EARNINGS | Coef.
-----------+------------
S | .1910651
100 SSQ | .0366817
_cons | 8.358401
------------------------
Hourly earnings ($)
80
60
40
20 quadratic
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The data on employment growth rate, e, and GDP growth rate, g, for 25
OECD countries in Exercise 1.5 provide another example of using a
quadratic function. 14
QUADRATIC EXPLANATORY VARIABLES
The output from a quadratic regression is shown. gsq has been defined as
the square of g.
15
QUADRATIC EXPLANATORY VARIABLES
quadratic
Employment growth rate
hyperbolic
1
0 ------------------------
0 1 2 3 4 5 6 7 e |8 Coef.
9
-----------+------------
g | .6616232
-1
gsq | -.0490589
_cons | -.2576489
------------------------
-2
quadratic
Employment growth rate
hyperbolic
1
0 ------------------------
0 1 2 3 4 5 6 7 e |8 Coef.
9
-----------+------------
g | .6616232
-1
gsq | -.0490589
_cons | -.2576489
------------------------
-2
The only defect is that it predicts that the fitted value of e starts to fall when
g exceeds 7.
17
QUADRATIC EXPLANATORY VARIABLES
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
The second reason follows from the first. Higher-order terms will improve fit,
but because these terms are not theoretically justified, the improvement will
be sample-specific. 20
QUADRATIC EXPLANATORY VARIABLES
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
Third, if the sample is very small, the fits of higher-order polynomials are likely
to be very different from those of a quadratic over the main part of the data
range. 21
QUADRATIC EXPLANATORY VARIABLES
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
The figure illustrates these points by comparing cubic and quartic regressions
with the quadratic regression. Over the main data range, from g = 1.5 to g = 5,
the cubic and quartic fits are very similar to those of the quadratic. 22
QUADRATIC EXPLANATORY VARIABLES
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
R2 for the quadratic specification is 0.334. For the cubic and quartic it is
0.345 and 0.355, relatively small improvements.
23
QUADRATIC EXPLANATORY VARIABLES
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
Further, the cubic and quartic curves both exhibit implausible characteristics.
24
QUADRATIC EXPLANATORY VARIABLES
3
quartic
Employment growth rate
quadratic
0
0 1 2 3 4 5 6 7 8 9
cubic
-1
GDP growth rate
As g increases, the slope of the cubic first diminishes and then increases.
There is no reasonable explanation. The quartic curve declines for g values
from 5 to 7 and then exhibits a strange upward twist at its end. 25
Introduction to Econometrics
Chapter heading
INTERACTIVE EXPLANATORY
VARIABLES
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
The model shown above is linear in parameters and may be fitted using
straightforward OLS, provided that the regression model assumptions are
satisfied. However, its nonlinearity in variables has implications for
interpreting the parameters.
When multiple regression was introduced at the beginning of the previous
chapter, it was stated that the slope coefficients represented the variables'
separate, individual marginal effects on Y, holding the other variables
constant.
In this model, such an interpretation is not possible. In particular, it is
impossible to interpret b2 as the effect of X2 on Y, holding X3 and X2X3
constant, because it is impossible to hold both X3 and X2X3 constant if X2
changes.
1
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
To make a proper interpretation of the coefficients, we can rewrite the model as shown. The
coefficient of X2, (2 + 4X3), can now be interpreted as the marginal effect of X2 on Y,
conditional on the value of X3.
4
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
Y = 1 + 2 X 2 + ( 3 + 4 X 2 ) X 3 + u
One may rewrite the model and the third line. From this, it may be seen that the
marginal effect of X3 on Y, conditional on the value of X2, is (b3 + b4X2) and that
b3 may be interpreted as the marginal effect of X3 on Y when X2 is equal to zero.
6
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
Y = 1 + 2 X 2 + ( 3 + 4 X 2 ) X 3 + u
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
Y = 1 + 2 X 2 + ( 3 + 4 X 2 ) X 3 + u
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
Y = 1 + 2 X 2 + ( 3 + 4 X 2 ) X 3 + u
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1 + ( 2 + 4 X 3 ) X 2 + 3 X 3 + u
Y = 1 + 2 X 2 + ( 3 + 4 X 2 ) X 3 + u
This can make it difficult to compare the estimates of the effects of X2 and X3
on Y in models excluding and including the interactive term.
10
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
X 2* = X 2 − X 2 X 3* = X 3 − X 3
X 2 = X 2* + X 2 X 3 = X 3* + X 3
One way of mitigating the problem is to rescale X2 and X3 so that they are
measured from their sample means.
11
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
X 2* = X 2 − X 2 X 3* = X 3 − X 3
X 2 = X 2* + X 2 X 3 = X 3* + X 3
Y = 1 + 2 ( X 2* + X 2 ) + 3 ( X 3* + X 3 ) + 4 ( X 2* + X 2 )( X 3* + X 3 ) + u
1* = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 2* = 2 + 4 X 3
3* = 3 + 4 X 2
Y = 1* + 2* X 2* + 3* X 3* + 4 X 2* X 3* + u
Substituting for X2 and X3, the model is as shown, with new parameters
defined in terms of the original ones.
12
INTERACTIVE EXPLANATORY VARIABLES
Y = 1 + 2 X 2 + 3 X 3 + 4 X 2 X 3 + u
Y = 1* + ( 2* + 4 X 3* )X 2* + 3* X 3* + u
Y = 1* + 2* X 2* + ( 3* + 4 X 2* )X 3* + u
The point of doing this is that the coefficients of X2 and X3 now give the
marginal effects of the variables when the other variable is held at its sample
mean, which is, to some extent, a representative value. 13
INTERACTIVE EXPLANATORY VARIABLES
Y = 1* + 2* X 2* + 3* X 3* + 4 X 2* X 3* + u
Y = 1* + ( 2* + 4 X 3* )X 2* + 3* X 3* + u
Y = 1* + 2* X 2* + ( 3* + 4 X 2* )X 3* + u
X 3* = 0 X 3 = X 3
For example, it can be seen that 2* gives the marginal effect of X2*, and
hence X2, when X3* = 0, that is, when X3 is at its sample mean.
14
INTERACTIVE EXPLANATORY VARIABLES
Y = 1* + 2* X 2* + 3* X 3* + 4 X 2* X 3* + u
Y = 1* + ( 2* + 4 X 3* )X 2* + 3* X 3* + u
Y = 1* + 2* X 2* + ( 3* + 4 X 2* )X 3* + u
X 2* = 0 X 2 = X 2
15
INTERACTIVE EXPLANATORY VARIABLES
We will illustrate the analysis with a wage equation in which the logarithm of
hourly earnings is regressed on years of schooling and work experience. We
start with a straightforward linear specification using EAWE Data Set 21. 16
INTERACTIVE EXPLANATORY VARIABLES
The interactive variable SEXP is defined as the product of S and EXP, and
the regression is performed again, including this term.
18
INTERACTIVE EXPLANATORY VARIABLES
The schooling coefficient has fallen. It has now changed its meaning. It now
estimates the impact of an extra year of schooling for individuals without
work experience. 19
INTERACTIVE EXPLANATORY VARIABLES
The experience coefficient has fallen sharply, and its meaning has also
changed. Now, it refers to individuals with no schooling, and every individual
in the sample had at least 8 years. 20
INTERACTIVE EXPLANATORY VARIABLES
The SEXP coefficient indicates that the schooling coefficient falls by 0.0012,
0.12 percent, for every additional year of work experience. Equally, it suggests
that the experience coefficient falls by 0.12 percent for every extra year of
schooling. 21
INTERACTIVE EXPLANATORY VARIABLES
. sum S EXP
. gen S1 = S - 14.866
. gen EXP1 = EXP - 6.445
. gen SEXP1 = S1*EXP1
Here is the regression without the interactive term. The top half of the
output is identical to that when LGEARN was regressed on S and EXP. What
differences do you expect in the bottom half? 23
INTERACTIVE EXPLANATORY VARIABLES
The slope coefficients (and their standard errors and t statistics) are precisely
the same as before. Only the intercept has been changed by subtracting the
means from S and EXP. 24
INTERACTIVE EXPLANATORY VARIABLES
Here is the output from the regression using S and EXP, with means extracted
and their interactive term. The top half of the output is identical to that when
LGEARN was regressed on S, EXP, and SEXP. 27
INTERACTIVE EXPLANATORY VARIABLES
However, the bottom half is different. The coefficients of S1 and EXP1 measure
the effects of those variables for the mean value of the other variable, that is, for
a ‘typical’ individual. The coefficients of S and EXP measure their effects when
the other variable is zero. 28
INTERACTIVE EXPLANATORY VARIABLES
29
INTERACTIVE EXPLANATORY VARIABLES
As before, it measures the change in the schooling coefficient per unit (one
year) change in experience and is unaffected by the extraction of the means.
It also measures the change in the experience coefficient per unit change in
schooling. 30
INTERACTIVE EXPLANATORY VARIABLES
With the means-extracted variables, we can see more clearly the impact of
including the interactive term.
31
INTERACTIVE EXPLANATORY VARIABLES
Here, again, are the corresponding results with the original variables for
comparison, where the introduction of the interactive term appears to have a
much larger effect. 33
Introduction to Econometrics
Chapter heading
NONLINEAR REGRESSION
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
1
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
4
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
Y = 1 + 2 X 3 + u
You calculate the residual for each observation in the sample, and hence
RSS, the sum of the squares of the residuals.
6
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
You then make small changes in one or more of your estimates of the
parameters.
7
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
Using the new estimates of 1, 2, and 3, you recalculate the fitted values of Y.
Then, you recalculate the residuals and RSS.
8
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
If RSS is smaller than before, your new estimates of the parameters are better
than the old ones, and you continue adjusting your estimates in the same
direction. Otherwise, you would try different adjustments. 9
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
You repeat steps 5, 6, and 7 repeatedly until you cannot make any changes
in the estimates of the parameters that would reduce RSS.
10
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
You conclude that you have minimized RSS, and you can describe the final
estimates of the parameters as the least squares estimates.
11
NONLINEAR REGRESSION
Y = 1 + 2 X 3 + u
2
e = 1 + +u
g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
GDP growth rate
In the first slideshow for this chapter, we will return to the relationship
between employment growth rate, e, and GDP growth rate, g. e and g are
hypothesized to be related, as shown. 13
NONLINEAR REGRESSION
2
e = 1 + +u
g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
GDP growth rate
400
Conditional on ˆ1 = 3
RSS
300
200
-7 -6 -5 –4.22 -4 -3 -2 -1 0
estimate of 2
200
100
0
1 2 2.82 3 4
estimate of 1
b1
Next, holding ˆ2 at –4.22, we look to improve our guess ̂ 1 . The figure shows
RSS as a function of ̂ 1, conditional on ˆ2 = –4.22. We see that the optimal
value of b1 is 2.82. 16
NONLINEAR REGRESSION
200
100
0
1 2 2.82 3 4
estimate of 1
b1
200
100
0
1 2 2.82 3 4
estimate of 1
b1
The limits must be the values from the transformed linear regression shown
in the first slideshow for this chapter: ˆ1 = 2.18 and ˆ2 = –2.36. The same
criterion, the minimization of RSS, has determined them. All we have done is
18
use a different method.
NONLINEAR REGRESSION
2
e = 1 +
. nl (e = {beta1} + {beta2}/g)
(obs = 31) +u
Iteration 0: residual SS = 12.30411
g
Iteration 1: residual SS = 12.30411
----------------------------------------------------------------------------
Source | SS df MS
-----------+------------------------------ Number of obs = 31
Model | 5.80515805 1 5.80515805 R-squared = 0.3206
Residual | 12.304107 29 .42427955 Adj R-squared = 0.2971
-----------+------------------------------ Root MSE = .6513674
Total | 18.109265 30 .603642167 Res. dev. = 59.32851
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
/beta1 | 2.17537 .249479 8.72 0.000 1.665128 2.685612
/beta2 | -2.356136 .6369707 -3.70 0.001 -3.658888 -1.053385
----------------------------------------------------------------------------
2
e = 1 +
. nl (e = {beta1} + {beta2}/g)
(obs = 31) +u
Iteration 0: residual SS = 12.30411
g
Iteration 1: residual SS = 12.30411
----------------------------------------------------------------------------
Source | SS df MS
-----------+------------------------------ Number of obs = 31
Model | 5.80515805 1 5.80515805 R-squared = 0.3206
Residual | 12.304107 29 .42427955 Adj R-squared = 0.2971
-----------+------------------------------ Root MSE = .6513674
Total | 18.109265 30 .603642167 Res. dev. = 59.32851
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
/beta1 | 2.17537 .249479 8.72 0.000 1.665128 2.685612
/beta2 | -2.356136 .6369707 -3.70 0.001 -3.658888 -1.053385
----------------------------------------------------------------------------
2
e = 1 +
. gen z = 1/g
. reg e z
+u
g
----------------------------------------------------------------------------
Source | SS df MS Number of obs = 31
-----------+------------------------------ F( 1, 29) = 13.68
Model | 5.80515811 1 5.80515811 Prob > F = 0.0009
Residual | 12.3041069 29 .424279548 R-squared = 0.3206
-----------+------------------------------ Adj R-squared = 0.2971
Total | 18.109265 30 .603642167 Root MSE = .65137
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
z | -2.356137 .6369707 -3.70 0.001 -3.658888 -1.053385
_cons | 2.17537 .249479 8.72 0.000 1.665128 2.685612
----------------------------------------------------------------------------
2.36
eˆ = 2.18 − 2.36 z = 2.18 −
g
The output is effectively the same as the linear regression output in the first
slideshow for this chapter.
21
NONLINEAR REGRESSION
2
4 e = 1 + +u
g
3
Employment growth rate
0
0 1 2 3 4 5 6 7 8 9
-1
-2
-3
GDP growth rate
The hyperbolic function imposes the constraint that the function plunges to
minus infinity for positive g as g approaches zero.
22
NONLINEAR REGRESSION
2
e = 1 +
. nl (e = {beta1} + {beta2}/({beta3} + g))
(obs = 31) +u
Iteration 0: residual SS = 12.30411
3 + g
Iteration 1: residual SS = 12.27327
.....................................
Iteration 8: residual SS = 11.98063
----------------------------------------------------------------------------
Source | SS df MS
-----------+------------------------------ Number of obs = 31
Model | 6.12863996 2 3.06431998 R-squared = 0.3384
Residual | 11.9806251 28 .427879466 Adj R-squared = 0.2912
-----------+------------------------------ Root MSE = .654125
Total | 18.109265 30 .603642167 Res. dev. = 58.5026
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
/beta1 | 2.714411 1.017058 2.67 0.013 .6310616 4.79776
/beta2 | -6.140415 8.770209 -0.70 0.490 -24.10537 11.82454
/beta3 | 1.404714 2.889556 0.49 0.631 -4.514274 7.323702
----------------------------------------------------------------------------
This feature can be relaxed by using the variation shown. Unlike the
previous function, this cannot be linearized by any transformation. Here,
nonlinear regression must be used. 23
NONLINEAR REGRESSION
2
e = 1 +
. nl (e = {beta1} + {beta2}/({beta3} + g))
(obs = 31) +u
Iteration 0: residual SS = 12.30411
3 + g
Iteration 1: residual SS = 12.27327
.....................................
Iteration 8: residual SS = 11.98063
----------------------------------------------------------------------------
Source | SS df MS
-----------+------------------------------ Number of obs = 31
Model | 6.12863996 2 3.06431998 R-squared = 0.3384
Residual | 11.9806251 28 .427879466 Adj R-squared = 0.2912
-----------+------------------------------ Root MSE = .654125
Total | 18.109265 30 .603642167 Res. dev. = 58.5026
----------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
/beta1 | 2.714411 1.017058 2.67 0.013 .6310616 4.79776
/beta2 | -6.140415 8.770209 -0.70 0.490 -24.10537 11.82454
/beta3 | 1.404714 2.889556 0.49 0.631 -4.514274 7.323702
----------------------------------------------------------------------------
The output for this specification is shown, with most of the iteration
messages deleted.
24
NONLINEAR REGRESSION
2
4 e = 1 + +u
3 + g
3
(4.47)
Employment growth rate
(4.46)
1
0
0 1 2 3 4 5 6 7 8 9
-1
-2
-3
GDP growth rate
The figure compares the original (black) and new (red) hyperbolic functions.
The overall fit is not significantly improved, but the specification does seem
more satisfactory. 25