0% found this document useful (0 votes)
2 views30 pages

EC501 Lecture 04

The document discusses econometric methods focusing on comparing regression models and limited dependent variables (LDVs). It outlines the importance of model specification, the consequences of omitted and irrelevant variables, and various methods for selecting regressors and comparing models, including statistical tests and information criteria. Additionally, it introduces binary choice models and latent variable models, emphasizing the need for appropriate modeling techniques when dealing with non-continuous dependent variables.

Uploaded by

T T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views30 pages

EC501 Lecture 04

The document discusses econometric methods focusing on comparing regression models and limited dependent variables (LDVs). It outlines the importance of model specification, the consequences of omitted and irrelevant variables, and various methods for selecting regressors and comparing models, including statistical tests and information criteria. Additionally, it introduces binary choice models and latent variable models, emphasizing the need for appropriate modeling techniques when dealing with non-continuous dependent variables.

Uploaded by

T T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

EC501 Econometric Methods

4. Comparing Regression Models,


and Limited Dependent Variables

Marcus Chambers

Department of Economics
University of Essex

02 November 2023

1 / 30
Outline

Review

Selecting regressors

Non-nested models

Testing the functional form

Limited dependent variables (LDVs)

Binary choice and latent variable models

An example

Reference: Verbeek, chapters 3 and 7.

2 / 30
Review
Our focus so far has been estimation and inference in the linear
regression model, y = Xβ + ϵ.
The ordinary least squares (OLS) estimator is:
• BLUE under strong assumptions; and,
• consistent under weaker ones.
One of these assumptions is that the model is correctly
specified.
This raises the question of how to select regressors and
compare models, and maybe how to test for the correct
functional form.
We have also implicitly assumed that the dependent variable is
continuous.
But sometimes this is not the case and new approaches are
needed to deal with limited dependent variables (LDVs).
3 / 30
Comparing models
Last week we considered the two models

y = X1 β1 + ϵ, (1)

y = X1 β1 + X2 β2 + ϵ, (2)

y:N×1 X1 : N × (K − J) β1 : (K − J) × 1
ϵ:N×1 X2 : N × J β2 : J × 1
We noted that (1) is obtained from (2) by setting the J elements
of the vector β2 equal to zero – (1) is nested within (2).
The J linear restrictions β2 = 0 can be tested using an F-test.
But suppose we estimate (1) when (2) is true – what are the
properties of the OLS estimator?
This is a case of omitted variables (those in X2 ).

4 / 30
OLS with omitted variables
We know that, in the regression (1), the OLS estimator is

b1 = (X1′ X1 )−1 X1′ y

= (X1′ X1 )−1 X1′ (X1 β1 + X2 β2 + ϵ)

= β1 + (X1′ X1 )−1 X1′ X2 β2 + (X1′ X1 )−1 X1′ ϵ.

Making our ‘usual’ assumptions we find that

E{b1 } = β1 + E (X1′ X1 )−1 X1′ X2 β2



| {z }
Assume non-zero
i.e. the OLS estimator is biased.
This is the omitted variable bias.
It is therefore important not to omit any relevant variables.

5 / 30
OLS with irrelevant variables

Suppose that, instead, we estimate (2) when (1) is true – we


are including irrelevant variables (those in X2 ).
It can be shown that OLS is an unbiased estimator of β1 and β2
(whose true value is 0).
However, despite OLS being unbiased, it is still better to
estimate (1) rather than (2).
This is because the variance of the estimator will be smaller in
the correctly-specified regression (1).
To summarise: omitting relevant variables leads to a biased
estimator, while including irrelevant variables leads to an
increased variance.
This raises the question: how should we select regressors?

6 / 30
Selecting regressors

Typically we will have a set of potentially relevant regressors


suggested by economic theory.
The general-to-specific modelling strategy starts with a general
unrestricted model (GUM) containing all possible regressors.
The aim is to reduce the GUM in size and complexity by testing
appropriate restrictions.
If a sufficiently general model is capable of describing reality
then a more parsimonious model is an improvement if it can
convey the same information in a simpler form.
The reverse approach – specific-to-general – is not to be
recommended.

7 / 30
Comparing models

In addition to statistical tests we can compare models by other


statistical criteria.
We have already seen one statistic that can be used:
N
1 X 2
ei
N−K
i=1
R̄2 = 1 − N
;
1 X
2
(yi − ȳ)
N−1
i=1

this
PN provides a trade-off between the fit (as measured by
e 2 ) and the parsimony of the model (as measured by K).
i=1 i
We might select the model with the largest R̄2 value.

8 / 30
Information criteria
Other model comparison methods include information criteria.
These, too, provide a trade-off between model fit and
parsimony.
Examples include Akaike’s Information Criterion (AIC):
N
1 X 2 2K
AIC = log ei + ,
N N
i=1

and the Schwarz Bayesian Information Criterion (BIC):


N
1X 2 K
BIC = log ei + log N.
N N
i=1

Models with smaller AIC or BIC are usually preferred.


Note: the dependent variable must be the same in all models.
9 / 30
Non-nested models

In the above we have assumed that the models are nested


i.e. that one can be obtained from the other by imposing
(typically linear) restrictions.
Sometimes this might not be the case, so how can we choose
between them?
Consider the following two models:

Model A: yi = xi′ β + w′i α + ϵi ,

Model B: yi = xi′ β + z′i γ + ui .

The models have different sets of regressors (although xi is


common to both) and no restrictions on either model will yield
the other – they are non-nested.

10 / 30
Non-nested F-test

The validity of Model A can be assessed in the regression

yi = xi′ β + w′i α + z′i δB + ϵi

by testing δB = 0; if this is true we obtain Model A.


Similarly, the validity of Model B can be assessed in the
regression
yi = xi′ β + z′i γ + w′i δA + ui
by testing δA = 0; if this is true we obtain Model B.
These are both straightforward F-tests in the encompassing
regressions.

11 / 30
J-test

An alternative approach is the J-test based on the artificial


nesting model

yi = xi′ β + (1 − δ)w′i α + δz′i γ + vi .

If δ = 0 we obtain Model A; if δ = 1 we obtain Model B.


However, we can’t separately identify δ, α and γ in this
regression.
A solution is to first estimate Model B, obtaining γ̂, and then
estimate
yi = xi′ β + w′i α∗ + δz′i γ̂ + vi
where α∗ = (1 − δ)α.
The null hypothesis δ = 0 can be tested using a t-test.

12 / 30
Logarithms versus levels
The non-nested approach can also be used to distinguish
between logarithmic and levels regressions.
Consider the two models:

Model LIN: yi = xi′ β + ϵi , fitted value ŷi ;

Model LOG: log yi = xi′ γ + ui , fitted value log ỹi .

Model LIN is valid if δLIN = 0 in

yi = xi′ β + δLIN log ŷi − log ỹi + ϵi .




Similarly, model LOG is valid if δLOG = 0 in

log yi = xi′ γ + δLOG ŷi − exp(log ỹi ) + ui .




These are both simple t-tests.

13 / 30
Testing the functional form

A linear regression model is a model of the conditional mean


which assumes that E{yi |xi } = xi′ β.
The Ramsey RESET test is based on the idea that, if the
conditional mean is correctly specified, then non-linear
functions of ŷi = xi′ b should not help in explaining yi .
It is based on the auxiliary regression

yi = xi′ β + α2 ŷ2i + α3 ŷ3i + . . . + αQ ŷQ


i + vi .

An F-test can be used to test the Q − 1 restrictions

H0 : α2 = α3 = . . . = αQ = 0.

Usually the test is performed with Q = 2 or Q = 3.

14 / 30
Non-linear models
If the RESET test rejects E{yi |xi } = xi′ β then what can we do?
One response would be to estimate a non-linear specification
for the conditional mean of the form

E{yi |xi } = g(xi , β)

for some (known) function g(·).


An example with a scalar xi would be g(xi , β) = β1 + β2 xiβ3 .
We could estimate the parameters using non-linear least
squares obtained by minimising
N
X 2
S(β̃) = yi − g(xi , β̃) .
i=1

This is, of course, more complicated than OLS!

15 / 30
Limited dependent variables (LDVs)

Many (dependent) variables are treated as being continuous


e.g. GDP, exports, log(wages) etc.
But some variables are not, such as:
• employment status (employed or unemployed, a binary
variable);
• home ownership (home owner or not, also binary).
In such cases the linear regression model is generally not
appropriate.
This usually arises with microeconomic data but can also apply
to certain time series.

16 / 30
Motivating example

Suppose we wish to model family home ownership in terms of


income.
A sample of N households is available, and for each we
observe their income, xi2 .
The observed dependent variable is of the form:
(
1 if household i owns their home;
yi =
0 if household i doesn’t own their home.

Now consider the linear regression model

yi = β1 + β2 xi2 + ϵi = xi′ β + ϵi , i = 1, . . . , N,

where xi = (1, xi2 )′ and ϵi is the regression disturbance.

17 / 30
Rogue probabilities!

If E{ϵi |xi } = 0 then E{yi |xi } = xi′ β.


However, because yi is binary, using (S1) we obtain

E{yi |xi } = 0 · P{yi = 0|xi } + 1 · P{yi = 1|xi }

= P{yi = 1|xi }.

But E{yi |xi } = xi′ β which implies P{yi = 1|xi } = xi′ β in this
model.
We know that a probability must lie between 0 and 1 and xi′ β is
not restricted to have this property in this model.
Hence predicted probabilities from the model could therefore be
negative or greater than one!

18 / 30
Values of ϵi

Another issue is that ϵi only has two possible outcomes


because

P{yi = 1|xi } = P{ϵi = 1 − xi′ β|xi } = xi′ β,

P{yi = 0|xi } = P{ϵi = −xi′ β|xi } = 1 − xi′ β.


In addition the variance can be shown to be

V{ϵi |xi } = xi′ β(1 − xi′ β)

which is not constant and also depends on β.


We shall examine various ways to handle these issues that
arise in such data sets.

19 / 30
Binary choice model
A binary choice model is of the form

P{yi = 1|xi } = F(xi′ β), i = 1, . . . , N.

The linear combination xi′ β is transformed by the function F(·) to


ensure that the probability lies between 0 and 1.
A natural candidate for F(·) is a distribution function.
If W is a continuous random variable then its (cumulative)
distribution function satisfies

0 ≤ F(w) = P{W ≤ w} ≤ 1, −∞ < w < ∞.

F(·) is a non-decreasing function that is related to the


probability density function, f (·), as follows:
Z w
F(w) = f (z)dz, −∞ < w < ∞.
−∞

20 / 30
Common choices of F(·)
Some common choices of F(·) result in the following models:
Probit model (standard normal distribution function):
Z w  
1 1
F(w) = Φ(w) = √ exp − t2 dt;
−∞ 2π 2

Logit model (logistic distribution function):


ew
F(w) = Λ(w) = ;
1 + ew
Linear probability model (uniform distribution function):


 0, w < 0;

F(w) = w, 0 ≤ w ≤ 1;


 1, w > 1.

21 / 30
The distribution functions

The Probit, Logit and linear probability distribution functions are


depicted below:

Probit
Logit
0.8 Linear probability

0.6
F(w)

0.4

0.2

0
-5 -2.5 0 2.5 5
w

The Probit and Logit curves intersect at w = 0 where, for both,


F(0) = 0.5.

22 / 30
Parameter interpretation

Interpretation of the parameters can be tricky.


One approach is to consider the marginal effects of changes in
a (continuous) explanatory variable, say xik .
Let ϕ(·) denote the pdf of a standard normal variable (S15).
For the models we have defined earlier we obtain
∂Φ(xi′ β)
Probit: = ϕ(xi′ β)βk ;
∂xik

∂Λ(xi′ β) e xi β
Logit: = β;
′ 2 k
∂xik 1 + exi β
∂xi′ β
Linear: = βk (or 0).
∂xik
It is more complicated when interactions are present e.g. xi2 xi3 .

23 / 30
Latent variables
Sometimes, however, the dependent variable of interest is not
actually observed, or latent.
For example, suppose the utility difference (y∗i ) between having
a job and not having one is a function of (observable)
characteristics (xi ).
The individual chooses to work if y∗i > 0.
The model might be of the form

y∗i = xi′ β + ϵi , i = 1, . . . , N.

But what we actually observe is employment status:

1 if y∗i > 0 (employed);


(
yi =
0 if y∗i ≤ 0 (unemployed).

24 / 30
A binary choice representation
Let F(·) denote the cdf of ϵi . Then

P(yi = 1) = P(y∗i > 0) = P(xi′ β + ϵi > 0)

= P(ϵi > −xi′ β)

= 1 − F(−xi′ β).

Note: Verbeek, equation (7.9), expresses this probability in


terms of the distribution function of −ϵi .
If ϵi ∼ N(0, 1) we have a probit model.
If ϵi is logistic we have a logit model.
Writing the latent variable model in this way yields a binary
choice model.
Such models are usually estimated using maximum likelihood
(which we will cover later in the module).
25 / 30
Example: labour force participation

Let’s compare the binary choice models using a data set on


labour force participation by married women (N = 753).
The dependent variable (y) is defined as follows:
(
1 if individual i is employed,
yi =
0 otherwise.

The independent variables (in the vector x) are education,


experience, experience-squared and age, all in years.
We shall estimate the following models:
Linear probability model: P(yi = 1|xi ) = xi′ β;
Probit model: P(yi = 1|xi ) = Φ(xi′ β);
Logit model: P(yi = 1|xi ) = Λ(xi′ β).

26 / 30
Linear probability model
The results obtained from estimating the linear probability
model by OLS in R are:
> olsfit <- lm(inlf~educ+exper+expersq+age,data=mroz)
> summary(olsfit)

Call:
lm(formula = inlf ~ educ + exper + expersq + age, data = mroz)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3273246 0.1374985 2.381 0.017535 *
educ 0.0275965 0.0072596 3.801 0.000156 ***
exper 0.0450286 0.0058559 7.689 4.66e-14 ***
expersq -0.0007236 0.0001922 -3.765 0.000180 ***
age -0.0105285 0.0022045 -4.776 2.15e-06 ***
---

Residual standard error: 0.4463 on 748 degrees of freedom


Multiple R-squared: 0.1936,Adjusted R-squared: 0.1893
F-statistic: 44.9 on 4 and 748 DF, p-value: < 2.2e-16

All variables are statistically significant.


An extra year of education increases the employment
probability by 0.0276.

27 / 30
Probit model

The results from estimating a Probit model are:


> probmod <- glm(inlf~educ+exper+expersq+age,
family = binomial(link = "probit"),data=mroz)
> summary(probmod)

Call:
glm(formula = inlf ~ educ + exper + expersq + age,
family = binomial(link = "probit"),data = mroz)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5209982 0.4130301 -1.261 0.207163
educ 0.0855908 0.0225252 3.800 0.000145 ***
exper 0.1287285 0.0180941 7.114 1.12e-12 ***
expersq -0.0020214 0.0005872 -3.443 0.000576 ***
age -0.0316601 0.0067338 -4.702 2.58e-06 ***
---

All variables (excluding the intercept) are statistically significant.


Recall that, in this model, the coefficients aren’t marginal
effects as they are in the usual regression framework.

28 / 30
Logit model
The results from estimating a Logit model are:
> logmod <- glm(inlf~educ+exper+expersq+age,
family = binomial(link = "logit"),data=mroz)
> summary(logmod)

Call:
glm(formula = inlf ~ educ + exper + expersq + age,
family = binomial(link = "logit"),data = mroz)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8824469 0.6863688 -1.286 0.198557
educ 0.1441256 0.0379671 3.796 0.000147 ***
exper 0.2114275 0.0306252 6.904 5.07e-12 ***
expersq -0.0033137 0.0009879 -3.354 0.000796 ***
age -0.0524305 0.0112967 -4.641 3.46e-06 ***
---

All variables (excluding the intercept) are again statistically


significant.
Also, in this model, the coefficients don’t have their usual
interpretation.

29 / 30
Summary

• comparing regression models


• non-nested tests
• test of functional form
• limited dependent variables
• binary choice

• Next week:
• heteroskedasticity

30 / 30

You might also like