0% found this document useful (0 votes)

2 views30 pages

EC501 Lecture 04

The document discusses econometric methods focusing on comparing regression models and limited dependent variables (LDVs). It outlines the importance of model specification, the consequences of omitted and irrelevant variables, and various methods for selecting regressors and comparing models, including statistical tests and information criteria. Additionally, it introduces binary choice models and latent variable models, emphasizing the need for appropriate modeling techniques when dealing with non-continuous dependent variables.

Uploaded by

T T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views30 pages

EC501 Lecture 04

Uploaded by

T T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

EC501 Econometric Methods

4. Comparing Regression Models,

and Limited Dependent Variables

Marcus Chambers

Department of Economics
University of Essex

02 November 2023

1 / 30
Outline

Review

Selecting regressors

Non-nested models

Testing the functional form

Limited dependent variables (LDVs)

Binary choice and latent variable models

An example

Reference: Verbeek, chapters 3 and 7.

2 / 30
Review
Our focus so far has been estimation and inference in the linear
regression model, y = Xβ + ϵ.
The ordinary least squares (OLS) estimator is:
• BLUE under strong assumptions; and,
• consistent under weaker ones.
One of these assumptions is that the model is correctly
specified.
This raises the question of how to select regressors and
compare models, and maybe how to test for the correct
functional form.
We have also implicitly assumed that the dependent variable is
continuous.
But sometimes this is not the case and new approaches are
needed to deal with limited dependent variables (LDVs).
3 / 30
Comparing models
Last week we considered the two models

y = X1 β1 + ϵ, (1)

y = X1 β1 + X2 β2 + ϵ, (2)

y:N×1 X1 : N × (K − J) β1 : (K − J) × 1
ϵ:N×1 X2 : N × J β2 : J × 1
We noted that (1) is obtained from (2) by setting the J elements
of the vector β2 equal to zero – (1) is nested within (2).
The J linear restrictions β2 = 0 can be tested using an F-test.
But suppose we estimate (1) when (2) is true – what are the
properties of the OLS estimator?
This is a case of omitted variables (those in X2 ).

4 / 30
OLS with omitted variables
We know that, in the regression (1), the OLS estimator is

b1 = (X1′ X1 )−1 X1′ y

= (X1′ X1 )−1 X1′ (X1 β1 + X2 β2 + ϵ)

= β1 + (X1′ X1 )−1 X1′ X2 β2 + (X1′ X1 )−1 X1′ ϵ.

Making our ‘usual’ assumptions we find that

E{b1 } = β1 + E (X1′ X1 )−1 X1′ X2 β2

| {z }
Assume non-zero
i.e. the OLS estimator is biased.
This is the omitted variable bias.
It is therefore important not to omit any relevant variables.

5 / 30
OLS with irrelevant variables

Suppose that, instead, we estimate (2) when (1) is true – we

are including irrelevant variables (those in X2 ).
It can be shown that OLS is an unbiased estimator of β1 and β2
(whose true value is 0).
However, despite OLS being unbiased, it is still better to
estimate (1) rather than (2).
This is because the variance of the estimator will be smaller in
the correctly-specified regression (1).
To summarise: omitting relevant variables leads to a biased
estimator, while including irrelevant variables leads to an
increased variance.
This raises the question: how should we select regressors?

6 / 30
Selecting regressors

Typically we will have a set of potentially relevant regressors

suggested by economic theory.
The general-to-specific modelling strategy starts with a general
unrestricted model (GUM) containing all possible regressors.
The aim is to reduce the GUM in size and complexity by testing
appropriate restrictions.
If a sufficiently general model is capable of describing reality
then a more parsimonious model is an improvement if it can
convey the same information in a simpler form.
The reverse approach – specific-to-general – is not to be
recommended.

7 / 30
Comparing models

In addition to statistical tests we can compare models by other

statistical criteria.
We have already seen one statistic that can be used:
N
1 X 2
ei
N−K
i=1
R̄2 = 1 − N
;
1 X
2
(yi − ȳ)
N−1
i=1

this
PN provides a trade-off between the fit (as measured by
e 2 ) and the parsimony of the model (as measured by K).
i=1 i
We might select the model with the largest R̄2 value.

8 / 30
Information criteria
Other model comparison methods include information criteria.
These, too, provide a trade-off between model fit and
parsimony.
Examples include Akaike’s Information Criterion (AIC):
N
1 X 2 2K
AIC = log ei + ,
N N
i=1

and the Schwarz Bayesian Information Criterion (BIC):

N
1X 2 K
BIC = log ei + log N.
N N
i=1

Models with smaller AIC or BIC are usually preferred.

Note: the dependent variable must be the same in all models.
9 / 30
Non-nested models

In the above we have assumed that the models are nested

i.e. that one can be obtained from the other by imposing
(typically linear) restrictions.
Sometimes this might not be the case, so how can we choose
between them?
Consider the following two models:

Model A: yi = xi′ β + w′i α + ϵi ,

Model B: yi = xi′ β + z′i γ + ui .

The models have different sets of regressors (although xi is

common to both) and no restrictions on either model will yield
the other – they are non-nested.

10 / 30
Non-nested F-test

The validity of Model A can be assessed in the regression

yi = xi′ β + w′i α + z′i δB + ϵi

by testing δB = 0; if this is true we obtain Model A.

Similarly, the validity of Model B can be assessed in the
regression
yi = xi′ β + z′i γ + w′i δA + ui
by testing δA = 0; if this is true we obtain Model B.
These are both straightforward F-tests in the encompassing
regressions.

11 / 30
J-test

An alternative approach is the J-test based on the artificial

nesting model

yi = xi′ β + (1 − δ)w′i α + δz′i γ + vi .

If δ = 0 we obtain Model A; if δ = 1 we obtain Model B.

However, we can’t separately identify δ, α and γ in this
regression.
A solution is to first estimate Model B, obtaining γ̂, and then
estimate
yi = xi′ β + w′i α∗ + δz′i γ̂ + vi
where α∗ = (1 − δ)α.
The null hypothesis δ = 0 can be tested using a t-test.

12 / 30
Logarithms versus levels
The non-nested approach can also be used to distinguish
between logarithmic and levels regressions.
Consider the two models:

Model LIN: yi = xi′ β + ϵi , fitted value ŷi ;

Model LOG: log yi = xi′ γ + ui , fitted value log ỹi .

Model LIN is valid if δLIN = 0 in

yi = xi′ β + δLIN log ŷi − log ỹi + ϵi .

Similarly, model LOG is valid if δLOG = 0 in

log yi = xi′ γ + δLOG ŷi − exp(log ỹi ) + ui .

These are both simple t-tests.

13 / 30
Testing the functional form

A linear regression model is a model of the conditional mean

which assumes that E{yi |xi } = xi′ β.
The Ramsey RESET test is based on the idea that, if the
conditional mean is correctly specified, then non-linear
functions of ŷi = xi′ b should not help in explaining yi .
It is based on the auxiliary regression

yi = xi′ β + α2 ŷ2i + α3 ŷ3i + . . . + αQ ŷQ

i + vi .

An F-test can be used to test the Q − 1 restrictions

H0 : α2 = α3 = . . . = αQ = 0.

Usually the test is performed with Q = 2 or Q = 3.

14 / 30
Non-linear models
If the RESET test rejects E{yi |xi } = xi′ β then what can we do?
One response would be to estimate a non-linear specification
for the conditional mean of the form

E{yi |xi } = g(xi , β)

for some (known) function g(·).

An example with a scalar xi would be g(xi , β) = β1 + β2 xiβ3 .
We could estimate the parameters using non-linear least
squares obtained by minimising
N
X 2
S(β̃) = yi − g(xi , β̃) .
i=1

This is, of course, more complicated than OLS!

15 / 30
Limited dependent variables (LDVs)

Many (dependent) variables are treated as being continuous

e.g. GDP, exports, log(wages) etc.
But some variables are not, such as:
• employment status (employed or unemployed, a binary
variable);
• home ownership (home owner or not, also binary).
In such cases the linear regression model is generally not
appropriate.
This usually arises with microeconomic data but can also apply
to certain time series.

16 / 30
Motivating example

Suppose we wish to model family home ownership in terms of

income.
A sample of N households is available, and for each we
observe their income, xi2 .
The observed dependent variable is of the form:
(
1 if household i owns their home;
yi =
0 if household i doesn’t own their home.

Now consider the linear regression model

yi = β1 + β2 xi2 + ϵi = xi′ β + ϵi , i = 1, . . . , N,

where xi = (1, xi2 )′ and ϵi is the regression disturbance.

17 / 30
Rogue probabilities!

If E{ϵi |xi } = 0 then E{yi |xi } = xi′ β.

However, because yi is binary, using (S1) we obtain

E{yi |xi } = 0 · P{yi = 0|xi } + 1 · P{yi = 1|xi }

= P{yi = 1|xi }.

But E{yi |xi } = xi′ β which implies P{yi = 1|xi } = xi′ β in this
model.
We know that a probability must lie between 0 and 1 and xi′ β is
not restricted to have this property in this model.
Hence predicted probabilities from the model could therefore be
negative or greater than one!

18 / 30
Values of ϵi

Another issue is that ϵi only has two possible outcomes

because

P{yi = 1|xi } = P{ϵi = 1 − xi′ β|xi } = xi′ β,

P{yi = 0|xi } = P{ϵi = −xi′ β|xi } = 1 − xi′ β.

In addition the variance can be shown to be

V{ϵi |xi } = xi′ β(1 − xi′ β)

which is not constant and also depends on β.

We shall examine various ways to handle these issues that
arise in such data sets.

19 / 30
Binary choice model
A binary choice model is of the form

P{yi = 1|xi } = F(xi′ β), i = 1, . . . , N.

The linear combination xi′ β is transformed by the function F(·) to

ensure that the probability lies between 0 and 1.
A natural candidate for F(·) is a distribution function.
If W is a continuous random variable then its (cumulative)
distribution function satisfies

0 ≤ F(w) = P{W ≤ w} ≤ 1, −∞ < w < ∞.

F(·) is a non-decreasing function that is related to the

probability density function, f (·), as follows:
Z w
F(w) = f (z)dz, −∞ < w < ∞.
−∞

20 / 30
Common choices of F(·)
Some common choices of F(·) result in the following models:
Probit model (standard normal distribution function):
Z w
1 1
F(w) = Φ(w) = √ exp − t2 dt;
−∞ 2π 2

Logit model (logistic distribution function):

ew
F(w) = Λ(w) = ;
1 + ew
Linear probability model (uniform distribution function):


 0, w < 0;

F(w) = w, 0 ≤ w ≤ 1;


 1, w > 1.

21 / 30
The distribution functions

The Probit, Logit and linear probability distribution functions are

depicted below:

Probit
Logit
0.8 Linear probability

0.6
F(w)

0.4

0.2

0
-5 -2.5 0 2.5 5
w

The Probit and Logit curves intersect at w = 0 where, for both,

F(0) = 0.5.

22 / 30
Parameter interpretation

Interpretation of the parameters can be tricky.

One approach is to consider the marginal effects of changes in
a (continuous) explanatory variable, say xik .
Let ϕ(·) denote the pdf of a standard normal variable (S15).
For the models we have defined earlier we obtain
∂Φ(xi′ β)
Probit: = ϕ(xi′ β)βk ;
∂xik
′
∂Λ(xi′ β) e xi β
Logit: = β;
′ 2 k
∂xik 1 + exi β
∂xi′ β
Linear: = βk (or 0).
∂xik
It is more complicated when interactions are present e.g. xi2 xi3 .

23 / 30
Latent variables
Sometimes, however, the dependent variable of interest is not
actually observed, or latent.
For example, suppose the utility difference (y∗i ) between having
a job and not having one is a function of (observable)
characteristics (xi ).
The individual chooses to work if y∗i > 0.
The model might be of the form

y∗i = xi′ β + ϵi , i = 1, . . . , N.

But what we actually observe is employment status:

1 if y∗i > 0 (employed);

(
yi =
0 if y∗i ≤ 0 (unemployed).

24 / 30
A binary choice representation
Let F(·) denote the cdf of ϵi . Then

P(yi = 1) = P(y∗i > 0) = P(xi′ β + ϵi > 0)

= P(ϵi > −xi′ β)

= 1 − F(−xi′ β).

Note: Verbeek, equation (7.9), expresses this probability in

terms of the distribution function of −ϵi .
If ϵi ∼ N(0, 1) we have a probit model.
If ϵi is logistic we have a logit model.
Writing the latent variable model in this way yields a binary
choice model.
Such models are usually estimated using maximum likelihood
(which we will cover later in the module).
25 / 30
Example: labour force participation

Let’s compare the binary choice models using a data set on

labour force participation by married women (N = 753).
The dependent variable (y) is defined as follows:
(
1 if individual i is employed,
yi =
0 otherwise.

The independent variables (in the vector x) are education,

experience, experience-squared and age, all in years.
We shall estimate the following models:
Linear probability model: P(yi = 1|xi ) = xi′ β;
Probit model: P(yi = 1|xi ) = Φ(xi′ β);
Logit model: P(yi = 1|xi ) = Λ(xi′ β).

26 / 30
Linear probability model
The results obtained from estimating the linear probability
model by OLS in R are:
> olsfit <- lm(inlf~educ+exper+expersq+age,data=mroz)
> summary(olsfit)

Call:
lm(formula = inlf ~ educ + exper + expersq + age, data = mroz)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3273246 0.1374985 2.381 0.017535 *
educ 0.0275965 0.0072596 3.801 0.000156 ***
exper 0.0450286 0.0058559 7.689 4.66e-14 ***
expersq -0.0007236 0.0001922 -3.765 0.000180 ***
age -0.0105285 0.0022045 -4.776 2.15e-06 ***
---

Residual standard error: 0.4463 on 748 degrees of freedom

Multiple R-squared: 0.1936,Adjusted R-squared: 0.1893
F-statistic: 44.9 on 4 and 748 DF, p-value: < 2.2e-16

All variables are statistically significant.

An extra year of education increases the employment
probability by 0.0276.

27 / 30
Probit model

The results from estimating a Probit model are:

> probmod <- glm(inlf~educ+exper+expersq+age,
family = binomial(link = "probit"),data=mroz)
> summary(probmod)

Call:
glm(formula = inlf ~ educ + exper + expersq + age,
family = binomial(link = "probit"),data = mroz)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5209982 0.4130301 -1.261 0.207163
educ 0.0855908 0.0225252 3.800 0.000145 ***
exper 0.1287285 0.0180941 7.114 1.12e-12 ***
expersq -0.0020214 0.0005872 -3.443 0.000576 ***
age -0.0316601 0.0067338 -4.702 2.58e-06 ***
---

All variables (excluding the intercept) are statistically significant.

Recall that, in this model, the coefficients aren’t marginal
effects as they are in the usual regression framework.

28 / 30
Logit model
The results from estimating a Logit model are:
> logmod <- glm(inlf~educ+exper+expersq+age,
family = binomial(link = "logit"),data=mroz)
> summary(logmod)

Call:
glm(formula = inlf ~ educ + exper + expersq + age,
family = binomial(link = "logit"),data = mroz)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8824469 0.6863688 -1.286 0.198557
educ 0.1441256 0.0379671 3.796 0.000147 ***
exper 0.2114275 0.0306252 6.904 5.07e-12 ***
expersq -0.0033137 0.0009879 -3.354 0.000796 ***
age -0.0524305 0.0112967 -4.641 3.46e-06 ***
---

All variables (excluding the intercept) are again statistically

significant.
Also, in this model, the coefficients don’t have their usual
interpretation.

29 / 30
Summary

• comparing regression models

• non-nested tests
• test of functional form
• limited dependent variables
• binary choice

• Next week:
• heteroskedasticity

30 / 30

2023 AMC Junior
56% (9)
2023 AMC Junior
8 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Week 1 - Intro To Statistics - Data
100% (4)
Week 1 - Intro To Statistics - Data
5 pages
UJIAN DIAGNOSTIK MATEMATIK (BM) PSPN
No ratings yet
UJIAN DIAGNOSTIK MATEMATIK (BM) PSPN
61 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Week1 Lecture2
No ratings yet
Week1 Lecture2
57 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
Introduction To Econometrics - Summary
No ratings yet
Introduction To Econometrics - Summary
23 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Ssss PDF
No ratings yet
Ssss PDF
50 pages
Ec 384 Applied Econometrics Topic 1 - 2023
No ratings yet
Ec 384 Applied Econometrics Topic 1 - 2023
99 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
11 - Econometrics - Linear Regression
No ratings yet
11 - Econometrics - Linear Regression
20 pages
Assignment of Econometrics
No ratings yet
Assignment of Econometrics
12 pages
Revision 235
No ratings yet
Revision 235
8 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Week3 Lecture1
No ratings yet
Week3 Lecture1
61 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
Functional Form and Prediction: OLS Estimation - Assumptions
No ratings yet
Functional Form and Prediction: OLS Estimation - Assumptions
37 pages
Econometrics - Review Sheet ' (Main Concepts)
No ratings yet
Econometrics - Review Sheet ' (Main Concepts)
5 pages
CH 06
No ratings yet
CH 06
22 pages
Regression 101
No ratings yet
Regression 101
18 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Lec 7
No ratings yet
Lec 7
39 pages
Econometrics
No ratings yet
Econometrics
13 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Sta 3
No ratings yet
Sta 3
9 pages
ECON0019 Lecture7 Slides
No ratings yet
ECON0019 Lecture7 Slides
43 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
Regression
No ratings yet
Regression
45 pages
Linear Regression 101
No ratings yet
Linear Regression 101
20 pages
Lecture BDS 3 23 24 Print
No ratings yet
Lecture BDS 3 23 24 Print
20 pages
2 Regression With Multiple Regressors 1
No ratings yet
2 Regression With Multiple Regressors 1
22 pages
LN 13
No ratings yet
LN 13
8 pages
A1 Regression
No ratings yet
A1 Regression
31 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Elementary Regression Analysis
No ratings yet
Elementary Regression Analysis
25 pages
Notes Book
No ratings yet
Notes Book
39 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
NASA Regression Lecture
No ratings yet
NASA Regression Lecture
268 pages
1 - UNIT 2 2 Files Merged
No ratings yet
1 - UNIT 2 2 Files Merged
80 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
Econometric Estimation BETA
No ratings yet
Econometric Estimation BETA
36 pages
UC Berkeley Econ 140 Section 10
No ratings yet
UC Berkeley Econ 140 Section 10
8 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
17 pages
Exercises of Equations and Disequations
From Everand
Exercises of Equations and Disequations
Simone Malacrida
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Minor Third Ditone: Analysis Modes
No ratings yet
Minor Third Ditone: Analysis Modes
1 page
Exp 5
No ratings yet
Exp 5
9 pages
Test - Calculator: Math No
No ratings yet
Test - Calculator: Math No
15 pages
Experiment 5 (Modified)
100% (1)
Experiment 5 (Modified)
8 pages
Partial Differential Equations (Pdes)
No ratings yet
Partial Differential Equations (Pdes)
66 pages
MPMR Abt Pendek
No ratings yet
MPMR Abt Pendek
5 pages
Imbalanced Data An Extensive Guide On How To Deal With Imbalanced Classification Problems by Lavinia Guadagnolo Eni digiTALKS Medium
No ratings yet
Imbalanced Data An Extensive Guide On How To Deal With Imbalanced Classification Problems by Lavinia Guadagnolo Eni digiTALKS Medium
18 pages
BUKsu Application
No ratings yet
BUKsu Application
2 pages
Contrast Enhancement
No ratings yet
Contrast Enhancement
39 pages
Subject Combination Suggestions
No ratings yet
Subject Combination Suggestions
1 page
Meat Products Demand in Agege Local Gove
No ratings yet
Meat Products Demand in Agege Local Gove
64 pages
Digital Receiver Handbook: Basics of Software Radio: Fourth Edition
No ratings yet
Digital Receiver Handbook: Basics of Software Radio: Fourth Edition
42 pages
Eda Important Two Marks & 16 Marks
0% (1)
Eda Important Two Marks & 16 Marks
17 pages
Design and Analysis of Gas Turbine Combustion Chamber
No ratings yet
Design and Analysis of Gas Turbine Combustion Chamber
61 pages
Finite Element Modeling and Machining of Al 7075 Using Coated Cutting Tools
No ratings yet
Finite Element Modeling and Machining of Al 7075 Using Coated Cutting Tools
10 pages
Tta 1617
No ratings yet
Tta 1617
5 pages
Input Output 1 Logical Reasoning Handout Cetking
No ratings yet
Input Output 1 Logical Reasoning Handout Cetking
6 pages
Analysis of Beam Transverse
No ratings yet
Analysis of Beam Transverse
35 pages
Wavelet Analysis of Bender Element Signals
No ratings yet
Wavelet Analysis of Bender Element Signals
10 pages
Module 2 Descriptive Statistics Tabular and Graphical Presentations
No ratings yet
Module 2 Descriptive Statistics Tabular and Graphical Presentations
23 pages
3F4 Power and Energy Spectral Density: Dr. I. J. Wassell
No ratings yet
3F4 Power and Energy Spectral Density: Dr. I. J. Wassell
12 pages
Complete Download Neandertal Lithic Industries at La Quina 1st Edition Arthur J. Jelinek PDF All Chapters
No ratings yet
Complete Download Neandertal Lithic Industries at La Quina 1st Edition Arthur J. Jelinek PDF All Chapters
67 pages
BHRM 242 - Collection, Organisation and Presentation of Data
No ratings yet
BHRM 242 - Collection, Organisation and Presentation of Data
13 pages
CIRED MV Shielded Busbar Long Term Ageing Test
No ratings yet
CIRED MV Shielded Busbar Long Term Ageing Test
5 pages
Soal Logaritma
No ratings yet
Soal Logaritma
2 pages
Modern Approach To Axiomatics
No ratings yet
Modern Approach To Axiomatics
91 pages
Ac4 Cba
100% (1)
Ac4 Cba
1 page

EC501 Lecture 04

Uploaded by

EC501 Lecture 04

Uploaded by

EC501 Econometric Methods

4. Comparing Regression Models,

Testing the functional form

Limited dependent variables (LDVs)

Binary choice and latent variable models

Reference: Verbeek, chapters 3 and 7.

b1 = (X1′ X1 )−1 X1′ y

= (X1′ X1 )−1 X1′ (X1 β1 + X2 β2 + ϵ)

= β1 + (X1′ X1 )−1 X1′ X2 β2 + (X1′ X1 )−1 X1′ ϵ.

Making our ‘usual’ assumptions we find that

E{b1 } = β1 + E (X1′ X1 )−1 X1′ X2 β2

Suppose that, instead, we estimate (2) when (1) is true – we

Typically we will have a set of potentially relevant regressors

In addition to statistical tests we can compare models by other

and the Schwarz Bayesian Information Criterion (BIC):

Models with smaller AIC or BIC are usually preferred.

In the above we have assumed that the models are nested

Model A: yi = xi′ β + w′i α + ϵi ,

Model B: yi = xi′ β + z′i γ + ui .

The models have different sets of regressors (although xi is

The validity of Model A can be assessed in the regression

yi = xi′ β + w′i α + z′i δB + ϵi

by testing δB = 0; if this is true we obtain Model A.

An alternative approach is the J-test based on the artificial

yi = xi′ β + (1 − δ)w′i α + δz′i γ + vi .

If δ = 0 we obtain Model A; if δ = 1 we obtain Model B.

Model LIN: yi = xi′ β + ϵi , fitted value ŷi ;

Model LOG: log yi = xi′ γ + ui , fitted value log ỹi .

Model LIN is valid if δLIN = 0 in

yi = xi′ β + δLIN log ŷi − log ỹi + ϵi .

Similarly, model LOG is valid if δLOG = 0 in

log yi = xi′ γ + δLOG ŷi − exp(log ỹi ) + ui .

These are both simple t-tests.

A linear regression model is a model of the conditional mean

yi = xi′ β + α2 ŷ2i + α3 ŷ3i + . . . + αQ ŷQ

An F-test can be used to test the Q − 1 restrictions

Usually the test is performed with Q = 2 or Q = 3.

E{yi |xi } = g(xi , β)

for some (known) function g(·).

This is, of course, more complicated than OLS!

Many (dependent) variables are treated as being continuous

Suppose we wish to model family home ownership in terms of

Now consider the linear regression model

where xi = (1, xi2 )′ and ϵi is the regression disturbance.

If E{ϵi |xi } = 0 then E{yi |xi } = xi′ β.

E{yi |xi } = 0 · P{yi = 0|xi } + 1 · P{yi = 1|xi }

Another issue is that ϵi only has two possible outcomes

P{yi = 1|xi } = P{ϵi = 1 − xi′ β|xi } = xi′ β,

P{yi = 0|xi } = P{ϵi = −xi′ β|xi } = 1 − xi′ β.

V{ϵi |xi } = xi′ β(1 − xi′ β)

which is not constant and also depends on β.

P{yi = 1|xi } = F(xi′ β), i = 1, . . . , N.

The linear combination xi′ β is transformed by the function F(·) to

0 ≤ F(w) = P{W ≤ w} ≤ 1, −∞ < w < ∞.

F(·) is a non-decreasing function that is related to the

Logit model (logistic distribution function):

The Probit, Logit and linear probability distribution functions are

The Probit and Logit curves intersect at w = 0 where, for both,

Interpretation of the parameters can be tricky.

But what we actually observe is employment status:

1 if y∗i > 0 (employed);

P(yi = 1) = P(y∗i > 0) = P(xi′ β + ϵi > 0)

= P(ϵi > −xi′ β)

Note: Verbeek, equation (7.9), expresses this probability in

Let’s compare the binary choice models using a data set on

The independent variables (in the vector x) are education,

Residual standard error: 0.4463 on 748 degrees of freedom

All variables are statistically significant.

The results from estimating a Probit model are:

All variables (excluding the intercept) are statistically significant.

All variables (excluding the intercept) are again statistically

• comparing regression models

You might also like