More On Specification and Data

Chapter 8. More on Specification and Data


M. Ryan Sanjaya

Departemen Ilmu Ekonomi

Fakultas Ekonomika dan Bisnis
Universitas Gadjah Mada

Maret 2023

Specification and data issues

How do we know that our econometric model is correctly

There is no exact way to know if our model is correctly specified.
Nonetheless, there are some selection criteria to judge whether an
econometric model is good or not.

What is a Good Model?

A good model for empirical analysis should (Hendry & Richard, 1983):
Be data admissible; that is, predictions made from the model must
be logically possible.
Be consistent with theory; that is, it must make good economic
Have weakly exogenous regressors; that is, the explanatory
variables, must be uncorrelated with the error term (no omitted
variable bias).
Exhibit parameter constancy; that is, the values of the parameters
should be stable.
Exhibit data coherency; that is, the residuals estimated from the
model must be purely random (technically, white noise).
Be encompassing; that is, other models cannot be an improvement
over the chosen model.
Specification Errors
Let’s say the true model is
y = β0 + β1 x1 + β2 x2 + u
We make specification error if we
omit variables → underfitting the model
y = β0 + β 1 x 1 + u
include irrelevant variable → overfitting the model
y = β0 + β1 x 1 + β 2 x 2 + β3 x 3 + u
estimate the wrong functional form
ln y = β0 + β1 x1 + β2 x2 + u
use proxy, e.g., x2∗ , that may contain measurement error
y = β0 + β1 x1 + β2 x2∗ + u
incorrectly specify the stochastic error term.
Detecting Model Misspecification

Look at the pattern of the residual.

Residual in the vertical axis, explanatory variable in the horizontal
No pattern = good
RESET (regression specification error test).
Davidson-MacKinnon J test.

1 2 of the original (but not
Obtain the fitted values ŷ and Rold
necessarily true) model
y = β0 + β1 x1 + β2 x2 + u.
2 Estimate the expanded model by adding ŷ 2 and ŷ 3 , and get Rnew

y = β0 + β1 x1 + β2 x2 + δ1 ŷ 2 + δ2 ŷ 3 + error .
3 Calculate the F statistic
2 2

Rnew − Rold (n − k − 3)
F = 2
(1 − Rnew ) 2
under the null hypothesis of H0 : δ1 = 0 and δ2 = 0.
The distribution of the F statistic is approximately F2,n−k−3 in large
samples (and the Gauss-Markov assumptions).
If H0 is rejected, then we have functional form problem.

An LM version is also available (and the chi-square distribution

will have two df).
We can also do the test to be made robust to heteroskedasticity
using the methods discussed in the previous chapter.
Drawback of RESET test: It provides no real direction on how to
proceed if the model is rejected → what’s the best model then?
RESET has no power for detecting omitted variables.
If the functional form is properly specified, RESET has no power
for detecting heteroskedasticity.
reg y x1 x2
estat ovtest

Davidson-MacKinnon test
Two nonnested models:

y = β0 + β1 x1 + β2 x2 + u (1)

y = β0 + β1 ln x1 + β2 ln x2 + u. (2)

1 Estimate model 2 and obtain fitted values y̌ .

2 Use y̌ as an additional regressor in model 1.
You can also do the opposite: estimate model 1 and use the fitted
values as regressor in model 2.
3 Use t-test: if the estimated parameter for y̌ is significant, then
model 1 is rejected.

Drawbacks of the Davidson-MacKinnon test

A clear winner need not emerge.

Both models could be rejected or neither model could be rejected.
If none are rejected, use adjusted R 2 to choose between them.
If both models are rejected, more work needs to be done.
Rejection of one model does not mean that the other model is the
correct one.
We can’t compare models with different dependent variables.

Proxy Variables

In the absence of a relevant variable, use a proxy variable.

If ability is unobserved, use IQ score.
Does IQ and ability the same? Measurement error?

What is a Good Proxy?

Suppose the model is

y = β0 + β1 x1 + β2 x2 + β3 x3∗ + u

where x3∗ is unobserved and is proxied by x3 in the plug-in

y on x1 , x2 , x3 .
The variable x3 is a good proxy for x3∗ if
x3∗ is closely correlated with x3 , that is x3∗ = δ0 + δ3 x3 + v3 ,
the estimated error term u is uncorrelated with x1 , x2 , and x3∗ ,
u is uncorrelated with x3 ,
v3 is uncorrelated with x1 , x2 , x3 .

Lagged Dependent Variable as Proxy

We can use lagged dependent variable as a proxy in a cross
sectional regression.
For example:

crime = β0 + β1 officer + β2 unem + β3 crime−1 + u.

Some cities had high crime rate in the past and today.
If crime−1 is not included we might suffer from reverse causality:
since the city has high crime rate → high unemployment and many
police officers.
If crime−1 is included, we can do this experiment: if two cities have
the same previous crime rate and current unemployment rate, then
β1 measures the effect of another police officer on crime rate.

Models with Random Slopes

A model with random slopes is given by

yi = ai + bi xi .

The slope coefficient bi is a random draw from the population.

That is, the slope of x varied by individual.
We cannot estimate the slope for each observation but we can
estimate the average slope across population → average partial
effect (APE) or average marginal effect (AME).
The assumption is that the slopes are independent of the
explanatory variables.
In Stata:

Determinant of language score.

Dataset: snijders.dta
Cross sections of 2287 8th grade students from 131 schools in the
Variables of interest: langpost (language score), iqvc (average
verbal IQ score), and schoolnr (identity code for each school).
In Stata:
** Random intercepts only
mixed langpost iqvc || schoolnr: , mle

** Random intercepts and slopes

mixed langpost iqvc || schoolnr: iqvc, mle covariance(indep)

Example Result — Random Intercepts

The expected language score

for a kid with average verbal
IQ averages 40.61 across all
schools, but with substantial
variation (variance = 9.50).
The common slope is
estimated as a gain of 2.49
points in language score per
point of verbal IQ.

Example Result — Random Intercepts and Slopes

The expected language score

for a child with average IQ
now averages 40.64 across
schools, with about the same
variance of 9.54.
The expected gain in language
score per point of IQ averages
2.52, a bit higher than in
random intercepts.

Measurement Error

The measurement error is defined as the difference between the

observed value (y , x1 ) and the actual value in population (y ∗ , x1∗ )

e0 = y − y ∗

e1 = x1 − x1∗

If e0 and e1 is uncorrelated with the explanatory variables → good.

If e0 and e1 is correlated with the error term u → bias → need to
collect new data with better data-collecting technique.

Classical errors-in-variables

Classical errors-in-variables (CEV) assumption: the measurement

error is uncorrelated with unobserved explanatory variable

Cov (x1∗ , e1 ) = 0.

Violation of this properties will resulted in a biased and

inconsistent estimator → attenuation bias (the estimated slope
will always be attenuated/weaker/underestimated).

Missing Data

If the data are missing completely at random (MCAR), then

missing data cause no statistical problems.
Complete cases estimator: use only observations with complete
data in the regression.
Multiple-imputation method.
mi estimate

Missing Indicator Method

1 Create Zik = xik when it is Drawbacks of MIM.
observed, 0 otherwise. Requires strong
2 Create a missing data assumptions, such as xk to
indicator mik = 1 when xik be uncorrelated with
is missing, 0 otherwise. x1 , x2 , ...xk−1 .
3 Estimate yi on It is less robust than the
xi1 , ..., xi,k−1 , Zik , mik for complete cases estimator.
i = 1, ..., n.

Nonrandom Samples

Exogenous sample selection.

Selection based on the independent variables (sometimes called
missing at random, MAR).
E.g., regressing y on x1 and age, but the survey is only for those
age > 40; nonrandom sample of adults.
Do not cause bias.
Endogenous sample selection.
Selection based on the dependent variable.
E.g., regressing wealth on x1 , x2 , but only those with
wealth < 250, 000 is in the sample.
Creates bias and inconsistent estimates.

Least absolute deviations (LAD) estimation can be used to

minimise the impact of outliers in a regression.
Minimize the sum of the absolute residuals
LAD is designed to estimate the parameters of the conditional
median of y given the xs.
LAD is a special case of robust regression and quantile regression.
In Stata:
qreg y x1 x2

