More On Specification and Data
More On Specification and Data
More On Specification and Data
M. Ryan Sanjaya
Maret 2023
Specification Errors
Let’s say the true model is
y = β0 + β1 x1 + β2 x2 + u
We make specification error if we
omit variables → underfitting the model
y = β0 + β 1 x 1 + u
include irrelevant variable → overfitting the model
y = β0 + β1 x 1 + β 2 x 2 + β3 x 3 + u
estimate the wrong functional form
ln y = β0 + β1 x1 + β2 x2 + u
use proxy, e.g., x2∗ , that may contain measurement error
y = β0 + β1 x1 + β2 x2∗ + u
incorrectly specify the stochastic error term.
M. Ryan Sanjaya — [email protected] Maret 2023 4/22
Misspecification Proxy Variables Random Slopes Measurement Error Missing, Nonrandom, Outlier
RESET
1 2 of the original (but not
Obtain the fitted values ŷ and Rold
necessarily true) model
y = β0 + β1 x1 + β2 x2 + u.
2 Estimate the expanded model by adding ŷ 2 and ŷ 3 , and get Rnew
2
y = β0 + β1 x1 + β2 x2 + δ1 ŷ 2 + δ2 ŷ 3 + error .
3 Calculate the F statistic
2 2
Rnew − Rold (n − k − 3)
F = 2
(1 − Rnew ) 2
under the null hypothesis of H0 : δ1 = 0 and δ2 = 0.
The distribution of the F statistic is approximately F2,n−k−3 in large
samples (and the Gauss-Markov assumptions).
If H0 is rejected, then we have functional form problem.
RESET
Davidson-MacKinnon test
Two nonnested models:
y = β0 + β1 x1 + β2 x2 + u (1)
vs
y = β0 + β1 ln x1 + β2 ln x2 + u. (2)
Proxy Variables
y = β0 + β1 x1 + β2 x2 + β3 x3∗ + u
Some cities had high crime rate in the past and today.
If crime−1 is not included we might suffer from reverse causality:
since the city has high crime rate → high unemployment and many
police officers.
If crime−1 is included, we can do this experiment: if two cities have
the same previous crime rate and current unemployment rate, then
β1 measures the effect of another police officer on crime rate.
yi = ai + bi xi .
Example
Measurement Error
e0 = y − y ∗
e1 = x1 − x1∗
Classical errors-in-variables
Cov (x1∗ , e1 ) = 0.
Missing Data
Procedure.
1 Create Zik = xik when it is Drawbacks of MIM.
observed, 0 otherwise. Requires strong
2 Create a missing data assumptions, such as xk to
indicator mik = 1 when xik be uncorrelated with
is missing, 0 otherwise. x1 , x2 , ...xk−1 .
3 Estimate yi on It is less robust than the
xi1 , ..., xi,k−1 , Zik , mik for complete cases estimator.
i = 1, ..., n.
Nonrandom Samples
Outliers