0% found this document useful (0 votes)
12 views79 pages

ch9_Model Specification and Data Problems

Chapter 9 of ECON F342 discusses specification and data problems in multiple regression analysis, focusing on functional form, proxy variables, measurement error, and missing data. It highlights the importance of using economic theory to determine the appropriate model and introduces tests like Ramsey's RESET for functional form validation. The chapter also addresses issues like outliers and sample selection bias that can affect regression estimates.

Uploaded by

Itsme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views79 pages

ch9_Model Specification and Data Problems

Chapter 9 of ECON F342 discusses specification and data problems in multiple regression analysis, focusing on functional form, proxy variables, measurement error, and missing data. It highlights the importance of using economic theory to determine the appropriate model and introduces tests like Ramsey's RESET for functional form validation. The chapter also addresses issues like outliers and sample selection bias that can affect regression estimates.

Uploaded by

Itsme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

ECON F342:

APPLIED ECONOMETRICS
BITS Pilani Specification and Data Problems
NVM Rao
Pilani Campus
Chapter 9
Specification and Data Problems

N V M Rao
Multiple Regression Analysis

y = b 0 + b 1x 1 + b 2 x 2 + . . . b k x k + u

Specification and Data Problems

3
Functional Form
• We’ve seen that a linear regression can really
fit nonlinear relationships
• Can use logs on RHS, LHS or both
• Can use quadratic forms of x’s
• Can use interactions of x’s
• How do we know if we’ve gotten the right
functional form for our model?

4
Functional Form (continued)
• First, use economic theory to guide you
• Think about the interpretation
• Does it make more sense for x to affect y in
percentage (use logs) or absolute terms?
• Does it make more sense for the derivative of
x1 to vary with x1 (quadratic) or with x2
(interactions) or to be fixed?

5
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Functional Form (continued)
• We already know how to test joint exclusion
restrictions to see if higher order terms or
interactions belong in the model
• It can be tedious to add and test extra terms,
plus may find a square term matters when
really using logs would be even better
• A test of functional form is Ramsey’s
regression specification error test (RESET)

15
RESET
Ramsey’s RESET
• RESET relies on a trick similar to the special
form of the White test
• Instead of adding functions of the x’s directly,
we add and test functions of ŷ
• So, estimate
y = b0 + b1x1 + … + bkxk + d1ŷ2 + d2ŷ3 +error
and test
• H0: d1 = 0, d2 = 0 using F~F2,n-k-3 or LM~χ22
17
RESET
EXAMPLE
Example : Suppose that the correct specification of the wage equation is
Consider the house price data (Exercise 3.1) and estimate
Nonnested Alternative Tests
• If the models have the same dependent
variables, but nonnested x’s could still just
make a giant model with the x’s from both and
test joint exclusion restrictions that lead to
one model or the other
• An alternative, the Davidson-MacKinnon test,
uses ŷ from one model as regressor in the
second model and tests for significance

30
Nonnested Alternatives (cont)
• More difficult if one model uses y and the
other uses ln(y)
• Can follow same basic logic and transform
predicted ln(y) to get ŷ for the second step
• In any case, Davidson-MacKinnon test may
reject neither or both models rather than
clearly preferring one specification

31
why ulta?
Points to remember
Proxy Variables
• What if model is misspecified because no data
is available on an important x variable?
• It may be possible to avoid omitted variable
bias by using a proxy variable
• A proxy variable must be related to the
unobservable variable – for example:
x3* = d0 + d3x3 + v3, where * implies
unobserved
• Now suppose we just substitute x3 for x3*
38
Proxy Variables (continued)
• What do we need for for this solution to give
us consistent estimates of b1 and b2?
• E(x3* | x1, x2, x3) = E(x3* | x3) = d0 + d3x3
• That is, u is uncorrelated with x1, x2 and x3* and
v3 is uncorrelated with x1, x2 and x3
• So really running
y = (b0 + b3d0) + b1x1+ b2x2 + b3d3x3 + (u + b3v3)
and have just redefined intercept, error term x3
coefficient
39
Proxy Variables (continued)
• Without out assumptions, can end up with
biased estimates
• Say x3* = d0 + d1x1 + d2x2 + d3x3 + v3
• Then really running
y = (b0 + b3d0) + (b1 + b3d1) x1+ (b2 + b3d2) x2 +
b3d3x3 + (u + b3v3)
• Bias will depend on signs of b3 and dj
• This bias may still be smaller than omitted
variable bias, though
40
In simple –
Proxy Variables
example
example
Example
Example
Example
Example
Lagged Dependent Variables
• What if there are unobserved variables, and
you can’t find reasonable proxy variables?
• May be possible to include a lagged
dependent variable to account for omitted
variables that contribute to both past and
current levels of y
• Obviously, you must think past and current y
are related for this to make sense

54
Measurement Error
• Sometimes we have the variable we want, but
we think it is measured with error
• Examples: A survey asks how many hours did
you work over the last year, or how many
weeks you used child care when your child
was young
• Measurement error in y different from
measurement error in x

59
Measurement Error in a
Dependent Variable
• Define measurement error as e0 = y – y*
• Thus, really estimating
y = b0 + b1x1 + …+ bkxk + u + e0
• When will OLS produce unbiased results?
• If e0 and xj, u are uncorrelated is unbiased
• If E(e0) ≠ 0 then b0 will be biased, though
• While unbiased, larger variances than with no
measurement error

60
Measurement Error in an
Explanatory Variable
• Define measurement error as e1 = x1 – x1*
• Assume E(e1) = 0 , E(y| x1*, x1) = E(y| x1*)
• Really estimating y = b0 + b1x1 + (u – b1e1)
• The effect of measurement error on OLS
estimates depends on our assumption about
the correlation between e1 and x1
• Suppose Cov(x1, e1) = 0
• OLS remains unbiased, variances larger

62
Measurement Error in an
Explanatory Variable (cont)
• Suppose Cov(x1*, e1) = 0, known as the classical
errors-in-variables assumption, then
• Cov(x1, e1) = E(x1e1) = E(x1*e1) + E(e12) = 0 + se2
• x1 is correlated with the error so estimate is
Cov x , u  b e 
 
biased bs 2
plim bˆ1  b1  1 1 1
 b1  1 e

Var  x1  s x2*  s e2
 se
2
  s x* 
2
 b1 1  2   b1  2
2 

2 
 s x*  s e   s x*  s e 
63
Measurement Error in an
Explanatory Variable (cont)
• Notice that the multiplicative error is just
Var(x1*)/Var(x1)
• Since Var(x1*)/Var(x1) < 1, the estimate is
biased toward zero – called attenuation bias
• It’s more complicated with a multiple
regression, but can still expect attenuation
bias with classical errors in variables

64
Missing Data – Is it a Problem?
• If any observation is missing data on one of
the variables in the model, it can’t be used
• If data is missing at random, using a sample
restricted to observations with no missing
values will be fine
• A problem can arise if the data is missing
systematically – say high income individuals
refuse to provide income data

67
Nonrandom Samples
• If the sample is chosen on the basis of an x
variable, then estimates are unbiased
• If the sample is chosen on the basis of the y
variable, then we have sample selection bias
• Sample selection can be more subtle
• Say looking at wages for workers – since
people choose to work this isn’t the same as
wage offers

70
Outliers
• Sometimes an individual observation can be
very different from the others, and can have a
large effect on the outcome
• Sometimes this outlier will simply be do to
errors in data entry – one reason why looking
at summary statistics is important
• Sometimes the observation will just truly be
very different from the others

71
Outliers (continued)
• Not unreasonable to fix observations where
it’s clear there was just an extra zero entered
or left off, etc.
• Not unreasonable to drop observations that
appear to be extreme outliers, although
readers may prefer to see estimates with and
without the outliers
• Can use Stata/ eviews/ any software to
investigate outliers

72
Outliers
Example Continue
Least Absolute Deviations Estimation
(LAD)
figure
Thanks

You might also like