Block 1
Block 1
y = Xβ + ε
LRM assumptions (for OLS estimation):
(Notation follows Greene, Econometric analysis, 7th ed.)
y = Xβ + ε
LRM assumptions (continued):
A4 Homoscedastic & nonautocorrelated disturbances:
E[εε0 ] = σ 2 In
Homoscedasticity: var[εi |X] = σ 2 , ∀ i = 1, . . . , n.
Independent disturbances: cov[εt , εs |X] = 0, ∀ t 6= s.
GARCH models [i.e. ARCH(1): var[εt |εt−1 ] = σ 2 + αεt−1 ]
do not violate the conditional variance assumption
var[εi |X] = σ 2 . However, var[εt |εt−1 ] 6= var[εt ], with
conditioning on X omitted from notation but left as
implicit.
A5 DGP of X: Variables in X may be fixed or random.
A6 Normal distribution of disturbances:
ε|X ∼ N [0, σ 2 In ].
Ordinary least squares (OLS)
y = Xβ + ε
The least squares estimator is unbiased (given A1 – A3):
β̂ = b = (X 0 X)−1 X 0 y = β + (X 0 X)−1 X 0 ε,
take expectations :
E[b|X] = β + E[(X 0 X)−1 X 0 ε|X] = β, (zero by A3).
Variance of the least squares estimator (A1 – A4):
var[b|X] = var[(X 0 X)−1 X 0 ε|X]
because var(β) = 0. Using A3 & A4:
= A σ 2 In A0 where A = (X 0 X)−1 X 0
which is a matrix quadratic form for var(cZ) = c2 var(Z)
= σ 2 (X 0 X)−1
because (AB)0 = B 0 A0 ; dim. compatible matrices A, B.
Normal distribution of the least squares estimator (A1 – A6):
b|X ∼ N [β, σ 2 (X 0 X)−1 ].
General properties of estimators
Consistency: plim(θ̂) = θ.
θ̂ → θ as n → ∞: vector θ̂ is at least asymptotically unbiased
and plim(var(θ̂)) = 0 [i.e. var(θ̂) → 0 as n → ∞].
Consistent estimators: unbiased or asymptotically unbiased
& their variance shrinks to zero as sample size grows.
Minimal requirement for estimator used in statistics or
econometrics.
If some estimator is not consistent, then it does not provide
relevant estimates of population θ values, even with
unlimited data, i.e. as n → ∞.
Unbiased estimators are not necessarily consistent.
Biased and consistent estimators are often useful
(small-sample bias, yet consistent: IVR, ML, etc.).
Properties of estimators - classification:
Application example 1
Sample covariance is a consistent estimator of population
covariance.
Application example 2
OLS estimators we have used for parameters in the CLRM can
be derived by the method of moments.
Method of moments
...
n
1 X
xiK yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1
...
n
1 X
ziL yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1
θ = (θ1 , . . . , θm )0
i=1 2πσ 2
In matrix form, the log likelihood function is:
n n 1
LL(β, σ 2 |y, X) = − log(2π) − log(σ 2 ) − 2 (y − Xβ)0 (y − Xβ)
2 2 2σ
Recall that:
(y − Xβ)0 (y − Xβ) = y 0 y − 2β 0 X 0 y + β 0 X 0 Xβ
and
∂(y−Xβ)0 (y−Xβ)
∂β 0
= −2X 0 y + 2X 0 Xβ.
Maximum likelihood estimator
LRM parameters & Normal distribution (continued)
n n 1
LL(β, σ 2 |y, X) = − log(2π)− log(σ 2 )− 2 (y−Xβ)0 (y−Xβ)
2 2 2σ
∂LL
∂ σ2 = − 2σn2 + 1
2σ 4 [(y − Xβ)0 (y − Xβ)] = 0
is solved by:
(y−Xβ)0 (y−Xβ) u0 u SSR
σ̂ 2 = n = n = n .
Asymptotic normality of θ̂
2 0 −1
σ̂ (X X) 0
var(θ̂) = I[θ̂]−1 = 2σ̂ 4 ,
0 n
h i−1
W = [h(θ̂) − q]0 Asy.Var[h(θ̂) − q] [h(θ̂) − q] ∼ χ2 (r),
H0
!0 !
∂ log L(θ̂R ) ∂ log L(θ̂R )
LM = I[θ̂R ]−1 ∼ χ2 (r),
∂ θ̂R ∂ θ̂R H0
W ≥ LR ≥ LM.
MLE – summary
Quantile regression
Non-linear regression models
yi = h(xi , β) + εi
non-autocorrelation:
yt = x0t β + ut , ut = ρut−1 + εt ,
yt = x0t β + ρut−1 + εt note: ut−1 = yt−1 − x0t−1 β,
hence:
yt = ρyt−1 + x0t β + ρ(x0t−1 β) + εt ,
which is non-linear in parameters (ρβ).
consi = β1 + β2 incβi 3 + εi
special case: model is linear for β3 = 1
(such assumption can be tested).
Nonlinear regression: examples
N N
X X
0
min : ρ(ei ) = ρ(yi − xi β̂)
10
ρ(e_i)
i=1 i=1
−4 −2 0 2 4
e_i
Quantile regression - example and motivation
(1) wagei = β1 + ui
(2) wagei = β1 + β2 femalei + ui
(3) wagei = β1 + β2 femalei + β3 experi + ui
values of τ
.25
.5
.9
reflected in quantile-specific
estimates β̂q .
0.0
−2 −1 0 1 2
e_i
Quantile regression (QREG)
1.4
−2
1.2
−4
1.0
−6
0.8
−8
AGE ADEPCNT
0.00
−0.02
−0.015
−0.04
−0.025
−0.06
−0.08
−0.035
Reparametrized CLRM:
ŷp = βˆ1∗
s.e.(ŷp ) = s.e.(βˆ1∗ ), i.e.
var(ŷp ) = var(β̂1∗ )
Predictions - basics
Prediction error
because var(β1 + β2 c2 + β3 c3 + · · · + βK cK ) = 0
Predictions - basics
log(y) = β1 + β2 x2 + · · · + βK xK + ε
\ = β̂1 + β̂2 x2 + · · · + β̂K xK
log(y)
\
\ = elog(y)
ŷ = exp(log(y)) systematically underestimates ŷ ,
\
b 0 elog(y)
we can use a correction: ŷ = α
Pn
b 0 = n−1
where α i=1 exp(ε̂i )
ŷp = x0p β̂
Prediction intervals:
Individual vs. mean value prediction intervals:
Reliability of predictions:
1 Xh i2
MSETe = yi − fˆ(xi )
m i∈Te
Variance vs. Bias trade-off
k
X
1
CV(k) = k MSEs ,
s=1