0% found this document useful (0 votes)

29 views83 pages

Block 1

Uploaded by

akshayaglave007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views83 pages

Block 1

Uploaded by

akshayaglave007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Block 1

Repetition from BSc courses

LRM estimators & non-linear extensions
Predictions from regression models

Advanced econometrics 1 4EK608

Pokročilá ekonometrie 1 4EK416

Vysoká škola ekonomická v Praze

Outline

1 Estimation methods, predictions from a model

Ordinary least squares
General properties of estimators
Method of moments
Maximum likelihood estimator

2 Non-linear extensions to LRM, quantile regression

Non-linear regression models
Quantile regression

3 Predictions from a regression model

Predictions from a CLRM (repetition from BSc courses)
Predictions: general features, kFCV, Variance vs. Bias
Linear regression model (LRM) and OLS estimation

y = Xβ + ε
LRM assumptions (for OLS estimation):
(Notation follows Greene, Econometric analysis, 7th ed.)

A1 Linearity: yi = β1 + β2 xi2 + · · · + βK xiK + εi

LRM describes linear relationship between yi and xi .
A2 Full rank: Matrix X is an n×K matrix with rank K.
Columns of X are linearly independent and n ≥ K.
A3 Exogeneity of regressors: E[εi |X] = 0 (strict form).
If relaxed to contemporaneous form in TS: E[εt |xt ] = 0.
Law of iterated expectations: E[εi |X] = 0 ⇒ E[ε] = 0.
Linear regression model (LRM) and OLS estimation

y = Xβ + ε
LRM assumptions (continued):
A4 Homoscedastic & nonautocorrelated disturbances:
E[εε0 ] = σ 2 In
Homoscedasticity: var[εi |X] = σ 2 , ∀ i = 1, . . . , n.
Independent disturbances: cov[εt , εs |X] = 0, ∀ t 6= s.
GARCH models [i.e. ARCH(1): var[εt |εt−1 ] = σ 2 + αεt−1 ]
do not violate the conditional variance assumption
var[εi |X] = σ 2 . However, var[εt |εt−1 ] 6= var[εt ], with
conditioning on X omitted from notation but left as
implicit.
A5 DGP of X: Variables in X may be fixed or random.
A6 Normal distribution of disturbances:
ε|X ∼ N [0, σ 2 In ].
Ordinary least squares (OLS)
y = Xβ + ε
The least squares estimator is unbiased (given A1 – A3):

β̂ = b = (X 0 X)−1 X 0 y = β + (X 0 X)−1 X 0 ε,
take expectations :
E[b|X] = β + E[(X 0 X)−1 X 0 ε|X] = β, (zero by A3).
Variance of the least squares estimator (A1 – A4):
var[b|X] = var[(X 0 X)−1 X 0 ε|X]
because var(β) = 0. Using A3 & A4:
= A σ 2 In A0 where A = (X 0 X)−1 X 0
which is a matrix quadratic form for var(cZ) = c2 var(Z)
= σ 2 (X 0 X)−1
because (AB)0 = B 0 A0 ; dim. compatible matrices A, B.
Normal distribution of the least squares estimator (A1 – A6):
b|X ∼ N [β, σ 2 (X 0 X)−1 ].
General properties of estimators

Estimators and estimation methods:

LRM is not the only type of regression model.

OLS is not the only useful estimator.

Let’s approach estimators and their properties more

generally.

(again, notation follows Greene, Econometric analysis.)

Estimators and estimation methods
Notation/definitions:
x0j = (x1j , . . . , xnj ) - random sample of n observations.
θ - population parameter [unknown parameter(s)]
f (xj , θ): probability distribution function
θ̂ is some estimator of θ
Basic notions:
All estimators have sampling distributions
mean: E(θ̂)
variance: E[(θ̂ − E(θ̂))2 ], etc.
Estimators × estimate
Generally, many estimators exist for a given parameter.
Population mean example (two sample-based estimators):
Pn
i=1 xi
θ̂1 = x =
n
1
θ̂2 = x̃ = (xmax + xmin )
2
Properties of estimators - classification:
Unbiasedness: can be described as E(θ̂) = θ.
Ocassionally useful – in finite (small) sample context.
Asymptotic unbiasedness (large sample property):
not very useful, discussion would be directed towards consistency
(which is a far more desirable feature).

Consistency: plim(θ̂) = θ.
θ̂ → θ as n → ∞: vector θ̂ is at least asymptotically unbiased
and plim(var(θ̂)) = 0 [i.e. var(θ̂) → 0 as n → ∞].
Consistent estimators: unbiased or asymptotically unbiased
& their variance shrinks to zero as sample size grows.
Minimal requirement for estimator used in statistics or
econometrics.
If some estimator is not consistent, then it does not provide
relevant estimates of population θ values, even with
unlimited data, i.e. as n → ∞.
Unbiased estimators are not necessarily consistent.
Biased and consistent estimators are often useful
(small-sample bias, yet consistent: IVR, ML, etc.).
Properties of estimators - classification:

Efficiency: an estimator is efficient if it is unbiased and no

other unbiased estimator has a smaller variance. Often
difficult to prove, we usually simplify the concept to
relative efficiency (e.g.: efficiency with respect to linear
unbiased estimators, etc.).
Asymptotic efficiency: holds for an estimator that is
asymptotically unbiased and no other asymptotically
unbiased estimator has smaller asymptotic variance.

Normality, asymptotic normality: basis for most

statistical inference performed with common estimators.
Estimators and estimation methods

Extremum estimator: obtained as the optimizer of some

criterion function q(θ|data). Most common estimators:
n
" #
X
2
LS θ̂LS = argmax − n1 (yi − h(xi , θLS )) ,
i=1
n
" #
X
1
ML θ̂ML = argmax n log f (yi |xi , θML ) ,
i=1
GMM θ̂GMM = argmax [−m(data, θGMM )0 W m(data, θGMM )],

where h(·) is a function (linear/non-linear → OLS/NLS),

f (·) is a probability density function (pdf),
m denotes sample moments,
W is a convenient positive definite matrix.
LS and ML estimators belong to a class of M estimators
(type of extremum estimators where objective function is a sample average).
Estimators and estimation methods

Assumptions for asymptotic properties of extremum estimators:

1 Parameter space: must be convex and the parameter

vector that is the object of estimation must be point in its
interior. Gaps and nonconvexities in parameter spaces
would generally collide with estimation algorithms
(settings such as σ 2 ≥ 0 are OK).

2 Criterion function: must be concave in the parameters

(concave in the neighborhood of the true parameter
vector). Criterion functions need not be globally concave.
In such situation, there may be multiple local optima
(often associated with poor model specification).
Estimators and estimation methods
Assumptions for asymptotic properties of extremum estimators:

3 Identifiability of parameters: has a relatively complex

technical definition (anything like “true parameters θ0 are
identified if...” is problematic - leads to a paradox if condition is
not met). Simple way to secure identification:

LS: for a given set of any two different parameter vectors θ

and θ0 , a vector of observations xi must exist (for some i),
leading to different conditional mean function (ŷi ).
ML: For any two parameter vectors θ 6= θ0 , a data vector
(yi , xi ) must exist, which generates different values of
density function: f (yi |xi , θ) 6= f (yi |xi , θ0 ).
Note: identifiability does not rule out possibility of:
f (yi |xi , θ) = f (y` |x` , θ), where, yi = y` , xi 6= x` .
GMM: sufficient condition for identification:
E[m(data, θ)] 6= 0 if θ 6= θ0 .
Estimators and estimation methods

Assumptions for asymptotic properties of extremum estimators:

4 Behavior of the data: Grenander conditions for

well-behaved data:
Pn
G1 For each xk column of X and d2nk = x0k xk = i=1 x2ik ,
it must hold that: limn→∞ d2nk = +∞.
Sum of squares continue to grow with sample size, i.e. xk
does not degenerate into a series of 0.
G2 The limn→∞ x2ik /d2nk = 0 for all i = 1, 2, . . . , n. Single
observations become less important as sample size grows.
No single observation will dominate x0k xk .
G3 Let Cn be sample correlation matrix of the columns in X
(excluding the intercept, if present). Then limn→∞ Cn = C
where C is positive definite. This implies that the full rank
condition for X (A2) is not asymptotically violated.
Estimators and estimation methods

Quick convergence recap (terminology):

Convergence in probability: a sequence of random variables

X1 , X2 , X3 , . . . converges in probability to a random
p
variable X, denoted as Xn → X [or plim(Xn ) = X], if:

lim P (|Xn − X| ≥ ) = 0, ∀ > 0.

n→∞

Convergence in distribution: a weaker type of convergence.

It states that the CDF of Xn converges to the CDF of X as
n goes to infinity (does not require dependency between
d
Xn and X). Xn → X, if:

lim FXn (x) = FX (x), FX (x) continuous.

n→∞
Estimators and estimation methods

Theorem: Consistency of M estimators

If:
(a) the parameter space is convex and the true parameter
vector is a point in its interior,
(b) the criterion function is concave,
(c) the parameters are identified by the criterion function,
(d) the data are well behaved,
then the M estimator converges in probability to the true
parameter vector.
Estimators and estimation methods

Theorem: Asymptotic normality of M estimators

If:
(a) θ̂ is a consistent estimator of θ0 where θ0 is a point in the
interior of the parameter space Θ,

(b) q(θ|data) is concave and twice continuously differentiable in θ in

a neighborhood of θ0 ,
√ d
(c) n [∂q(θ0 |data)/∂θ0 ] −−−→ N (0, Φ),

(d) lim Pr |(∂ 2 q(θ|data)/∂θk ∂θm ) − hkm (θ)| > ε = 0 ∀ε > 0 for
n→∞
any θ in Θ; hkm (θ) is a continuous finite valued function of θ,

(e) the matrix of elements H(θ) is nonsingular at θ0 ,

√ d
then n(θ̂ − θ0 ) −−−→ N {0, [H −1 (θ0 )ΦH −1 (θ0 )]}.

where Φ is a variance-covariance matrix,

and H(θ0 ) = ∂ 2 q(θ|data)/∂θ∂θ 0 is a Hessian (evaluated at θ0 ).
Method of moments

Method of moments (MM)

Generalized method of moments (GMM)

Method of moments

With the method of moments, we simply estimate

population moments by corresponding sample moments.

Under very general conditions, sample moments are

consistent estimators of the corresponding population
moments, but NOT necessarily unbiased estimators.

Application example 1
Sample covariance is a consistent estimator of population
covariance.

Application example 2
OLS estimators we have used for parameters in the CLRM can
be derived by the method of moments.
Method of moments

Method of moments (MM)

Population moments for a stochastic variable X
E(X r ): rth population moment about zero
E(X): population mean: 1st population moment about zero
E[(X − E(X))2 ]: population variance is the second moment
about the mean

Sample moments for sample observations (x1 , x2 , . . . , xn )

Pn r
x
i=1 i
n : rth sample moment about zero
Pn
xi
i=1
n = x : sample mean is the first moment about zero
Pn
(xi −x)2
i=1
n−1 : sample variance is the second sample moment
about the mean
Method of moments

LRM estimation by MM:

For MM, the usual linear model assumption (concerning

1st population moment) E[xi εi ] = 0 implies:

E[xi (yi − x0i β)] = 0,

which constitutes a population moment equation:

E xi (yi − x0i β) = E [m(β)] = 0 ,

and the corresponding sample (empirical) moment equation

can be formalized as:
n
" #
1X
xi (yi − x0i β̂) = m(β̂) = 0.
n i=1
Method of moments
For a LRM with K exogenous regressors, MM sample equations
can be cast as:
n
1 X
yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1
n
1 X
xi2 yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1

...
n
1 X
xiK yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1

Removing n1 elements from equations does not affect the solution.

This is a system of K equations with K unknown parameters βj .
The set of moment equations is equivalent to 1st order conditions for
the OLS estimator:
n
X 2
min yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK
β̂
i=1
Generalized method of moments

GMM is a very general class of estimators, includes many

other estimators as a special case (IVR, simultaneous
equations, Arellano-Bond estimator for dynamic panels).

For single equation linear models, GMM may be

conveniently described using the instrumental variable case:

For the LRM yi = x0i β + εi ,

We abandon the assumption E[xi εi ] = 0 and
we replace it by E[zi εi ] = 0.
Hence, columns of X (n×K) are potentially endogenous
and Z (n×L) is a matrix of exogenous instruments.
All exogenous columns in X are repeated in Z and each
endogenous column in X is replaced in Z by at least one
instrument (exogenous variable not present in X).
Generalized method of moments

GMM equations can be cast by analogy to the MM case:

we start by E[zi εi ] = 0, which implies a population

moment equation (matrix form):

E zi (yi − x0i β) = E [m(β)] = 0 ,

and corresponding sample (empirical) moment equation:

n
" #
1X
zi (yi − x0i β̂) = m(β̂) = 0.
n i=1
Generalized method of moments
GMM empirical equations can also be cast as:
n
1 X
yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1
n
1 X
zi2 yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1

...
n
1 X
ziL yi − β̂1 − β̂2 xi2 − · · · − β̂K xiK = 0
n
i=1

First column of Z is assumed to be a vector of ones (same as for X).

For Z = X as a special case, the above equations are identical to MM

(shown previously) and the solution is identical to the OLS estimator:
β̂ = (X 0 X)−1 X 0 y.
For Z 6= X, where Z is (n×L) and X is (n×K), three identification
possibilities have to be considered.
Generalized method of moments
Identification of GMM equations

1 Underidentified: with L < K, there are fewer moment

equations than unknown parameters (βj ). Without
additional information (parameter restrictions), there is no
solution to the system of GMM equations.

2 Exactly identified: for L = K, single solution exists:

n
" #
1X
zi (yi − x0i β̂) = m(β̂) = 0,
n i=1
can be conveniently re-written as:
1 0 1 0

m(β̂) = Zy − Z X β̂ = 0
n n
and the solution yields the familiar IV estimator:
β̂ = (Z 0 X)−1 Z 0 y.
Generalized method of moments

Identification of GMM equations (continued)

3 With L > K, there is no unique solution to the equation

system m(β̂) = 0.
One intuitive solution is the “least squares approach”:

min m(β̂)0 m(β̂)
β

Through the first order conditions, we obtain a GMM

estimator as
−1
β̂ = (X 0 Z)(Z 0 X) (X 0 Z)Z 0 y.

Generalized method of moments
GMM - consistency conditions
Convergence of the moments: Empirical (sample)
moments converge in probability to their population
counterparts. DGP meets the conditions for LLN.
p
m(β) = 1
n (Z 0 y − Z 0 Xβ) →
− 0.
Identification: For any n ≥ K and β1 6= β2 it holds that
m(β1 ) 6= m(β2 ). Three implications:
Order condition: L ≥ K. Number of moment equations
at least as large as number of parameters.
Rank condition: matrix G(β) = ∂ m(β)/∂β 0 (i.e. n1 Z 0 X)
is a L×K matrix with row rank equal to K.
Uniqueness: unique solution/optimizer exists.

Limiting Normal distribution for the sample

moments: Population moments obey central limit
theorem (CLT) or some similar variant.
Generalized method of moments
GMM - final remarks & summary
GMM-based asymptotic covariance matrix of β̂ is discussed
in Greene (Econometric analysis, ch. 13.6) for the classical,
heteroscedastic and generalized case (includes TS-based
estimation).
GMM is robust to differences in “specification” of the data
generating process (DGP). → i.e. sample mean or sample
variance estimate their population counterparts (assuming
they exist) regardless of DGP.
GMM is free from distributional assumptions. “Cost” of
this approach: if we know the specific distribution of a
DGP, GMM does not make use of such information →
inefficient estimates.
Alternative approach: method of maximum likelihood
utilizes distributional information and is more efficient
(provided this information is available & valid).
Maximum likelihood estimator

Maximum likelihood estimator (MLE)

Normal distribution & MLE

Maximum likelihood estimator
Maximum likelihood estimator – single parameter
For a stochastic variable y with a known distribution, described by a
single θ parameter:
f (y|θ) is the pdf/pmf of y, conditioned on parameter θ.
(pmf: probability mass function, discrete probability density f.)

For n iid observations, joint density of this process:

n
Y
f (y1 , y2 , . . . , yn |θ) = f (yi |θ) = L(θ|y)
i=1
is the likelihood function.
We estimate θ by maximizing L(θ|y) with respect to the
parameter (1st order conditions). Solution (MLE) often denoted
as θ̂ML .
For maximization (MLE), it is usually simpler to work with a
log-transformed likelihood function:
Xn
log L(θ|y) = log f (yi |θ).
i=1
Maximum likelihood estimator

MLE – Poisson distribution example

Consider 10 iid observations from a Poisson distribution:
y 0 = (5, 0, 1, 1, 0, 3, 2, 3, 4, 1).
e−λ λyi
The pmf: f (yi |λ) = Pr(Y = yi ) = yi ! .
P10
n yi
Y e−λ λyi e−10λ λ i=1
Likelihood function: L(λ|y) = = Q10 .
i=1
yi ! i=1 yi !
n
X n
X
logL: log L(λ|y) = −nλ + log λ yi − log(yi !),
i=1 i=1
n
∂ log L(λ|y)
X
1
1st order condition: ∂λ = −n + λ yi = 0.
i=1

From 1st order condition: λ̂ML = y n .

For our empirical example: λ̂ML = 2.
Maximum likelihood estimator

Maximum likelihood estimator – vector of parameters

θ = (θ1 , . . . , θm )0

L = L(θ1 , θ2 , ...θm |y1 , y2 , ..., yn )

We find MLEs of the m parameters by partially

differentiating the likelihood function L (often, log L is
used) with respect to each θ and then setting all the partial
derivatives obtained to zero.
Maximum likelihood estimator

LRM parameters & Normal distribution

n (yi −x0 β)2

Y 1 − i
L(θ|data) = L(β, σ 2 |yi , xi ) = √ e 2σ 2

i=1 2πσ 2
In matrix form, the log likelihood function is:
n n 1
LL(β, σ 2 |y, X) = − log(2π) − log(σ 2 ) − 2 (y − Xβ)0 (y − Xβ)
2 2 2σ

Recall that:
(y − Xβ)0 (y − Xβ) = y 0 y − 2β 0 X 0 y + β 0 X 0 Xβ
and
∂(y−Xβ)0 (y−Xβ)
∂β 0
= −2X 0 y + 2X 0 Xβ.
Maximum likelihood estimator
LRM parameters & Normal distribution (continued)
n n 1
LL(β, σ 2 |y, X) = − log(2π)− log(σ 2 )− 2 (y−Xβ)0 (y−Xβ)
2 2 2σ

1st order conditions:

0
∂LL
∂β 0 = 1
2σ 2 [2X y − 2X 0 Xβ] = 0
is solved by:
β̂ = (X 0 X)−1 X 0 y

∂LL
∂ σ2 = − 2σn2 + 1
2σ 4 [(y − Xβ)0 (y − Xβ)] = 0
is solved by:
(y−Xβ)0 (y−Xβ) u0 u SSR
σ̂ 2 = n = n = n .

Note: the MLE estimate σ̂ 2 is biased downwards in small

samples, as the unbiased estimate is equal to SSR/(n − K).
Maximum likelihood estimator

MLE assumptions (quick recap)

Parameter space: Gaps and nonconvexities in parameter

spaces would generally collide with estimation algorithms.
Identifiability: The parameter vector θ is identified
(estimable), if for two vectors, θ ∗ 6= θ and for some data
observations x, L(θ ∗ |x) 6= L(θ|x).
Well-behaved data: Laws of large numbers (LLN) apply.
Some form of CLT can be applied to the gradient (i.e. for
the estimation method).
Regularity conditions: “well behaved” derivatives of
f (yi |θ) with respect to θ (see Greene, chapter 14.4.1).
Maximum likelihood estimator

MLE properties (if assumptions are met)

Consistency: plim(θ̂) = θ0 (θ0 is the true parameter)

Asymptotic normality of θ̂

Asymptotic efficiency: θ̂ is asymptotically efficient and

achieves the Cramér-Rao lower bound for consistent
estimators (see Greene, chapter 14.4.5)

Invariance: MLE of γ0 = c(θ0 ) is c(θ̂) if c(θ0 )

is a continuous and countinuously differentiable function.
(empirical advantages: we can use reparameterization in MLE,
e.g. γj = 1/θj or θ2 = 1/σ 2 ).
Maximum likelihood estimator

MLE properties (Normal distribution, var(θ̂) ):

Under the above assumptions, asymptotic variance-covariance matrix

of θ̂ is the inverse of the Information matrix:

2 0 −1
σ̂ (X X) 0
var(θ̂) = I[θ̂]−1 = 2σ̂ 4 ,
0 n

where I[θ̂] = −[H(θ̂)]. MLE gives the familiar formula for

var(β̂) = σ̂ 2 (X 0 X)−1 , and a simple expression for the variance of σ̂ 2 .
The two zero vectors come from LRM exogeneity assumptions.
(see Greene, 7th ed., ch. 14.4)
The square roots of diagonal elements of I[θ̂]−1 give estimates of
the standard errors of parameter estimates.
We can construct simple z-scores to test the null hypothesis
concerning any individual parameter, just as in OLS, but using
the normal instead of the t-distribution.
Maximum likelihood estimator

MLE - inference, three classic tests:

Consider MLE of parameter θ and a test of the hypothesis
H0 : h(θ) = 0. Recall that ML parameter estimates are
asymptotically normally distributed.

1 Likelihood ratio test: If the restriction h(θ) = 0 is valid,

then imposing it should not lead to a large reduction in the
log-likelihood function.

LR = 2(LLU − LLR ) ∼ χ2 (r),

where LLU is the LL of unconstrained model, LLR denotes

restricted model and r is the number of restrictions
imposed. To do this test you have to estimate two models
(one nested) and get the results of both.
Maximum likelihood estimator

MLE - inference, three classic tests:

We have an unrestricted ML estimate θ̂ = (θ̂1 , . . . , θ̂m )0 ,

and test of the hypothesis H0 : h(θ) = q,
where q is a (r × 1) vector function of θ (linear/non-linear
restrictions, continuous partial derivatives assumed).

2 Wald test: If restriction h(θ) = q is valid, then h(θ̂) − q

should be close to zero since MLE is consistent.

h i−1
W = [h(θ̂) − q]0 Asy.Var[h(θ̂) − q] [h(θ̂) − q] ∼ χ2 (r),
H0

where the estimated 0

∂h(θ̂) ∂h(θ̂)
Asy.Var[h(θ̂) − q] = 0
Est.Asy.Var(θ̂) 0
.
∂ θ̂ ∂ θ̂
Maximum likelihood estimator
MLE - inference, three classic tests:

We have a ML estimate θ̂R – i.e. ML estimation of the

restricted model, under H0 : h(θ) = 0,

3 Lagrange multiplier test: If the restriction is valid, then

the restricted estimator should be near the point that
maximizes the log-likelihood. Therefore, the slope of the
log-likelihood function should be near zero at the restricted
estimator. The test is based on the slope of the
log-likelihood at the point where the function is maximized
subject to the restriction.

!0 !
∂ log L(θ̂R ) ∂ log L(θ̂R )
LM = I[θ̂R ]−1 ∼ χ2 (r),
∂ θ̂R ∂ θ̂R H0

where −I[θ̂R ] = ∂ 2 LL(θ)/∂θ 0 ∂θ evaluated at θ = θ̂R .

Maximum likelihood estimator

MLE - inference, three classic tests:

The χ2 distributions of the three test statistics are

asymptotically valid.

The three tests are asymptotically equivalent, but may

differ in small samples:

W ≥ LR ≥ LM.

Hence, in finite samples, LR rejects H0 less often than W

but more often than LM.

The above tests are discussed in ML context, i.e. with a

known distribution of the variable/error term
(ML parameter estimates are asymptotically normally
distributed).
Maximum likelihood estimator

MLE – summary

MLE is only possible if we know the form of the probability

distribution function for the population (Normal, Poisson,
Negative Binomial, etc.).

MLE has the large sample properties of consistency and

asymptotic efficiency. There is no guarantee of desirable
small-sample properties.

Under CLRM assumptions (A1 – A6), ML estimator is

identical to OLS estimator (for β̂).
Non-linear extensions to LRM, quantile regression

Non-linear regression models

Quantile regression
Non-linear regression models

Nonlinear regression model:

yi = h(xi , β) + εi

Linear model is a special case of the nonlinear model.

yi = h(xi , β) + εi = x0i β + εi .
Linear models: linear in parameters. Definition includes
non-linear regressors such as x2i , etc.
Many nonlinear models can be transformed into linear
models (log-transformation)

For nonlinear models that cannot be transformed into

LRM, nonlinear LS (NLS) are available.

∂h(xi , β)/∂x is no longer equal to β

(interpretation based on estimated model . . . )
Nonlinear regression
Assumptions relevant to the nonlinear regression model
1 Functional form: The conditional mean function for yi ,
given xi is:

E[yi |xi ] = h(xi , β) , i = 1, 2, . . . , n

2 Identifiability of model parameters: The parameter

vector in the model is identified (estimable) if there is no
nonzero parameter β0 6= β such that h(xi , β0 ) = h(xi , β)
for all xi .

3 Zero mean of the disturbance: For yi = h(xi , β) + εi ,

we assume

E[εi |h(xi , β)] = 0 , i = 1, 2, . . . , n

i.e. disturbance at observation i is uncorrelated with the

conditional mean function.
Nonlinear regression

Assumptions relevant to the nonlinear regression model

4 Homoscedasticity and non-autocorrelation:

conditional homoscedasticity:

E[ε2i |h(xi , β)] = σ 2 , i = 1, 2, . . . , n

non-autocorrelation:

E[εt εs |h(xt , β), h(xs , β)] = 0, for all t 6= s

Nonlinear regression

Assumptions relevant to the nonlinear regression model

5 Data generating process: DGP for xi is assumed to be

a well-behaved population such that first and second
sample moments of the data can be assumed to converge to
fixed, finite population counterparts. The crucial
assumption is that the process generating xi is strictly
exogenous to that generating εi
6 Underlying probability model There is a well-defined
probability distribution generating εi . At this point, we
assume only that this process produces a sample of
uncorrelated, identically (marginally) distributed random
variables εi with mean zero and variance σ 2 conditioned on
h(xi , β). Hence, our statement of the model is
semi-parametric (i.e. specific distributional assumption
on residuals are replaced by weaker assumptions).
Nonlinear Regression: NLS

NLS: estimator of the nonlinear regression model

[yi − h(xi , β)]2

P
NLS: min: S(β) =

Using the standard procedure, we can get k first order

conditions for the minimization:
n
∂S(β) X ∂h(xi , β)
= 2· [yi − h(xi , β)] = 0
∂β i=1
∂β

The above first order conditions are also moment conditions

and this defines the NLS estimator as a GMM estimator.
Nonlinear regression: NLS

NLS: estimator of the nonlinear regression model

NLS being a GMM estimator allows us to deduce that the

NLS estimator has good large sample properties:
consistency and asymptotic normality (if assumptions are
fulfilled).

Hypothesis testing: The principal testing procedure is the

Wald test, which relies on the consistency and asymptotic
normality of the estimator. Likelihood ratio and LM tests
can also be constructed.
Nonlinear regression: computing NLS estimates

For nonlinear models, a closed-form solution (NLS estimator)

usually does not exist.

Most of the nonlinear maximization problems are solved by

an iterative algorithm.
The most commonly used of iterative algorithms are
gradient methods.
The template for most gradient methods in common use is
the Newton’s method.
Look at your software packages which methods are
available for computing NLS estimates.
Nonlinear regression: examples

LRM on TS with autocorrelation:

yt = x0t β + ut , ut = ρut−1 + εt ,
yt = x0t β + ρut−1 + εt note: ut−1 = yt−1 − x0t−1 β,
hence:
yt = ρyt−1 + x0t β + ρ(x0t−1 β) + εt ,
which is non-linear in parameters (ρβ).

Non-linear consumption function example:

consi = β1 + β2 incβi 3 + εi
special case: model is linear for β3 = 1
(such assumption can be tested).
Nonlinear regression: examples

Examples 7.4 & 7.8 (Greene):

Analysis of a Nonlinear Consumption Function
OLS version: for β3 = 1.

Depednent Variable: REALCONS

Method: Least Squares (Marquard - EViews legacy)
Date: 09/19/16 Time 16:31
Sample 1950Q1 2000Q4
Included observations: 204
REALCONS=C(1)+C(2)*REALDPI

Coeficient Std.Error t-Statistic Prob.

C(1) -80.35475 14.30585 -5.616915 0.0000

C(2) 0.921686 0.003872 238.0540 0.0000

R-squared 0.996448 Mean dependent var 2999.436

Adjusted R-squared 0.996431 S.D. dependent var 1459.707
S.E. of regression 87.20983 Akaike info criterion 11.78427
Sum squared resid 1536322 Schwarz criterion 11.81680
Log likelihood -1199.995 Hannan-Quinn criter. 11.79743
F-statistics 56669.72 Durbin-Watson stat 0.092048
Prob(F-statistics) 0.000000
Nonlinear regression: examples

Examples 7.4 & 7.8 (Greene):

Analysis of a Nonlinear Consumption Function
NLS with starting values equal to 0

Depednent Variable: REALCONS

Method: Least Squares (Marquard - EViews legacy)
Sample 1950Q1 2000Q4 Included observations: 204
Convergence achieved after 200 iterations
REALCONS=C(1)+C(2)*REALDPI^C(3)

Coeficient Std.Error t-Statistic Prob.

C(1) 458.7991 22.50140 20.38980 0.0000

C(2) 0.100852 0.010910 9.243667 0.0000
C(3) 1.244827 0.012055 103.2632 0.0000

R-squared 0.998834 Mean dependent var 2999.436

Adjusted R-squared 0.998822 S.D. dependent var 1459.707
S.E. of regression 50.09460 Akaike info criterion 10.68030
Sum squared resid 504403.2 Schwarz criterion 10.72910
Log likelihood -1086.391 Hannan-Quinn criter. 10.70004
F-statistics 86081.29 Durbin-Watson stat 0.295995
Prob(F-statistics) 0.000000
Nonlinear regression: examples

Examples 7.4 & 7.8 (Greene):

Analysis of a Nonlinear Consumption Function
NLS with starting values equal to the parameters from the OLS
estimation (c(3) equal to 1)

Depednent Variable: REALCONS

Method: Least Squares (Marquard - EViews legacy)
Sample 1950Q1 2000Q4 Included observations: 204
Convergence achieved after 80 iterations
REALCONS=C(1)+C(2)*REALDPI^C(3)

Coeficient Std.Error t-Statistic Prob.

C(1) 458.7989 22.50149 20.38971 0.0000

C(2) 0.100852 0.010911 9.243447 0.0000
C(3) 1.244827 0.012055 103.2632 0.0000

R-squared 0.998834 Mean dependent var 2999.436

Adjusted R-squared 0.998822 S.D. dependent var 1459.707
S.E. of regression 50.09460 Akaike info criterion 10.68030
Sum squared resid 504403.2 Schwarz criterion 10.72910
Log likelihood -1086.391 Hannan-Quinn criter. 10.70004
F-statistics 86081.28 Durbin-Watson stat 0.295995
Prob(F-statistics) 0.000000
Quantile regression - LAD

Quantile regression estimates the relationship between

regressors and a specified quantile of dependent variable.
LAD estimator is the QREG for q = 12 (median) and the
loss function can be described as (compare to OLS
objective function):
n
|yi − x0i β̂q |
X
min: Qn (β̂q ) =
β̂q i=1

LAD estimator predates OLS (itself older than 200 years).

Until recently, QREG and LAD have seen little use in
econometrics, as OLS is vastly easier to compute.
Different software packages use a variety of optimization
algorithms for QREG/LAD estimation.
Linear programming can be used for finding QREG
estimates (Koenkerr and Bassett (around 1980).
Quantile regression - LAD

OLS vs LAD estimator

Objective (loss) function of the estimators:

N N
X X
0
min : ρ(ei ) = ρ(yi − xi β̂)
10
ρ(e_i)

i=1 i=1

For OLS, ρ(·) is the L2 norm: e2

i.
For LAD, ρ(·) is the L1 norm: |ei |.
5

LAD estimation gives much less weight

to large deviations.
L1
L2
0

−4 −2 0 2 4

e_i
Quantile regression - example and motivation

OLS / LAD / QREG coefficient interpretation example:

(1) wagei = β1 + ui
(2) wagei = β1 + β2 femalei + ui
(3) wagei = β1 + β2 femalei + β3 experi + ui

The above equations are estimated by OLS / LAD / QREG:

Coefficient OLS LAD (q = 1 2
) QREG (q = 3 4
)
(1) β1 β̂1 = y β̂1 = ỹ β̂1 = Q3
sample mean sample median sample 3rd quartile
(2) β1 , β1 +β2 conditional sample mean cond. sample median conditional sample Q3
wage: male / female wage: male / female wage: male / female
(3) β3 change in expected mean change in exp. median change in expected Q3
wage for ∆exper = 1 wage for ∆exper = 1 wage for ∆exper = 1
Quantile regression (QREG)

For LRMs, the q-th quantile QREG estimator βq minimizes:

n n
q|yi − x0i β̂q | + (1 − q)|yi − x0i β̂q |,
X X
min: Qn (β̂q ) =
β̂q i: ei ≥ 0 i: ei < 0

where ei = (yi − x0i β̂q ).

We use the notation β̂q to make clear that different choices
of q lead to different β̂.
Slope of the loss function Qn is asymmetrical
(around ei = 0).
The loss function is not differentiable (at ei = 0)
→ gradient methods are not applicable
(linear programming can be used).
Quantile regression - LAD

QREG: different quantiles

1.5

values of τ
.25
.5
.9

For q = 0.5, positive and

1.0

negative errors are treated

symmetrically.
ρτ(e_i)

For q 6= 0.5, positive and

negative errors (of the same
magnitude ei ) have different
weights (penalties), which is
0.5

reflected in quantile-specific
estimates β̂q .
0.0

−2 −1 0 1 2

e_i
Quantile regression (QREG)

Quantile regression: used to describe relationship between

regressors and a specified quantile of dependent variable.

The (linear) quantile model can be defined as

Q[y|x, q] = x0 βq , such that Prob[y ≤ x0 βq |x] = q, 0 < q < 1
where q denotes the q-th quantile of y.

One important special case of quantile regression is the

least absolute deviations (LAD) estimator, which
corresponds to fitting the conditional median of the
response variable (q = 12 ).

QREG (LAD) estimator can be motivated as a robust

alternative to OLS (with respect to outliers).
Quantile regression example

Example 7.10 (Greene):

Income Elasticity of Credit Cards Expenditure
OLS & LAD & Income elasticity at different deciles

Depednent Variable: LOGSPEND

Method: Least Squares
Date: 09/15/16 Time 13:53
Sample (adjusted): 3 13443
Included observations: 10499 after adjustments

Variable Coeficient Std.Error t-Statistic Prob.

C -3.055807 0.239699 -12.74852 0.0000

LOGINC 1.083438 0.032118 33.73296 0.0000
AGE -0.017364 0.001348 -12.88069 0.0000
ADEPCNT -0.044610 0.010921 -4.084857 0.0000

R-squared 0.100572 Mean dependent var 4.728778

Adjusted R-squared 0.100315 S.D. dependent var 1.404820
S.E. of regression 1.332496 Akaike info criterion 3.412366
Sum squared resid 18634.35 Schwarz criterion 3.415131
Log likelihood -17909.21 Hannah-Quinn criter. 3.413300
F-statistic 391.1750 Durbin-Watson stat 1.888912
Prob(F-statistic) 0.000000
Quantile regression example 2

Example 7.10 (Greene):

Income Elasticity of Credit Cards Expenditure (LAD)
Depednent Variable: LOGSPEND Method: Quantile Regression (Median)
Sample (adjusted): 3 13443 Included observations: 10499 after adjustments
Huber Sandwich Standard Errors & Covariance
Sparsity method: Kemel (Epanechnikov) using residuals
Bandwidth method: Hall-Sheather, bw=0.04437
Estimation successfully identifies unique optimal solution

Variable Coeficient Std.Error t-Statistic Prob.

C -2.803756 0.233534 -12.00577 0.0000

LOGINC 1.074928 0.030923 34.76139 0.0000
AGE -0.016988 0.001530 -11.10597 0.0000
ADEPCNT -0.049955 0.011055 -4.518599 0.0000

Pseudo R-squared 0.058243 Mean dependent var 4.728778

Adjusted R-squared 0.057974 S.D. dependent var 1.404820
S.E. of regression 1.346476 Objective 5096.818
Quantile dependent va... 4.941583 Restr. objective 5412.032
Sparsity 2.659971 Quasi-LR statistic 948.0224
Prob(Quasi-LR stat) 0.000000
Quantile regression example 2

Example 7.10 (Greene):

Income Elasticity of Credit Cards Expenditure
(Intercept) log(INCOME)
0

1.4
−2

1.2
−4

1.0
−6

0.8
−8

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

AGE ADEPCNT

0.00
−0.02
−0.015

−0.04
−0.025

−0.06
−0.08
−0.035

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

Predictions from a model

Predictions from a CLRM (repetition from BSc courses)

Predictions: general features, kFCV, Variance vs. Bias

Predictions - basics

CLRM and its estimate (for i = 1, . . . , n):

yi = β1 + β2 xi2 + β3 xi3 + · · · + βK xiK + εi

ŷi = β̂1 + β̂2 xi2 + β̂3 xi3 + · · · + β̂K xiK

Prediction of expected value (x0p can be out of sample):

ŷp = E(y|xp1 = 1, xp2 = c2 , . . . , xpK = cK )

ŷp = β̂1 + β̂2 c2 + β̂3 c3 + · · · + β̂K cK

Rough (underestimated) confidence interval for the

expected value prediction: (95%): ŷp ± 2 × s.e.(ŷp ).
(Rule of thumb)
Predictions - basics

s.e.(ŷp ) can be obtained by reparametrization:

Reparametrized CLRM:

y ∗ = β1∗ + β2∗ (x2 − c2 ) + β3∗ (x3 − c3 ) + · · · + ε

The following holds:

ŷp = βˆ1∗
s.e.(ŷp ) = s.e.(βˆ1∗ ), i.e.
var(ŷp ) = var(β̂1∗ )
Predictions - basics

Predicted and actual values of yp :

ŷp = β̂1 + β̂2 c2 + β̂3 c3 + · · · + β̂K cK

yp = β1 + β2 c2 + β3 c3 + · · · + βK cK + εp

Prediction error

êp = yp − ŷp = (β1 + β2 c2 + β3 c3 + · · · + βK cK ) + εp − ŷp

Prediction error variance

var(êp ) = var(εp ) + var(ŷp )

because var(β1 + β2 c2 + β3 c3 + · · · + βK cK ) = 0
Predictions - basics

In CLRM, as homoscedasticity holds, σ 2 = var(εp ):

var(êp ) = σ 2 + var(ŷp )
We estimate σ 2 from the original CLRM as (SSR/(n − K))
We get var(ŷp ) from the reparametrized LRM

Standard prediction error:

p
s.e.(êp ) = var(êp )

Prediction interval (95%)

ŷp ± t0.025 × s.e.(êp )
Predictions - basics

Prediction with logarithmic dependent variable

log(y) = β1 + β2 x2 + · · · + βK xK + ε
\ = β̂1 + β̂2 x2 + · · · + β̂K xK
log(y)

\
\ = elog(y)
ŷ = exp(log(y)) systematically underestimates ŷ ,
\
b 0 elog(y)
we can use a correction: ŷ = α
Pn
b 0 = n−1
where α i=1 exp(ε̂i )

is a consistent (not unbiased) estimator of exp (ε).

Predictions - basics (Matrix form)

Prediction based on estimated model:

ŷp = x0p β̂

Difference between actual yp and prediction value:

êp = yp − ŷp = (x0p β + εp ) − x0p β̂ = x0p (β − β̂) + εp

If β̂ is unbiased estimator for β,

ŷp is an unbiased estimator for yp value:

E(êp ) = E(yp − ŷp ) = x0p E(β − β̂) + E(εp ) = 0

and the variance of êp can be expressed as:

E(ê2p ) = var(êp ) = x0p var(β̂)xp + var(εp )

Predictions - basics (Matrix form)

Variance of êp (continued):

var(êp ) = x0p var(β̂)xp + var(εp )

h −1 i
= x0p σ 2 X 0X xp + var(εp )
substitute σ 2 = var(εp ), with σ̂ 2 (homoscedasticity)
h −1 i
= x0p σ̂ 2 X 0X xp +σ̂ 2
| {z }
σ̂p2

With growing sample size (asymptotically),

var(êp ) = σ̂p2 + σ̂ 2 converges to σ̂ 2
. . . because plim β̂ = β ↔ plim σ̂p2 = 0
(Note: recall consistency of the OLS estimator under A1–A5
conditions & for the CLRM model - i.e. under A1–A6.)
Predictions - basics (Matrix form)

Variance of êp (continued):

h −1 i
var(êp ) = x0p σ̂ 2 X 0X xp + σ̂ 2 .

Individual prediction error: after re-arranging, the

above expression can be cast for s.e.(êp ) as
q
s.e.(êp ) = σ̂ · 1 + x0p (X 0X)−1 xp ,

Mean prediction error: calculated considering σ̂p2 only

q
s.e.(eep ) = σ̂ · x0p (X 0X)−1 xp .
Predictions - basics

Prediction intervals:
Individual vs. mean value prediction intervals:

Individual prediction: yp ∈ ŷp ± t∗α/2 × s.e.(êp )

Mean value: yp ∈ ŷp ± t∗α/2 × s.e.(eep )

Predictions – general discussion:

Reliability of predictions:

we work with estimated parameters

(if we generalize from the CLRM paradigm, finite/small
sample properties of estimators may be difficult to
describe),
model parameters can change in time
(discussed separately in next Block – see Chow tests),
predictions include “individual” random errors.

Impacts of random errors on predictions of individual

values are usually much bigger than the impacts of
variance in estimated parameters.
Mean Squared Error of prediction

We can generalize the previous discussion on predictions by

considering both biased and unbiased predictors and by
allowing for different functional forms and complexity levels in
predictive models.

Predictions may be compared/evaluated using:

2
MSE = E yi − fˆ(xi )

where fˆ(xi ) is the prediction that fˆ generates for the i-th

regressor set. Here, fˆ represents a general class of
predictors (linear, non-linear, non-parametric, etc.) and it
may produce either biased or unbiased predictions
Variance vs. Bias trade-off

Example for a “sine-like” function: y = f (x) + u

Train sample & Test sample

Suppose we fit a model fˆ(x) to some training data

Tr = {yi , xi }n1 and we wish to see how well it performs.
We could compute MSE over Tr:
1 Xh i2
MSETr = yi − fˆ(xi )
n i∈Tr

When searching for the “best” model by minimizing MSE, the

above statistic would lead to over-fit models.

Instead, we should (if possible) compute the MSE using

fresh test data Te = {yi , xi }m
1 :

1 Xh i2
MSETe = yi − fˆ(xi )
m i∈Te
Variance vs. Bias trade-off

Suppose we have a model fˆ(x), fitted to some training data Tr

and let {y0 , x0 } be a test observation drawn from the
population. If the true model is yi = f (xi ) + εi ,
with f (xi ) = E(yi |xi ), then the expected test MSE can be
decomposed into:
E(MSE0 ) = var(fˆ(x0 )) + [Bias (fˆ(x0 ))]2 + var(ε0 ),
where
Bias (fˆ(x0 )) = E[fˆ(x0 )] − f (x0 ),
ε0 is the irreducible error: E(MSE0 ) ≥ ε0 ,
all three RHS elements are non-negative,
The above equation refers to the average test MSE that we
would obtain if we repeatedly estimated f (x) using a large
number of training sets and then tested each fˆ(x) at x0 .
Variance vs. Bias trade-off

E(MSE0 ) = var(fˆ(x0 )) + [Bias (fˆ(x0 ))]2 + var(ε0 ),

This is an illustration, var(ε0 ) not shown explicitly.

(lies at the /asymptotic/ minima of Variance and Bias2 )
k-Fold Cross Validation

Training error (MSETr ) can be calculated easily.

However, MSETr is not a good approximation for the
MSETe (out-of sample predictive properties of the model).
Usually, MSETr dramatically underestimates MSETe .

Cross-validation is based on re-sampling (similar to bootstrap).

Repeatedly fit a model of interest to samples formed from the
training set & make “test sample” predictions, in order to
obtain additional information about predictive properties of the
model.
k-Fold Cross Validation

In k-Fold Cross-Validation (kFCV), the original sample is

randomly partitioned into k roughly equal subsamples
(divisibility).
One of the k subsamples is retained as the test sample, and
the remaining (k − 1) subsamples are used as training data.

The cross-validation process is then repeated k times

(the k folds), with each of the k subsamples used exactly
once as the test sample.
The k results from the folds can then be averaged to
produce a single estimation.
k = 5 or k = 10 is commonly used.
k-Fold Cross Validation

kFCV example for CS data & k = 5:

(random sampling, no replacement)

In TS, a similar “Walk forward” test procedure may be applied.

k-Fold Cross Validation

k
X
1
CV(k) = k MSEs ,
s=1

where CV(k) is the cross-validated estimate of MSE,

k is the number of folds used (e.g. 5 or 10),
1
− ybi )2
P
MSEs = ms i∈Cs (yi
ms is the number of observations in the s-th test sample
Cs refers to the s-th set of test sample observations.

As we evaluate predictions from two or more models,

we look for the lowest CV(k) .

Reservoir Types. Classification Methodology
100% (1)
Reservoir Types. Classification Methodology
2 pages
Estimation Theory MCQ
86% (7)
Estimation Theory MCQ
8 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
SR 728 C 1
No ratings yet
SR 728 C 1
36 pages
Block 1
No ratings yet
Block 1
81 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Robert Engle Dan McFadden Handbook of Econometrics PDF
No ratings yet
Robert Engle Dan McFadden Handbook of Econometrics PDF
1,024 pages
RegEstimationLS ML StatColumbia
No ratings yet
RegEstimationLS ML StatColumbia
44 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Statinf Estimation
No ratings yet
Statinf Estimation
110 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
6 Point Estimation
No ratings yet
6 Point Estimation
49 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
Chapter 36 Large Sample Estimation and Hypothesis Testing
No ratings yet
Chapter 36 Large Sample Estimation and Hypothesis Testing
135 pages
(Newey and Mcfadden) Large Sample Estimation and Hypothesis Testing
No ratings yet
(Newey and Mcfadden) Large Sample Estimation and Hypothesis Testing
135 pages
Estimators: The Basic Statistical Model
No ratings yet
Estimators: The Basic Statistical Model
9 pages
A Modern Gauss-Markov Theorem
No ratings yet
A Modern Gauss-Markov Theorem
18 pages
SLRM Note
No ratings yet
SLRM Note
15 pages
Properties of Estimators New Update Spin
No ratings yet
Properties of Estimators New Update Spin
43 pages
Class Notes in Statistics and Econometrics
No ratings yet
Class Notes in Statistics and Econometrics
1,644 pages
Ecmet
No ratings yet
Ecmet
1,644 pages
Estimação Pontual
No ratings yet
Estimação Pontual
58 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
A Short Introduction To The Generalized Method of Moments Estimation
No ratings yet
A Short Introduction To The Generalized Method of Moments Estimation
5 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
Asymptotic Theory For OLS
No ratings yet
Asymptotic Theory For OLS
15 pages
Statistical Methods
No ratings yet
Statistical Methods
25 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
No ratings yet
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
15 pages
Lecture 5
No ratings yet
Lecture 5
39 pages
Econ-2042 - Unit 6-W12-13
No ratings yet
Econ-2042 - Unit 6-W12-13
77 pages
Lecture 21
No ratings yet
Lecture 21
4 pages
Estimation Bertinoro09 Cristiano Porciani 1
No ratings yet
Estimation Bertinoro09 Cristiano Porciani 1
42 pages
Unit - III
No ratings yet
Unit - III
4 pages
Point Estimation: Institute of Technology of Cambodia
No ratings yet
Point Estimation: Institute of Technology of Cambodia
22 pages
Advanced Econometrics: Masters Class
No ratings yet
Advanced Econometrics: Masters Class
38 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Chap 10
No ratings yet
Chap 10
7 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Econometrica - 2022 - Hansen - A Modern Gauss Markov Theorem
No ratings yet
Econometrica - 2022 - Hansen - A Modern Gauss Markov Theorem
12 pages
6.chapter 4
No ratings yet
6.chapter 4
9 pages
ST Topic 4
No ratings yet
ST Topic 4
110 pages
ECN 318 Lecture Notes Weeks 3-4
No ratings yet
ECN 318 Lecture Notes Weeks 3-4
25 pages
Psp-Unit-6 Estimation Theory PDF
No ratings yet
Psp-Unit-6 Estimation Theory PDF
38 pages
Chapter 7 - PROPERTIES OF THE LEAST SQUARES ESTIMATES
No ratings yet
Chapter 7 - PROPERTIES OF THE LEAST SQUARES ESTIMATES
9 pages
Lect 6
No ratings yet
Lect 6
20 pages
Module 4: Point Estimation: Statistics (OA3102)
No ratings yet
Module 4: Point Estimation: Statistics (OA3102)
41 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
Homework Excercises 3 (MC Part)
No ratings yet
Homework Excercises 3 (MC Part)
1 page
Debre Berhan University: College of Natural and Computational Science Department of Statistics
No ratings yet
Debre Berhan University: College of Natural and Computational Science Department of Statistics
9 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
PDF Estimation Corr
No ratings yet
PDF Estimation Corr
43 pages
Estimation Methods and Their Properties
100% (1)
Estimation Methods and Their Properties
46 pages
Lecture II - Docx - 12
No ratings yet
Lecture II - Docx - 12
12 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
2 pages
Wa0004.
No ratings yet
Wa0004.
239 pages
Explosive Detection Systems For Cabin Baggage Edscb Excel Format
No ratings yet
Explosive Detection Systems For Cabin Baggage Edscb Excel Format
3 pages
MTH302-lec-02 Worksheet
No ratings yet
MTH302-lec-02 Worksheet
6 pages
Engine Table of Contents
No ratings yet
Engine Table of Contents
248 pages
Q4G8W2
No ratings yet
Q4G8W2
7 pages
Design of PV
No ratings yet
Design of PV
6 pages
Dragonpay API
No ratings yet
Dragonpay API
31 pages
SOPFalloutDataWorkaroundv1 2
No ratings yet
SOPFalloutDataWorkaroundv1 2
457 pages
MC 3487
No ratings yet
MC 3487
6 pages
Unit 5 File Management PDF
No ratings yet
Unit 5 File Management PDF
40 pages
Numerical Simulation of Silicon Heterojunction Solar Cells Featuring Metal Oxides As Carrier-Selective Contacts
No ratings yet
Numerical Simulation of Silicon Heterojunction Solar Cells Featuring Metal Oxides As Carrier-Selective Contacts
9 pages
A Novel Dataset of Guava Fruit For Grading and Classification
No ratings yet
A Novel Dataset of Guava Fruit For Grading and Classification
5 pages
MA507 Syllabus
No ratings yet
MA507 Syllabus
2 pages
Pump Minimum Continuous Stable Flow (MCSF)
No ratings yet
Pump Minimum Continuous Stable Flow (MCSF)
6 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
24 pages
C4 Sieve Analysis Aggregate Grading For Fine and Coarse Aggregate
No ratings yet
C4 Sieve Analysis Aggregate Grading For Fine and Coarse Aggregate
6 pages
Topic: The Determinants of Profitability of Bangladeshi Commercial Banks
No ratings yet
Topic: The Determinants of Profitability of Bangladeshi Commercial Banks
50 pages
Unit-III Final Java Servlets and XML Notes
No ratings yet
Unit-III Final Java Servlets and XML Notes
64 pages
Experimental Study On Self Compacting Concrete With Various Percentage of Steel Fibres
No ratings yet
Experimental Study On Self Compacting Concrete With Various Percentage of Steel Fibres
4 pages
Materi SMA Bahasa Inggris
No ratings yet
Materi SMA Bahasa Inggris
21 pages
ANSUMAN SHARMA 109ee0305 Department of ELECTRICAL Engineering, NIT Rourkela
100% (1)
ANSUMAN SHARMA 109ee0305 Department of ELECTRICAL Engineering, NIT Rourkela
1 page
Handbook Rheometer
No ratings yet
Handbook Rheometer
328 pages
Mathematics 9 Curriculum Mapdocx
No ratings yet
Mathematics 9 Curriculum Mapdocx
12 pages
Numerical Methods L3 Ok
No ratings yet
Numerical Methods L3 Ok
28 pages
Forging Presentation
No ratings yet
Forging Presentation
17 pages
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
No ratings yet
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
212 pages
THEORY OF COST Micro 6
No ratings yet
THEORY OF COST Micro 6
12 pages

Block 1

Uploaded by

Block 1

Uploaded by

Block 1

Repetition from BSc courses

Advanced econometrics 1 4EK608

Vysoká škola ekonomická v Praze

1 Estimation methods, predictions from a model

2 Non-linear extensions to LRM, quantile regression

3 Predictions from a regression model

A1 Linearity: yi = β1 + β2 xi2 + · · · + βK xiK + εi

Estimators and estimation methods:

LRM is not the only type of regression model.

OLS is not the only useful estimator.

Let’s approach estimators and their properties more

(again, notation follows Greene, Econometric analysis.)

Efficiency: an estimator is efficient if it is unbiased and no

Normality, asymptotic normality: basis for most

Extremum estimator: obtained as the optimizer of some

where h(·) is a function (linear/non-linear → OLS/NLS),

Assumptions for asymptotic properties of extremum estimators:

1 Parameter space: must be convex and the parameter

2 Criterion function: must be concave in the parameters

3 Identifiability of parameters: has a relatively complex

LS: for a given set of any two different parameter vectors θ

Assumptions for asymptotic properties of extremum estimators:

4 Behavior of the data: Grenander conditions for

Quick convergence recap (terminology):

Convergence in probability: a sequence of random variables

lim P (|Xn − X| ≥ ) = 0, ∀  > 0.

Convergence in distribution: a weaker type of convergence.

lim FXn (x) = FX (x), FX (x) continuous.

Theorem: Consistency of M estimators

Theorem: Asymptotic normality of M estimators

(b) q(θ|data) is concave and twice continuously differentiable in θ in

(e) the matrix of elements H(θ) is nonsingular at θ0 ,

where Φ is a variance-covariance matrix,

Method of moments (MM)

Generalized method of moments (GMM)

With the method of moments, we simply estimate

Under very general conditions, sample moments are

Method of moments (MM)

Sample moments for sample observations (x1 , x2 , . . . , xn )

LRM estimation by MM:

For MM, the usual linear model assumption (concerning

E[xi (yi − x0i β)] = 0,

which constitutes a population moment equation:

E xi (yi − x0i β) = E [m(β)] = 0 ,

and the corresponding sample (empirical) moment equation

Removing n1 elements from equations does not affect the solution.

GMM is a very general class of estimators, includes many

For single equation linear models, GMM may be

For the LRM yi = x0i β + εi ,

GMM equations can be cast by analogy to the MM case:

we start by E[zi εi ] = 0, which implies a population

E zi (yi − x0i β) = E [m(β)] = 0 ,

and corresponding sample (empirical) moment equation:

First column of Z is assumed to be a vector of ones (same as for X).

For Z = X as a special case, the above equations are identical to MM

1 Underidentified: with L < K, there are fewer moment

2 Exactly identified: for L = K, single solution exists:

Identification of GMM equations (continued)

3 With L > K, there is no unique solution to the equation

Through the first order conditions, we obtain a GMM

Limiting Normal distribution for the sample

Maximum likelihood estimator (MLE)

Normal distribution & MLE

For n iid observations, joint density of this process:

MLE – Poisson distribution example

From 1st order condition: λ̂ML = y n .

Maximum likelihood estimator – vector of parameters

L = L(θ1 , θ2 , ...θm |y1 , y2 , ..., yn )

We find MLEs of the m parameters by partially

LRM parameters & Normal distribution

n (yi −x0 β)2

1st order conditions:

Note: the MLE estimate σ̂ 2 is biased downwards in small

MLE assumptions (quick recap)

Parameter space: Gaps and nonconvexities in parameter

lim P (|Xn − X| ≥ ) = 0, ∀ > 0.

where the estimated 0