0% found this document useful (0 votes)

156 views33 pages

Nonlinear Least Squares Theory - Lecture Notes

The document provides an overview of nonlinear least squares theory. It discusses nonlinear model specifications, the nonlinear least squares estimator, asymptotic properties of the estimator, and numerical optimization algorithms used to compute the estimator such as gradient descent. Nonlinear time series models, production functions, and artificial neural networks are provided as examples of nonlinear specifications.

Uploaded by

Anonymous tsTtieMHD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views33 pages

Nonlinear Least Squares Theory - Lecture Notes

Uploaded by

Anonymous tsTtieMHD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Nonlinear Least Squares Theory

CHUNG-MING KUAN
Department of Finance & CRETA

March 9, 2010

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 1 / 33
Lecture Outline

1 Nonlinear Specifications

2 The NLS Method

The NLS Estimator
Nonlinear Optimization Algorithms

3 Asymptotic Properties of the NLS Estimator

Digression: Uniform Law of Large Numbers
Consistency
Asymptotic Normality
Wald Tests

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 2 / 33
Nonlinear Specifications

Given the dependent variable y , consider the nonlinear specification:

y = f (x; β) + e(β),

where x is ` × 1, β is k × 1, and f is a given function. There are many

choices of f . A flexible model is to transform one (or several) x by the
Box-Cox transform of x:
xγ − 1
,
γ

which yields x − 1 when γ = 1, 1 − 1/x when γ = −1, and a value close

to ln x when γ → 0.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 3 / 33
The CES (constant elasticity of substitution) production function:
−λ/γ
y = α δL−γ + (1 − δ)K −γ

,

where α > 0, 0 < δ < 1 and γ ≥ −1, which yields:

λ −γ
ln δL + (1 − δ)K −γ .

ln y = ln α −
γ

The translog (transcendental logarithmic) production function:

ln y = β1 +β2 ln L+β3 ln K +β4 (ln L)(ln K )+β5 (ln L)2 +β6 (ln K )2 ,

which is linear in parameters; in this case, the OLS method suffices.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 4 / 33
Nonlinear Time Series Models

An exponential autoregressive (EXPAR) model:

p
X
2

yt = αj + βj exp −γyt−1 yt−j + et .
j=1

A self-exciting threshold autoregressive (SETAR) model:

(
a0 + a1 yt−1 + · · · + ap yt−p + et , if yt−d ∈ (−∞, c],
yt =
b0 + b1 yt−1 + · · · + bp yt−p + et , if yt−d ∈ (c, ∞),
where 1 ≤ d ≤ p is the delay parameter, and c is the threshold
parameter. Alternatively,
p
X p
X
yt = a0 + aj yt−j + δ0 + δj yt−j 1{yt−d >c} + et ,
j=1 j=1

with aj + δj = bj .
C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 5 / 33
Replacing the indicator function in SETAR model with a “smooth”
function h we obtain the smooth threshold autoregressive (STAR)
model:
p
X p
X
yt = a0 + aj yt−j + δ0 + δj yt−j h(yt−d ; c, δ) + et ,
j=1 j=1

where h is a distribution function, e.g.,

1
h(yt−d ; c, δ) = ,
1 + exp[−(yt−d − c)/s]

with c the threshold value and s a scale parameter. The STAR model
admits smooth transition between different regimes, and it behaves
like a SETAR model when (yt−d − c)/s is large.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 6 / 33
Artificial Neural Networks

A 3-layer neural network can be expressed as

 
q
X p
X
f (x1 . . . . , xp ; β) = g α0 + αi h γi0 + γij xj  ,
i=1 j=1

which contains p input units, q hidden units, and one output unit. The
functions h and g are known as activation functions, and the parameters
in these functions are connection weights.

h is typically an S-shaped function; two leading choices are the logistic

function h(x) = 1/(1 + e −x ) and the hyperbolic tangent function

e x − e −x
h(x) = .
e x + e −x
The function g may be the identity function or the same as h.
C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 7 / 33
Artificial neural networks are designed to mimic the behavior of biological
neural systems and have the following properties.

Universal approximation: Neural network is capable of approximating

any Borel-measurable function to any degree of accuracy, provided
that q is sufficiently large. In this sense, neural networks can be
understood as a series expansion, with hidden units functions as the
basis functions.
Parsimonious model: To achieve a given degree of approximation
accuracy, neural networks are simpler than the polynomial and
trigonometric expansions, in the sense that the number of hidden
units q can grow at a much slower rate.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 8 / 33
The NLS Estimator

The NLS criterion function:

1
QT (β) = [y − f(x1 , . . . , xT ; β)]0 [y − f(x1 , . . . , xT ; β)]
T
T
1 X
= [yt − f (xt ; β)]2 .
T
t=1

The first order condition contains k nonlinear equations with k

unknowns:
2 set
∇β QT (β) = − ∇β f(x1 , . . . , xT ; β) [y − f(x1 , . . . , xT ; β)] = 0,
T
where ∇β f(x1 , . . . , xT ; β) is a k × T matrix. A solution to the first
order condition is the NLS estimator β̂ T .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 9 / 33
[ID-2] f (x; ·) is twice continuously differentiable in the second argument
on Θ1 , such that for given data (yt , xt ), t = 1, . . . , T , ∇2β QT (β̂ T ) is
positive definite.

While [ID-2] ensures that β̂ T is a minimum of QT (β), it does not

guarantee the uniqueness of this solution. For a given data set, there
may exist multiple, local minima of QT (β).
For linear regressions, f(β) = Xβ so that ∇β f(β) = X0 and
∇2β f(β) = 0. It follows that ∇2β QT (β) = 2(X0 X)/T , which is
positive definite if, and only if, X has full column rank. Note that in
linear regression, the identification condition does not depend on β.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 10 / 33
Nonlinear Optimization Algorithms

An NLS estimate is usually computed using a numerical method. In

particular, an iterative algorithm starts from some initial value of the
parameter and then repeatedly calculates next available value according to
a particular rule until an optimum is reached approximately.

A generic, iterative algorithm is

β (i+1) = β (i) + s (i) d(i) .

That is, the (i + 1) th iterated value β (i+1) is obtained from β (i) with an
adjustment term s (i) d(i) , where d(i) characterizes the direction of change
in the parameter space and s (i) controls the amount of change. Note that
an iterative algorithm can only locate a local optimum.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 11 / 33
Gradient Method

The first-order Taylor expansion of Q(β) about β † is

QT (β) ≈ QT (β † ) + [∇β QT (β † )]0 (β − β † ).

Replacing β with β (i+1) and β † with β (i) ,

0
QT β (i+1) ≈ QT β (i) + ∇β QT β (i) s (i) d(i) .

Setting d(i) = −g(i) , where g(i) is ∇β QT (β) evaluated at β (i) , we have

QT β (i+1) ≈ QT β (i) − s (i) g(i)0 g(i) ,

where g(i)0) g(i) ≥ 0. This leads to:

β (i+1) = β (i) − s (i) g(i) .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 12 / 33
Steepest Descent Algorithm

To maximize the step length, note that

∂QT β (i+1)
(i+1)
(i+1) ∂β
= −g(i+1)0 g(i) = 0.

= ∇ Q
β T β
∂s (i) ∂s (i)
Let H(i) = ∇2β QT (β)|β=β(i) . By Taylor’s expansion of g , we have

g(i+1) ≈ g(i) + H(i) β (i+1) − β (i) = g(i) − H(i) s (i) g(i) .

Thus, 0 = g(i+1)0 g(i) ≈ g(i)0 g(i) − s (i) g(i)0 H(i) g(i) , or equivalently,
g(i)0 g(i)
s (i) = ≥ 0,
g(i)0 H(i) g(i)
when H(i) is p.d. We obtain the steepest descent algorithm:
" #
g (i)0 g(i)
β (i+1) = β (i) − g(i) .
g(i)0 H(i) g(i)
C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 13 / 33
Newton Method

The Newton method takes into account the second order derivatives.
Consider the second-order Taylor expansion of Q(β) around some β † :

1
QT (β) ≈ QT (β † ) + g†0 (β − β † ) + (β − β † )0 H† (β − β † ).
2

The first order condition of QT (β) is g† + H† (β − β † ) ≈ 0, so that

β ≈ β † − (H† )−1 g† .

This suggests the following Newton-Raphson algorithm:

−1
β (i+1) = β (i) − H(i) g(i) ,
−1
with the step length 1 and the direction vector − H(i) g(i) .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 14 / 33
From Taylor’s expansion it is easy to see that
1 −1 (i)
QT β (i+1) − QT β (i) ≈ − g(i)0 H(i)

g ≤ 0,
2

provided that H(i) is p.s.d. Thus, the Newton-Raphson algorithm usually

results in a decrease of QT .

When QT is (locally) quadratic, the second-order expansion is exact, so

that β = β † − (H† )−1 g† must be a minimum of QT (β). This immediately
suggests that the Newton-Raphson algorithm can reach the minimum in a
single step. Yet, there are two drawbacks.
The Hessian matrix need not be positive definite.
The Hessian matrix must be inverted at each iteration step.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 15 / 33
Gauss-Newton Algorithm

Letting Ξ(β) = ∇β f(β) we have

2 2 2
H(β) = − ∇ f(β)[y − f(β)] + Ξ(β)0 Ξ(β),
T β T
Ignoring the first term, an approximation to H(β) is 2Ξ(β)0 Ξ(β)/T ,
which requires only the first order derivatives and is guaranteed to be
p.s.d. The Gauss-Newton algorithm utilizes this approximation as
0 −1
β (i+1) = β (i) + Ξ β (i) Ξ β (i) Ξ β (i) y − f β (i) .

Note that the adjustment term can be obtained as the OLS estimate of
regressing y − f β (i) on Ξ β (i) ; this is known as the Gauss-Newton

regression.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 16 / 33
Other Modifications

To maintain a correct search direction, H(i) needs to be p.d.

(i)
Correcting H(i) by: Hc = H(i) + c (i) I, where c (i) > 0 is chosen to
(i)
“force” Hc to be p.d.
(i) (i)
For H̃ = H−1 , one may compute H̃c = H̃ + cI. Such a correction
is used in the Marquardt-Levenberg algorithm.
(i)
The quasi-Newton method corrects H̃ by a symmetric matrix:
(i+1) (i)
H̃ = H̃ + C(i) .

This is used by the Davidon-Fletcher-Powell (DFP) algorithm and the

Broydon-Fletcher-Goldfarb-Shanno (BFGS) algorithm.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 17 / 33
Initial Values and Convergence Criteria

Initial values: Specified by the researcher or obtained using a random

number generator. Prior information, if available, should also be
taken into account.
Convergence criteria:
(i+1)
− β (i) < c, where k · k denotes the Euclidean norm,

β
g β (i) < c, or

QT β (i+1) − QT β (i) < c.

For the Gauss-Newton algorithm, one may stop the algorithm when
TR 2 is “close” to zero, where R 2 is the coefficient of determination of
the Gauss-Newton regression.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 18 / 33
Digression: Uniform Law of Large Numbers

Consider the function q(zt (ω); θ). It is a r.v. for a given θ and a function
of θ for a given ω. Suppose {q(zt ; θ)} obeys a SLLN for each θ ∈ Θ:
T
1 X a.s.
QT (ω; θ) = q(zt (ω); θ) −→ Q(θ),
T
t=1

where Q(θ) is non-stochastic. Note that Ωc0 (θ) = {ω : QT (ω; θ) 6→ Q(θ)}

varies with θ.

Although IP(Ωc0 (θ)) = 0, ∪θ∈Θ Ωc0 (θ) is an uncountable union of

non-convergence sets and may not have probability zero.
∩θ∈Θ Ω0 (θ) may occur with probability less than one.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 19 / 33
When θ also depends on T (e.g., when θ is replaced by an estimator θ̃T ),
there may not exist a finite T ∗ such that QT (ω; θ̃T ) are arbitrarily close to
Q(ω; θ̃T ) for all T > T ∗ . Thus, we need a notion of convergence that is
uniform on the parameter space Θ.

We say that QT (ω; θ) converges to Q(θ) uniformly in θ almost surely (in

probability) if

sup |QT (θ) − Q(θ)| → 0, a.s. (in probability).

θ∈Θ

We also say that q(zt (ω); θ) obey a strong (or weak) uniform law of large
numbers (SULLN or WULLN).

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 20 / 33
Example: Let zt be i.i.d. with zero mean and

1
 T θ,
 0 ≤ θ ≤ 2T ,
1 1
qT (zt (ω); θ) = zt (ω) + 1 − T θ, 2T < θ ≤ T ,
 1
0, T < θ < ∞.


Observe that for θ ≥ 1/T and θ = 0,

T T
1 X 1 X a.s.
QT (ω; θ) = qT (zt ; θ) = zt −→ 0,
T T
t=1 t=1

by Kolmogorov’s SLLN. For a given θ, we can choose T large enough such

a.s.
that QT (ω; θ) −→ 0, where 0 is the pointwise limit. Yet for Θ = [0, ∞),
a.s.
sup |QT (ω; θ)| = |z̄T + 1/2| −→ 1/2,
θ∈Θ

so that the uniform limit is different from the pointwise limit.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 21 / 33
What is the extra condition needed to ensure SULLN if we already have,
for each θ ∈ Θ,
T
1 X a.s.
QT (θ) = [qTt (zt ; θ) − IE(qTt (zt ; θ))] −→ 0.
T
t=1

Suppose QT (θ) satisfies a Lipschitz-type condition: for θ and θ † in Θ,

|QT (θ) − QT (θ † )| ≤ CT kθ − θ † k a.s.,

where |CT | ≤ ∆ a.s. and ∆ does not depend on θ. Then,

sup |QT (θ)| ≤ sup |QT (θ) − QT (θ † )| + |QT (θ † )|.

θ∈Θ θ∈Θ

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 22 / 33
Given > 0, we can choose θ † such that kθ − θ † k < /(2∆). Then,

sup |QT (θ) − QT (θ † )| ≤ CT ≤ ,
θ∈Θ 2∆ 2

uniformly in T . Also, by pointwise convergence of QT , |QT (θ † )| < /2 for

large T . Consequently, for all T sufficiently large,

sup |QT (θ)| ≤ .

θ∈Θ

This shows that pointwise convergence and a Lipschitz condition on QT

together suffice for a SULLN or WULLN.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 23 / 33
Consistency

The NLS criterion function is QT (β) = T −1 T 2

P
t=1 [yt − f (xt ; β)] , and its
minimizer is the NLS estimator β̂ T . Suppose IE[QT (β)] is continuous on
Θ1 such that β o is its unique, global minimum. If QT (β) is close to
IE[QT (β)], we would expect β̂ T close to β o .

To see this, assuming that QT obeys a SULLN:

sup QT (β) − IE[QT (β)] → 0,
β∈Θ1

for all ω ∈ Ω0 and IP(Ω0 ) = 1. Set

= inf
c
IE[QT (β)] − IE[QT (β o )] ,
β∈B ∩Θ1

for an open neighborhood B of β o .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 24 / 33
For ω ∈ Ω0 , we have for large T , IE[QT (β̂ T )] − QT (β̂ T ) < 2 , and

QT (β̂ T ) − IE[QT (β o )] ≤ QT (β o ) − IE[QT (β o )] < ,
2

because the NLS estimator β̂ T minimizes QT (β). It follows that

IE[QT (β̂ T )] − IE[QT (β o )]

≤ IE[QT (β̂ T )] − QT (β̂ T ) + QT (β̂ T ) − IE[QT (β o )] < ,

for all T sufficiently large. As β̂ T is such that IE[QT (β̂ T )] is closer to

IE[QT (β o )] with probability one, it can not be outside the neighborhood B
of β o . As B is arbitrary, β̂ T must be converging to β o almost surely.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 25 / 33
Q: How do we ensure a SULLN or WULLN?
If Θ1 is compact and convex, we have from the mean-value theorem and
the Cauchy-Schwartz inequality that

|QT (β) − QT (β † )| ≤ k∇β QT (β ‡ )k kβ − β † k a.s.,

where β ‡ is the mean value of β and β † , in the sense that

|β − β † | < |β ‡ − β † |. Hence, the Lipschitz-type condition would hold for

CT = sup ∇β QT (β),
β∈Θ1

with ∇β QT (β) = −2 T
P
t=1 ∇β f (xt ; β)[yt − f (xt ; β)]/T . Note that
∇β QT (β) may be bounded in probability, but it may not be bounded in
an almost sure sense. (Why?)

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 26 / 33
We impose the following conditions.
[C1] {(yt w0t )0 } is a sequence of random vectors, and xt is vector
containing some elements of Y t−1 and W t .
(i) The sequences {yt2 }, {yt f (xt ; β)} and {f (xt ; β)2 } all obey a WLLN for
each β in Θ1 , where Θ1 is compact and convex.
(ii) yt , f (xt ; β) and ∇β f (xt ; β) all have bounded second moment
uniformly in β.
[C2] There exists a unique parameter vector β o such that
IE(yt | Y t−1 , W t ) = f (xt ; β o ).

Theorem 8.1
Given the nonlinear specification: y = f (x; β) + e(β), suppose that [C1]
IP
and [C2] hold. Then, β̂ T −→ β o .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 27 / 33
Remark: Theorem 8.1 is not satisfactory because it only deals with the
convergence to the global minimum. Yet, an iterative algorithm is not
guaranteed to find a global minimum of the NLS objective function.
Hence, it is more reasonable to expect the NLS estimator converging to
some local minimum of IE[QT (β)]. Therefore, we shall, in what follows,
assert only that the NLS estimator converges in probability to a local
minimum β ∗ of IE[QT (β)]. In this case, f (x; β ∗ ) is, at most, an
approximation to the conditional mean function.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 28 / 33
Asymptotic Normality

By the mean-value expansion of ∇β QT (β̂ T ) about β ∗ ,

0 = ∇β QT (β̂ T ) = ∇β QT (β ∗ ) + ∇2β QT (β †T )(β̂ T − β ∗ ),

where β †T is a mean value of β̂ T and β ∗ . Thus, when ∇2β QT (β †T ) is

invertible, we have
√ √
T (β̂ T − β ∗ ) = −[∇2β QT (β †T )]−1 T ∇β QT (β ∗ )
√
= −HT (β ∗ )−1 T ∇β QT (β ∗ ) + oIP (1),
√
where HT (β) = IE[∇2β QT (β)]. That is, T (β̂ T − β ∗ ) and
√
−HT (β ∗ )−1 T ∇β QT (β ∗ ) are asymptotically equivalent.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 29 / 33
Under suitable conditions,

√ T
2 X
T ∇β QT (β ∗ ) = − √ ∇β f (xt ; β ∗ )[yt − f (xt ; β ∗ )]
T t=1
√ D
obeys a CLT, i.e., (V∗T )−1/2 T ∇β QT (β ∗ ) −→ N (0, Ik ), where

T
!
2 X
V∗T = var √ ∗ ∗
∇β f (xt ; β )[yt − f (xt ; β )] .
T t=1

Then for D∗T = HT (β ∗ )−1 V∗T HT (β ∗ )−1 ,

√ D
(D∗T )−1/2 HT (β ∗ )−1 T ∇β QT (β ∗ ) −→ N (0, Ik ).

By asymptotic equivalence,
√ D
(D∗T )−1/2 T (β̂ T − β ∗ ) −→ N (0, Ik ).

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 30 / 33
When D∗T is replaced by a consistent estimator D
b ,
T
√
b −1/2 T (β̂ − β ∗ ) −→
D
D
N (0, Ik ).
T T

Note that
T
∗2 X 0
IE ∇β f (xt ; β ∗ ) ∇β f (xt ; β ∗ )

HT (β ) =
T
t=1
T
2 X
IE ∇2β f (xt ; β ∗ ) yt − f (xt ; β ∗ ) ,

−
T
t=1

which can be consistently estimated by its sample counterpart:

T T
b = 2
X 0 2 X
∇2β f (xt ; β̂ T )êt .

H T ∇ β f (xt ; β̂ T ) ∇ β f (x t ; β̂ T ) −
T T
t=1 t=1

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 31 / 33
When t = yt − f (xt ; β ∗ ) are uncorrelated with ∇2β f (xt ; β ∗ ), HT (β ∗ )
depends only on the expectation of the outer product of ∇β f (xt ; β ∗ ) so
that H
b may be simplified as
T

T
b = 2
X 0
H T ∇β f (xt ; β̂ T ) ∇β f (xt ; β̂ T ) .
T
t=1
PT 0
This is analogous to estimating Mxx by t=1 xt xt /T in linear regressions.

If {t } is not a martingale difference sequence with respect to Y t−1 and

W t , V∗T can be consistently estimated using a Newey-West type
estimator. This is more likely in practice as the NLS estimator typically
converges to a local optimum β ∗ .

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 32 / 33
Wald Tests

Hypothesis: Rβ ∗ = r, where R is a q × k selection matrix and r is a

q × 1 vector of pre-specified constants.
By the asymptotic normality result, we have under the null that
√ √
b−1/2 T R(β̂ − β ∗ ) = Γ
Γ b−1/2 T (Rβ̂ − r) −→
D
N (0, Iq ),
T T T T

where Γ b R0 , and D
b = RD b is a consistent estimator for D∗ .
T T T T

The Wald statistic is

−1 D
b (Rβ̂ − r)0 −→ χ2 (q).
WT = T (Rβ̂ T − r)ΓT T

For nonlinear restrictions r(β ∗ ) = 0, the Wald test is not invariant

with respect to the form of r(β) = 0.

C.-M. Kuan (National Taiwan Univ.) Nonlinear Least Squares Theory March 9, 2010 33 / 33

Principles of Biostatistics 2nd Edition Marcello Pagano Instant Download
No ratings yet
Principles of Biostatistics 2nd Edition Marcello Pagano Instant Download
38 pages
Classical Least Squares Theory - Lecture Notes
No ratings yet
Classical Least Squares Theory - Lecture Notes
109 pages
Statistics 580 Nonlinear Least Squares: I I I I I I I 2 N I I 2
No ratings yet
Statistics 580 Nonlinear Least Squares: I I I I I I I 2 N I I 2
14 pages
Ramanathan 4
No ratings yet
Ramanathan 4
46 pages
Figure of Merit
100% (1)
Figure of Merit
33 pages
2020 Killara HS Adv Trial
No ratings yet
2020 Killara HS Adv Trial
66 pages
Least Squares Method
No ratings yet
Least Squares Method
36 pages
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
No ratings yet
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
351 pages
Elements of Probability Theory
100% (2)
Elements of Probability Theory
38 pages
CH 12
No ratings yet
CH 12
82 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
Stock Market Prediction and Investment Portfolio Selection Using Computational Approach
No ratings yet
Stock Market Prediction and Investment Portfolio Selection Using Computational Approach
10 pages
Nonlinear Least Squares Data Fitting: Appendix D
No ratings yet
Nonlinear Least Squares Data Fitting: Appendix D
16 pages
Function Approximation: A Gradient Boosting Machine.
No ratings yet
Function Approximation: A Gradient Boosting Machine.
45 pages
Laporan Survey Batimetri Ketapang
No ratings yet
Laporan Survey Batimetri Ketapang
32 pages
ECON 5350 Class Notes Nonlinear Regression Models: 2.1 Linearized Regression Model and The Gauss-Newton Algorithm
No ratings yet
ECON 5350 Class Notes Nonlinear Regression Models: 2.1 Linearized Regression Model and The Gauss-Newton Algorithm
11 pages
Datasheet PDF
No ratings yet
Datasheet PDF
12 pages
Aiml Unit 1 Nil
No ratings yet
Aiml Unit 1 Nil
24 pages
MoE 2023
No ratings yet
MoE 2023
28 pages
General Equilibrium
No ratings yet
General Equilibrium
4 pages
Image Blending and Compositing
No ratings yet
Image Blending and Compositing
106 pages
Mtech-Syllabus-Data Science - Sem1
No ratings yet
Mtech-Syllabus-Data Science - Sem1
25 pages
Algorithms For Efficient Computation of Convolution
No ratings yet
Algorithms For Efficient Computation of Convolution
30 pages
Chiang Ch8
100% (1)
Chiang Ch8
61 pages
1 Intro Dynamic Programming
No ratings yet
1 Intro Dynamic Programming
6 pages
Egede Project Final Correction
No ratings yet
Egede Project Final Correction
119 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Appendix B Hand Out Gauss Newton Derivation
No ratings yet
Appendix B Hand Out Gauss Newton Derivation
8 pages
Chapter 17
No ratings yet
Chapter 17
1 page
Discrete-Time Evaluation of The Time Response: Appendix
No ratings yet
Discrete-Time Evaluation of The Time Response: Appendix
6 pages
TS Theme3
No ratings yet
TS Theme3
18 pages
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
No ratings yet
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
5 pages
(Bruderl) Applied Regression Analysis Using Stata
No ratings yet
(Bruderl) Applied Regression Analysis Using Stata
73 pages
Introducing Advanced Macroeconomics:: Growth and Business Cycles Cycles
No ratings yet
Introducing Advanced Macroeconomics:: Growth and Business Cycles Cycles
28 pages
Joreskog&Sorbom - LISREL 8 - Structural Equation Modeling With Simplis Command Language (1998) - Iki1psl
No ratings yet
Joreskog&Sorbom - LISREL 8 - Structural Equation Modeling With Simplis Command Language (1998) - Iki1psl
12 pages
FinalExam Mar21 Solutions
No ratings yet
FinalExam Mar21 Solutions
9 pages
Bootstrap: Estimate Statistical Uncertainties
No ratings yet
Bootstrap: Estimate Statistical Uncertainties
22 pages
Generalized Least Squares Theory
No ratings yet
Generalized Least Squares Theory
32 pages
TD Sequential
No ratings yet
TD Sequential
66 pages
Quasi Maximum Likelihood Theory - Lecture Notes
No ratings yet
Quasi Maximum Likelihood Theory - Lecture Notes
119 pages
Recursive Economics
No ratings yet
Recursive Economics
3 pages
3.2 - Interpolation and Lagrange Polynomials 1. Polynomial Interpolation: Problem: Given N
No ratings yet
3.2 - Interpolation and Lagrange Polynomials 1. Polynomial Interpolation: Problem: Given N
13 pages
!!! Aplied Combinatorial Mathematics
No ratings yet
!!! Aplied Combinatorial Mathematics
8 pages
Econ 138: Financial and Behavioral Economics: Noise-Trader Risk in Financial Markets February 8 & 13, 2017
0% (1)
Econ 138: Financial and Behavioral Economics: Noise-Trader Risk in Financial Markets February 8 & 13, 2017
35 pages
Business Mathematics
No ratings yet
Business Mathematics
38 pages
A Brief History of Econometrics
No ratings yet
A Brief History of Econometrics
31 pages
Terence C. Mills - The Econometric of Modelling of Financial Time Series
No ratings yet
Terence C. Mills - The Econometric of Modelling of Financial Time Series
11 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Bayesian Structural Time Series Models
No ratings yet
Bayesian Structural Time Series Models
100 pages
Abbasi Kalantari Impact of Corporate Governance Mechanisms On Firm Value
No ratings yet
Abbasi Kalantari Impact of Corporate Governance Mechanisms On Firm Value
10 pages
Datacamp Python 3
No ratings yet
Datacamp Python 3
36 pages
Bernard Salanie The Economics of Taxation Contents
No ratings yet
Bernard Salanie The Economics of Taxation Contents
5 pages
Lecture Note 4 - Dynamic Models For Stationary Data
100% (1)
Lecture Note 4 - Dynamic Models For Stationary Data
28 pages
Inertia Relief
No ratings yet
Inertia Relief
16 pages
Convergence of Stochastic Processes
No ratings yet
Convergence of Stochastic Processes
223 pages
Stock Market Prediction Using MLP and Random Forest
No ratings yet
Stock Market Prediction Using MLP and Random Forest
18 pages
Kinetics of Castor Oil Alkyd Resin Polycondensation Reaction 2157 7048 1000240
No ratings yet
Kinetics of Castor Oil Alkyd Resin Polycondensation Reaction 2157 7048 1000240
8 pages
Dual Methods For The Minimization of The Total Variation
No ratings yet
Dual Methods For The Minimization of The Total Variation
30 pages
Elements of Probability Theory - Lecture Notes
No ratings yet
Elements of Probability Theory - Lecture Notes
58 pages
Lecture On Bootstrap - Lecture Notes
No ratings yet
Lecture On Bootstrap - Lecture Notes
29 pages
LINEAR REGRESSION Feu Diliman
No ratings yet
LINEAR REGRESSION Feu Diliman
11 pages
Forecasting Interest Rates
No ratings yet
Forecasting Interest Rates
63 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
A Survey of Multilinear Subspace Learning For Tensor Data
No ratings yet
A Survey of Multilinear Subspace Learning For Tensor Data
35 pages
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
No ratings yet
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
23 pages
Is That Back-Test Result Good or Just Lucky
No ratings yet
Is That Back-Test Result Good or Just Lucky
8 pages
Duality Based Algorithms For Total Variation Regularized Image Restoration
No ratings yet
Duality Based Algorithms For Total Variation Regularized Image Restoration
24 pages
Statistical Concepts
No ratings yet
Statistical Concepts
20 pages
Quasi Maximum Likelihood - Applications
No ratings yet
Quasi Maximum Likelihood - Applications
17 pages
(Quantitative Finance Collector) PDF
No ratings yet
(Quantitative Finance Collector) PDF
57 pages
Stability of Equilibrium Points (2021)
No ratings yet
Stability of Equilibrium Points (2021)
17 pages
Slides 2 Extending The RBC Model
No ratings yet
Slides 2 Extending The RBC Model
70 pages
Bahan Univariate Linear Regression
No ratings yet
Bahan Univariate Linear Regression
64 pages
The Structure of Images
No ratings yet
The Structure of Images
8 pages
Seamless Image Stitching in The Gradient Domain
No ratings yet
Seamless Image Stitching in The Gradient Domain
12 pages
Scale Space and Edge Detection Using Anisotropic Diffusion
No ratings yet
Scale Space and Edge Detection Using Anisotropic Diffusion
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
31 pages
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
No ratings yet
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
19 pages
Econ 138: Financial and Behavioral Economics: Noise February 6, 2017
No ratings yet
Econ 138: Financial and Behavioral Economics: Noise February 6, 2017
29 pages
Bs Greek Derivation!!!!
No ratings yet
Bs Greek Derivation!!!!
13 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
Static Optimization vs. Computed Muscle Control Characterizations of Neuromuscular Control: Clinically Meaningful Differences?
No ratings yet
Static Optimization vs. Computed Muscle Control Characterizations of Neuromuscular Control: Clinically Meaningful Differences?
21 pages
P. Christoffersen. Evaluating Interval Forecast. Internatinal Economic Review, 39, 1998.
No ratings yet
P. Christoffersen. Evaluating Interval Forecast. Internatinal Economic Review, 39, 1998.
23 pages
1 Models
No ratings yet
1 Models
30 pages
Super Six 2.2
No ratings yet
Super Six 2.2
11 pages
Tobit
No ratings yet
Tobit
28 pages
CH 3 Describing Relationship Review
No ratings yet
CH 3 Describing Relationship Review
8 pages
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
No ratings yet
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
39 pages
The Econometric Modelling of Financial Time Series: Terence C. Mills
100% (1)
The Econometric Modelling of Financial Time Series: Terence C. Mills
11 pages
2017 - The American Biology Teacher - Understanding Reversible Molecular Binding
No ratings yet
2017 - The American Biology Teacher - Understanding Reversible Molecular Binding
7 pages
ARCH Model
No ratings yet
ARCH Model
26 pages
Demand Forecasting Methods
No ratings yet
Demand Forecasting Methods
5 pages
Econometrics Notes 2024
100% (1)
Econometrics Notes 2024
46 pages
Convex Optimization: Instructor: Angelia Nedich
No ratings yet
Convex Optimization: Instructor: Angelia Nedich
17 pages
The Relation Between Gender and Emotions in Different Cultures
No ratings yet
The Relation Between Gender and Emotions in Different Cultures
9 pages
Nonlinear Curve Fitting
No ratings yet
Nonlinear Curve Fitting
43 pages
New Syllabus
No ratings yet
New Syllabus
7 pages
Lab08 09
No ratings yet
Lab08 09
4 pages
Kernel Regression Section3
No ratings yet
Kernel Regression Section3
3 pages
Sales Analysis Using The Forecasting Method: Bit-Tech April 2019
No ratings yet
Sales Analysis Using The Forecasting Method: Bit-Tech April 2019
5 pages
ch9 QML
No ratings yet
ch9 QML
25 pages
The Capital Asset Pricing Model: Some Empirical Tests: Mjensen@Hbs - Edu
No ratings yet
The Capital Asset Pricing Model: Some Empirical Tests: Mjensen@Hbs - Edu
54 pages
AP Statistics - 2014-2015 Semester 1 Test 3
No ratings yet
AP Statistics - 2014-2015 Semester 1 Test 3
4 pages

Nonlinear Least Squares Theory - Lecture Notes

Uploaded by

Nonlinear Least Squares Theory - Lecture Notes

Uploaded by

Nonlinear Least Squares Theory

2 The NLS Method

3 Asymptotic Properties of the NLS Estimator

Given the dependent variable y , consider the nonlinear specification:

where x is ` × 1, β is k × 1, and f is a given function. There are many

which yields x − 1 when γ = 1, 1 − 1/x when γ = −1, and a value close

where α > 0, 0 < δ < 1 and γ ≥ −1, which yields:

The translog (transcendental logarithmic) production function:

which is linear in parameters; in this case, the OLS method suffices.

An exponential autoregressive (EXPAR) model:

A self-exciting threshold autoregressive (SETAR) model:

where h is a distribution function, e.g.,

A 3-layer neural network can be expressed as

h is typically an S-shaped function; two leading choices are the logistic

Universal approximation: Neural network is capable of approximating

The NLS criterion function:

The first order condition contains k nonlinear equations with k

While [ID-2] ensures that β̂ T is a minimum of QT (β), it does not

An NLS estimate is usually computed using a numerical method. In

A generic, iterative algorithm is

β (i+1) = β (i) + s (i) d(i) .

The first-order Taylor expansion of Q(β) about β † is

QT (β) ≈ QT (β † ) + [∇β QT (β † )]0 (β − β † ).

Replacing β with β (i+1) and β † with β (i) ,

Setting d(i) = −g(i) , where g(i) is ∇β QT (β) evaluated at β (i) , we have

QT β (i+1) ≈ QT β (i) − s (i) g(i)0 g(i) ,

where g(i)0) g(i) ≥ 0. This leads to:

β (i+1) = β (i) − s (i) g(i) .

To maximize the step length, note that

g(i+1) ≈ g(i) + H(i) β (i+1) − β (i) = g(i) − H(i) s (i) g(i) .

The first order condition of QT (β) is g† + H† (β − β † ) ≈ 0, so that

This suggests the following Newton-Raphson algorithm:

provided that H(i) is p.s.d. Thus, the Newton-Raphson algorithm usually

When QT is (locally) quadratic, the second-order expansion is exact, so

Letting Ξ(β) = ∇β f(β) we have

To maintain a correct search direction, H(i) needs to be p.d.

This is used by the Davidon-Fletcher-Powell (DFP) algorithm and the

Initial values: Specified by the researcher or obtained using a random

where Q(θ) is non-stochastic. Note that Ωc0 (θ) = {ω : QT (ω; θ) 6→ Q(θ)}

Although IP(Ωc0 (θ)) = 0, ∪θ∈Θ Ωc0 (θ) is an uncountable union of

We say that QT (ω; θ) converges to Q(θ) uniformly in θ almost surely (in

sup |QT (θ) − Q(θ)| → 0, a.s. (in probability).

Observe that for θ ≥ 1/T and θ = 0,

by Kolmogorov’s SLLN. For a given θ, we can choose T large enough such

so that the uniform limit is different from the pointwise limit.

Suppose QT (θ) satisfies a Lipschitz-type condition: for θ and θ † in Θ,

|QT (θ) − QT (θ † )| ≤ CT kθ − θ † k a.s.,

where |CT | ≤ ∆ a.s. and ∆ does not depend on θ. Then,

sup |QT (θ)| ≤ sup |QT (θ) − QT (θ † )| + |QT (θ † )|.

uniformly in T . Also, by pointwise convergence of QT , |QT (θ † )| < /2 for

sup |QT (θ)| ≤ .

This shows that pointwise convergence and a Lipschitz condition on QT

The NLS criterion function is QT (β) = T −1 T 2

To see this, assuming that QT obeys a SULLN:

for all ω ∈ Ω0 and IP(Ω0 ) = 1. Set

for an open neighborhood B of β o .

because the NLS estimator β̂ T minimizes QT (β). It follows that

IE[QT (β̂ T )] − IE[QT (β o )]

≤ IE[QT (β̂ T )] − QT (β̂ T ) + QT (β̂ T ) − IE[QT (β o )] < ,

for all T sufficiently large. As β̂ T is such that IE[QT (β̂ T )] is closer to

|QT (β) − QT (β † )| ≤ k∇β QT (β ‡ )k kβ − β † k a.s.,

where β ‡ is the mean value of β and β † , in the sense that

By the mean-value expansion of ∇β QT (β̂ T ) about β ∗ ,

0 = ∇β QT (β̂ T ) = ∇β QT (β ∗ ) + ∇2β QT (β †T )(β̂ T − β ∗ ),

where β †T is a mean value of β̂ T and β ∗ . Thus, when ∇2β QT (β †T ) is

Then for D∗T = HT (β ∗ )−1 V∗T HT (β ∗ )−1 ,

which can be consistently estimated by its sample counterpart:

If {t } is not a martingale difference sequence with respect to Y t−1 and

Hypothesis: Rβ ∗ = r, where R is a q × k selection matrix and r is a

The Wald statistic is

For nonlinear restrictions r(β ∗ ) = 0, the Wald test is not invariant

You might also like

uniformly in T . Also, by pointwise convergence of QT , |QT (θ † )| < /2 for

sup |QT (θ)| ≤ .

≤ IE[QT (β̂ T )] − QT (β̂ T ) + QT (β̂ T ) − IE[QT (β o )] < ,

If {t } is not a martingale difference sequence with respect to Y t−1 and