Topics 2011
Topics 2011
where
f (x) = f (x, y)dx and f (y) = f (x, y)dy (2)
y x
f (y, x) f (y, x)
f (x|y) = and f (y|x) = (3)
f (y) f (x)
1
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
Therefore, E{(y − π)2 } = E{(y − ŷ)2 } + E{(ŷ − π)2 } ≥ E{(y − ŷ)2 }, and the
assertion is proved.
The error in predicting y is uncorrelated with x. The proof of this depends
on showing that E(ŷx) = E(yx), where ŷ = E(y|x):
E(ŷx) = xE(y|x)f (x)dx
x
f (y, x)
= x y dy f (x)dx (9)
x y f (x)
= xyf (y, x)dydx = E(xy).
x y
whence
E(xy) − E(x)E(y) C(x, y)
β= = . (15)
E(x ) − {E(x)}
2 2 V (x)
The expression
α = E(y) − βE(x) (16)
comes directly from (9).
Observe that, by substituting (16) into (10), the following prediction-error
equation for the conditional expectation is derived:
2
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
β1
y = [ X1 X2 ] + ε = X1 β1 + X2 β2 + ε. (1)
β2
Here [X1 , X2 ] = X and [β1 , β2 ] = β are obtained by partitioning the matrix X
and vector β of the equation y = Xβ + ε in a conformable manner. The normal
equations X Xβ = X y can be partitioned likewise. Writing the equations
without the surrounding matrix braces gives
From (2), we get the equation X1 X1 β1 = X1 (y − X2 β2 ) which gives an expres-
sion for the leading subvector of β̂ :
To obtain an expression for β̂2 , we must eliminate β1 from equation (3). For
this purpose, we multiply equation (2) by X2 X1 (X1 X1 )−1 to give
X2 X1 β1 + X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 X1 (X1 X1 )−1 X1 y. (5)
X2 X2 − X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 y − X2 X1 (X1 X1 )−1 X1 y. (6)
On defining
P1 = X1 (X1 X1 )−1 X1 , (7)
can we rewrite (6) as
whence
−1
β̂2 = X2 (I − P1 )X2 X2 (I − P1 )y. (9)
β1
y = [ X 1 , X2 ] + ε = X1 β1 + X2 β2 + ε. (10)
β2
3
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
It can be assumed that the variables in this equation are in deviation form.
Imagine that the columns of X1 are orthogonal to the columns of X2 such that
X1 X2 = 0. This is the same as assuming that the empirical correlation between
variables in X1 and variables in X2 is zero.
The effect upon the ordinary least-squares estimator can be seen by exam-
ining the partitioned form of the formula β̂ = (X X)−1 X y. Here we have
where the final equality follows from the condition of orthogonality. The inverse
of the partitioned form of X X in the case of X1 X2 = 0 is
−1
−1 X1 X1 0 (X1 X1 )−1 0
(X X) = = . (12)
0 X2 X2 0 (X2 X2 )−1
We also have
X1 X1 y
Xy= y= . (13)
X2 X2 y
On combining these elements, we find that
β̂1 (X1 X1 )−1 0 X1 y (X1 X1 )−1 X1 y
= = . (14)
β̂2 0 (X2 X2 )−1 X2 y (X2 X2 )−1 X2 y
It can be confirmed easily that these formulae do specialise to those under (14)
in the case of X1 X2 = 0.
The purpose of including X2 in the regression equation when, in fact,
interest is confined to the parameters of β1 is to avoid falsely attributing the
explanatory power of the variables of X2 to those of X1 .
Let us investigate the effects of erroneously excluding X2 from the regres-
sion. In that case, the estimate will be
4
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
since E{(X1 X1 )−1 X1 ε} = (X1 X1 )−1 X1 E(ε) = 0. Thus, in general, we have
E(β̃1 ) = β1 , which is to say that β̃1 is a biased estimator. The only circum-
stances in which the estimator will be unbiased are when either X1 X2 = 0 or
β2 = 0. In other circumstances, the estimator will suffer from a problem which
is commonly described as omitted-variables bias.
We need to ask whether it matters that the estimated regression parame-
ters are biased. The answer depends upon the use to which we wish to put the
estimated regression equation. The issue is whether the equation is to be used
simply for predicting the values of the dependent variable y or whether it is to
be used for some kind of structural analysis.
If the regression equation purports to describe a structural or a behavioral
relationship within the economy, and if some of the explanatory variables on
the RHS are destined to become the instruments of an economic policy, then
it is important to have unbiased estimators of the associated parameters. For
these parameters indicate the leverage of the policy instruments. Examples of
such instruments are provided by interest rates, tax rates, exchange rates and
the like.
On the other hand, if the estimated regression equation is to be viewed
solely as a predictive device—that it to say, if it is simply an estimate of the
function E(y|x1 , . . . , xk ) which specifies the conditional expectation of y given
the values of x1 , . . . , xn —then, provided that the underlying statistical mech-
anism which has generated these variables is preserved, the question of the
unbiasedness the regression estimates does not arise.
Rβ = r, (1)
5
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
derived from the estimator of the regression parameters. The variate of (4)
must also be independent of the chi-square of (3); and it is straightforward to
deduce that
−1
(Rβ̂ − r) R(X X)−1 R (Rβ̂ − r) (y − X β̂) (y − X β̂)
F =
j T −k
(6)
−1
(Rβ̂ − r) R(X X)−1 R (Rβ̂ − r)
= 2
∼ F (j, T − k),
σ̂ j
which is to say that the ratio of the two independent chi-square variates, di-
vided by their respective degrees of freedom, is an F statistic. This statistic,
which embodies only know and observable quantities, can be used in testing
the validity of the hypothesised restrictions Rβ = r.
A specialisation of the statistic under (6) can also be used in testing an
hypothesis concerning a subset of the elements of the vector β. Let β =
[β1 , β2 ] . Then the condition that the subvector β1 assumes the value of β1∗ can
be expressed via the equation
β1
[Ik1 , 0] = β1∗ . (7)
β2
6
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
(β̂i − βi )2
F = , (11)
σ̂ 2 wii
Here, wii stands for the ith diagonal element of (X X)−1 . If the hypothesis is
true, then this will have an F (1, T − k) distribution.
However, the usual way of testing such an hypothesis is to use
β̂i − βi
t= (12)
(σ̂ 2 wii )
in conjunction with the tables of the t(T −k) distribution. The t statistic shows
the direction in which the estimate of βi deviates from the hypothesised value
as well as the size of the deviation.
[T /2]
yt = α0 + {αj cos(ωj t) + β sin(ωj t)} ; t = 0, 1, . . . , T − 1. (1)
j=1
7
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
where ctj = cos(ωj t) and stj = sin(ωj t). The vectors of the ordinates of func-
tions of different frequencies are mutually orthogonal. Therefore, the following
orthogonality conditions hold:
ci cj = si sj = 0 if i = j,
(4)
and ci sj = 0 for all i, j.
In addition, there are some sums of squares which can be taken into account
in computing the coefficients of the Fourier decomposition:
c0 c0 = ι ι = T, s0 s0 = 0,
(5)
cj cj = sj sj = T /2 for j = 1, . . . , [(T − 1)/2]
The “regression” formulae for the Fourier coefficients can now be given.
First, there is
1
α0 = (ι ι)−1 ι y = yt = ȳ. (7)
T t
2
αj = (cj cj )−1 cj y = yt cos ωj t, (8)
T t
and
2
βj = (sj sj )−1 sj y = yt sin ωj t. (9)
T t
1
αn = (cn cn )−1 cn y = (−1)t yt . (10)
T t
[T /2]
yy= α02 ι ι + αj2 cj cj + βj2 sj sj . (11)
j=1
Now consider writing α02 ι ι = ȳ 2 ι ι = ȳ ȳ, where ȳ = [ȳ, ȳ, . . . , ȳ] is a vector
whose repeated element is the sample mean ȳ. It follows that y y − α02 ι ι =
8
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
T 2 T 2
n−1 n
(y − ȳ) (y − ȳ) = 2 2
α + βj + T αn = ρ . (12)
2 j=1 j 2 j=1 j
T −1
1 1 2
n
(yt − ȳ) =
2
(α + βj2 ). (13)
T t=0 2 j=1 j
9
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
5.4
5.2
4.8
0 25 50 75 100 125
Figure 1. The plot of 132 monthly observations on the U.S. money supply,
beginning in January 1960. A quadratic function has been interpolated
through the data.
0.015
0.01
0.005
0
0 π/4 π/2 3π/4 π
Figure 2. The periodogram of the residuals of the logarithmic money-
supply data.
T −1
T −1
1
I(ωj ) = cτ cos(ωj τ ) = c0 + 2 cτ cos(ωj τ ), (15)
2 τ =1
τ =1−T
10
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
T −1
−1
cτ = T (yt − ȳ)(yt−τ − ȳ), (16)
t=τ
T −1
1 1 2
[T /2]
(yt − ȳ) =
2
(α + βj2 ), (17)
T t=0 2 j=0 j
where
2 2
αj = yt cos(ωj t) = (yt − ȳ) cos(ωj t),
T t T t
2 2
βj = yt sin(ωj t) = (yt − ȳ) sin(ωj t).
T t T t
Substituting these into the term T (αj2 + βj2 )/2 gives the periodogram
T −1 2 T
−1 2
2
I(ωj ) = cos(ωj t)(yt − ȳ) + sin(ωj t)(yt − ȳ) .
T t=0 t=0
Since cos(A) cos(B) + sin(A) sin(B) = cos(A − B), this can be written as
2
I(ωj ) = cos(ωj [t − s])(yt − ȳ)(ys − ȳ)
T t s
On defining τ = t − s and writing cτ = t (yt − ȳ)(yt−τ − ȳ)/T , we can reduce
the latter expression to
T −1
I(ωj ) = 2 cos(ωj τ )cτ ,
τ =1−T
11
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
y = ξ + η. (1)
E(ξ) = 0, D(ξ) = Ωξ ,
E(η) = 0, D(η) = Ωη , (2)
and C(ξ, η) = 0.
12
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
h = Ψh y = Ωη (Ωξ + Ωη )−1 y
(6)
= {I − Ωξ (Ωξ + Ωη )−1 }y.
Conditional Expectations
In deriving the estimator, we might have used the formula for conditional ex-
pectations. In the case of two linearly related scalar random variables ξ and y,
the conditional expectation of ξ given y is
C(ξ, y)
E(ξ|y) = E(ξ) + {y − E(y)} (7)
V (y)
By setting
C(ξ, y) = Ωξ and D(y) = Ωξ + Ωη
∇2 = 1 − 2L + L2
13
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
0
0 π/4 π/2 3π/4 π
Figure 3. The squared gain of the difference operator, which has a zero at zero
frequency, and the squared gain of the summation operator, which is unbounded at
zero frequency.
Polynomial Regression
Using the matrix Q defined above, we can represent the vector of the ordinates
of a linear trend line interpolated through the data sequence as
Observe that this vector contains exactly the same information as the
differenced vector g = Q y. However, whereas the low-frequency structure of
14
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
11.5
11
10.5
10
0 50 100 150
Figure 4. The quarterly series of the logarithms of consumption in the U.K., for
the years 1955 to 1994, together with a linear trend interpolated by least-squares
regression.
0
0 π/4 π/2 3π/4 π
0.3
0.2
0.1
0
0 π/4 π/2 3π/4 π
15
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
0.01
0.0075
0.005
0.0025
0
0 π/4 π/2 3π/4 π
Figure 7. The periodogram of the residual sequence obtained from the linear de-
trending of the logarithmic consumption data.
Q y = Q ξ + Q η
(12)
= δ + κ = g.
The vectors of the expectations and the dispersion matrices of the differenced
vectors are
E(δ) = 0, D(δ) = Ωδ = Q D(ξ)Q,
(13)
E(κ) = 0, D(κ) = Ωκ = Q D(η)Q.
The difficulty of estimating the trended vector ξ = y − η directly is that some
starting values or initial conditions are required in order to define the value at
time t = 0. However, since η is from a stationary mean-zero process, it requires
only zero-valued initial conditions. Therefore, the starting-value problem can
be circumvented by concentrating on the estimation of η.
The conditional expectation of η, given the differenced data g = Q y, is
provided by the formula
16
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
0.75
0.5
0.25
0
0 π/4 π/2 3π/4 π
Figure 8. The gain of the Hodrick–Prescott lowpass filter with a smoothing param-
eter set to 100, 1,600 and 14,400.
ση2
D(η) = Ωη = ση2 I, D(δ) = Ωδ = σδ2 I and λ = (18)
σδ2
x = y − Q(Q Q)−1 Q y.
17
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
0.75
0.5
0.25
0
0 π/4 π/2 3π/4 π
Figure 9. The gain of the lowpass Butterworth filters of orders n = 6 and
n = 12 with a nominal cut-off point of 2π/3 radians.
Figure 8 depicts the frequency response of the lowpass H–P filter for various
values of the smoothing parameter λ. The innermost profile corresponds to the
highest value of the parameter, and it represents a filter that transmits only
the data elements of lowest frequency.
For all values of λ, the response of the H–P filter shows a gradual transition
from the pass band, which corresponds to the frequencies that are transmitted
by the filter, to the stop band, which corresponds to the frequencies that are
impeded.
Often, there is a requirement for a more rapid transition as well as a need
to control the location in frequency where the transitions occurs. These needs
can be served by the Butterworth filter, which is more amenable to adjustment.
Σ = {2IT − (LT + LT )}n−2 and M = {2IT + (LT + LT )}n , (21)
where LT is a matrix of order T with units on the first subdiagonal; and it can
be verified that
Q ΣQ = {2IT − (LT + LT )}n . (22)
Figure 9 shows the frequency response of the Butterworth filter for various
values of n and for a specific cut-off frequency, which is determined by the
18
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
parameter λ. The greater the value of n, the more rapid is the transition from
pass band to stop band.
β 1
y(t) = x(t) + ε(t). (3)
1 − φL 1 − φL
19
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
20
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
The part φ1 y(t − 1) + φ2 y(t − 2) comprising the lagged dependent variables can
be reparameterised as follows:
1 0 1 0 y(t − 1) y(t − 1)
[ φ1 φ2 ] = [θ ρ] . (7)
1 1 −1 1 y(t − 2)) ∇y(t − 1))
Here, the matrix that postmultiplies the row vector of the parameters is the
inverse of the matrix that premultiplies the column vector of the variables.
The sum β0 x(t) + β1 x(t − 1) can be reparametrised to become
1 1 0 1 x(t) x(t − 1)
[ β0 β1 ] = [κ δ] . (8)
1 0 1 −1 x(t − 1) ∇x(t)
Taking y(t − 1) from both sides of this equation and rearranging it gives
κ
∇y(t) = (1 − θ) x(t − 1) − y(t − 1) + ρ∇y(t − 1) + δ∇x(t) + ε(t)
1−θ
= λ {γx(t − 1) − y(t − 1)} + ρ∇y(t − 1) + δ∇x(t) + ε(t).
(10)
This is an elaboration of equation (5); and it includes the differenced sequences
∇y(t − 1) and ∇x(t). These are deemed to be stationary, as is the composite
error sequence γx(t − 1) − y(t − 1).
Observe that, in contrast to equation (5), the error-correction term of (10)
comprises the lagged value x(t − 1) in place of x(t). Had the reparametrising
transformation that has been employed in equation (7) also been used in (8),
then the consequence would have been to generate an error-correction term
of the form γx(t) − y(t − 1). It should also be observed that the parameter
associated with x(t) in (10), which is
κ β0 + β1
γ= = , (11)
1−φ 1 − φ1 − φ2
is the steady state gain of the transfer function from x(t) to y(t).
Additional lagged differences can be added to the equation (10); and this
is tantamount to increasing the number of lags of the dependent variable y(t)
and the number of lags of the input variable x(t) within equation (6).
21