0% found this document useful (0 votes)
8 views21 pages

Topics 2011

The document discusses key concepts in econometrics, focusing on expectations, conditional expectations, and regression models. It explains the relationship between joint and conditional densities, the minimum mean squared error prediction, and the implications of omitted variable bias in regression analysis. Additionally, it addresses hypotheses concerning subsets of regression coefficients and their statistical properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Topics 2011

The document discusses key concepts in econometrics, focusing on expectations, conditional expectations, and regression models. It explains the relationship between joint and conditional densities, the minimum mean squared error prediction, and the implications of omitted variable bias in regression analysis. Additionally, it addresses hypotheses concerning subsets of regression coefficients and their statistical properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

D.S.G.

POLLOCK: TOPICS IN ECONOMETRICS 2011

1. EXPECTATIONS AND CONDITIONAL EXPECTATIONS


The joint density function of x and y is

f (x, y) = f (x|y)f (y) = f (y|x)f (x), (1)

where  
f (x) = f (x, y)dx and f (y) = f (x, y)dy (2)
y x

are the marginal distributions of x and y respectively and where

f (y, x) f (y, x)
f (x|y) = and f (y|x) = (3)
f (y) f (x)

are the conditional distributions of x given y and of y given x.


The unconditional expectation of y ∼ f (y) is

E(y) = yf (y)dy. (4)
y

The conditional expectation of y given x is


 
f (y, x)
E(y|x) = yf (y|x)dy = y dy. (5)
y y f (x)

The expectation of the conditional expectation is an unconditional expectation:


  
f (y, x)
E{E(y|x)} = y dy f (x)dx
x y f (x)
 
= yf (y, x)f (x)dydx (6)
x y
   
= y f (y, x)dx dy = f (y)dy = E(y).
y x y

The conditional expectation of y given x is the minimum mean squared


error prediction
Proof. Let ŷ = E(y|x) and let π = π(x) be any other estimator. Then,
   2 
E (y − π)2 = E (y − ŷ) + (ŷ − π)
      (7)
= E (y − ŷ)2 + 2E (y − ŷ)(ŷ − π) + E (ŷ − π)2 .

In the second term, there is


 
 
E (y − ŷ)(ŷ − π) = (y − ŷ)(ŷ − π)f (x, y)∂y∂x
x y
   (8)
= (y − ŷ)f (y|x)∂y (ŷ − π)f (x)∂x = 0.
x y

1
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

Therefore, E{(y − π)2 } = E{(y − ŷ)2 } + E{(ŷ − π)2 } ≥ E{(y − ŷ)2 }, and the
assertion is proved.
The error in predicting y is uncorrelated with x. The proof of this depends
on showing that E(ŷx) = E(yx), where ŷ = E(y|x):

E(ŷx) = xE(y|x)f (x)dx
 
x

f (y, x)
= x y dy f (x)dx (9)
x y f (x)
 
= xyf (y, x)dydx = E(xy).
x y

The result can be expressed as E{(y − ŷ)x} = 0.


This result can be used in deriving expressions for the parameters α and
β of a linear regression of the form

E(y|x) = α + βx, (10)

from which and unconditional expectation is derived in the form of

E(y) = α + βE(x). (11)

The orthogonality of the prediction error implies that

0 = E{(y − ŷ)x} = E{(y − α − βx)x}


(12)
= E(xy) − αE(x) − βE(x2 ).

In order to eliminate αE(x) from this expression, equation (11) is multiplied


by E(x) and rearranged to give

αE(x) = E(x)E(y) − β{E(x)}2 . (13)

This substituted into (12) to give



E(xy) − E(x)E(y) = β E(x2 ) − {E(x)}2 , (14)

whence
E(xy) − E(x)E(y) C(x, y)
β= = . (15)
E(x ) − {E(x)}
2 2 V (x)
The expression
α = E(y) − βE(x) (16)
comes directly from (9).
Observe that, by substituting (16) into (10), the following prediction-error
equation for the conditional expectation is derived:

E(y|x) = E(y) + β{x − E(x)}. (17)

2
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

Thus, the conditional expectation of y given x is obtained by adjusting the


unconditional expectation by some proportion of the error in predicting x by
its expected value.

2. THE PARTITIONED REGRESSSION MODEL


Consider taking a regression equation in the form of

β1
y = [ X1 X2 ] + ε = X1 β1 + X2 β2 + ε. (1)
β2

Here [X1 , X2 ] = X and [β1 , β2 ] = β are obtained by partitioning the matrix X
and vector β of the equation y = Xβ + ε in a conformable manner. The normal
equations X  Xβ = X  y can be partitioned likewise. Writing the equations
without the surrounding matrix braces gives

X1 X1 β1 + X1 X2 β2 = X1 y, (2)


X2 X1 β1 + X2 X2 β2 = X2 y. (3)

From (2), we get the equation X1 X1 β1 = X1 (y − X2 β2 ) which gives an expres-
sion for the leading subvector of β̂ :

β̂1 = (X1 X1 )−1 X1 (y − X2 β̂2 ). (4)

To obtain an expression for β̂2 , we must eliminate β1 from equation (3). For
this purpose, we multiply equation (2) by X2 X1 (X1 X1 )−1 to give

X2 X1 β1 + X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 X1 (X1 X1 )−1 X1 y. (5)

When the latter is taken from equation (3), we get

X2 X2 − X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 y − X2 X1 (X1 X1 )−1 X1 y. (6)

On defining
P1 = X1 (X1 X1 )−1 X1 , (7)
can we rewrite (6) as

X2 (I − P1 )X2 β2 = X2 (I − P1 )y, (8)

whence
−1
β̂2 = X2 (I − P1 )X2 X2 (I − P1 )y. (9)

Now let us investigate the effect that conditions of orthogonality amongst


the regressors have upon the ordinary least-squares estimates of the regression
parameters. Consider a partitioned regression model, which can be written as

β1
y = [ X 1 , X2 ] + ε = X1 β1 + X2 β2 + ε. (10)
β2

3
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

It can be assumed that the variables in this equation are in deviation form.
Imagine that the columns of X1 are orthogonal to the columns of X2 such that
X1 X2 = 0. This is the same as assuming that the empirical correlation between
variables in X1 and variables in X2 is zero.
The effect upon the ordinary least-squares estimator can be seen by exam-
ining the partitioned form of the formula β̂ = (X  X)−1 X  y. Here we have

X1 X1 X1 X1 X2 X1 X1 0


X X = [ X1 X2 ] = = , (11)
X2 X2 X1 X2 X2 0 X2 X2

where the final equality follows from the condition of orthogonality. The inverse
of the partitioned form of X  X in the case of X1 X2 = 0 is

−1
 −1 X1 X1 0 (X1 X1 )−1 0
(X X) =  = . (12)
0 X2 X2 0 (X2 X2 )−1

We also have

X1 X1 y
Xy= y= . (13)
X2 X2 y
On combining these elements, we find that
 
β̂1 (X1 X1 )−1 0 X1 y (X1 X1 )−1 X1 y
= = . (14)
β̂2 0 (X2 X2 )−1 X2 y (X2 X2 )−1 X2 y

In this special case, the coefficients of the regression of y on X = [X1 , X2 ] can


be obtained from the separate regressions of y on X1 and y on X2 .
It should be understood that this result does not hold true in general. The
general formulae for β̂1 and β̂2 are those which we have given already under
(4) and (9):

β̂1 = (X1 X1 )−1 X1 (y − X2 β̂2 ),


 −1  (15)
β̂2 = X2 (I − P1 )X2 X2 (I − P1 )y, P1 = X1 (X1 X1 )−1 X1 .

It can be confirmed easily that these formulae do specialise to those under (14)
in the case of X1 X2 = 0.
The purpose of including X2 in the regression equation when, in fact,
interest is confined to the parameters of β1 is to avoid falsely attributing the
explanatory power of the variables of X2 to those of X1 .
Let us investigate the effects of erroneously excluding X2 from the regres-
sion. In that case, the estimate will be

β̃1 = (X1 X1 )−1 X1 y


= (X1 X1 )−1 X1 (X1 β1 + X2 β2 + ε) (16)
= β1 + (X1 X1 )−1 X1 X2 β2 + (X1 X1 )−1 X1 ε.

4
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

On applying the expectations operator to these equations, we find that

E(β̃1 ) = β1 + (X1 X1 )−1 X1 X2 β2 , (17)

since E{(X1 X1 )−1 X1 ε} = (X1 X1 )−1 X1 E(ε) = 0. Thus, in general, we have
E(β̃1 ) = β1 , which is to say that β̃1 is a biased estimator. The only circum-
stances in which the estimator will be unbiased are when either X1 X2 = 0 or
β2 = 0. In other circumstances, the estimator will suffer from a problem which
is commonly described as omitted-variables bias.
We need to ask whether it matters that the estimated regression parame-
ters are biased. The answer depends upon the use to which we wish to put the
estimated regression equation. The issue is whether the equation is to be used
simply for predicting the values of the dependent variable y or whether it is to
be used for some kind of structural analysis.
If the regression equation purports to describe a structural or a behavioral
relationship within the economy, and if some of the explanatory variables on
the RHS are destined to become the instruments of an economic policy, then
it is important to have unbiased estimators of the associated parameters. For
these parameters indicate the leverage of the policy instruments. Examples of
such instruments are provided by interest rates, tax rates, exchange rates and
the like.
On the other hand, if the estimated regression equation is to be viewed
solely as a predictive device—that it to say, if it is simply an estimate of the
function E(y|x1 , . . . , xk ) which specifies the conditional expectation of y given
the values of x1 , . . . , xn —then, provided that the underlying statistical mech-
anism which has generated these variables is preserved, the question of the
unbiasedness the regression estimates does not arise.

3. HYPOTHESES CONCERNING SUBSETS


OF THE REGRESSION COEFFICIENTS
Consider a set of linear restrictions on the vector β of a classical linear regression
model N (y; Xβ, σ 2 I) which take the form of

Rβ = r, (1)

where R is a matrix of order j × k and of rank j, which is to say that the j


restrictions are independent of each other and are fewer in number than the
parameters within β. We know that the ordinary least-squares estimator of β
is a normally distributed vector β̂ ∼ N {β, σ 2 (X  X)−1 }. It follow that
 
Rβ̂ ∼ N Rβ = r, σ 2 R(X  X)−1 R ; (2)

and, from this, we can immediately infer that


 −1
(Rβ̂ − r) R(X  X)−1 R (Rβ̂ − r)
2
∼ χ2 (j). (3)
σ

5
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

We have already established the result that

(T − k)σ̂ 2 (y − X β̂) (y − X β̂)


= ∼ χ2 (T − k) (4)
σ2 σ2

is a chi-square variate which is statistically independent of the chi-square variate

(β̂ − β) X  X(β̂ − β)


2
∼ χ2 (k) (5)
σ

derived from the estimator of the regression parameters. The variate of (4)
must also be independent of the chi-square of (3); and it is straightforward to
deduce that
  −1  
(Rβ̂ − r) R(X  X)−1 R (Rβ̂ − r) (y − X β̂) (y − X β̂)
F =
j T −k
(6)
  −1
(Rβ̂ − r) R(X  X)−1 R (Rβ̂ − r)
= 2
∼ F (j, T − k),
σ̂ j

which is to say that the ratio of the two independent chi-square variates, di-
vided by their respective degrees of freedom, is an F statistic. This statistic,
which embodies only know and observable quantities, can be used in testing
the validity of the hypothesised restrictions Rβ = r.
A specialisation of the statistic under (6) can also be used in testing an
hypothesis concerning a subset of the elements of the vector β. Let β  =
[β1 , β2 ] . Then the condition that the subvector β1 assumes the value of β1∗ can
be expressed via the equation

β1
[Ik1 , 0] = β1∗ . (7)
β2

This can be construed as a case of the equation Rβ = r where R = [Ik1 , 0] and


r = β1∗ .
In order to discover the specialised form of the requisite test statistic, let
us consider the following partitioned form of an inverse matrix:
−1
 −1
X1 X1 X1 X2
(X X) =
X2 X1 X2 X2
  (8)
{X1 (I − P2 )X1 }−1 − {X1 (I − P2 )X1 }−1 X1 X2 (X2 X2 )−1
= ,
−{X2 (I − P1 )X2 }−1 X2 X1 (X1 X1 )−1 {X2 (I − P1 )X2 }−1

Then, with R = [I, 0], we find that


 −1
R(X  X)−1 R = X1 (I − P2 )X1 (9)

6
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

It follows in a straightforward manner that the specialised form of the F statistic


of (6) is
    
(β̂1 − β1∗ ) X1 (I − P2 )X1 (β̂1 − β1∗ ) (y − X β̂) (y − X β̂)
F =
k1 T −k
(10)
 
(β̂1 − β1∗ ) X1 (I − P2 )X1 (β̂1 − β1∗ )
= ∼ F (k1 , T − k).
σ̂ 2 k1

By specialising the expression under (10), a statistic may be derived for


testing the hypothesis that βi = βi , concerning a single element:

(β̂i − βi )2
F = , (11)
σ̂ 2 wii

Here, wii stands for the ith diagonal element of (X  X)−1 . If the hypothesis is
true, then this will have an F (1, T − k) distribution.
However, the usual way of testing such an hypothesis is to use

β̂i − βi
t=  (12)
(σ̂ 2 wii )

in conjunction with the tables of the t(T −k) distribution. The t statistic shows
the direction in which the estimate of βi deviates from the hypothesised value
as well as the size of the deviation.

4. REGRESSIONS ON TRIGONOMETRICAL FUNCTIONS:


DISCRETE-TIME FOURIER ANALYSIS
An example of orthogonal regressors is a Fourier analysis, where the explana-
tory variables are sampled from a set of trigonometric functions with angular
velocities, called Fourier frequencies, that are evenly distributed in an interval
from zero to π radians per sample period.
If the sample is indexed by t = 0, 1, . . . , T − 1, then the Fourier frequencies
are ωj = 2πj/T ; j = 0, 1, . . . , [T /2], where [T /2] denotes the integer quotient of
the division of T by 2.
The object of a Fourier analysis is to express the elements of the sample
as a weighted sum of sine and cosine functions as follows:


[T /2]
yt = α0 + {αj cos(ωj t) + β sin(ωj t)} ; t = 0, 1, . . . , T − 1. (1)
j=1

The vectors of the generic trigonometric regressors may be denoted by

cj = [c0j , c1j , . . . cT −1,j ] and sj = [s0j , s1j , . . . sT −1,j ] , (3)

7
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

where ctj = cos(ωj t) and stj = sin(ωj t). The vectors of the ordinates of func-
tions of different frequencies are mutually orthogonal. Therefore, the following
orthogonality conditions hold:

ci cj = si sj = 0 if i = j,
(4)
and ci sj = 0 for all i, j.

In addition, there are some sums of squares which can be taken into account
in computing the coefficients of the Fourier decomposition:

c0 c0 = ι ι = T, s0 s0 = 0,
(5)
cj cj = sj sj = T /2 for j = 1, . . . , [(T − 1)/2]

When T = 2n, there is ωn = π and there is also

sn sn = 0, and cn cn = T. (6)

The “regression” formulae for the Fourier coefficients can now be given.
First, there is
1
α0 = (ι ι)−1 ι y = yt = ȳ. (7)
T t

Then, for j = 1, . . . , [(T − 1)/2], there are

2
αj = (cj cj )−1 cj y = yt cos ωj t, (8)
T t

and
2
βj = (sj sj )−1 sj y = yt sin ωj t. (9)
T t

If T = 2n is even, then there is no coefficient βn and there is

1
αn = (cn cn )−1 cn y = (−1)t yt . (10)
T t

By pursuing the analogy of multiple regression, it can be seen, in view of


the orthogonality relationships, that there is a complete decomposition of the
sum of squares of the elements of the vector y:


[T /2]


yy= α02 ι ι + αj2 cj cj + βj2 sj sj . (11)
j=1

Now consider writing α02 ι ι = ȳ 2 ι ι = ȳ  ȳ, where ȳ  = [ȳ, ȳ, . . . , ȳ] is a vector
whose repeated element is the sample mean ȳ. It follows that y  y − α02 ι ι =

8
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

y  y − ȳ  ȳ = (y − ȳ) (y − ȳ). Then, in the case where T = 2n is even, the equation


can be written as

T  2  T  2
n−1 n

(y − ȳ) (y − ȳ) = 2 2
α + βj + T αn = ρ . (12)
2 j=1 j 2 j=1 j

where ρj = αj2 + βj2 for j = 1, . . . , n − 1 and ρn = 2αn . A similar expression


exists when T is odd, with the exceptions that αn is missing and that the
summation runs to (T − 1)/2.
It follows that the variance of the sample can be expressed as

T −1
1  1 2
n
(yt − ȳ) =
2
(α + βj2 ). (13)
T t=0 2 j=1 j

The proportion of the variance that is attributable to the component at fre-


quency ωj is (αj2 + βj2 )/2 = ρ2j /2, where ρj is the amplitude of the component.
The number of the Fourier frequencies increases at the same rate as the
sample size T , and, if there are no regular harmonic components in the under-
ling process, then we can expect the proportion of the variance attributed to
the individual frequencies to decline as the sample size increases.
If there is a regular component, then we can expect the the variance at-
tributable to it to converge to a finite value as the sample size increases.
In order provide a graphical representation of the decomposition of the
sample variance, we must scale the elements of equation (36) by a factor of T .
The graph of the function I(ωj ) = (T /2)(αj2 + βj2 ) is know as the periodogram.
We should note that the trigonometric functions that underlie the peri-
odogram are defined over an infinite index set {t = 0 ± 1, ±2, . . .} comprising
all positive and negative integers. As the index t progresses through this se-
quence, equation (1) will generate successive replications of the data set that is
defined on the points t = 0, 1, . . . , T − 1. The result is described as the periodic
extension of the data.
Since the trigonometric functions are both periodic and strictly bounded,
they cannot be used to synthesise a perpetual trend. Therefore, before a peri-
odogram analysis can be applied effectively, it is necessary to ensure that the
data sequence is free of trend. A failure to detrend the data will lead to suc-
cessive disjunctions in its periodic extension at the points where the beginning
of one replication of the data set joins the end of the previous replication. The
result will be a so-called saw tooth function.
The typical periodogram of a saw tooth function has the form of a rectangu-
lar hyperbola, which descends from a high point at the fundamental frequency
of ω1 = 2π/T and which reaches a low point in the vicinity of the limiting
frequency of π. Within such a periodogram, the features of primary interest,
which relate to the fluctuations that are superimposed upon the trend, may be
virtually imperceptible—see Figure 5, for example.

9
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

5.4

5.2

4.8
0 25 50 75 100 125
Figure 1. The plot of 132 monthly observations on the U.S. money supply,
beginning in January 1960. A quadratic function has been interpolated
through the data.

0.015

0.01

0.005

0
0 π/4 π/2 3π/4 π
Figure 2. The periodogram of the residuals of the logarithmic money-
supply data.

The Periodogram and the Autocovariance Function


A stationary stochastic process can be characterised, equivalently, by its auto-
covariance function or its partial autocovariance function.
It can also be characterised by is spectral density function, which is the
Fourier transform of the autocovariances {γτ ; τ = 0, ±1, ±2, . . .} :

 ∞

f (ω) = γτ cos(ωτ ) = γ0 + 2 γτ cos(ωτ ). (14)
τ =−∞ τ =1

Here, ω ∈ [0, π] is an angular velocity, or frequency value, in radians per period.


The empirical counterpart of the spectral density function is the peri-
odogram I(ωj ), which may be defined as


T −1 
T −1
1
I(ωj ) = cτ cos(ωj τ ) = c0 + 2 cτ cos(ωj τ ), (15)
2 τ =1
τ =1−T

10
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

where {cτ ; τ = 0, ±1, . . . , ±(T − 1)}, with


T −1
−1
cτ = T (yt − ȳ)(yt−τ − ȳ), (16)
t=τ

are the empirical autocovariances, and where ωj = 2πj/T ; j = 0, 1, . . . , [T /2]


are the Fourier frequencies and We need to show this definition of the perido-
gram is equivalent to the previous definition, which was based on the following
frequency decomposition of the sample variance:

T −1
1  1  2
[T /2]
(yt − ȳ) =
2
(α + βj2 ), (17)
T t=0 2 j=0 j

where
2 2
αj = yt cos(ωj t) = (yt − ȳ) cos(ωj t),
T t T t
2 2
βj = yt sin(ωj t) = (yt − ȳ) sin(ωj t).
T t T t

Substituting these into the term T (αj2 + βj2 )/2 gives the periodogram
 T −1 2  T
−1 2 
2 
I(ωj ) = cos(ωj t)(yt − ȳ) + sin(ωj t)(yt − ȳ) .
T t=0 t=0

The quadratic terms may be expanded to give


 
2
I(ωj ) = cos(ωj t) cos(ωj s)(yt − ȳ)(ys − ȳ)
T t s
 
2
+ sin(ωj t) sin(ωj s)(yt − ȳ)(ys − ȳ) ,
T t s

Since cos(A) cos(B) + sin(A) sin(B) = cos(A − B), this can be written as
 
2
I(ωj ) = cos(ωj [t − s])(yt − ȳ)(ys − ȳ)
T t s


On defining τ = t − s and writing cτ = t (yt − ȳ)(yt−τ − ȳ)/T , we can reduce
the latter expression to


T −1
I(ωj ) = 2 cos(ωj τ )cτ ,
τ =1−T

which is a Fourier transform of the empirical autocovariances.

11
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

5. FILTERING MACROECONOMIC DATA

Wiener–Kolmogorov Filtering of Stationary Sequences


The classical theory of linear filtering was formulated independently by Norbert
Wiener (1941) and Andrei Nikolaevich Kolmogorov (1941) during the Second
World War. They were both considering the problem of how to target radar-
assisted anti-aircraft guns on incoming enemy aircraft.
The theory has found widespread application in analog and digital signal
processing and in telecommunications in general. Also, it has provided a basic
technique for the enhancement of recorded music.
The classical theory assumes that the data sequences are generated by
stationary stochastic processes and that these are of sufficient length to justify
the assumption that they constitute doubly-infinite sequences.
For econometrics, the theory must to be adapted to cater to short trended
sequences. Then, Wiener–Kolmogorov filters can used to extract trends from
economic data sequences and for generating seasonally adjusted data.
Consider a vector y with a signal component ξ and a noise component η:

y = ξ + η. (1)

These components are assumed to be independently normally distributed with


zero means and with positive-definite dispersion matrices. Then,

E(ξ) = 0, D(ξ) = Ωξ ,
E(η) = 0, D(η) = Ωη , (2)
and C(ξ, η) = 0.

A consequence of the independence of ξ and η is that

D(y) = Ωξ + Ωη and C(ξ, y) = D(ξ) = Ωξ . (3)

The signal component is estimated by a linear transformation x = Ψx y


of the data vector that suppresses the noise component. Usually, the signal
comprises low-frequency elements and the noise comprises elements of higher
frequencies.

The Minimum Mean-Squared Error Estimator


The principle of linear minimum mean-squared error estimation indicates that
the error ξ − x in representing ξ by x should be uncorrelated with the data in
y:
0 = C(ξ − x, y) = C(ξ, y) − C(x, y)
= C(ξ, y) − Ψx C(y, y) (4)
= Ωξ − Ψx (Ωξ + Ωη ).
This indicates that the estimate is

x = Ψx y = Ωξ (Ωξ + Ωη )−1 y. (5)

12
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

The corresponding estimate of the noise component η is

h = Ψh y = Ωη (Ωξ + Ωη )−1 y
(6)
= {I − Ωξ (Ωξ + Ωη )−1 }y.

It will be observed that Ψξ + Ψη = I and, therefore, that x + h = y.

Conditional Expectations

In deriving the estimator, we might have used the formula for conditional ex-
pectations. In the case of two linearly related scalar random variables ξ and y,
the conditional expectation of ξ given y is

C(ξ, y)
E(ξ|y) = E(ξ) + {y − E(y)} (7)
V (y)

In the case of two vector quantities, this becomes

E(ξ|y) = E(ξ) + C(ξ, y)D−1 (y){y − E(y)} (8)

By setting
C(ξ, y) = Ωξ and D(y) = Ωξ + Ωη

as in (3), and by setting E(ξ) = E(y) = 0, we get the expression that is to be


found under (5):
x = Ωξ (Ωξ + Ωη )−1 y.

The Difference Operator and Polynomial Regression

The lag operator L, which is commonly defined in respect of a doubly-infinite


sequence x(t) = {xt ; t = 0 ± 1, ±2, . . .}, has the effect that Lx(t) = x(t − 1).
The (backwards) difference operator ∇ = 1−L has the effect that ∇x(t) =
x(t) − x(t − 1). It serves to reduce a constant function to zero and to reduce a
linear function to a constant. The second-order or twofold difference operator

∇2 = 1 − 2L + L2

is effective in reducing a linear function to zero.


A difference operator ∇d of order d is commonly employed in the context
of an ARIMA(p, d, q) model to reduce the data to stationarity. Then, the differ-
enced data can be modelled by an ARMA(p, q) process. In such circumstances,
the difference operator takes the form of a matrix transformation.

13
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

0
0 π/4 π/2 3π/4 π
Figure 3. The squared gain of the difference operator, which has a zero at zero
frequency, and the squared gain of the summation operator, which is unbounded at
zero frequency.

The Matrix Difference Operator

The matrix analogue of the second-order difference operator in the case of


T = 5, for example, is given by
 
1 0 0 0 0
 −2 1 0 0 0
Q∗  
∇5 =
2
 =
 1 −2 1 0 0
.
 (9)
Q  
0 1 −2 1 0
0 0 1 −2 1
The first two rows, which do not produce true differences, are liable to be
discarded.
The difference operator nullifies data elements at zero frequency and it
severely attenuates those at the adjacent frequencies. This is a disadvantage
when the low frequency elements are of primary interest. Another way of
detrending the data is to fit a polynomial trend by least-squares regression and
to take the residual sequence as the detrended data.

Polynomial Regression
Using the matrix Q defined above, we can represent the vector of the ordinates
of a linear trend line interpolated through the data sequence as

x = y − Q(Q Q)−1 Q y. (10)

The vector of the residuals is

e = Q(Q Q)−1 Q y. (11)

Observe that this vector contains exactly the same information as the
differenced vector g = Q y. However, whereas the low-frequency structure of

14
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

11.5

11

10.5

10
0 50 100 150

Figure 4. The quarterly series of the logarithms of consumption in the U.K., for
the years 1955 to 1994, together with a linear trend interpolated by least-squares
regression.

0
0 π/4 π/2 3π/4 π

Figure 5. The periodogram of the trended logarithmic data.

0.3

0.2

0.1

0
0 π/4 π/2 3π/4 π

Figure 6. The periodogram of the differenced logarithmic consumpption data.

15
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

0.01

0.0075

0.005

0.0025

0
0 π/4 π/2 3π/4 π
Figure 7. The periodogram of the residual sequence obtained from the linear de-
trending of the logarithmic consumption data.

the data in invisible in the periodogram of the latter, it is entirely visible in


the periodogram of the residuals.

Filters for Short Trended Sequences


Applying Q to the equation y = ξ + η, representing the trended data, gives

Q y = Q ξ + Q η
(12)
= δ + κ = g.

The vectors of the expectations and the dispersion matrices of the differenced
vectors are
E(δ) = 0, D(δ) = Ωδ = Q D(ξ)Q,
(13)
E(κ) = 0, D(κ) = Ωκ = Q D(η)Q.
The difficulty of estimating the trended vector ξ = y − η directly is that some
starting values or initial conditions are required in order to define the value at
time t = 0. However, since η is from a stationary mean-zero process, it requires
only zero-valued initial conditions. Therefore, the starting-value problem can
be circumvented by concentrating on the estimation of η.
The conditional expectation of η, given the differenced data g = Q y, is
provided by the formula

h = E(η|g) = E(η) + C(η, g)D−1 (g){g − E(g)}


(14)
= C(η, g)D−1 (g)g,

where the second equality follows in view of the zero-valued expectations.


Within this expression, there are

D(g) = Ωδ + Q Ωη Q and C(η, g) = Ωη Q. (15)

16
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

0.75

0.5

0.25

0
0 π/4 π/2 3π/4 π
Figure 8. The gain of the Hodrick–Prescott lowpass filter with a smoothing param-
eter set to 100, 1,600 and 14,400.

Putting these details into (14) gives the following estimate of η:

h = Ωη Q(Ωδ + Q Ωη Q)−1 Q y. (16)

Putting this into the equation x = y − h gives

x = y − Ωη Q(Ωδ + Q Ωη Q)−1 Q y. (17)

The Leser (H–P) Filter


We now consider two specific cases of the Wiener–Kolmogorov filter. First,
there is the Leser or Hodrick–Prescott (H–P) filter. This can be derived from
a model that supposes that the signal is generated by an integrated (second-
order) random walk and and that the noise is from a white-noise process.
The random walk process is reduced to a white-noise process δ(t) by taking
twofold differences. Thus, (1 − L)2 ξ(t) = δ(t), and the corresponding equation
for the sample is Q ξ = δ. Accordingly, the filter is derived by setting

ση2
D(η) = Ωη = ση2 I, D(δ) = Ωδ = σδ2 I and λ = (18)
σδ2

within (17) to give


x = y − Q(λ−1 I + Q Q)−1 Q y. (19)
Here, λ is the so-called smoothing parameter. It will be observed that, as
λ → ∞, the vector x tends to that of a linear function interpolated into the
data by least-squares regression, which is represented by equation (10):

x = y − Q(Q Q)−1 Q y.

17
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

0.75

0.5

0.25

0
0 π/4 π/2 3π/4 π
Figure 9. The gain of the lowpass Butterworth filters of orders n = 6 and
n = 12 with a nominal cut-off point of 2π/3 radians.

Figure 8 depicts the frequency response of the lowpass H–P filter for various
values of the smoothing parameter λ. The innermost profile corresponds to the
highest value of the parameter, and it represents a filter that transmits only
the data elements of lowest frequency.
For all values of λ, the response of the H–P filter shows a gradual transition
from the pass band, which corresponds to the frequencies that are transmitted
by the filter, to the stop band, which corresponds to the frequencies that are
impeded.
Often, there is a requirement for a more rapid transition as well as a need
to control the location in frequency where the transitions occurs. These needs
can be served by the Butterworth filter, which is more amenable to adjustment.

The Butterworth Filter


The Butterworth filter can be derived from an heuristic model in which the
signal and the noise are generated by processes that are described, respectively,
by the equations (1 − L)2 ξ(t) = (1 + L)n ζ(t) and (1 − L)2 η(t) = (1 − L)n ε(t),
where ζ(t) and η(t) are mutually independent white-noise processes.
The filter that is appropriate to short trended sequences can be represented
by the equation
x = y − λΣQ(M + λQ ΣQ)−1 Q y. (20)
Here, the matrices are

Σ = {2IT − (LT + LT )}n−2 and M = {2IT + (LT + LT )}n , (21)

where LT is a matrix of order T with units on the first subdiagonal; and it can
be verified that
Q ΣQ = {2IT − (LT + LT )}n . (22)
Figure 9 shows the frequency response of the Butterworth filter for various
values of n and for a specific cut-off frequency, which is determined by the

18
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

parameter λ. The greater the value of n, the more rapid is the transition from
pass band to stop band.

LAGGED DEPENDENT VARIABLES AND


ERROR CORRECTION MECHANISMS
A regression equation can also be set in motion by including lagged values of
the dependent variable on the RHS. With one lagged value, we get

y(t) = φy(t − 1) + βx(t) + ε(t). (1)

In terms of the lag operator, this is

(1 − φL)y(t) = βx(t) + ε(t), (2)

of which the rational form is

β 1
y(t) = x(t) + ε(t). (3)
1 − φL 1 − φL

The advantage of equation (1) is that it is amenable to estimation by


ordinary least-squares regression. Although the estimates will be biased in
finite samples, they will be consistent, if the model is correctly specified.
The disadvantage is the restrictive assumption that the systematic and
disturbance parts have the same dynamics.

Partial Adjustment and Adaptive Expectations


A simple partial-adjustment model has the form
 
y(t) = λ γx(t) + (1 − λ)y(t − 1) + ε(t), (4)

If y(t) is current consumption, x(t) is disposable income, then γx(t) = y ∗ (t) is


“desired”consumption. If habits of consumption persist, then current consump-
tion will be a weighted combination of the previous consumption and present
desired consumption.
The weights of the combination depend on the partial-adjustment param-
eter λ ∈ (0, 1]. If λ = 1, then the consumers adjust their consumption instan-
taneously to the desired value. As λ → 0, their consumption habits become
increasingly persistent.
When the notation λγ = β and (1 − λ) = φ is adopted, equation (4)
becomes identical to equation (2), which is the regression model with a lagged
dependent variable. Observe also that γ = β/(1 − φ) is a “long-term multipier”
in the relationhip between iincome and comsumption. It can also be described
as the steady-state gain of the transfer function from x(t) to y(t), which is
depocted in equation (3).

19
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

Error-Correction Forms, and Nonstationary Signals


The usual linear regression procedures presuppose that the relevant data
moments will converge asymptotically to fixed limits as the sample size in-
creases. This cannot happen if the data are trended, in which case, the standard
techniques of statistical inference will not be applicable.
A common approach is to subject the data to as many differencing opera-
tions as may be required to achieve stationarity. However, differencing tends to
remove some of the essential information regarding the behaviour of economic
agents. Moreover, it is often discovered that the regression model looses much
of its explanatory power when the differences of the data are used instead.
In such circumstances, one might use the so-called error-correction model.
The model depicts a mechanism whereby two trended economic variables main-
tain an enduring long-term proportionality with each other.
The data sequences comprised by the model are stationary, either individ-
ually or in an appropriate combination; and this enables us apply the standard
procedures of statistical inference that are appropriate to models comprising
data from stationary processes.
Consider taking y(t − 1) from both sides of the equation of (1), which
represents the first-order dynamic model. This gives
∇y(t) = y(t) − y(t − 1) = (φ − 1)y(t − 1) + βx(t) + ε(t)
 
β
= (1 − φ) x(t) − y(t − 1) + ε(t) (5)
1−φ
 
= λ γx(t) − y(t − 1) + ε(t),
where λ = 1 − φ and where γ is the gain of the transfer function from x(t)
to y(t) defined under (3). This is the so-called error-correction form of the
equation; and it indicates that the change in y(t) is a function of the extent to
which the proportions of the series x(t) and y(t − 1) differs from those which
would prevail in the steady state.
The error-correction form provides the basis for estimating the parameters
of the model when the signal series x(t) is trended or nonstationary. A pair
of nonstationary series that maintain a long-run proportionality are said to
be cointegrated. It is easy to obtain an accurate estimate of γ, which is the
coefficient of proportionality, simply by running a regression of y(t − 1) on x(t).
Once a value for γ is available, the remaining parameter λ may be es-
timated by regressing ∇y(t) upon the composite variable {γx(t) − y(t − 1)}.
However, if the error-correction model is an unrestricted reparametrisation of
an original model in levels, then its parameters can be estimated by ordinary
least-squares regression. The same estimates can also be inferred from the
least-squares estimates of the parameters of the original model in levels. A
pair of nonstationary series that maintain a long-run proportionality are said
to be cointegrated. It is easy to obtain an accurate estimate of γ, which is the
coefficient of proportionality, simply by running a regression of y(t − 1) on x(t).
To see how to derive an error-correction form for a more general autore-
gressive distributed-lag model, consider the second-order model:
y(t) = φ1 y(t − 1) + φ2 y(t − 2) + β0 x(t) + β1 x(t − 1) + ε(t). (6)

20
D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011

The part φ1 y(t − 1) + φ2 y(t − 2) comprising the lagged dependent variables can
be reparameterised as follows:
  
1 0 1 0 y(t − 1) y(t − 1)
[ φ1 φ2 ] = [θ ρ] . (7)
1 1 −1 1 y(t − 2)) ∇y(t − 1))

Here, the matrix that postmultiplies the row vector of the parameters is the
inverse of the matrix that premultiplies the column vector of the variables.
The sum β0 x(t) + β1 x(t − 1) can be reparametrised to become
  
1 1 0 1 x(t) x(t − 1)
[ β0 β1 ] = [κ δ] . (8)
1 0 1 −1 x(t − 1) ∇x(t)

It follows that equation (61) can be recast in the form of

y(t) = θy(t − 1) + ρ∇y(t − 1) + κx(t − 1) + δ∇x(t) + ε(t). (9)

Taking y(t − 1) from both sides of this equation and rearranging it gives
 
κ
∇y(t) = (1 − θ) x(t − 1) − y(t − 1) + ρ∇y(t − 1) + δ∇x(t) + ε(t)
1−θ
= λ {γx(t − 1) − y(t − 1)} + ρ∇y(t − 1) + δ∇x(t) + ε(t).
(10)
This is an elaboration of equation (5); and it includes the differenced sequences
∇y(t − 1) and ∇x(t). These are deemed to be stationary, as is the composite
error sequence γx(t − 1) − y(t − 1).
Observe that, in contrast to equation (5), the error-correction term of (10)
comprises the lagged value x(t − 1) in place of x(t). Had the reparametrising
transformation that has been employed in equation (7) also been used in (8),
then the consequence would have been to generate an error-correction term
of the form γx(t) − y(t − 1). It should also be observed that the parameter
associated with x(t) in (10), which is

κ β0 + β1
γ= = , (11)
1−φ 1 − φ1 − φ2

is the steady state gain of the transfer function from x(t) to y(t).
Additional lagged differences can be added to the equation (10); and this
is tantamount to increasing the number of lags of the dependent variable y(t)
and the number of lags of the input variable x(t) within equation (6).

21

You might also like