LECTURE2
LECTURE2
If the value of x is know, then the best predictor is the conditional expectation
of y given x which is defined as
Z
f (x, y)
E(y|x) = y dy
y f (x)
(53) Z
= yf (y|x)dy,
y
15
D.S.G. POLLOCK: INTRODUCTORY ECONOMETRICS
y = E(y|x) + ε
(56)
= α + xβ + ε.
whence
Equation (57) shows that the regression line passes through the point E(x, y) =
{E(x), E(y)} which is the expected value of the joint distribution.
By putting (58) into (55), we find that
© ª
(59) E(y|x) = E(y) + β x − E(x) ,
which shows how the conditional expectation of y differs from the unconditional
expectation in proportion to the error of predicting x by taking its expected
value.
Now let us multiply (55) by x and f (x) and then integrate with respect to
x to provide
16
2: ELEMENTARY REGRESSION
E(xy) − E(x)E(y)
β= © ª2
E(x2 ) − E(x)
h© ª© ªi
E x − E(x) y − E(y)
(63) = h© ª2 i
E x − E(x)
C(x, y)
= .
V (x)
Thus we have expressed α and β in terms of the moments E(x), E(y), V (x)
and C(x, y) of the joint distribution of x and y.
It should be recognised that the prediction error ε = y−E(y|x) = y−α−xβ
is uncorrelated with the variable x. This is shown by writing
h© ª i
(64) E y − E(y|x) x = E(yx) − αE(x) − βE(x2 ) = 0,
where the final equality comes from (60). This result is readily intelligible; for,
if the prediction error were correlated with the value of x, then we should not
be using the information of x efficiently in predicting y.
Empirical Regressions
Imagine that we have a sample of T observations on x and y which are
(x1 , y1 ), (x2 , y2 ), . . . , (xT , yT ). Then we can calculate the following empirical or
sample moments:
1X
T
(65) x̄ = xt ,
T t=1
1X
T
(66) ȳ = yt ,
T t=1
1X 1X 1X 2
T T T
(67) Sx2 = (xt − x̄) =
2
(xt − x̄)xt = x − x̄2 ,
T t=1 T t=1 T t=1 t
1X 1X 1X
T T T
(68) Sxy = (xt − x̄)(yt − ȳ) = (xt − x̄)yt = xt yt − x̄ȳ.
T t=1 T t=1 T t=1
17
D.S.G. POLLOCK: INTRODUCTORY ECONOMETRICS
α̂ = ȳ − β̂ x̄,
(69) P
(xt − x̄)(yt − ȳ)
β̂ = P .
(xt − x̄)2
(70) y = α + xβ + ε
X
T
S= (yt − ŷt )2
t=1
(72)
X
T
= (yt − α − xt β)2 .
t=1
18
2: ELEMENTARY REGRESSION
Next, by differentiating with respect to β and setting the result to zero, we get
X
(75) −2 xt (yt − α − βxt ) = 0.
On substituting for α from (74) and eliminating the factor −2, this becomes
X X X
(76) xt yt − xt (ȳ − β x̄) − β x2t = 0,
whence we get
P
xt yt − T x̄ȳ
β̂ = P 2
xt − T x̄2
(77) P
(xt − x̄)(yt − ȳ)
= P .
(xt − x̄)2
This expression is identical to the one under (69) which we have derived by the
method of moments. By putting β̂ into the estimating equation for α under
(74), we derive the same estimate α̂ for the intercept parameter as the one to
be found under (69).
It is notable that the equation (75) is the empirical analogue of the equation
(64) which expresses the condition that the prediction error is uncorrelated with
the values of x.
The method of least squares does not automatically provide an estimate
of σ = E(ε2t ). To obtain an estimate, we may invoke the method of moments
2
which, in view of the fact that the regression residuals et = yt −α̂−β̂xt represent
estimates of the corresponding values of εt , suggests an estimator in the form
of
1X 2
(78) σ̃ 2 = et .
T
In fact, this is a biased estimator with
¡ ¢ © ª
(79) E T σ̃ 2 = T − 2 σ 2 ;
19
D.S.G. POLLOCK: INTRODUCTORY ECONOMETRICS
On dividing the first of these equations by T are rearranging it, we get the
estimating equation for α:
(86) α(β1 , β2 ) = ȳ − x̄1 β1 − x̄2 β2 ,
−1
P −1
P
where x̄1 = T t xt1 and x̄ 2 = T t xt2 . When this is substituted into
the equations (84) and (85) they become
X n o
(87) 0= xt1 (yt − ȳ) − (xt1 − x̄1 )β1 − (xt2 − x̄2 )β2 ,
t
X n o
(88) 0= xt2 (yt − ȳ) − (xt1 − x̄1 )β1 − (xt2 − x̄2 )β2 .
t
20
2: ELEMENTARY REGRESSION
1X 1X
T T
(89) S11 = (xt1 − x̄1 )2 = (xt1 − x̄1 )xt1 ,
T t=1 T t=1
1X 1X
T T
(90) S22 = (xt2 − x̄2 ) =
2
(xt2 − x̄2 )xt2 ,
T t=1 T t=1
1X 1X
T T
(91) S12 = (xt1 − x̄1 )(xt2 − x̄2 ) = (xt1 − x̄1 )xt2 ,
T t=1 T t=1
1X 1X
T T
(92) S1y = (xt1 − x̄1 )(yt − ȳ) = (xt1 − x̄1 )yt ,
T t=1 T t=1
1X 1X
T T
(93) S2y = (xt2 − x̄2 )(yt − ȳ) = (xt2 − x̄2 )yt .
T t=1 T t=1
In these terms, the pair of equations under (87) and (88) become
21
D.S.G. POLLOCK: INTRODUCTORY ECONOMETRICS
together with the expression for β̂2 of (97). The estimate of α, which comes
from substituting β̂1 and β̂2 into equation (86), is
(102) y = β0 + β1 x1 + · · · + βk xk + ε,
(104) y = Xβ + ε.
S(β) = ε0 ε
= (y − Xβ)0 (y − Xβ)
(105)
= y 0 y − y 0 Xβ − β 0 X 0 y + β 0 X 0 Xβ
= y 0 y − 2y 0 Xβ + β 0 X 0 Xβ.
∂S
(106) = −2y 0 X + 2β 0 X 0 X.
∂β
22
2: ELEMENTARY REGRESSION
(107) X 0 Xβ = X 0 y.
On the assumption that the inverse matrix exists, the equations have a unique
solution which is the vector of ordinary least-squares estimates:
(108) β̂ = (X 0 X)−1 X 0 y.
Here, [X1 , X2 ] = X and [β10 , β20 ]0 = β are obtained by partitioning the matrix
X and vector β in a conformable manner. The normal equations of (107) can
be partitioned likewise. Writing the equations without the surrounding matrix
braces gives
To obtain an expression for β̂2 , we must eliminate β1 from equation (111). For
this purpose, we multiply equation (110) by X20 X1 (X10 X1 )−1 to give
(113) X20 X1 β1 + X20 X1 (X10 X1 )−1 X10 X2 β2 = X20 X1 (X10 X1 )−1 X10 y.
On defining
23
D.S.G. POLLOCK: INTRODUCTORY ECONOMETRICS
whence
n o−1
(117) β̂2 = X20 (I − P1 )X2 X20 (I − P1 )y.
(118) yt = α + xt β + εt , t = 1, . . . , T
y = [y1 , y2 , . . . , yT ]0 ,
x = [x1 , x2 , . . . , xT ]0 ,
(119)
ε = [ε1 , ε2 , . . . , εT ]0 ,
i = [1, 1, . . . , 1]0 .
Here the vector i = [1, 1, . . . , 1]0 , which consists of T units, is described alter-
natively as the dummy vector or the summation vector.
In terms of the vector notation, the equation of (118) can be written as
(120) y = iα + xβ + ε,
© ª−1 0
β̂ = x0 (I − Pi )x x (I − Pi )y, with
(122) 1 0
Pi = i(i0 i)−1 i0 = ii .
T
24
2: ELEMENTARY REGRESSION
To understand the effect of the operator Pi in this context, consider the follow-
ing expressions:
X
T
0
iy= yt ,
t=1
1X
(123) T
0 −1 0
(i i) iy= yt = ȳ,
T t=1
Pi y = i(i0 i)−1 i0 y = [ȳ, ȳ, . . . , ȳ]0 .
X
T X
T X
T
0
(124) x (I − Pi )x = xt (xt − x̄) = (xt − x̄)xt = (xt − x̄)2 .
t=1 t=1 t=1
P P
The final equality depends upon the fact that (xt − x̄)x̄ = x̄ (xt − x̄) = 0.
On using the results under (123) and (124) in the equations (121) and
(122), we find that
(125) α̂ = ȳ − x̄β̂,
P P
(xt − x̄)yt (xt − x̄)(yt − ȳ)
(126) β̂ = P t
= tP ,
t (xt − x̄)xt t (xt − x̄)
2
which lacks an intercept term. The estimate for the intercept term can be
recovered from the equation (125) once the value for β̂ is available.
25
D.S.G. POLLOCK: INTRODUCTORY ECONOMETRICS
If we define the matrix X = [xtj − x̄j ] and the vectors y = [yt − ȳ] and
ε = [εt − ε̄], then we can retain the summary notation y = Xβ + ε which now
denotes equation (128) instead of equation (103).
As an example of this device, let us consider the equation
(132) y = x1 β1 + x2 β2 + ε,
26
2: ELEMENTARY REGRESSION
and that
© ª−1 0
β̂2 = x02 (1 − P1 )x2 x2 (1 − P1 )y
(135) © −1
ª−1 © −1
ª
= S22 − S21 S11 S12 S2y − S21 S11 S1y .
These are the matrix versions of the formulae which have already appeared
under (96) and (97).
27