Econometrics (EM2008) The K-Variable Linear Regression Model
Econometrics (EM2008) The K-Variable Linear Regression Model
Lecture 2
The k-variable linear regression model
Irene Mammi
1 / 46
outline
I References:
I Johnston, J. and J. DiNardo (1997), Econometrics Methods, 4th
Edition, McGraw-Hill, New York, Chapter 3.
2 / 46
the multivariate model
3 / 46
matrix formulation of the k-variable model
I matrices indicated by uppercase bold letters, vectors by lowercase
bold letters
I vectors generally taken as column vectors
I for example,
Y1 X21
Y2 X22
y= . x2 = .
.. ..
Yn X2n
are n × 1 vectors, also referred to n-vectors, containing the sample
observations on Y and X2
I the n sample observations on the k-variable model can be written as
.. .. .. .. ..
. . . . .
y = β 1 x 1 + β 2 x 2 + · · · + β k x k + u
.. .. .. .. ..
. . . . .
4 / 46
matrix formulation of the k-variable model (cont.)
y = Xβ + u
where
· · · Xk1
1 X21 β1
1 X22 · · · Xk2 β2
X = . and β= .
.. .. ..
.. . . . ..
1 X2n · · · Xkn βk
5 / 46
the algebra of least squares
e = y − Xb
RSS = e 0 e
= (y − Xb )0 (y − Xb )
= y 0 y − b 0 X 0 y − y 0 Xb + b 0 X 0 Xb
= y 0 y − 2b 0 X 0 y + b 0 X 0 Xb
6 / 46
the algebra of least squares (cont.)
I the first order-conditions for the minimization are
∂(RSS )
= −2X 0 y + 2X 0 Xb = 0
∂b
giving the normal equations
(X 0 X )b = X 0 y
(X 0 X )b = X 0 (Xb + e ) = (X 0 X )b + X 0 e
thus
X 0e = 0
which is another fundamental least-squares result
I the first element in this equation gives ∑ et = 0, that is,
ē = Ȳ − b1 − b2 X̄2 − · · · − bk X̄k = 0
7 / 46
the algebra of least squares (cont.)
I ⇒ the residuals have zero mean, and the regression plane passes
through the point of means in k dimensional space
I the remaining elements are of the form
∑ Xit et = 0 i = 2, . . . , k
t
which implies that each regressor has zero sample correlation with the
residuals
I this, in turn,implies that ŷ (= Xb ), the vector of the regression values
for Y , is uncorrelated with e, for
ŷ 0 e = (Xb )0 e = b 0 X 0 e = 0
8 / 46
the algebra of least squares (cont.)
9 / 46
the algebra of least squares (cont.)
and
Y1
1 Y2 ∑Y
1 1 ···
X 0y = .. =
X1 X2 ··· Xn . ∑ XY
Yn
giving
∑X ∑Y
n b1
=
∑X ∑ X2 b2 ∑ XY
or
nb1 + b2 ∑ X = ∑Y
b1 ∑ X + b2 ∑ X 2
= ∑ XY
10 / 46
the algebra of least squares (cont.)
I in a similar way, it may be shown that the normal equations for fitting
a three-variable equation by least squares are
nb1 + b2 ∑ X2 + b3 ∑ X3 = ∑Y
b1 ∑ X2 + b2 ∑ X22 + b3 ∑ X2 X3 = ∑ X2 Y
b1 ∑ X3 + b2 ∑ X2 X3 + b3 ∑ X32 = ∑ X3 Y
11 / 46
decomposition of the sum of squares
I the zero covariances between regressors and the residuals underlie the
decomposition of the sum of squares
I decomposing the y vector into the part explained by the regression
and the unexplained part, we have
y = ŷ + e = Xb + e
y 0 y = (ŷ + e )0 (ŷ + e ) = ŷ 0 ŷ + e 0 e = b 0 X 0 Xb + e 0 e
12 / 46
decomposition of the sum of squares (cont.)
where TSS indicates the total sum of squares in Y , and ESS and
RSS the explained and residual (unexplained) sum of squares
13 / 46
equation in deviation form
I alternatively, express all the data in the form of deviations from the
sample mean
I the least-squares equation is
b1 = Ȳ − b2 X̄2 − · · · − bk X̄k
14 / 46
equation in deviation form (cont.)
I nb: the least-squares slope coefficients b2 , . . . , bk are identical in both
forms of the regression equation, and so the residuals
I collecting all n observations, the deviation form of the equation may
be written compactly using a transformation matrix
1
A = In − ii0
n
X 0∗ y ∗ = (X 0∗ X ∗ )b 2
which are the familiar normal equations, except that now the data
have all been expresses in deviation form and the b 2 vector contains
the k − 1 slope coefficients and excludes the intercept term
16 / 46
equation in deviation form (cont.)
I the decomposition of the sum of squares may be expressed as
y 0∗ y ∗ = b 20 X 0∗ X ∗ b 2 + e0e
TSS = ESS + RSS
RSS/(n − k )
R̄ 2 = 1 −
TSS/(n − 1)
I the numerator and the denominator on the RHS are unbiased
estimators of the disturbance variance and the variance of Y
17 / 46
equation in deviation form (cont.)
I the relation between the adjusted and unadjusted coefficients is
n−1
R̄ 2 = 1 − (1 − R 2 )
n−k
1−k n−1 2
= + R
n−k n−k
I two alternative criteria for comparing the fit of specifications are the
Schwarz criterion
e0e k
SC = ln + ln n
n n
e0e 2k
AIC = ln +
n n
18 / 46
generalizing partial correlation
I the normal equations solve for b = (X 0 X )−1 X 0 y
I the residuals from the LS regression may be expressed as
e = y − Xb = y − X (X 0 X )−1 X 0 y = My
where
M = I − X (X 0 X ) −1 X 0
I M is a symmetric, idempotent matrix; it also has the properties that
MX = 0 and Me = e
I now write the general regression in partitioned form as
b2
y = x2 X∗ +e
b (2)
19 / 46
generalizing partial correlation (cont.)
I the normal equations for this setup are
0
x 2 x 2 x 20 X ∗
0
b2 x y
= 20
X 0∗ x 2 X 0∗ X ∗ b (2) X ∗y
b2 = (x 20 M ∗ x 2 )−1 (x 20 M ∗ y )
where
M ∗ = I − X ∗ (X 0∗ X ∗ )−1 X 0∗
M ∗ is a symmetric, idempotent matrix with the properties
M ∗ X ∗ = 0 and M ∗ e = e
I we have that
20 / 46
generalizing partial correlation (cont.)
I regressing the first vector on the second gives a slope coefficient,
which, using the simmetry and idempotency of M ∗ , gives the b2
coefficient defined above
I a simpler way to prove the same result is as follows: write the
partitioned regression as
y = x 2 b2 + X ∗ b ( 2 ) + e
I premultiplying by M ∗ , obtain
M ∗ y = ( M ∗ x 2 ) b2 + e
x 20 M ∗ y = (x 20 M ∗ x 2 )b2
21 / 46
inference in the k-variables equation
assumptions
E(u ) = 0
and
var(u ) = E(uu 0 ) = σ2 I
22 / 46
inference in the k-variables equation (cont.)
u1
u2
E(uu 0 ) = E . u1 u2 ··· un
..
un
E(u12 ) E ( u1 u2 ) · · · E ( u1 un )
E ( u2 u1 ) 2
E ( u2 ) · · · E ( u2 un )
=
.. .. .. ..
. . . .
E(un u1 ) E(un u2 ) · · · E(un2 )
var(u1 ) cov(u1 , u2 ) · · · cov(u1 , un )
cov(u2 , u1 ) var(u2 ) · · · cov(u2 , un )
=
.. .. .. ..
. . . .
cov(un , u1 ) cov(un , u2 ) · · · var(un )
2
0 ··· 0
σ
0 σ2 · · · 0
2
= . .. = σ I
.. ..
.. . . .
0 0 · · · σ2
23 / 46
inference in the k-variables equation (cont.)
24 / 46
inference in the k-variables equation (cont.)
Mean and Variance of b
b = (X 0 X ) −1 X 0 y
b = (X 0 X ) −1 X 0 (X β + u ) = β + (X 0 X ) −1 X 0 u
from which
b − β = (X 0 X ) −1 X 0 u
I take expectations (moving the expectation operator to the right past
non-stochastic terms such as X)
E (b − β ) = (X 0 X ) −1 X 0 E (u ) = 0
giving
E(b ) = β
25 / 46
inference in the k-variables equation (cont.)
I under the assumptions of the model, the LS estimators are
unbiased estimators of the β parameters
I to obtain the variance-covariance matrix of the LS estimators,
consider
var(b ) = E[(b − β)(b − β)0 ]
and substituting for b − β get
thus
var(b ) = σ2 (X 0 X )−1
26 / 46
inference in the k-variables equation (cont.)
Estimation of σ2
I exploiting the fact that the trace of a scalar is the scalar, write
E(u 0 Mu ) = E[tr(u 0 Mu )]
= E[tr(uu 0 M )]
= σ2 tr(M )
= σ2 tr(I ) − σ2 tr[X (X 0 X )−1 X 0 ]
= σ2 tr(I ) − σ2 tr[(X 0 X )−1 (X 0 X )]
= σ 2 (n − k )
27 / 46
inference in the k-variables equation (cont.)
I thus
e0e
s2 =
n−k
defines an unbiased estimator of σ2
I the square root s is the standard deviation of the Y values about the
regression plane; it is referred to as standard error of the
estimator or standard error of the regression (SER)
28 / 46
inference in the k-variables equation (cont.)
Gauss-Markov theorem
29 / 46
testing linear hypotheses about β
30 / 46
testing linear hypotheses about β (cont.)
I all these examples fit into the general linear framework
Rβ = r
31 / 46
testing linear hypotheses about β (cont.)
I we now derive a general testing procedure for the general linear
hypothesis
H0 : R β − r = 0
I given the LS estimator, we can compute the vector (Rb − r ), which
measures the discrepancy between expectation and observation
I if this vector is “large”, it casts doubt on the null hypothesis
I the distinction between “large” and “small” is determined from the
sampling distribution under the null, in this case, the distribution of
Rb when R β = r
I from the unbiasedness result, it follows that
E(Rb ) = R β
I therefore
u ∼ N (0, σ2 I )
I it follows that
b ∼ N [ β, σ2 (X 0 X )−1 ]
then
Rb ∼ N [R β, σ2 R (X 0 X )−1 R 0 ]
and so
R (b − β) ∼ N [0, σ2 R (X 0 X )−1 R 0 ]
I if the null hypothesis R β = r is true, then
34 / 46
testing linear hypotheses about β (cont.)
35 / 46
testing linear hypotheses about β (cont.)
I going back to the previous examples. . .
(i) H0 : β i = 0: Rb picks out bi and R (X 0 X )−1 R 0 picks out cii , the i th
diagonal element in (X 0 X )−1 . Thus we have
bi2 bi2
F = 2
= ∼ F (1, n − k )
s cii var(bi )
bi − β i0
t= ∼ t (n − k )
s.e.(bi )
bi ± t0.025 s.e.(bi )
36 / 46
testing linear hypotheses about β (cont.)
(iii) H0 : β 2 + β 3 = 1: Rb gives the sum of the two estimated coefficients,
b2 + b3 . Premultiplying (X 0 X )−1 by R gives a row vector whose
elements are the sum of the corresponding elements in the second and
third rows of (X 0 X )−1 . Forming the inner product with R 0 gives the
sum of the second and third elements of the row vector, that is,
c22 + 2c23 + c33 , noting that c23 = c32 . Thus
b3 − b4
t= p ∼ t (n − k )
var(b3 − b4 )
ESS/(k − 1)
F = ∼ F (k − 1, n − k )
RSS/(n − k )
R 2 / (k − 1)
F = ∼ F (k − 1, n − k )
(1 − R 2 ) / (n − k )
38 / 46
testing linear hypotheses about β (cont.)
(e 0∗ e ∗ − e 0 e )/k2
F = ∼ F ( k2 , n − k )
e 0 e/(n − k )
39 / 46
restricted and unrestricted regressions
40 / 46
fitting the restricted regressions
I question: how to fit the restricted regression?
I answer: 1) either work out each specific case from first principles; 2)
or derive a general formula into which specific cases can be fitted
I (1) as for the first approach, consider example (iii) with the regression
in deviation form,
y = b 2 x2 + b 3 x3 + e
I want to impose the restriction that b2 + b3 = 1. Substituting the
restriction in the regression gives
y = b2 x2 + (1 − b2 )x3 + e∗ or
(y − x3 ) = b2 (x2 − x3 ) + e∗
so as to form two new variables (y − x3 ) and (x2 − x3 ): the simple
regression of the first on the second (without the constant) gives the
restricted estimate of b2 ; the RSS from this regression is the
restricted RSS, e 0∗ e ∗ .
41 / 46
fitting the restricted regressions (cont.)
I (2) the general approach requires a b ∗ vector that minimizes the RSS
subject to the restrictions Rb ∗ = r . To do so set up the function
φ = (y − Xb ∗ )0 (y − Xb ∗ ) − 2λ0 (Rb ∗ − r )
42 / 46
fitting the restricted regressions (cont.)
I the residuals from the restricted regression are
e ∗ = y − Xb ∗
= y − Xb − X (b ∗ − b )
= e − X (b ∗ − b )
I transposing and multiplying, we obtain
e 0∗ e ∗ = e 0 e + (b ∗ − b )0 X 0 X (b ∗ − b )
I the process of substituting for (b ∗ − b ) and simplifying gives
e 0∗ e ∗ − e 0 e = (r − Rb )0 [R (X 0 X )−1 R 0 ]−1 (r − Rb )
where, apart from q, the expression on the RHS is the same as the
numerator in the F statistic
I thus an alternative expression of the test statistic for H0 : Rb = r is
(e 0∗ e ∗ − e 0 e )/q
F = ∼ F (q, n − k )
e 0 e/(n − k )
43 / 46
prediction
c 0 = 1 X2f · · · Xkf
var(c 0 b ) = c 0 var(b )c
44 / 46
prediction (cont.)
I if we assume normality for the error term, it follows that
c 0b − c 0 β
p ∼ N (0, 1)
var(c 0 b )
Ŷ − E(Yf )
pf ∼ t (n − k )
s c 0 (X 0 X ) −1 c
ef = Yf − Ŷf = uf − c 0 (b − β)
I squaring both sides and taking expectations gives the variance of the
prediction error
var(ef ) = σ2 + c 0 var(b )c
σ 2 (1 + c 0 (X 0 X ) −1 c )
Ŷf − Yf
∼ t (n − k )
1 + c 0 (X 0 X ) −1 c
p
s
46 / 46