1 Regression Analysis and Least Squares Estimators
1 Regression Analysis and Least Squares Estimators
Abstract
We consider the regression model: Yi = Xi0 + ui , i = 1, . . . , n. This note summa that makes
rizes the results for asymptotic analysis of the least squares estimator ,
and (ii) hyit possible to test: (i) hypotheses about the individual coecients of ;
potheses that involve several coecients, such as R = r, where R is a known matrix
and r is a known vector.
[Matrix: Y = X + U.]
(1)
0
0
0
0
with = ( 0 1 . . . k ) , Xi = (1 x1i . . . xki ) , Y = (Y1 . . . Yn ) , X = (X1 . . . Xn ) . The least
squares estimator (LS) is given as the solution to
min S() = min
n
X
i=1
u2i = min
n
X
2
Yi 0 Xi ,
i=1
t = 1, . . . , n,
The least squares estimator results by solving the first order condition for a minimum:
Pn
2
Y
X
=
0
i
i
i=1
0 S() = 0
Pn
=0
S(
)
=
0
Y
x
X
2
1i
i
i
i=1
1
..
..
.
.
n
=0
k S() = 0
2 i=1 xki Yi Xi0
(2)
Pn
Pn
i=1 Yi =
i=1 Xi
Pn x1i Yi = Pn x1i X 0
Pn
Pn
i
i=1
i=1
Pn
Pn
0
i=1 xki Yi =
i=1 xki Xi
1
Xi X
Xi Yi .
= X0 X 1 X0 Y],
[Matrix:
i=1
i=1
(3)
= X0 X 1 X0 (X + u) = + X0 X 1 X0 U,
such that
n
X
i=1
Xi Xi0
!1
n
X
= X0 X 1 X0 U].
[Matrix:
Xi ui ,
i=1
(4)
A1 E(U |X) = 0.
:
The expectation of
n
!1 n
X
X
= +
E()
Xi X 0
E(Xi ui ) = ,
= + X0 X 1 E(X0 U) =].
[Matrix: E()
i=1
i=1
(5)
Since we can not always obtain
We want to use the true finite sample distribution of .
this, we resort to the asymptotic distribution, which is likely to be a good approximation
Asymptotic Analysis of
We have already used A4 when we implicitly assumed that the inverse of X0 X was well
defined. The two other assumptions will be used in the asymptotic analysis.
You are familiar with the law of large numbers and the central limit theorem for scalar
variables. The results that concerns vectors and matrices are straightforward extensions.
Theorem 1 (Multivariate LLN) Let {Mi }, i = 1, 2, . . . be a sequence of matrices, whose
P
p
elements are iid random variables. Then it holds that n1 ni=1 Mi E(Mi ).
So the univariate LLN is just a special case of the multivariate version. The same is
true for the multivariate CLT.
Theorem 2 (Multivariate CLT) Let {Vi } be a sequence of m-dimensional random vectors that are iid, with mean V = E(Vi ) and covariance matrix V = var(Vi ) = E[(Vi
P
d
V )(Vi V )0 ]. Then it holds that 1n ni=1 (Vi V ) Nm (0, V ).
Since {Xi } is a sequence of iid random variables (vectors), it follows that {Xi Xi0 } is
a sequence of iid random variables (matrices). So by the multivariate LLN it holds that
1 Pn
0 p
0
i=1 Xi Xi QX E(Xi Xi ). Similarly, {Xi ui } is a sequence of iid random variables
n
(vectors), with expected value E[Xi ui ] = E[E(Xi ui |Xi )] = E[Xi E(ui |Xi )] = E[Xi 0] = 0.
P
d
So by the multivariate central limit theorem we have that 1n ni=1 Xi ui Nk+1 (0, v ),
where v var(Xi ui ) = E(Xi Xi0 u2i ). Here we are implicitly using Assumption A3 that
guarantees that the expected values E(Xi Xi0 ) and E(Xi Xi0 u2i ) are well defined (finite).
P
1
P
In the present context, we can set An = n1 ni=1 Xi Xi0
, bn = 0, and Zn = 1n ni=1 Xi ui ,
and from the theorem it follows that
n
!1
n
1 X
1X
d
1
0
n( ) =
Xi Xi
Xi ui Nk+1 (0, Q1
X v QX ).
n
n
| i=1 {z
}| i=1
{z
}
p
Q1
X
Nk+1 (0,v )
1
1
vQ
1 ,
The covariance matrix n()
Q1
Q
X v QX , can be estimated by n(
X
X
n )
where
n
n
X
X
1
0
1 1
X
X
and
Xi Xi0 u
2i .
Q
i i
v
X
n
nk1
i=1
i=1
How can we be sure that
? We have already
n( n ) is consistent for n(
n )
p
X QX and since {Xi , Yi } is iid (Assumption A2), also {Xi X 0 (Yi
established that Q
i
P
p
Xi0 )2 } = {Xi Xi0 u2i } is iid, such that a LLN gives us n1 ni=1 Xi Xi0 u2i E(Xi Xi0 u2i ). To
p
v
v , we first note that
establish that
n
i=1
i=1
1X
1X
1X
Xi Xi0 u
2i =
Xi Xi0 u2i +
Xi Xi0 (
u2i u2i ),
n
n
n
i=1
Pn
n( n ) = Qx v Qx Qx v Qx = n(
n )
to show.
is normally
) by 1n and adding , shows that asymptotically
Multiplying n(
. In practice
distributed about , with a covariance matrix that is given by n1 n()
1
we will use the estimate,
.
2.1
n( n )
) = n(d0
j j ),
d0 ) = n(
d0 n(
d
0
0
d). So
and by Theorem 4 it follows that n(
j j ) = d n( ) N1 (0, d n()
for large n it holds that
A
d),
N1 (0, d0
j
j
j
,
t j =c = q
d
d0
which for large n, is approximately distributed as a standard normal, N (0, 1). (For moderate
values of n, it is typically better to use the t-distribution with nk 1 degrees of freedom.)
4
2.2
To test hypotheses that involve multiple coecients we need the following result.
Theorem 5 Let Z Nm (, ), for some vector, , (m 1), and some (full rank) matrix
, (m m). Then it holds that
(Z )0 1 (Z ) 2m .
Here we use 2m to denote the chi-squared distribution with m degrees of freedom. In
our asymptotic analysis the result we need is the following.
d
Theorem 6 Let Zn Nm (, ) for some vector, , (m 1), and some (full rank) matrix
p
p
d
1 (Zn
, (mm). Suppose that
and that
. Then it holds that (Zn
)0
)
2m .
p
d
In our setting we have established that n()
Nk+1 (0, n()
) and
n()
n()
. Thus the theorem tells us that
1
h
i1
0
0 1
)
n( ) n()
n( ) = ( )
(
n n()
h i1
d
)0
)
= (
(
2k+1 .
This enables us to test the hypothesis that the vector of regression parameters equals a
h i1
o 0
o)
)
(
particular vector, e.g., H0 : = o . All we need to do is to compute (
and compare this (scalar) number to the quantile (e.g. the 95%-quantile) of the 2k+1 distribution.
An important distribution that is closely related to the 2 -distribution is the Fq, distribution. It is defined as follows. Suppose that Z 2q , then U = Z/q Fq, . So an
Fq, is simply a 2q that has been divided by its degrees of freedom. So, should we prefer
to use an F -test to test H0 : = o , we would simply use that
F= o =
h i1
o )0
o)
(
(
k+1
Fk+1, ,
where F= o denotes the test-statistic and Fk+1, represents the F -distribution with (k +
1, ) degrees of freedom. (An F -distribution has two degrees of freedom).
Typically we are interested in more complicated hypotheses than j = c or = o . A
general class of hypotheses can be formulated as H0 : R = r, for some q k + 1 matrix,
R, and some q 1 vector, r.
5
How can we test hypotheses of this kind? First we note that Theorem 4 gives us that,
d
)
Nq (R 0, Rn()
R0 ) = Nq (0, Rn()
R0 ).
R n(
R) = n[(R
r)
) = n(R
The left hand side can be rewritten as R n(
statistics
FR=r =
h
i1
r)0 R
r)
R0
(R
(R
Fq, .
Tables with critical values for the 2m -distribution can be found in S&W on page 645.
The Fm1 ,m2 -distribution is tabulated on pages 647-649, and the Fm, -distribution (which
you will use most frequently) is tabulated on page 646, and (conveniently) on the very last
page of the book.
2.3
A Simple Example
4.0
= 2.5
1.5
and
801
40
27
8
27
8
3
4
9
8
6
98 ,
15
8
We find that
= 1.5
R1
and
R10 =
R1
15
,
8
1
15
A
= (1.5)
(1.5) = 1.2 F1, .
8
6
Note that when we test a single restriction, the F -statistic is the square of the tstatistic
2
2
0
.
F 3 =0 = t 3 =0 = q 3
)
var(
c
3
R20 =
R2
and
39
.
8
1
h
i1
39
A
0
= R2 R2 R2
R2 = 4
4 = 3.2821 F1, .
8
3. H3 : 2 + 3 = 0? Here we set
R3 = (0, 1, 1) and r3 = 0,
We find
= 2.5 1.5 = 1,
R3
R30 = 3 ,
R3
and
1
3
A
=1
1 = 2.6666 F1, .
8
4. H4 : 2 = 3 = 0. Now we set
R4 =
Then
=
R4
2.5
1.5
0 1 0
0 0 1
and
0
0
R40 =
R4
, r4 =
3
4
98
98
15
8
such that
F 2 = 3 =0 =
2.5 1.5
3
4
98
98
15
8
2.5
1.5
= 17.666 F2, .
(3)