0% found this document useful (0 votes)
79 views8 pages

1 Regression Analysis and Least Squares Estimators

- The document summarizes the asymptotic properties of least squares estimators in regression analysis. - It shows that the least squares estimator is normally distributed around the true coefficient vector as the sample size increases, with a covariance matrix that can be consistently estimated. - This allows testing hypotheses about individual regression coefficients with t-tests, and testing hypotheses about multiple coefficients with chi-squared tests as approximations of the finite-sample distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views8 pages

1 Regression Analysis and Least Squares Estimators

- The document summarizes the asymptotic properties of least squares estimators in regression analysis. - It shows that the least squares estimator is normally distributed around the true coefficient vector as the sample size increases, with a covariance matrix that can be consistently estimated. - This allows testing hypotheses about individual regression coefficients with t-tests, and testing hypotheses about multiple coefficients with chi-squared tests as approximations of the finite-sample distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Regression Analysis: Properties of Least Squares Estimators

EC1630 Handout, Chapter 18

Abstract
We consider the regression model: Yi = Xi0 + ui , i = 1, . . . , n. This note summa that makes
rizes the results for asymptotic analysis of the least squares estimator ,
and (ii) hyit possible to test: (i) hypotheses about the individual coecients of ;
potheses that involve several coecients, such as R = r, where R is a known matrix
and r is a known vector.

Regression Analysis and Least Squares Estimators

Consider the regression model


Yi = 0 + 1 x1i + . . . + k xki + ui = Xi0 + ui ,

[Matrix: Y = X + U.]
(1)
0
0
0
0
with = ( 0 1 . . . k ) , Xi = (1 x1i . . . xki ) , Y = (Y1 . . . Yn ) , X = (X1 . . . Xn ) . The least
squares estimator (LS) is given as the solution to
min S() = min

n
X
i=1

u2i = min

n
X

2
Yi 0 Xi ,
i=1

t = 1, . . . , n,

[Matrix: min(Y X)0 (Y X).]

The least squares estimator results by solving the first order condition for a minimum:

Pn

2
Y

X
=
0

i
i
i=1
0 S() = 0

Pn

=0
S(
)
=
0
Y
x

X
2

1i
i
i
i=1
1

..

..

.
.

n
=0
k S() = 0
2 i=1 xki Yi Xi0
(2)
Pn
Pn

i=1 Yi =
i=1 Xi
Pn x1i Yi = Pn x1i X 0

Pn
Pn
i

i=1
i=1

i=1 Xi Yi = i=1 Xi Xi0


.
..

Pn
Pn
0
i=1 xki Yi =
i=1 xki Xi
1

and the least squares estimator reads


n
!1 n
X
X
0
=

Xi X
Xi Yi .

= X0 X 1 X0 Y],
[Matrix:

i=1

i=1

(3)

by substituting (1) into (3) we find that

= X0 X 1 X0 (X + u) = + X0 X 1 X0 U,

such that

n
X
i=1

Xi Xi0

!1

n
X

= X0 X 1 X0 U].
[Matrix:

Xi ui ,

i=1

(4)

A1 E(U |X) = 0.
:
The expectation of
n
!1 n
X
X
= +
E()
Xi X 0
E(Xi ui ) = ,

= + X0 X 1 E(X0 U) =].
[Matrix: E()

i=1

i=1

(5)
Since we can not always obtain
We want to use the true finite sample distribution of .
this, we resort to the asymptotic distribution, which is likely to be a good approximation

of the unknown finite sample distribution, if n is large. We consider the distribution of

as n , where is the LS estimator (note that depends on n, although it is not clear


from the definition).
When applying asymptotic results one should have the following in mind. The esti , has some (unknown) finite-sample distribution, which is approximated by its
mator,
n
asymptotic distribution. The finite-sample distribution and the asymptotic distribution are
(most likely) dierent, so we are making an error when we use the asymptotic distribution.
However, when n is large, the dierence between the two distributions is likely to be small,
and the asymptotic distribution is then a good approximation.

Asymptotic Analysis of

In addition to Assumption A1 we assume:


A2 (Yi , X1i , . . . , Xki ), i = 1, . . . , n are iid.
A3 (ui , X1i , . . . , Xki ) have finite fourth moments.
A4 X0 X has full rank. [Equivalent to: No perfect multicollinearity, or that det(X0 X) 6= 0.]
2

We have already used A4 when we implicitly assumed that the inverse of X0 X was well
defined. The two other assumptions will be used in the asymptotic analysis.
You are familiar with the law of large numbers and the central limit theorem for scalar
variables. The results that concerns vectors and matrices are straightforward extensions.
Theorem 1 (Multivariate LLN) Let {Mi }, i = 1, 2, . . . be a sequence of matrices, whose
P
p
elements are iid random variables. Then it holds that n1 ni=1 Mi E(Mi ).
So the univariate LLN is just a special case of the multivariate version. The same is
true for the multivariate CLT.

Theorem 2 (Multivariate CLT) Let {Vi } be a sequence of m-dimensional random vectors that are iid, with mean V = E(Vi ) and covariance matrix V = var(Vi ) = E[(Vi
P
d
V )(Vi V )0 ]. Then it holds that 1n ni=1 (Vi V ) Nm (0, V ).

Since {Xi } is a sequence of iid random variables (vectors), it follows that {Xi Xi0 } is
a sequence of iid random variables (matrices). So by the multivariate LLN it holds that
1 Pn
0 p
0
i=1 Xi Xi QX E(Xi Xi ). Similarly, {Xi ui } is a sequence of iid random variables
n
(vectors), with expected value E[Xi ui ] = E[E(Xi ui |Xi )] = E[Xi E(ui |Xi )] = E[Xi 0] = 0.
P
d
So by the multivariate central limit theorem we have that 1n ni=1 Xi ui Nk+1 (0, v ),
where v var(Xi ui ) = E(Xi Xi0 u2i ). Here we are implicitly using Assumption A3 that
guarantees that the expected values E(Xi Xi0 ) and E(Xi Xi0 u2i ) are well defined (finite).

Theorem 3 (Linear Transformation of Gaussian Variables) Let Z Nm (, ), for


some vector, , (m 1), and some matrix , (m m). Let A be a l m matrix and b be
a l 1 vector. Define the l-dimensional random variable Z = AZ + b. Then it holds that
Z Nl (A + b, AA0 ).
d

Theorem 4 (Asymptotic Linear Transformation of Gaussian Variables) Let Zn


p
Nm (, ), for some vector, , (m 1), and some matrix , (m m). Let An A and
p
bn b for some constant l m matrix, A, and some constant l 1 vector b. Define the
d
l-dimensional random variable Zn = An Zn +bn . Then it holds that Zn Nl (A+b, AA0 ).

P
1
P
In the present context, we can set An = n1 ni=1 Xi Xi0
, bn = 0, and Zn = 1n ni=1 Xi ui ,
and from the theorem it follows that
n
!1
n

1 X
1X
d
1
0

n( ) =
Xi Xi
Xi ui Nk+1 (0, Q1
X v QX ).
n
n
| i=1 {z
}| i=1
{z
}
p

Q1
X

Nk+1 (0,v )

1

1
vQ
1 ,
The covariance matrix n()
Q1
Q

X v QX , can be estimated by n(
X
X
n )
where
n
n
X
X
1
0

1 1
X
X
and

Xi Xi0 u
2i .
Q
i i
v
X
n
nk1
i=1

i=1



How can we be sure that
? We have already
n( n ) is consistent for n(
n )
p
X QX and since {Xi , Yi } is iid (Assumption A2), also {Xi X 0 (Yi
established that Q
i
P
p
Xi0 )2 } = {Xi Xi0 u2i } is iid, such that a LLN gives us n1 ni=1 Xi Xi0 u2i E(Xi Xi0 u2i ). To
p
v
v , we first note that
establish that
n

i=1

i=1

1X
1X
1X
Xi Xi0 u
2i =
Xi Xi0 u2i +
Xi Xi0 (
u2i u2i ),
n
n
n
i=1

Pn

and it can be shown that n1 i=1 Xi Xi0 (


u2i u2i ) 0, (beyond the scope of EC163), such
P
n
2 p
0 2
v = n 1 n Xi X 0 u
that
i i E(Xi Xi ui ) = v , using nk1 1 as n . Since
i=1
nk1 n
p
1

the mapping from {Qx , v } 7 Q1


x v Qx is continuous, we know that QX QX and
p
1

1 1 p 1
v v implies that

, as we wanted

n( n ) = Qx v Qx Qx v Qx = n(
n )
to show.

is normally
) by 1n and adding , shows that asymptotically
Multiplying n(

. In practice
distributed about , with a covariance matrix that is given by n1 n()

1


we will use the estimate,
.

2.1

n( n )

Test About a Single Regression Coecient

Consider the vector of regression coecients, = ( 0 , . . . , k )0 , and suppose that we are


interested in the jth coecient, j . We can let d = (0, . . . , 0, 1, 0, . . . , 0)0 denote the jth
unit-vector (the vector which has 1 as its jth element and zero otherwise). Then we note
that

) = n(d0
j j ),
d0 ) = n(
d0 n(

d
0
0
d). So
and by Theorem 4 it follows that n(

j j ) = d n( ) N1 (0, d n()
for large n it holds that
A

d),
N1 (0, d0

j
j

which allows us to construct the t-statistic of the hypothesis H0 : j = c. It is given by


c

j
,
t j =c = q
d
d0

which for large n, is approximately distributed as a standard normal, N (0, 1). (For moderate
values of n, it is typically better to use the t-distribution with nk 1 degrees of freedom.)
4

2.2

Test About a Multiple Regression Coecients

To test hypotheses that involve multiple coecients we need the following result.
Theorem 5 Let Z Nm (, ), for some vector, , (m 1), and some (full rank) matrix
, (m m). Then it holds that
(Z )0 1 (Z ) 2m .
Here we use 2m to denote the chi-squared distribution with m degrees of freedom. In
our asymptotic analysis the result we need is the following.
d

Theorem 6 Let Zn Nm (, ) for some vector, , (m 1), and some (full rank) matrix
p
p
d

1 (Zn
, (mm). Suppose that
and that
. Then it holds that (Zn
)0
)
2m .

p
d

In our setting we have established that n()
Nk+1 (0, n()
) and

n()
n()
. Thus the theorem tells us that

1
h
i1

0
0 1

)
n( ) n()
n( ) = ( )
(

n n()
h i1
d
)0
)

= (
(
2k+1 .

This enables us to test the hypothesis that the vector of regression parameters equals a
h i1
o 0

o)
)
(
particular vector, e.g., H0 : = o . All we need to do is to compute (

and compare this (scalar) number to the quantile (e.g. the 95%-quantile) of the 2k+1 distribution.
An important distribution that is closely related to the 2 -distribution is the Fq, distribution. It is defined as follows. Suppose that Z 2q , then U = Z/q Fq, . So an
Fq, is simply a 2q that has been divided by its degrees of freedom. So, should we prefer
to use an F -test to test H0 : = o , we would simply use that

F= o =

h i1
o )0
o)

(
(

k+1

Fk+1, ,

where F= o denotes the test-statistic and Fk+1, represents the F -distribution with (k +
1, ) degrees of freedom. (An F -distribution has two degrees of freedom).
Typically we are interested in more complicated hypotheses than j = c or = o . A
general class of hypotheses can be formulated as H0 : R = r, for some q k + 1 matrix,
R, and some q 1 vector, r.
5

How can we test hypotheses of this kind? First we note that Theorem 4 gives us that,

d
)
Nq (R 0, Rn()
R0 ) = Nq (0, Rn()
R0 ).
R n(


R) = n[(R
r)
) = n(R
The left hand side can be rewritten as R n(

(R r)] which equals n(R r) if the null hypothesis is true. So if we divide by n


A
r)
Nq (0, n1 Rn()
R0 ) = Nq (0, R R0 ). Thus by using Theorem
we get that (R

6 we can construct a 2 -test of the hypothesis H0 : R = r, using the test statistic:


h
i1
d
r)
r)0 R
R0
(R
2 , or the equivalent F -test, which is based on the
(R
q

statistics

FR=r =

h
i1
r)0 R
r)
R0
(R
(R

Fq, .

Tables with critical values for the 2m -distribution can be found in S&W on page 645.
The Fm1 ,m2 -distribution is tabulated on pages 647-649, and the Fm, -distribution (which
you will use most frequently) is tabulated on page 646, and (conveniently) on the very last
page of the book.

2.3

A Simple Example

Suppose that we have estimated

4.0
= 2.5

1.5

and

and wanted to test the following hypotheses:

801
40
27
8

27
8
3
4
9
8

6
98 ,
15
8

1. H1 : 3 = 0. We note that H1 is equivalent


R1 = r1 ,

where R1 = (0, 0, 1) and r1 = 0.

We find that
= 1.5
R1

and

R10 =
R1

15
,
8

such that the F -statistic of H1 is given by


h
i1
r1 )0 R1
r1 )
R10
(R1
F 3 =0 = (R1

1
15
A
= (1.5)
(1.5) = 1.2 F1, .
8
6

Note that when we test a single restriction, the F -statistic is the square of the tstatistic
2

2
0
.
F 3 =0 = t 3 =0 = q 3
)
var(
c
3

2. H2 : 2 = 3 . This hypothesis is equivalent to 2 3 = 0, so we set


R2 = (0, 1, 1) and r2 = 0,
and find
= 2.5 (1.5) = 4,
R2

R20 =
R2

and

39
.
8

So the F -test is given by


F 2 = 3

1
h
i1
39
A
0

= R2 R2 R2
R2 = 4
4 = 3.2821 F1, .
8

3. H3 : 2 + 3 = 0? Here we set
R3 = (0, 1, 1) and r3 = 0,
We find
= 2.5 1.5 = 1,
R3

R30 = 3 ,
R3

and

and the F -test is given by


F 2 + 3 =0

1
3
A
=1
1 = 2.6666 F1, .
8

4. H4 : 2 = 3 = 0. Now we set
R4 =
Then
=
R4

2.5
1.5

0 1 0
0 0 1

and

0
0

R40 =
R4

, r4 =

3
4
98

98

15
8

such that

F 2 = 3 =0 =

2.5 1.5

3
4
98

98

15
8

2.5
1.5

= 17.666 F2, .

Calculus with Conditional Expectations

When Xi is random, the concept of conditional expectation is an important tool in the


analysis.
A conditional expected value, is written E(Y |X = x), and denotes the expected value
Y, when we know that X = x. (So E(Y ) is the expected value of Y when we know nothing
about other random variables). When we write E(Y |X), we think of it as a function of X.
Since X is random then E(Y |X) is also random (in the general case).
Some properties of conditional expected values are listed below.
1. If X and Y are independent, then E(Y |X) = E(Y ).
2. If cov(X, Y ) 6= 0, then E(Y |X) depends on the X, in which case E(Y |X) is a random
variable (as X is random). So it is meaningful to talk about the expected value of a
conditional expected value.
3. E(Y f (X)|X) = E(Y |X)f (X), for any function f. So (functions of) the variable we
condition on, can be taken outside the conditional expectation.
4. E(Y ) = E[E(Y |X)].
The properties of conditional expectations are very useful for our analysis. We shall
make use of arguments such as the following example.
Example 7 Suppose that E(u|X) = 0. Then we have that
(4)

(3)

E(uX) = E[E(uX|X)] = E[E(u|X)X] = E[0X] = 0,


where the numbers refer to the properties listed above.

You might also like