0% found this document useful (0 votes)
58 views

1 Regression Analysis and Least Squares Estimators

(1) The document discusses least squares regression analysis and properties of the least squares estimator βˆ. It considers a regression model where the dependent variable Y is related to independent variables X through a linear equation with errors. (2) The least squares estimator βˆ is defined as the solution that minimizes the sum of squared errors. Asymptotic properties of βˆ are analyzed as the sample size n approaches infinity. It is shown that βˆ is normally distributed around the true parameter values β. (3) Hypothesis tests can be constructed to test properties of βˆ. Tests for individual regression coefficients and multiple coefficients are discussed. Test statistics are shown to have an asymptotic normal distribution under the null hypothesis as

Uploaded by

Freddie Yuan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

1 Regression Analysis and Least Squares Estimators

(1) The document discusses least squares regression analysis and properties of the least squares estimator βˆ. It considers a regression model where the dependent variable Y is related to independent variables X through a linear equation with errors. (2) The least squares estimator βˆ is defined as the solution that minimizes the sum of squared errors. Asymptotic properties of βˆ are analyzed as the sample size n approaches infinity. It is shown that βˆ is normally distributed around the true parameter values β. (3) Hypothesis tests can be constructed to test properties of βˆ. Tests for individual regression coefficients and multiple coefficients are discussed. Test statistics are shown to have an asymptotic normal distribution under the null hypothesis as

Uploaded by

Freddie Yuan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Regression Analysis: Properties of Least Squares Estimators

EC163 Handout, Chapter 18

Abstract
We consider the regression model: Yi = Xi0 + ui , i = 1, . . . , n. This note summarizes the results
that makes it possible to test: (i) hypotheses
for asymptotic analysis of the least squares estimator ,
and (ii) hypotheses that involve several coecients, such as
about the individual coecients of ;
R = r, where R is a known matrix and r is a known vector.

Regression Analysis and Least Squares Estimators

Consider the regression model


Yi = 0 + 1 x1i + . . . + k xki + ui = Xi0 + ui ,

t = 1, . . . , n,

[Matrix: Y = X + U.]

(1)

with = ( 0 1 . . . k )0 , Xi = (1 x1i . . . xki )0 , Y = (Y1 . . . Yn )0 , X = (X1 . . . Xn )0 . The least squares


estimator (LS) is given as the solution to
min S() = min

n
X

u2i

i=1

n
X

2
Yi 0 Xi ,
= min

i=1

[Matrix: min(Y X)0 (Y X).]

The least squares estimator results by solving the first order condition for a minimum:

Pn

0
=0

2
Y

i
i
i=1
0 S() = 0

Pn

=0
S(
)
=
0
Y
x

X
2

1i
i
i
i=1
1

..

.
.

n
=0
k S() = 0
2 i=1 xki Yi Xi0
Pn
P

Yi = ni=1 Xi0
i=1
Pn x1i Yi = Pn x1i X 0

Pn
Pn
i

i=1
i=1

i=1 Xi Yi = i=1 Xi Xi0


.
.

.
P
Pn

xki Yi = n xki X 0
i=1

i=1

and the least squares estimator reads


n
!1 n
X
X
=

Xi X 0
Xi Yi .

= X0 X 1 X0 Y],
[Matrix:

i=1

(2)

i=1

by substituting (1) into (3) we find that

= X0 X 1 X0 (X + u) = + X0 X 1 X0 U,

(3)

such that
=

n
X

Xi Xi0

i=1

!1

n
X

= X0 X 1 X0 U].
[Matrix:

Xi ui ,

i=1

(4)

A1 E(U |X) = 0.
:
The expectation of
=+
E()

n
X
i=1

Xi Xi0

!1

n
X

= + X0 X 1 E(X0 U) =].
[Matrix: E()

E(Xi ui ) = ,

i=1

(5)

Since we can not always obtain this, we


We want to use the true finite sample distribution of .
resort to the asymptotic distribution, which is likely to be a good approximation of the unknown finite
as n , where
is the LS
sample distribution, if n is large. We consider the distribution of

estimator (note that depends on n, although it is not clear from the definition).
, has
When applying asymptotic results one should have the following in mind. The estimator,
n
some (unknown) finite-sample distribution, which is approximated by its asymptotic distribution. The
finite-sample distribution and the asymptotic distribution are (most likely) dierent, so we are making
an error when we use the asymptotic distribution. However, when n is large, the dierence between the
two distributions is likely to be small, and the asymptotic distribution is then a good approximation.

Asymptotic Analysis of

In addition to Assumption A1 we assume:


A2 (Yi , X1i , . . . , Xki ), i = 1, . . . , n are iid.
A3 (ui , X1i , . . . , Xki ) have finite fourth moments.
A4 X0 X has full rank. [Equivalent to: No perfect multicollinearity, or that det(X0 X) 6= 0.]
We have already used A4 when we implicitly assumed that the inverse of X0 X was well defined. The
two other assumptions will be used in the asymptotic analysis.
You are familiar with the law of large numbers and the central limit theorem for scalar variables.
The results that concerns vectors and matrices are straightforward extensions.
Theorem 1 (Multivariate LLN) Let {Mi }, i = 1, 2, . . . be a sequence of matrices, whose elements
P
p
are iid random variables. Then it holds that n1 ni=1 Mi E(Mi ).

So the univariate LLN is just a special case of the multivariate version. The same is true for the
multivariate CLT.
Theorem 2 (Multivariate CLT) Let {Vi } be a sequence of m-dimensional random vectors that are
iid, with mean V = E(Vi ) and covariance matrix V = var(Vi ) = E[(Vi V )(Vi V )0 ]. Then it holds
P
d
that 1n ni=1 (Vi V ) Nm (0, V ).

Since {Xi } is a sequence of iid random variables (vectors), it follows that {Xi Xi0 } is a sequence of
P
p
iid random variables (matrices). So by the multivariate LLN it holds that n1 ni=1 Xi Xi0 QX
E(Xi Xi0 ). Similarly, {Xi ui } is a sequence of iid random variables (vectors), with expected value
E[Xi ui ] = E[E(Xi ui |Xi )] = E[Xi E(ui |Xi )] = E[Xi 0] = 0. So by the multivariate central limit theP
d
orem we have that 1n ni=1 Xi ui Nk+1 (0, v ), where v var(Xi ui ) = E(Xi Xi0 u2i ). Here we are
implicitly using Assumption A3 that guarantees that the expected values E(Xi Xi0 ) and E(Xi Xi0 u2i ) are
well defined (finite).
Theorem 3 (Linear Transformation of Gaussian Variables) Let Z Nm (, ), for some vector, , (m 1), and some matrix , (m m). Let A be a l m matrix and b be a l 1 vector. Define
the l-dimensional random variable Z = AZ + b. Then it holds that Z Nl (A + b, AA0 ).
d

Theorem 4 (Asymptotic Linear Transformation of Gaussian Variables) Let Zn Nm (, ),


p
p
for some vector, , (m1), and some matrix , (mm). Let An A and bn b for some constant lm
matrix, A, and some constant l 1 vector b. Define the l-dimensional random variable Zn = An Zn + bn .
d
Then it holds that Zn Nl (A + b, AA0 ).
1
P
P
In the present context, we can set An = n1 ni=1 Xi Xi0
, bn = 0, and Zn = 1n ni=1 Xi ui , and
from the theorem it follows that
n
!1
n
X

1
1 X
d
1
0

n( ) =
Xi Xi
Xi ui Nk+1 (0, Q1
X v QX ).
n
n
| i=1 {z
}| i=1
{z
}
p

Q1
X

Nk+1 (0,v )

1

Q1
The covariance matrix n()

X v QX , can be estimated by n(
n

X
1 1
Xi Xi0
Q
X
n
p

n )

1
vQ
1 , where
Q
X
X

and

i=1


How can we be sure that
n(

n )

X
1
Xi Xi0 u
2i .
nk1
i=1

is consistent for n(

n )

? We have already established that

X QX and since {Xi , Yi } is iid (Assumption A2), also {Xi X 0 (Yi X 0 )2 } = {Xi X 0 u2 } is iid, such
Q
i
i
i i
P
p
p
v
that a LLN gives us n1 ni=1 Xi Xi0 u2i E(Xi Xi0 u2i ). To establish that
v , we first note that
n

i=1

i=1

i=1

1X
1X
1X
Xi Xi0 u
2i =
Xi Xi0 u2i +
Xi Xi0 (
u2i u2i ),
n
n
n

P
p
v =
and it can be shown that n1 ni=1 Xi Xi0 (
u2i u2i ) 0, (beyond the scope of EC163), such that
P
p
n
1
n
n
0 2 E(X X 0 u2 ) = , using
i i i
v
i
i=1 Xi Xi u
nk1 n
nk1 1 as n . Since the mapping from
1
p
p

{Qx , v } 7 Q1
=
x v Qx is continuous, we know that QX QX and v v implies that
n( n )

1
1
1 p 1

Q
, as we wanted to show.
x v Qx Qx v Qx = n(
n )

is normally distributed
Multiplying n( ) by 1n and adding , shows that asymptotically

about , with a covariance matrix that is given by n1 n()


. In practice we will use the estimate,

1


.

n( n )

2.1

Test About a Single Regression Coecient

Consider the vector of regression coecients, = ( 0 , . . . , k )0 , and suppose that we are interested in
the jth coecient, j . We can let d = (0, . . . , 0, 1, 0, . . . , 0)0 denote the jth unit-vector (the vector which
has 1 as its jth element and zero otherwise). Then we note that


),
d0 ) = n(
) = n(d0
d0 n(
j
j
and by Theorem 4 it follows that
holds that



d
n( j j ) = d0 n(
) N1 (0, d0 n()
d). So for large n it

A

d),
N1 (0, d0

j
j

which allows us to construct the t-statistic of the hypothesis H0 : j = c. It is given by


c

j
,
t j =c = q
d
d0

which for large n, is approximately distributed as a standard normal, N (0, 1). (For moderate values of
n, it is typically better to use the t-distribution with n k 1 degrees of freedom.)

2.2

Test About a Multiple Regression Coecients

To test hypotheses that involve multiple coecients we need the following result.
Theorem 5 Let Z Nm (, ), for some vector, , (m 1), and some (full rank) matrix , (m m).
Then it holds that
(Z )0 1 (Z ) 2m .
Here we use 2m to denote the chi-squared distribution with m degrees of freedom. In our asymptotic
analysis the result we need is the following.
d

Theorem 6 Let Zn Nm (, ) for some vector, , (m 1), and some (full rank) matrix , (m m).
p
p
d

1 (Zn
Suppose that
and that
. Then it holds that (Zn
)0
) 2m .
In our setting we have established that
Thus the theorem tells us that


p
d


n( ) Nk+1 (0, n()
) and
.

n() n()

1
h
i1

0
0 1

)
n( ) n()
n( ) = ( )
(

n n()
h i1
d
)0
)

= (
(
2k+1 .

This enables us to test the hypothesis that the vector of regression parameters equals a particular vector,
h i1
o )0
o ) and compare this (scalar)

(
e.g., H0 : = o . All we need to do is to compute (

number to the quantile (e.g. the 95%-quantile) of the 2k+1 -distribution.


An important distribution that is closely related to the 2 -distribution is the Fq, -distribution. It
is defined as follows. Suppose that Z 2q , then U = Z/q Fq, . So an Fq, is simply a 2q that has

been divided by its degrees of freedom. So, should we prefer to use an F -test to test H0 : = o , we
would simply use that
h i1
o )0
o)

(
(

d
F= o =
Fk+1, ,
k+1
where F= o denotes the test-statistic and Fk+1, represents the F -distribution with (k + 1, ) degrees
of freedom. (An F -distribution has two degrees of freedom).
Typically we are interested in more complicated hypotheses than j = c or = o . A general class
of hypotheses can be formulated as H0 : R = r, for some q k + 1 matrix, R, and some q 1 vector, r.
How can we test hypotheses of this kind? First we note that Theorem 4 gives us that,

d
)
R n(
Nq (R 0, Rn()
R0 ) = Nq (0, Rn()
R0 ).


R) = n[(R
r) (R r)]
) = n(R
The left hand side can be rewritten as R n(

A
r) if the null hypothesis is true. So if we divide by n we get that (R
r)
which equals n(R
R0 ) = Nq (0, R R0 ). Thus by using Theorem 6 we can construct a 2 -test of the
Nq (0, n1 Rn()

h
i1
d
r)0 R
r)
R0
(R
2q , or the equivalent
hypothesis H0 : R = r, using the test statistic: (R

F -test, which is based on the statistics


h
i1
r)0 R
r)
R0
(R
(R

d
FR=r =
Fq, .
q
Tables with critical values for the 2m -distribution can be found in S&W on page 645. The Fm1 ,m2 distribution is tabulated on pages 647-649, and the Fm, -distribution (which you will use most frequently) is tabulated on page 646, and (conveniently) on the very last page of the book.

2.3

A Simple Example

Suppose that we have estimated

4.0
= 2.5

1.5

and

and wanted to test the following hypotheses:

801
40
27
8

27
8
3
4
98

6
98 ,
15
8

1. H1 : 3 = 0. We note that H1 is equivalent


R1 = r1 ,

where R1 = (0, 0, 1) and r1 = 0.

We find that
= 1.5
R1

and

R10 = 15 ,
R1

such that the F -statistic of H1 is given by


h
i1
r1 )0 R1
r1 )
R10
F 3 =0 = (R1
(R1

1
15
A
= (1.5)
(1.5) = 1.2 F1, .
8
5

Note that when we test a single restriction, the F -statistic is the square of the t-statistic
2

2
0
F 3 =0 = t 3 =0 = q 3
.
3)
var(
c

2. H2 : 2 = 3 . This hypothesis is equivalent to 2 3 = 0, so we set


R2 = (0, 1, 1) and r2 = 0,
and find
= 2.5 (1.5) = 4,
R2

R20 = 39 .
R2

and

So the F -test is given by


F 2 = 3

1
i1
h
39
A
0

= R2 R2 R2
R2 = 4
4 = 3.2821 F1, .
8

3. H3 : 2 + 3 = 0? Here we set
R3 = (0, 1, 1) and r3 = 0,
We find
= 2.5 1.5 = 1,
R3

R30 = 3 ,
R3

and

and the F -test is given by


F 2 + 3 =0

1
3
A
=1
1 = 2.6666 F1, .
8

4. H4 : 2 = 3 = 0. Now we set
R4 =
Then
=
R4

2.5
1.5

0 1 0
0 0 1

and

0
0

R40 =
R4

, r4 =

3
4
98

98

15
8

such that

F 2 = 3 =0 =

2.5 1.5

3
4
98

98

15
8

2.5
1.5

= 17.666 F2, .

Calculus with Conditional Expectations

When Xi is random, the concept of conditional expectation is an important tool in the analysis.
A conditional expected value, is written E(Y |X = x), and denotes the expected value Y, when we
know that X = x. (So E(Y ) is the expected value of Y when we know nothing about other random
variables). When we write E(Y |X), we think of it as a function of X. Since X is random then E(Y |X)
is also random (in the general case).
Some properties of conditional expected values are listed below.
6

1. If X and Y are independent, then E(Y |X) = E(Y ).


2. If cov(X, Y ) 6= 0, then E(Y |X) depends on the X, in which case E(Y |X) is a random variable
(as X is random). So it is meaningful to talk about the expected value of a conditional expected
value.
3. E(Y f (X)|X) = E(Y |X)f (X), for any function f. So (functions of) the variable we condition on,
can be taken outside the conditional expectation.
4. E(Y ) = E[E(Y |X)].
The properties of conditional expectations are very useful for our analysis. We shall make use of
arguments such as the following example.
Example 7 Suppose that E(u|X) = 0. Then we have that
(4)

(3)

E(uX) = E[E(uX|X)] = E[E(u|X)X] = E[0X] = 0,


where the numbers refer to the properties listed above.

You might also like