0% found this document useful (0 votes)
43 views

Teaching Notes 3

1) The document discusses tests of linear restrictions in single-equation linear regression models. It focuses on testing hypotheses using the F distribution. 2) It defines nested and non-nested models, and explains how to test restrictions through F tests when models are nested. Restrictions create a restricted parameter vector that is contained within the unrestricted parameter vector. 3) It shows that the test statistic for multiple linear restrictions follows an F distribution with degrees of freedom equal to the number of restrictions and sample size minus the number of parameters. This allows testing multiple restrictions simultaneously.

Uploaded by

Xenocide23
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Teaching Notes 3

1) The document discusses tests of linear restrictions in single-equation linear regression models. It focuses on testing hypotheses using the F distribution. 2) It defines nested and non-nested models, and explains how to test restrictions through F tests when models are nested. Restrictions create a restricted parameter vector that is contained within the unrestricted parameter vector. 3) It shows that the test statistic for multiple linear restrictions follows an F distribution with degrees of freedom equal to the number of restrictions and sample size minus the number of parameters. This allows testing multiple restrictions simultaneously.

Uploaded by

Xenocide23
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1

Single-equation linear regression model 2

1 Inference and prediction


We talked about basic statistical inference when we were discussing the finite
sample properties of the OLS estimator. There we learned how to construct the t
statistics and how to use them to test the significance of individual coefficients. We also
briefly mentioned the F test.

In this section, which is concerned only with linear restrictions, we will expand on
the theme of testing hypotheses using the F distribution. In addition, we will study how
regression estimates can be used for prediction or forecasting.

1.1 Restrictions and nested models


Let’s first try and understand the difference between nested and nonnested
models. Consider the following simple model of investment (Greene, p.81) where
investors are sensitive to nominal interest rates:

ln It  1  2it  3pt  4 ln Yt  5t   t ,

where I is investment, i is nominal interest rate, p is the rate of inflation, Y is real


output and t is a time trend.

An alternative model is

ln It  1  2  it  pt   3pt  4 ln Yt  5t   t ,

where investors are sensitive to real interest rates.

The new model embodies the theoretical conjecture that “investors care about real
interest rates,” but since the second equation contains both nominal interest and
inflation, the theory does not imply testable restrictions on the model.

However, if the theoretical conjecture is that “investors care only about real
interest rates,” the model becomes

ln It  1  2  it  pt   4 ln Yt  5t   t ,
2

which is a restricted version of the original model. Namely, the second model can be
obtained from the first by setting 2  3 . We can then test the hypothesis 2  3  0 .

The first and third equations give an example of nested models: The hypothesis
specified by the restricted model is contained within the unrestricted model. The first
equation specifies a model with five unrestricted parameters  1 , 2 , 3 , 4 , 5  ,
whereas the vector of parameters associated with the third equation is
 1 , 2 , 2 , 4 , 5  . The latter subset of values is contained within the unrestricted set.
Consider now an alternative pair of models. Model 1 makes the conjecture that
“investors care only about inflation,” whereas Model 2’s conjecture is that “investors
care only about the nominal interest rate.” Since the parameter vector associated with
the first model is  1 ,0, 3 , 4 , 5  , while that associated with the second is

 1 , 2 ,0, 4 , 5  , neither model is obtained as a restriction on the other. We say that


they are nonnested.

1.2 Tests of multiple restrictions

Consider a random vector x N     . We want to obtain the distribution of the

quadratic form q   x     1  x    . If we define z  x   , this quadratic form can


T

be written as q  zT  1z . Notice that z is normally distributed and that  is also the
variance matrix of z1.

The variance matrix  is positive definite, and so has a (symmetric) square root
matrix 1 2 , defined by 1 21 2   . Hence 1  1 21 2 and

zT  1z = zT   1 2   1 2 z    1 2 z    1 2 z   wT w ,
T T

where w  Az, A   1 2 . Notice that E  w   AE  z   0 and

var  w   AAT   1 2 1 2   1 21 21 2 1 2  I .

1
We have used the well-known result that if x N     then Ax  b N    b  T  .
3

Thus we have shown that  1 2  x    N 0, I  , which in turn implies

 x    1  x     2  n  2.
T

Next consider testing a null hypothesis that consists of a set of J linear restrictions,
i.e., H 0 : R  q  0 , where R is a J  K matrix of constants,  is K 1 , q is a J 1

vector of constants, and 0 is a J 1 vector of zeroes. The alternative hypothesis is


H1 : R  q  0 . Examples of R and q are the following:

1. R  0 0 1 0 0 and q  0 , where the number one appears in

the jth position. In this case R   j and the null hypothesis is H 0 :  j  0 .

2. R  0 1 1 1 0 0 and q  2 . In this case R  2  3  4 and

the null hypothesis is H 0 : 2  3  4  2 .

0 1 0 0 1 0  1    2  5 
3. R  1 0 0 1 0 0  q   0  . In this case R   1   4  and the null
0 0 0 0 1 1   3   5   6 

hypothesis is H 0 : 2  5  1; 1  4  0; 5  6  3.

Now define m  Rb  q . As usual, we want to investigate if the likely deviation


of m from 0 is due to sampling error, in which case we cannot reject the null, or if it is
statistically significant. Since the OLS estimator b is normally distributed and m is a
linear function of b, we know that m is also normally distributed.

Under the null, we have

E m X   RE b X   q = R  q  0

and

var m X   var  Rb  q X   R var b X  RT   2 R  X T X  RT .


1

1
Now define W  mT var m X  m . According to the result we proved above,

2
We have used the following result: If z N 0, I  and A is idempotent, then xT Ax has a chi-squared
distribution with degrees of freedom equal to the rank of A. Notice that we set A  I .
4

1
W   Rb  q   2 R  X T X  RT 
1
 Rb  q 
T

 
1
 Rb  q 
T  R  X T X 1 RT   Rb  q 
  
2
 2  J .

Intuitively, the larger the value of m the worse the failure of least squares to
satisfy the restriction. Therefore, a large chi-squared value will weigh against the
hypothesis.

There is, however, one problem with the statistic W defined above: It depends on
 2 , an unknown parameter. Let’s replace  2 with the estimator s 2 , as we typically do,
and define a new variable F  W 2 Js 2 . Notice that

  R  X T X 1 RT 
1

  Rb  q   Rb  q    1    2   N  K 
T
 
F    2  
 2  J  s  N  K 
  (1)
1
 Rb  q   2 R  X T X  RT   Rb  q 
T 1
J
 .
 N  K  s 2  2   N  K 

We know that W has a chi-squared distribution with J degrees of freedom and that
 N  K  s2  2 has a chi-squared distribution with N  K degrees of freedom. We also

know that if x1 and x2 are two independent chi-squared variables with degrees of

x1 n1
freedom n1 and n2 respectively, then the ratio has an F distribution with n1 and
x2 n2

n2 degrees of freedom. So if we can show that W and  N  K  s 2  2 are independent,

the statistic F defined above has an F distribution with J and N  K degrees of


freedom.

To show independence, notice first that, under the null,

Rb  q  Rb  R  R  b     R  X T X  X T  .
1

Thus

R b       
= R XT X  XT 
1
  D ,
    
5

where D  R  X T X  X T , and the numerator of F can be written as


1

 R b      R b   
T ´T
1
    
   
T 1 T 1
 R X X R
 D   C D  
        

J J
   T 1         
T T

  D C D    T  
        ,
 
J J

where C = R  X T X  RT and T  DT C 1 D .
1

As for the denominator, we have already seen that it can be written as

   
T

  M 
    .
N K

Recall that M is an idempotent matrix. T is also an idempotent matrix, for


T  DT C 1 D  R  X T X  X T C RXT X  XT
1 T 1
1

 X  X T X  RT C 1 R  X T X  X T ,
1 1

and

T 2  X  X T X  RT C 1 R  X T X  X T X  X T X  RT C 1 R  X T X  X T
1 1 1 1

 X  X T X  RT C 1 R  X T X  RT C 1 R  X T X  X T
1 1 1

 X  X T X  RT C 1CC 1 R  X T X  X T
1 1

 X  X T X  RT C 1 R  X T X  X T .
1 1

       
T T

We have thus shown that   M   and   T   are two idempotent


       
quadratic forms in     , a standard normal vector. Since TM  0 , these quadratic

forms are independent3. As a result, the numerator and denominator of F are also
independent.

The formula for the statistic F in (1) can be simplified to

3
We are using the following result here: If xT Ax and xT Bx are idempotent quadratic forms in a
standard normal vector, then these quadratic forms are independent if AB  0 . See Greene Appendix B.
6

1
 Rb  q   R  s 2  X T X   RT 
1
 Rb  q 
T

F   .
J

When there is only one linear restriction, we can use the sample estimate of the
restriction r11  r2 2   rK  K  r T   q to conduct a t-test. The sample estimate of

qˆ  q
r T  is r T b  qˆ , and so we can form the t statistic t  . If q̂ differs significantly
se  qˆ 

from q, doubt is cast on the validity of the null hypothesis. More precisely, if the
absolute value of the t ratio is larger than the appropriate critical value, we reject the
null.

But we need an estimate of the standard error of q̂ in order to perform the test.
This can be easily obtained, for q̂ is a linear function of b, whose estimated covariance

matrix is s 2  X T X  . Recall that var aT x   aT a , where   var  x  , and so


1

Est.var  qˆ X   r T  s 2  X T X 1  r .
 

Notice that
1

       
 r T  s 2 X T X 1  r 
r b  q
T
T
 T
 qˆ  q   r b q
2

t2  ,
var  qˆ  q X  1

which coincides with the F statistic for testing a single restriction.

Example: (Greene, pp. 86-87) Consider the model

ln It  1  2it  3pt  4 ln Yt  5t   t ,

and say we are interested in testing the hypothesis that “investors care only about real
interest rates.” A natural way to do this is to stipulate the null hypothesis
H 0 : 2  3  0 that equal increases in interest rates and the rate of inflation have no
independent effect on investment.

Greene (p. 86) gives estimates of the parameters of the model using quarterly data
from 1950.1 to 2000.4. He also computes the standard error of our estimator q̂  b2  b3

and obtains se  qˆ   0.002866 . We can then form the t-ratio


7

0.00860  0.00331
t  1.845 .
0.002866

The appropriate value from the t distribution (with 203-54 degrees of freedom) at a
significance level of 5% is 1.96. Therefore, the null hypothesis is not rejected.

Now let’s test the joint hypothesis that 2  3  0 ,  4  1 and 5  0 . The


appropriate vectors and matrices are

0 1 1 0 0  0 
R  0 0 0 1 0  , q  1  .
 
0 0 0 0 1  0 

We can then compute

 0.0053
Rb  q   0.9302 
 0.0057 

and F  109.84 . The 5% critical value from the F distribution (with 3 and 198 degrees
of freedom) is 2.65. We therefore reject the null hypothesis.

One last comment about the F test of linear restrictions is that the F statistic can
be expressed in terms of measures of goodness of fit. Let R 2 be the coefficient of
determination of the original, unrestricted regression model, and let R*2 be that of the
restricted model. The restricted model is simply the original model subject to the set of
constraints R  q 5.

One can show that

F
R 2
 R*2  J
.
1  R   N  K 
2

This formula is an example of an approach to hypothesis testing that focuses on


the fit of the regression. We will not pursue this approach here.

4
There are 203 observations, for one observation is lost in computing the change in the consumer price
index.
5
The OLS estimator of the restricted model is the solution to the problem of minimizing
 y  X    y  X   subject to R  q.
T
8

1.3 Prediction6

Regression results are commonly used to predict the value of the dependent
variable. Suppose we want to predict the value y associated with a vector of independent

variables x 0 . This value would be y 0   x 0     0 .


T

The Gauss-Markov theorem tells us that yˆ 0   x 0  b is the best (minimum


T

variance) linear unbiased estimator of E  y 0 x 0  . The error in our prediction is

e0  y 0  yˆ 0     b  x 0   0 ,
T

and the prediction variance is

var e0 X , x 0    2  var    b  x 0 X , x 0    2   x 0   2  X T X 1  x 0 .


T T

   

When the regression contains a constant term, one can show that

 1 K 1 K 1 jk 
var e0    2 1     x 0j  x j  xk0  xk  Z T M 0 Z   ,
 n j 1 k 1 

where Z is the matrix with the K  1 columns of X not including the constant. After
inspection of this formula, we notice that the prediction variance is proportional to the
distance of the elements of x 0 from the center of the data. This makes sense, for the
degree of uncertainty should increase as we venture away from the average.

The prediction variance formulas above include the unknown parameter  2 , and
so we need to replace it with an estimator. As usual, we use s 2 for that. A confidence
interval around y can then be formed:

prediction interval  yˆ 0  t1 2 se  e0  .

The figure below shows the behavior of the prediction interval for a bivariate
regression.

6
The term “prediction” typically means to use the regression model to compute fitted values of the
dependent variable, either in-sample or out-of-sample. The term “forecasting” is normally associated with
time-series models, where one is interested in future values of the dependent variable, and so time plays
an explicit role.
9

Source: Greene.

Example: We continue the analysis of the previous example. In the first quarter of 2001,
the average rate for the 90-day T-bill was 4.48%, real GDP was 9316.8, the rate of
inflation on an yearly basis was 5.26%, and the time trend would equal 204. In order to
predict the natural log of investment in the first quarter of 2001, we use the data vector
(notice that we take the natural log of real GDP)

x 0  1 4.48 5.26 9.1396 204 .


T

Given the parameter estimates, we can compute

 9.1345 
 0.008601
 
   
T
x 0
b  1 4.48 5.26 9.1396 204  0.003308   7.3312 .
 
 1.9302 
 0.005659 

The estimated prediction variance is s 2   x 0   s 2  X T X   x 0  0.0076912 ,


T 1

 
and so the prediction standard deviation is 0.087699. We can then obtain the prediction
interval: 7.3312  1.96  0.087699    7.1593,7.5031 . The actual value of the yearly

rate of real investment in the first quarter of 2001 was 1721, and its natural log is
7.4507. Therefore the true value belongs to the prediction interval.

What we did in this section assumes that x 0 is either known with certainty or can
be forecasted perfectly. If, however, x 0 itself needs to be forecasted, then the formulas
10

we obtained need to be modified to include the variation in x 0 . We will not discuss this
case here.

You might also like