Teaching Notes 3
Teaching Notes 3
In this section, which is concerned only with linear restrictions, we will expand on
the theme of testing hypotheses using the F distribution. In addition, we will study how
regression estimates can be used for prediction or forecasting.
An alternative model is
The new model embodies the theoretical conjecture that “investors care about real
interest rates,” but since the second equation contains both nominal interest and
inflation, the theory does not imply testable restrictions on the model.
However, if the theoretical conjecture is that “investors care only about real
interest rates,” the model becomes
ln It 1 2 it pt 4 ln Yt 5t t ,
2
which is a restricted version of the original model. Namely, the second model can be
obtained from the first by setting 2 3 . We can then test the hypothesis 2 3 0 .
The first and third equations give an example of nested models: The hypothesis
specified by the restricted model is contained within the unrestricted model. The first
equation specifies a model with five unrestricted parameters 1 , 2 , 3 , 4 , 5 ,
whereas the vector of parameters associated with the third equation is
1 , 2 , 2 , 4 , 5 . The latter subset of values is contained within the unrestricted set.
Consider now an alternative pair of models. Model 1 makes the conjecture that
“investors care only about inflation,” whereas Model 2’s conjecture is that “investors
care only about the nominal interest rate.” Since the parameter vector associated with
the first model is 1 ,0, 3 , 4 , 5 , while that associated with the second is
be written as q zT 1z . Notice that z is normally distributed and that is also the
variance matrix of z1.
The variance matrix is positive definite, and so has a (symmetric) square root
matrix 1 2 , defined by 1 21 2 . Hence 1 1 21 2 and
zT 1z = zT 1 2 1 2 z 1 2 z 1 2 z wT w ,
T T
1
We have used the well-known result that if x N then Ax b N b T .
3
x 1 x 2 n 2.
T
Next consider testing a null hypothesis that consists of a set of J linear restrictions,
i.e., H 0 : R q 0 , where R is a J K matrix of constants, is K 1 , q is a J 1
0 1 0 0 1 0 1 2 5
3. R 1 0 0 1 0 0 q 0 . In this case R 1 4 and the null
0 0 0 0 1 1 3 5 6
hypothesis is H 0 : 2 5 1; 1 4 0; 5 6 3.
E m X RE b X q = R q 0
and
1
Now define W mT var m X m . According to the result we proved above,
2
We have used the following result: If z N 0, I and A is idempotent, then xT Ax has a chi-squared
distribution with degrees of freedom equal to the rank of A. Notice that we set A I .
4
1
W Rb q 2 R X T X RT
1
Rb q
T
1
Rb q
T R X T X 1 RT Rb q
2
2 J .
Intuitively, the larger the value of m the worse the failure of least squares to
satisfy the restriction. Therefore, a large chi-squared value will weigh against the
hypothesis.
There is, however, one problem with the statistic W defined above: It depends on
2 , an unknown parameter. Let’s replace 2 with the estimator s 2 , as we typically do,
and define a new variable F W 2 Js 2 . Notice that
R X T X 1 RT
1
Rb q Rb q 1 2 N K
T
F 2
2 J s N K
(1)
1
Rb q 2 R X T X RT Rb q
T 1
J
.
N K s 2 2 N K
We know that W has a chi-squared distribution with J degrees of freedom and that
N K s2 2 has a chi-squared distribution with N K degrees of freedom. We also
know that if x1 and x2 are two independent chi-squared variables with degrees of
x1 n1
freedom n1 and n2 respectively, then the ratio has an F distribution with n1 and
x2 n2
Rb q Rb R R b R X T X X T .
1
Thus
R b
= R XT X XT
1
D ,
5
R b R b
T ´T
1
T 1 T 1
R X X R
D C D
J J
T 1
T T
D C D T
,
J J
where C = R X T X RT and T DT C 1 D .
1
T
M
.
N K
T DT C 1 D R X T X X T C RXT X XT
1 T 1
1
X X T X RT C 1 R X T X X T ,
1 1
and
T 2 X X T X RT C 1 R X T X X T X X T X RT C 1 R X T X X T
1 1 1 1
X X T X RT C 1 R X T X RT C 1 R X T X X T
1 1 1
X X T X RT C 1CC 1 R X T X X T
1 1
X X T X RT C 1 R X T X X T .
1 1
T T
forms are independent3. As a result, the numerator and denominator of F are also
independent.
3
We are using the following result here: If xT Ax and xT Bx are idempotent quadratic forms in a
standard normal vector, then these quadratic forms are independent if AB 0 . See Greene Appendix B.
6
1
Rb q R s 2 X T X RT
1
Rb q
T
F .
J
When there is only one linear restriction, we can use the sample estimate of the
restriction r11 r2 2 rK K r T q to conduct a t-test. The sample estimate of
qˆ q
r T is r T b qˆ , and so we can form the t statistic t . If q̂ differs significantly
se qˆ
from q, doubt is cast on the validity of the null hypothesis. More precisely, if the
absolute value of the t ratio is larger than the appropriate critical value, we reject the
null.
But we need an estimate of the standard error of q̂ in order to perform the test.
This can be easily obtained, for q̂ is a linear function of b, whose estimated covariance
Est.var qˆ X r T s 2 X T X 1 r .
Notice that
1
r T s 2 X T X 1 r
r b q
T
T
T
qˆ q r b q
2
t2 ,
var qˆ q X 1
and say we are interested in testing the hypothesis that “investors care only about real
interest rates.” A natural way to do this is to stipulate the null hypothesis
H 0 : 2 3 0 that equal increases in interest rates and the rate of inflation have no
independent effect on investment.
Greene (p. 86) gives estimates of the parameters of the model using quarterly data
from 1950.1 to 2000.4. He also computes the standard error of our estimator q̂ b2 b3
0.00860 0.00331
t 1.845 .
0.002866
The appropriate value from the t distribution (with 203-54 degrees of freedom) at a
significance level of 5% is 1.96. Therefore, the null hypothesis is not rejected.
0 1 1 0 0 0
R 0 0 0 1 0 , q 1 .
0 0 0 0 1 0
0.0053
Rb q 0.9302
0.0057
and F 109.84 . The 5% critical value from the F distribution (with 3 and 198 degrees
of freedom) is 2.65. We therefore reject the null hypothesis.
One last comment about the F test of linear restrictions is that the F statistic can
be expressed in terms of measures of goodness of fit. Let R 2 be the coefficient of
determination of the original, unrestricted regression model, and let R*2 be that of the
restricted model. The restricted model is simply the original model subject to the set of
constraints R q 5.
F
R 2
R*2 J
.
1 R N K
2
4
There are 203 observations, for one observation is lost in computing the change in the consumer price
index.
5
The OLS estimator of the restricted model is the solution to the problem of minimizing
y X y X subject to R q.
T
8
1.3 Prediction6
Regression results are commonly used to predict the value of the dependent
variable. Suppose we want to predict the value y associated with a vector of independent
e0 y 0 yˆ 0 b x 0 0 ,
T
When the regression contains a constant term, one can show that
1 K 1 K 1 jk
var e0 2 1 x 0j x j xk0 xk Z T M 0 Z ,
n j 1 k 1
where Z is the matrix with the K 1 columns of X not including the constant. After
inspection of this formula, we notice that the prediction variance is proportional to the
distance of the elements of x 0 from the center of the data. This makes sense, for the
degree of uncertainty should increase as we venture away from the average.
The prediction variance formulas above include the unknown parameter 2 , and
so we need to replace it with an estimator. As usual, we use s 2 for that. A confidence
interval around y can then be formed:
The figure below shows the behavior of the prediction interval for a bivariate
regression.
6
The term “prediction” typically means to use the regression model to compute fitted values of the
dependent variable, either in-sample or out-of-sample. The term “forecasting” is normally associated with
time-series models, where one is interested in future values of the dependent variable, and so time plays
an explicit role.
9
Source: Greene.
Example: We continue the analysis of the previous example. In the first quarter of 2001,
the average rate for the 90-day T-bill was 4.48%, real GDP was 9316.8, the rate of
inflation on an yearly basis was 5.26%, and the time trend would equal 204. In order to
predict the natural log of investment in the first quarter of 2001, we use the data vector
(notice that we take the natural log of real GDP)
9.1345
0.008601
T
x 0
b 1 4.48 5.26 9.1396 204 0.003308 7.3312 .
1.9302
0.005659
and so the prediction standard deviation is 0.087699. We can then obtain the prediction
interval: 7.3312 1.96 0.087699 7.1593,7.5031 . The actual value of the yearly
rate of real investment in the first quarter of 2001 was 1721, and its natural log is
7.4507. Therefore the true value belongs to the prediction interval.
What we did in this section assumes that x 0 is either known with certainty or can
be forecasted perfectly. If, however, x 0 itself needs to be forecasted, then the formulas
10
we obtained need to be modified to include the variation in x 0 . We will not discuss this
case here.