Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
Overview
Chapter 1 introduced least squares regression analysis, a mathematical
technique for fitting a relationship given suitable data on the variables
involved. It is a fundamental chapter because much of the rest of the
text is devoted to extending the least squares approach to handle more
complex models, for example models with multiple explanatory variables,
nonlinear models, and models with qualitative explanatory variables.
However, the mechanics of fitting regression equations are only part of
the story. We are equally concerned with assessing the performance of our
regression techniques and with developing an understanding of why they
work better in some circumstances than in others. Chapter 2 is the starting
point for this objective and is thus equally fundamental. In particular, it
shows how two of the three main criteria for assessing the performance
of estimators, unbiasedness and efficiency, are applied in the context of
a regression model. The third criterion, consistency, will be considered in
Chapter 8.
Learning outcomes
After working through the corresponding chapter in the text, studying the
corresponding slideshows, and doing the starred exercises in the text and
the additional exercises in this guide, you should be able to explain what is
meant by:
• cross-sectional, time series, and panel data
• unbiasedness of OLS regression estimators
• variance and standard errors of regression coefficients and how they
are determined
• Gauss–Markov theorem and efficiency of OLS regression estimators
• two-sided t tests of hypotheses relating to regression coefficients and
one-sided t tests of hypotheses relating to regression coefficients
• F tests of goodness of fit of a regression equation
in the context of a regression model. The chapter is a long one and you
should take your time over it because it is essential that you develop a
perfect understanding of every detail.
Further material
Derivation of the expression for the variance of the naïve estimator in
Section 2.3.
The variance of the naïve estimator in Section 2.3 and Exercise 2.9 is not
of any great interest in itself but its derivation provides an example of how
one obtains expressions for variances of estimators in general.
In Section 2.3 we considered the naïve estimator of the slope coefficient
derived by joining the first and last observations in a sample and
calculating the slope of that line:
37
20 Elements of econometrics
Yn − Y1
b2 = .
X n − X1
(
) u − u1
σ b2 = E [b2 − b 2 ]2 = E b 2 + n
X n − X1
− b
n
2 = E
X − X
1
.
2
n 1
On the assumption that X is nonstochastic, this can be written as
2
σ 2
b2
=
1
E ([u n )
− u1 ] .
2
X n − X1
2
=
1
[( ) ( )
E u n + E u1 − 2 E (u n u1 ) .
2 2
]
X n − X1
( ) (
= X n − X + X − A + X1 − X + X − A )
2 2
= (X − X ) + (X − A) + 2(X − X )(X − A)
2 2
n n
+ (X − X ) + (X − A) + 2(X − X )(X − A)
2 2
1 1
= (X − X ) + (X − X ) − 2(X − A) = (X − X ) + (X − X ) − 2(A − X )
2 2 2 2 2 2
1 n 1 n
= (X − X ) + (X − X ) − (X + X − 2 X )
2 1 2 2
38 1 n 1 n
2
Chapter 2: Properties of the regression coefficients and hypothesis testing
Additional exercises
A2.1
A variable Yi is generated as
Yi = β1 + ui
where β1 is a fixed parameter and ui is a disturbance term that is
independently and identically distributed with expected value 0 and
population variance σ u2 . The least squares estimator of β1 is Y , the
sample mean of Y. However a researcher believes that Y is a linear
function of another variable X and uses ordinary least squares to fit the
relationship
Yˆ = b + b X
1 2
A2.2
With the model described in Exercise A2.1, standard theory states that the
population variance of the researcher’s estimator b1 is
1 X2 . In general, this is larger than the population
σ u2 +
(
n ∑ Xi − X 2
)
σ2
variance of Y , which is u . Explain the implications of the difference in
n
the variances.
In the special case where X = 0 , the variances are the same. Give an
intuitive explanation.
A2.3
Using the output for the regression in Exercise A1.9 in the text, reproduced
below, perform appropriate statistical tests.
. reg CHILDREN SM
------------------------------------------------------------------------------
CHILDREN | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SM | -.2525473 .0316673 -7.98 0.000 -.314754 -.1903406
_cons | 7.198478 .3773667 19.08 0.000 6.457186 7.939771
------------------------------------------------------------------------------
39
20 Elements of econometrics
A2.4
Using the output for the regression in Exercise A1.1, reproduced below,
perform appropriate statistical tests.
UHJ)'+2(;3LI)'+2!
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_H5VTXDUHG
$GM5VTXDUHG
7RWDO_H5RRW06(
)'+2_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
(;3_
BFRQV_
A2.5
Using the output for your regression in Exercise A1.2, perform appropriate
statistical tests.
A2.6
Using the output for the regression in Exercise A1.3, reproduced below,
perform appropriate statistical tests.
UHJ:(,*+7:(,*+7
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
:(,*+7_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
:(,*+7_
BFRQV_
A2.7
Using the output for the regression in Exercise A1.4, reproduced below,
perform appropriate statistical tests.
UHJ($51,1*6+(,*+7
6RXUFH_66GI061XPEHURIREV
)
0RGHO_3URE!)
5HVLGXDO_5VTXDUHG
$GM5VTXDUHG
7RWDO_5RRW06(
($51,1*6_&RHI6WG(UUW3!_W_>&RQI,QWHUYDO@
+(,*+7_
BFRQV_
40
Chapter 2: Properties of the regression coefficients and hypothesis testing
A2.8
With the information given in Exercise A1.5, how would the change in the
measurement of GDP affect
• the standard error of the coefficient of GDP
• the F statistic for the equation?
A2.9
With the information given in Exercise A1.6, how would the change in the
measurement of GDP affect
• the standard error of the coefficient of GDP
• the F statistic for the equation?
A2.10
[This is a continuation of Exercise 1.15 in the text.] A sample of data
consists of n observations on two variables, Y and X. The true model is
Yi = b 1 + b 2 X i + u i
A2.11
A variable Y depends on a nonstochastic variable X with the relationship
Y = β1 + β2X + u
where u is a disturbance term that satisfies the regression model
assumptions. Given a sample of n observations, a researcher decides to
estimate β2 using the expression
n
∑X Y
i =1
i i
b2 = n .
∑
i =1
X i2
(This is the OLS estimator of β2 for the model Y = β2X + u). It can be
shown that the population variance of this estimator is σ 2 .
u
n
∑X
i =1
i
2
?
∑(X )
n
2
i −X
i =1
∑(X )(Y )
n
i −X i −Y ?
i =1
∑(X )
n
2
i −X
i =1
A2.12
A variable Yi is generated as
Yi = β1 + β2Xi + ui (1)
where β1 and β2 are fixed parameters and ui is a disturbance term that
satisfies the regression model assumptions. The values of X are fixed
and are as shown in the figure opposite. Four of them, X1 to X4, are close
together. The fifth, X5, is much larger. The corresponding values that Y
would take, if there were no disturbance term, are given by the circles on
the line. The presence of the disturbance term in the model causes the
actual values of Y in a sample to be different. The solid black circles depict
a typical sample of observations.
42
Chapter 2: Properties of the regression coefficients and hypothesis testing
<
E
; ; ;; ; ;
2.5
An investigator correctly believes that the relationship between two
variables X and Y is given by
Yi = β1 + β2Xi + ui.
Given a sample of observations on Y, X, and a third variable Z (which is
not a determinant of Y), the investigator estimates β2 as
∑(Z
n
i −Z )(Y i −Y )
i =1
.
∑(Z
n
i −Z )(X i −X )
i =1
43
20 Elements of econometrics
Hence
E (b2 ) = β 2 +
∑ (Z i − Z )E (ui − u ) = β .
∑ (Z i − Z )(Xi − X )
2
2.6
Using the decomposition of b1 obtained in Exercise 2.1, derive the
expression for σ b21 given in equation (2.38).
n
1
Answer: b1 = b 1 + ∑ c i u i , where ci = – ai X . Hence
i =1 n
n
2
n
1 X n n
σ b2 = E ∑ c i u i = σ u2 ∑ c i2 = σ u2 n 2 − 2 ∑a i + X 2 ∑ a i2 .
1
i =1 i =1 n n i =1 i =1
n n
1
From Box 2.2, ∑a i = 0 and ∑ a i2 = .
(X i − X )
n
∑
i =1 2
i =1
i =1
Hence
2 1 X2 .
σ b2 = σu +
∑ (X
n
n
)
1
2
i −X
i =1
2.7
Given the decomposition in Exercise 2.2 of the OLS estimator of β2 in
the model Yi = b 2 X i + u i , demonstrate that the variance of the slope
coefficient is given by
σ u2
σ b2 = n
.
2
∑X j =1
2
j
Answer:
n
Xi
b2 = b 2 + ∑ d i u i , where d i = n
, and E (b2 ) = b 2 . Hence
i =1
∑X
j =1
2
j
n
2
n
n
X i2 σ u2 n
σ u2
σ b2 = E ∑ d i u i = σ u2 ∑ d i2 = σ u2 ∑ = ∑X i
2
= n
.
i =1 n n
2 2
i =1
2
i =1
∑ X j 2
∑ X j 2
i =1
∑X 2
j
j =1 j =1 j =1
2.10
It can be shown that the variance of the estimator of the slope coefficient
in Exercise 2.4,
∑(Z
n
i )(
− Z Yi − Y )
i =1
∑(Z
n
i −Z )(X i −X )
i =1
44
Chapter 2: Properties of the regression coefficients and hypothesis testing
is given by
σ u2 1
σ b2 = ×
∑ (X
n
)
2
2
2 rXZ
i −X
i =1
where rXZ is the correlation between X and Z. What are the implications for
the efficiency of the estimator?
Answer:
If Z happens to be an exact linear function of X, the population variance
1
will be the same as that of the OLS estimator. Otherwise 2 will be
rXZ
greater than 1, the variance will be larger, and so the estimator will be less
efficient.
2.13
Suppose that the true relationship between Y and X is Yi = b 1 + b 2 X i + u i
and that the fitted model is Yˆi = b1 + b2 X i . In Exercise 1.12 it was shown
that if X i* = µ1 + µ 2 X i , and Y is regressed on X * , the slope coefficient
b2* = b2 µ 2 . How will the standard error of b2* be related to the standard
error of b2?
Answer:
In Exercise 1.22 it was demonstrated that the fitted values of Y would be
the same. This means that the residuals are the same, and hence
s u2 , the estimator of the variance of the disturbance term, is the same. The
standard error of b2* is then given by
s u2 s u2
( )
s.e. b2* = =
∑ (X i* − X * ) ∑ (µ1 + µ 2 X i − µ1 − µ 2 X )
2 2
s u2 1
= = s.e. (b2 ) .
∑ (X i − X ) µ2
2
µ 2
2
2.15
A researcher with a sample of 50 individuals with similar education
but differing amounts of training hypothesises that hourly earnings,
EARNINGS, may be related to hours of training, TRAINING, according to
the relationship
EARNINGS = β1 + β2TRAINING + u .
He is prepared to test the null hypothesis H0: β2 = 0 against the alternative
hypothesis H1: β2 ≠ 0 at the 5 per cent and 1 per cent levels. What should
he report
1. if b2 = 0.30, s.e.(b2) = 0.12?
2. if b2 = 0.55, s.e.(b2) = 0.12?
3. if b2 = 0.10, s.e.(b2) = 0.12?
4. if b2 = –0.27, s.e.(b2) = 0.12?
Answer:
There are 48 degrees of freedom, and hence the critical values of t at the
5 per cent, 1 per cent, and 0.1 per cent levels are 2.01, 2.68, and 3.51,
respectively.
45
20 Elements of econometrics
1. The t statistic is 2.50. Reject H0 at the 5 per cent level but not at the 1
per cent level.
2. t = 4.58. Reject at the 0.1 per cent level.
3. t = 0.83. Fail to reject at the 5 per cent level.
4. t = –2.25. Reject H0 at the 5 per cent level but not at the 1 per cent
level.
2.20
Explain whether it would have been possible to perform one-sided tests
instead of two-sided tests in Exercise 2.15. If you think that one-sided tests
are justified, perform them and state whether the use of a one-sided test
makes any difference.
Answer:
First, there should be a discussion of whether the parameter β2 in
EARNINGS = β1 + β2TRAINING + u
can be assumed not to be negative. The objective of training is to impart
skills. It would be illogical for an individual with greater skills to be paid
less on that account, and so we can argue that we can rule out β2 < 0. We
can then perform a one-sided test. With 48 degrees of freedom, the critical
values of t at the 5 per cent, 1 per cent, and 0.1 per cent levels are 1.68,
2.40, and 3.26, respectively.
1. The t statistic is 2.50. We can now reject H0 at the 1 per cent level (but
not at the 0.1 per cent level).
2. t = 4.58. Not affected by the change. Reject at the 0.1 per cent level.
3. t = 0.83. Not affected by the change. Fail to reject at the 5 per cent
level.
4. t = –2.25. Fail to reject H0 at the 5 per cent level. Here there is a
problem because the coefficient has an unexpected sign and is large
enough to reject H0 at the 5 per cent level with a two-sided test.
In principle we should ignore this and fail to reject H0. Admittedly, the
likelihood of such a large negative t statistic occurring under H0 is very
small, but it would be smaller still under the alternative hypothesis
H1: β2 >0.
However we should consider two further possibilities. One is that the
justification for a one-sided test is incorrect. For example, some jobs
pay relatively low wages because they offer training that is valued by
the employee. Apprenticeships are the classic example. Alternatively,
workers in some low-paid occupations may, for technical reasons, receive a
relatively large amount of training. In either case, the correlation between
training and earnings might be negative instead of positive.
Another possible reason for a coefficient having an unexpected sign is that
the model is misspecified in some way. For example, the coefficient might
be distorted by omitted variable bias, to be discussed in Chapter 6.
2.25
Suppose that the true relationship between Y and X is Yi = b 1 + b 2 X i + u i
and that the fitted model is Yˆi = b1 + b2 X i . In Exercise 1.12 it was shown
that if X i* = µ 1 + µ 2 X i , and Y is regressed on X * , the slope coefficient
b2* = b2 µ 2 . How will the t statistic for b2* be related to the t statistic for
b2? (See also Exercise 2.13.)
46
Chapter 2: Properties of the regression coefficients and hypothesis testing
Answer:
( )
From Exercise 2.13, we have s.e. b2* = s.e. (b2 ) / µ 2 . Since b2* = b2 µ 2 , it
follows that the t statistic must be the same.
Alternatively, since we saw in Exercise 1.22 that R2 must be the same,
it follows that the F statistic for the equation must be the same. For a
simple regression the F statistic is the square of the t statistic on the slope
coefficient, so the t statistic must be the same.
2.28
Calculate the 95 per cent confidence interval for β2 in the price inflation/
wage inflation example:
p̂ = –1.21 + 0.82w
.
(0.05) (0.10)
What can you conclude from this calculation?
Answer:
With n equal to 20, there are 18 degrees of freedom and the critical value
of t at the 5 per cent level is 2.10. The 95 per cent confidence interval is
therefore
0.82 – 0.10 × 2.10 ≤ β2 ≤ 0.82 + 0.10 × 2.10
that is,
0.61 ≤ β2 ≤ 1.03.
2.34
Suppose that the true relationship between Y and X is Yi = b 1 + b 2 X i + u i
and that the fitted model is Yˆi = b1 + b2 X i . Suppose that X i* = µ1 + µ 2 X i , and
Y is regressed on X * . How will the F statistic for this regression be related
to the F statistic for the original regression? (See also Exercises 1.22, 2.13,
and 2.24.)
Answer:
We saw in Exercise 1.22 that R2 would be the same, and it follows that F
must also be the same.
A2.1
First we need to show that E (b2 ) = 0 .
∑ (X i)(Y − Y ) ∑(X − X ) (β + u − β
−X i i 1 i 1 −u) ∑(X i −X ) (u i −u)
b2 = i
= i
= i
.
∑(X − X ) ∑(X − X ) ∑(X )
2 2 2
i i i −X
i i i
47
20 Elements of econometrics
∑i (X i − X ) E (u i − u ) = 0
1
=
∑i (X i − X )
2
A2.2
If X = 0 , the estimators are identical. Y − b2 X reduces to Y .
A2.3
The t statistic for the coefficient of SM is –7.98, very highly significant. The
t statistic for the intercept is even higher, but it is of no interest. All the
mothers in the sample must have had at least one child (the respondent),
for otherwise they would not be in the sample. The F statistic is 63.60,
very highly significant.
A2.4
The t statistic for the coefficient of EXP is 19.50, very highly significant.
There is little point performing a t test on the intercept, given that it has
no plausible meaning. The F statistic is 380.4, very highly significant.
A2.5
The slope coefficient for every category is significantly different from
zero at a very high significance level, with the exception of local public
transportation. The coefficient for the latter is virtually equal to zero and
the t statistic is only 0.40. Evidently this category is on the verge of being
an inferior good.
A2.6
A straight t test on the coefficient of WEIGHT85 is not very interesting
since we know that those who were relatively heavy in 1985 will also be
relatively heavy in 2002. The t statistic confirms this obvious fact. Of more
interest would be a test investigating whether those relatively heavy in
1985 became even heavier in 2002. We set up the null hypothesis that they
did not, H0: β2 = 1, and see if we can reject it. The t statistic for this test is
W
and hence the null hypothesis is not rejected. The constant indicates that
the respondents have tended to put on 23.6 pounds over the interval,
irrespective of their 1985 weight. The null hypothesis that the general
increase is zero is rejected at the 0.1 per cent level.
48
Chapter 2: Properties of the regression coefficients and hypothesis testing
A2.7
The t statistic for height, 5.42, suggests that the effect of height on
earnings is highly significant, despite the very low R2. In principle the
estimate of an extra 86 cents of hourly earnings for every extra inch of
height could have been a purely random result of the kind that one obtains
with nonsense models. However, the fact that it is apparently highly
significant causes us to look for other explanations, the most likely one
being that suggested in the answer to Exercise A1.4. Of course we would
not attempt to test the negative constant.
A2.8
The standard error of the coefficient of GDP. This is given by
su* ,
∑ (G *
i −G )
* 2
is estimated as
∑ ei*2 . Since RSS is unchanged, s * = s .
u
n−2 u
* *
We saw in Exercise A1.5 that Gi − G = Gi − G for all i. Hence the new
standard error is given
su
by and is unchanged.
∑ (G )
2
i −G
2
ESS
F=
RSS / n − 2
where ESS = explained sum of squares = ∑ Yˆi* − Yˆ * .
Since ei* = ei , Yˆi* = Yˆi and ESS is unchanged. We saw in Exercise A1.5 that
RSS is unchanged. Hence F is unchanged.
A2.9
The standard error of the coefficient of GDP. This is given by
su* ,
∑ (G )
* 2
i −G*
n−2
*
where n is the number of observations. We saw in Exercise 1.6 that ei = ei
*
and so RSS is unchanged. Hence su = su. Thus the new standard error is
given by
su 1 su
= = 0.005 .
∑( 2 ) ∑( )
2 2
2G − 2G i G −G i
ESS 2
F=
RSS / n − 2
where ESS = explained sum of squares = ∑ Yˆ
i
*
− Yˆ * .
A2.10
One way of demonstrating that Yˆi* = Yˆi − Y :
Yˆi* = b1* + b2* X i* = b2 X i − X ( )
( )
Yˆi − Y = (b1 + b2 X i ) − Y = Y − b2 X + b2 X i − Y = b2 X i − X . ( )
49
20 Elements of econometrics
The standard error of the slope coefficient in (2) is equal to that in (1).
s u*
2
s u2 s u2
σˆ =2
= = = σˆ b2
∑ (X ) ∑X ∑ (X ) .
b2* * 2 *2 2
− X*
2
i i i −X
estimated β2 as b2* =
∑X Y * *
i i
. However, Exercise 1.15 demonstrates
∑X *2
i
that, effectively, he has done exactly this. Hence the estimator will be the
same. It follows that dropping the unnecessary intercept would not have
led to a gain in efficiency.
A2.11
n n n n
∑X Y i i ∑ X (b i 1 + b 2 X i + ui ) b1 ∑ X i ∑X u i i
b2 = i =1
n
= i =1
n
= n
i =1
+ b2 + i =1
n .
∑X
i =1
i
2
∑X
i =1
i
2
∑X
i =1
i
2
∑Xi =1
i
2
Hence
n
n n n
b1 ∑ X i ∑ X i ui b1 ∑ X i ∑ X E (u ) i i
E (b2 ) = n
i =1
+ b 2 + E i =1n =
n
i =1
+ b2 + i =1
n
∑X i
2
∑ Xi
2
∑ X i
2
∑X i
2
i =1 i =1 i =1 i =1
50
Chapter 2: Properties of the regression coefficients and hypothesis testing
Thus b2 will in general be a biased estimator. The sign of the bias depends
n
on the signs of β1 and ∑X i =1
i .
i −X i −Y
b2 is more efficient than i =1 unless X = 0 since its
∑(X )
n
2
i −X
i =1 n
σ 2 ∑ (X i −X )(Y i −Y )
u i =1
population variance is , whereas that of is
∑(X )
n n
2
∑ i =1
X i2
i =1
i −X
σ u2 σ u2 .
=
∑(X ) ∑X
n n
2 2 2
i −X i − nX
i =1 i =1
∑ (X )( ) ∑ X (Y ) ∑X Y
n n n n n
i − X Yi − Y i i −Y i i Y ∑ Xi ∑X Y i i
i =1 i =1 i =1 i =1 i =1
= = − =
∑(X )
n n n n n
∑X ∑X ∑X ∑X
2 2 2 2 2
i −X i i i i
i =1 i =1 i =1 i =1 i =1
n
since ∑X i =1
i = nX = 0 .
∑(X )
n
2
If there is little variation in X in the sample, i −X
i =1
∑(X )(Y )
n
∑(X )
n
2
i −X
i =1
may be large. Thus using a criterion such as mean square error, b2 may be
preferable if the bias is small.
A2.12
The inclusion of the fifth observation does not cause the model to be
misspecified or the regression model assumptions to be violated, so
retaining it in the sample will not give rise to biased estimates. There
would be no advantages in dropping it and there would be one major
51
20 Elements of econometrics
52