0% found this document useful (0 votes)
50 views13 pages

Hypothesis Testing in The Classical Regression Model

1) The lecture discusses hypothesis testing in the classical regression model. It assumes disturbances follow a normal distribution, allowing derivation of sampling distributions of estimates and testing of hypotheses about parameters. 2) It introduces chi-square, F, and t distributions that arise from normal distributions and are used in statistical inference. Relationships between these distributions are defined. 3) Methods for testing hypotheses about regression coefficients are presented using F-statistics that have F distributions under the null hypotheses. Hypotheses can be tested about single or subsets of coefficients.

Uploaded by

kiiroi89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views13 pages

Hypothesis Testing in The Classical Regression Model

1) The lecture discusses hypothesis testing in the classical regression model. It assumes disturbances follow a normal distribution, allowing derivation of sampling distributions of estimates and testing of hypotheses about parameters. 2) It introduces chi-square, F, and t distributions that arise from normal distributions and are used in statistical inference. Relationships between these distributions are defined. 3) Methods for testing hypotheses about regression coefficients are presented using F-statistics that have F distributions under the null hypotheses. Hypotheses can be tested about single or subsets of coefficients.

Uploaded by

kiiroi89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

LECTURE 5

Hypothesis Testing in the


Classical Regression Model
The Normal Distribution and the Sampling Distributions
It is often appropriate to assume that the elements of the disturbance vec-
tor within the regression equations y = X + are distributed independently
and identically according to a normal law. Under this assumption, the sampling
distributions of the estimates may be derived and various hypotheses relating
to the underlying parameters may be tested.
To denote that x is a normally distributed random variable with a mean
of E(x) = and a dispersion matrix of D(x) = , we shall write x N(, ).
A vector z N(0, I) with a mean of zero and a dispersion matrix of D(z) = I
is described as a standard normal vector. Any normal vector x N(, ) can
be standardised:
(1) If T is a transformation such that TT

= I and T

T =
1
, then
T(x ) N(0, I).
Associated with the normal distribution are a variety of so-called sam-
pling distributions, which occur frequently in problems of statistical inference.
Amongst these are the chi-square distribution, the F distribution and the t
distribution.
If z N(0, I) is a standard normal vector of n elements, then the sum
of squares of its elements has a chi-square distribution of n degrees of free-
dom; and this is denoted by z

z
2
(n). With the help of the standardising
transformation, it can be shown that,
(2) If x N(, ) is a vector of order n, then (x )

1
(x )

2
(n).
The sum of any two independent chi-square variates is itself a chi-square
variate whose degrees of freedom equal the sum of the degrees of freedom of its
constituents. Thus,
(3) If u
2
(m) and v
2
(n) are independent chi-square variates of
m and n degrees of freedom respectively, then (u+v)
2
(m+n)
is a chi-square variate of m+n degrees of freedom.
1
D.S.G. POLLOCK: ECONOMETRICS
The ratio of two independent chi-square variates divided by their respective
degrees of freedom has a F distribution which is completely characterised by
these degrees of freedom. Thus,
(4) If u
2
(m) and v
2
(n) are independent chi-square variates,
then the variate F = (u/m)/(v/n) has an F distribution of m and
n degrees of freedom; and this is denoted by writing F F(m, n).
The sampling distribution which is most frequently used is the t distribu-
tion. A t variate is a ratio of a standard normal variate and the root of an
independent chi-square variate divided by its degrees of freedom. Thus,
(5) If z N(0, 1) and v
2
(n) are independent variates, then t =
z/
_
(v/n) has a t distribution of n degrees of freedom; and this is
denoted by writing t t(n).
It is clear that t
2
F(1, n).
Hypothesis Concerning the Coecients
A linear function of a normally distributed vector is itself normally dis-
tributed. The oridinary least squares estimate = (X

X)
1
X

y of the pa-
rameter vector in the regression model (y; X,
2
I) is a linear function of
y, which has an expected value of E(

) = and a dispersion matrix of


D(

) =
2
(X

X)
1
, Thus, it follows that, if y N(X,
2
I) is normally
distributed, then
(6)

N
k
{,
2
(X

X)
1
}.
Likewise, the marginal distributions of

1
,

2
within

= [

1
,

2
] are given by

1
N
k
1
_

1
,
2
{X

1
(I P
2
)X
1
}
1
_
, (7)

2
N
k
2
_

2
,
2
{X

2
(I P
1
)X
2
}
1
_
. (8)
From the results under (2) to (6), it follows that
(9)
2
(

X(

)
2
(k).
Similarly, it follows from (7) and (8) that

2
(

1
)

1
(I P
2
)X
1
(

1
)
2
(k
1
), (10)

2
(

2
)

2
(I P
1
)X
2
(

2
)
2
(k
2
). (11)
The distribution of the residual vector e = yX

is degenerate in the sense


that the mapping e = (I P) from the disturbance vector to the residual
2
5. HYPOTHESIS TESTS
1.0 2.0 3.0 4.0
Figure 1. The critical region, at the 10% signicance level, of an F(5, 60) statistic.
vector e entails a singular transformation. Nevertheless, it is possible to obtain
a factorisation of the transformation in the form of I P = CC

, where C
is matrix of order T (T k) comprising T k orthonormal columns, which
are orthogonal to the columns of X such that C

X = 0. Since C

C = I
Tk
,
it follows that, on premultiplying y N
T
(X,
2
I) by C

, we get C

y
N
Tk
(0,
2
I). Hence
(12)
2
y

CC

y =
2
(y X

(y X

)
2
(T k).
The vectors X

= Py and yX

= (I P)y have a zero-valued covariance


matrix. If two normally distributed random vectors have a zero covariance
matrix, then they are statistically independent. Therefore, it follows that
(13)

2
(

X(

)
2
(k) and

2
(y X

(y X

)
2
(T k)
are mutually independent chi-square variates. From this, it can be deduced
that
(14)
F =
_
(

X(

)
k
_
(y X

(y X

)
T k
_
=
1

2
k
(

X(

) F(k, T k).
To test an hypothesis specifying that =

, the hypothesised parameter vector


can be inserted in the above statistic and the resulting value can be compared
3
D.S.G. POLLOCK: ECONOMETRICS
with the critical values of an F distribution of k and T k degrees of freedom.
If a critical value is exceeded, then the hypothesis is liable to be rejected.
The test is readily intelligible, since it is based on a measure of the dis-
tance between the hypothesised value X

of the systematic component of the


regression and the value X

that is suggested by the data. If the two values are


remote from each other, then we may suspect that the hypothesis is at fault.
It is usual to suppose that a subset of the elements of the parameter vector
are zeros. This represents an instance of a class of hypotheses that specify
values for a subvector
2
within the partitioned model y = X
1

1
+ X
2
+
without asserting anything about the values of the remaining elements in the
subvector
1
. The appropriate test statistic for testing the hypothesis that

2
=
2
is
(15) F =
1

2
k
2
(

2
)

2
(I P
1
)X
2
(

2
).
This will have an F(k
2
, T k) distribution if the hypothesis is true.
A limiting case of the F statistic concerns the test of an hypothesis aecting
a single element
i
within the vector . By specialising the expression under
(15), a statistic may be derived in the form of
(16) F =
(

i
)
2

2
w
ii
,
wherein w
ii
stands for the ith diagonal element of (X

X)
1
. If the hypothesis
is true, then this will be distributed according to the F(1, T k) law. However,
the usual way of assessing such an hypothesis is to relate the value of the
statistic
(17) t =

i
_
(
2
w
ii
)
to the tables of the t(T k) distribution. The advantage of the t statistic
is that it shows the direction in which the estimate of
i
deviates from the
hypothesised value as well as the size of the deviation.
Cochranes Theorem and the Decomposition of a Chi-Square Variate
The standard test of an hypothesis regarding the vector in the model
N(y; X,
2
I) entails a multi-dimensional version of Pythagoras Theorem.
Consider the decomposition of the vector y into the systematic component
and the residual vector. This gives
(18)
y = X

+ (y X

) and
y X = (X

X) + (y X

),
4
5. HYPOTHESIS TESTS
y
e
!
X
"
^
Figure 2. The vector Py = X

is formed by the orthogonal projection of


the vector y onto the subspace spanned by the columns of the matrix X.
where the second equation comes from subtracting the unknown mean vector
X from both sides of the rst. These equations can also be expressed in terms
of the projector P = X(X

X)
1
X

which gives Py = X

and (I P)y =
y X

= e. Also, the denition = y X can be used within the second of


the equations. Thus,
(19)
y = Py + (I P)y and
= P + (I P).
The reason for adopting this notation is that it enables us to envisage more
clearly the Pythagorean relationship between the vectors. Thus, from the fact
that P = P

= P
2
and that P

(I P) = 0, it can be established that


(20)

P +

(I P) or, equivalently,

= (X

X)

(X

X) + (y X

(y X

).
The terms in these expressions represent squared lengths; and the vectors them-
selves form the sides of a right-angled triangle with P at the base, (I P) as
the vertical side and as the hypotenuse. These relationship are represented
by Figure 2, where = X and where = y .
The usual test of an hypothesis regarding the elements of the vector is
based on the foregoing relationships. Imagine that the hypothesis postulates
5
D.S.G. POLLOCK: ECONOMETRICS
that the true value of the parameter vector is

. To test this proposition, the


value of X

is compared with the estimated mean vector X

. The test is a
matter of assessing the proximity of the two vectors, which is measured by the
square of the distance that separates them. This would be given by
(21)

P = (X

(X

).
If the hypothesis is untrue and if X

is remote from the true value of X,


then the distance is liable to be excessive.
The distance can only be assessed in comparison with the variance
2
of
the disturbance term or with an estimate thereof. Usually, one has to make do
with the estimate of
2
which is provided by
(22)

2
=
(y X

(y X

)
T k
=

(I P)
T k
.
The numerator of this estimate is simply the squared length of the vector
e = (I P)y = (I P), which constitutes the vertical side of the right-angled
triangle.
Simple arguments, which have been given in the previous section, serve to
demonstrate that
(23)
(a)

= (y X)

(y X)
2

2
(T),
(b)

P = (

X(

)
2

2
(k),
(c)

(I P) = (y X

(y X

)
2

2
(T k),
where (b) and (c) represent statistically independent random variables whose
sum is the random variable of (a). These quadratic forms, divided by their
respective degrees of freedom, nd their way into the F statistic of (14) which
is
(24) F =
_

P
k
_

(I P)
T k
_
F(k, T k).
This result depends upon Cochranes Theorem concerning the decomposition
of a chi-square random variate. The following is a statement of the theorem
which is attuned to the present requirements:
6
5. HYPOTHESIS TESTS
(25) Let N(0,
2
I
T
) be a random vector of T independently and
identically distributed elements. Also, let P = X(X

X)
1
X

be a
symmetric idempotent matrix, such that P = P

= P
2
, which is
constructed from a matrix X of order T k with Rank(X) = k.
Then

2
+

(I P)

2
=

2

2
(T),
which is a chi-square variate of T degrees of freedom, represents
the sum of two independent chi-square variates

P/
2

2
(k)
and

(I P)/
2

2
(T k) of k and T k degrees of freedom
respectively.
To prove this result, we begin by nding an alternative expression for the
projector P = X(X

X)
1
X

. First, consider the fact that X

X is a symmetric
positive-denite matrix. It follows that there exists a matrix transformation T
such that T(X

X)T

= I and T

T = (X

X)
1
. Therefore, P = XT

TX

=
C
1
C

1
, where C
1
= XT

is a T k matrix comprising k orthonormal vectors


such that C

1
C
1
= I
k
is the identity matrix of order k.
Now dene C
2
to be a complementary matrix of T k orthonormal vectors.
Then, C = [C
1
, C
2
] is an orthonormal matrix of order T such that
(26)
CC

= C
1
C

1
+C
2
C

2
= I
T
and
C

C =
_
C

1
C
1
C

1
C
2
C

2
C
1
C

2
C
2
_
=
_
I
k
0
0 I
Tk
_
.
The rst of these results allows us to set I P = I C
1
C

1
= C
2
C

2
. Now,
if N(0,
2
I
T
) and if C is an orthonormal matrix such that C

C = I
T
,
then it follows that C

N(0,
2
I
T
). In eect, if is a normally distributed
random vector with a density function which is centred on zero and which has
spherical contours, and if C is the matrix of a rotation, then nothing is altered
by applying the rotation to the random vector. On partitioning C

, we nd
that
(27)
_
C

_
N
__
0
0
_
,
_

2
I
k
0
0
2
I
Tk
__
,
which is to say that C

1
N(0,
2
I
k
) and C

2
N(0,
2
I
Tk
) are indepen-
dently distributed normal vectors. It follows that
(28)

C
1
C

2
=

2

2
(k) and

C
2
C

2
=

(I P)

2

2
(T k)
7
D.S.G. POLLOCK: ECONOMETRICS
are independent chi-square variates. Since C
1
C

1
+C
2
C

2
= I
T
, the sum of these
two variates is
(29)

C
1
C

2
+

C
2
C

2
=

2

2
(T);
and thus the theorem is proved.
The statistic under (14) can now be expressed in the form of
(30) F =
_

P
k
_

(I P)
T k
_
.
This is manifestly the ratio of two chi-square variates divided by their respec-
tive degrees of freedom; and so it has an F distribution with these degrees of
freedom. This result provides the means for testing the hypothesis concerning
the parameter vector .
Hypotheses Concerning Subsets of the Regression Coecients
Consider a set of linear restrictions on the vector of a classical linear regression
model N(y; X,
2
I), which take the form of
(31) R = r,
where R is a matrix of order j k and of rank j, which is to say that the j
restrictions are independent of each other and are fewer in number than the
parameters within . Given that the ordinary least-squares estimator of is a
normally distributed vector

N{,
2
(X

X)
1
}, it follows that
(32) R

N
_
R = r,
2
R(X

X)
1
R

_
;
and, from this, it can be inferred immediately that
(33)
(R

r)

_
R(X

X)
1
R

_
1
(R

r)

2

2
(j).
It has already established been established that
(34)
(T k)
2

2
=
(y X

(y X

2

2
(T k)
is a chi-square variate that is statistically independent of the chi-square variate
(35)
(

X(

2

2
(k)
8
5. HYPOTHESIS TESTS
derived from the estimator of the regression parameters. The variate of (33)
must also be independent of the chi-square of (34); and it is straightforward to
deduce that
(36)
F =
_
(R

r)

_
R(X

X)
1
R

_
1
(R

r)
j
_
(y X

(y X

)
T k
_
=
(R

r)

_
R(X

X)
1
R

_
1
(R

r)

2
j
F(j, T k),
which is to say that the ratio of the two independent chi-square variates, di-
vided by their respective degrees of freedom, is an F statistic. This statistic,
which embodies only know and observable quantities, can be used in testing
the validity of the hypothesised restrictions R = r.
A specialisation of the statistic under (36) can also be used in testing
an hypothesis concerning a subset of the elements of the vector . Let

=
[

1
,

2
]

. Then, the condition that the subvector


1
assumes the value of

1
can be expressed via the equation
(37) [I
k
1
, 0]
_

2
_
=

1
.
This can be construed as a case of the equation R = r, where R = [I
k
1
, 0] and
r =

1
.
In order to discover the specialised form of the requisite test statistic, let
us consider the following partitioned form of an inverse matrix:
(38)
(X

X)
1
=
_
X

1
X
1
X

1
X
2
X

2
X
1
X

2
X
2
_
1
=
_
{X

1
(I P
2
)X
1
}
1
{X

1
(I P
2
)X
1
}
1
X

1
X
2
(X

2
X
2
)
1
{X

2
(I P
1
)X
2
}
1
X

2
X
1
(X

1
X
1
)
1
{X

2
(I P
1
)X
2
}
1
_
,
Then, with R = [I, 0], we nd that
(39) R(X

X)
1
R

=
_
X

1
(I P
2
)X
1
_
1
.
It follows, in a straightforward manner, that the specialised form of the F
statistic of (36) is
(40)
F =
_
(

1
)

_
X

1
(I P
2
)X
1
_
(

1
)
k
1
_
(y X

(y X

)
T k
_
=
(

1
)

_
X

1
(I P
2
)X

1
_
(

1
)

2
k
1
F(k
1
, T k).
9
D.S.G. POLLOCK: ECONOMETRICS
Finally, for the jth element of

, there is
(41)
(

j

j
)
2
/
2
w
jj
F(1, T k) or, equivalently,
(

j

j
)
_

2
w
jj
t(T k),
where w
jj
is the jth diagonal element of (X

X)
1
and t(T k) denotes the t
distribution of T k degrees of freedom.
An Alternative Formulation of the F statistic
An alternative way of forming the F statistic uses the products of two sepa-
rate regressions. Consider the formula for the restricted least-squares estimator
that has been given under (2.76):
(42)

=

(X

X)
1
R

{R(X

X)
1
R

}
1
(R

r).
From this, the following expression for the residual sum of squares of the re-
stricted regression is is derived:
(43) y X

= (y X

) +X(X

X)
1
R

{R(X

X)
1
R

}
1
(R

r).
The two terms on the RHS are mutually orthogonal on account of the dening
condition of an ordinary least-squares regression, which is that (yX

X = 0.
Therefore, the residual sum of squares of the restricted regression is
(44)
(y X

(y X

) = (y X

(y X

) +
(R

r)

_
R(X

X)
1
R

_
1
(R

r).
This equation can be rewritten as
(45) RSS USS = (R

r)

_
R(X

X)
1
R

_
1
(R

r),
where RSS denotes the restricted sum of squares an USS denotes the unre-
stricted sum of squares. It follows that the test statistic of (36) can be written
as
(46) F =
_
RSS USS
j
_
USS
T k
_
.
This formulation can be used, for example, in testing the restriction that

1
= 0 in the partitioned model N(y; X
1

1
+ X
2

2
,
2
I). Then, in terms of
equation (37), there is R = [I
k
1
, 0] and there is r =

1
= 0, which gives
(47)
RSS USS =

1
X

1
(I P
2
)X
1

1
= y

(I P
2
)X
1
{X

1
(I P
2
)X
1
}
1
X

1
(I P
2
)y.
10
5. HYPOTHESIS TESTS
X"
*
X"
^
y
Figure 3. The test of the hypothesis entailed by the resticted model is
based on a measure of the proximity of the restricted estimate X

, and
the unrestricted estimate X

. The USS is the squared distance yX

2
.
The RSS is the squared distance y X

2
.
On the other hand, there is
(48) RSS USS = y

(I P
2
)y y

(I P)y = y

(P P
2
)y,
Since the two expressions must be identical for all values of y, the comparison
of (36) and (37) is sucient to establish the following identity:
(49) (I P
2
)X
1
{X

1
(I P
2
)X
1
}
1
X

1
(I P
2
) = P P
2
.
The geometric interpretation of the alternative formulation of the test
statistic is straightforward. It can be understood, in reference to Figure 3,
that the square of the distance between the restricted estimate X

and the
unrestricted estimate X

, denoted by X

2
, which is the basis of the
original formulation of the test statistic, is equal to the restricted sum of squares
y X

2
less the unrestricted sum of squares y X

2
. The latter is the
basis of the alternative formulation.
The Partitioned Inverse and Associated Identities
The rst objective is to derive the formula for the partitioned inverse of
X

X that has been given in equation (38). Write


(50)
_
A B

B C
_
=
_
X

1
X
1
X

1
X
2
X

2
X
1
X

2
X
2
_
1
11
D.S.G. POLLOCK: ECONOMETRICS
and consider, the equation
(51)
_
X

1
X
1
X

1
X
2
X

2
X
1
X

2
X
2
_ _
A B

B C
_
=
_
I 0
0 I
_
.
From this system, the following two equations can be extracted:
X

1
X
1
A+X

1
X
2
B = I, (52)
X

2
X
1
A+X

2
X
2
B = 0. (53)
To isolate A, equation (53) is premultiplied by X

1
X
2
(X

2
X
2
)
1
to give
(54) X

1
X
2
(X

2
X
2
)
1
X

2
X
1
A+X

1
X
2
B = 0,
and this is taken from (52) to give
(55) {X

1
X
1
X

1
X
2
(X

2
X
2
)
1
X

2
X
1
}A = I
whence
(56) A = {X

1
(I P
2
)X
1
}
1
with P
2
= X
2
(X

2
X
2
)
1
X

2
.
An argument of symmetry will serve to show that
(57) C = {X

2
(I P
1
)X
2
}
1
with P
1
= X
1
(X

1
X
1
)
1
X

1
.
To nd B, and therefore B

, the expression for A from (56) is substituted into


(53). This gives
(58) X

2
X
1
{X

1
(I P
2
)X
1
}
1
+X

2
X
2
B = 0,
whence
(59) B

= {X

1
(I P
2
)X
1
}
1
X

1
X
2
(X

2
X
2
)
1
The matrix B is the transpose of this, but an argument of symmetry will serve
to show that this is also given by the expression
(60) B = {X

2
(I P
1
)X
2
}
1
X

2
X
1
(X

1
X
1
)
1
When the expression for A, B, B

and C are put in place, the result is


(61)
_
X

1
X
1
X

1
X
2
X

2
X
1
X

2
X
2
_
1
=
_
A B

B C
_
=
_
{X

1
(I P
2
)X
1
}
1
{X

1
(I P
2
)X
1
}
1
X

1
X
2
(X

2
X
2
)
1
{X

2
(I P
1
)X
2
}
1
X

2
X
1
(X

1
X
1
)
1
{X

2
(I P
1
)X
2
}
1
_
,
12
5. HYPOTHESIS TESTS
Next consider
(62)
X(X

X)
1
X

= [ X
1
X
2
]
_
A B

B C
_ _
X

1
X

2
_
= {X
1
AX

1
+X
1
BX

2
} + {X
2
CX

2
+X
2
B

1
}
Substituting the expressions for A, B, C and B

shows that
(63)
X(X

X)
1
X

=X
1
{X

1
(I P
2
)X
1
}
1
X

1
(I P
2
)
+X
2
{X

2
(I P
1
)X
2
}
1
X

2
(I P
1
),
which can be written as
(64) P = P
1/2
+P
2/1
with
P
1/2
= X
1
{X

1
(I P
2
)X
1
}
1
X

1
(I P
2
), (65)
P
2/1
= X
2
{X

2
(I P
1
)X
2
}
1
X

2
(I P
1
). (66)
Next, observe that there are
P
1/2
P
1
= P
1
, P
1/2
P
2
= 0, (67)
P
2/1
P
2
= P
2
, P
2/1
P
1
= 0. (68)
It follows that
(69) PP
1
= (P
1/2
+P
2/1
)P
1
= P
1
= P
1
P,
where the nal equality follows in consequence of the symmetry of P
1
and P.
It also follows, by an argument of symmetry, that
(70) P(I P
1
) = P P
1
= (I P
1
)P
Therefore, since (I P
1
)P = (I P
1
)P
2/1
, there is
(71) P P
1
= (I P
1
)X
2
{X

2
(I P
1
)X
2
}
1
X
2
(I P
1
).
By interchanging the two subscripts, the identity of (49) is derived.
13

You might also like