W LR LM Tests in Econometrics Engle1984
W LR LM Tests in Econometrics Engle1984
W LR LM Tests in Econometrics Engle1984
Contents
1. Introduction 776
2. Definitions and intuitions 776
3. A general formulation of Wald, Likelihood Ratio, and Lagrange
Multiplier tests 780
4. Two simple examples 785
5. The linear hypothesis in generalized least squares models 788
5.1. The problem 788
5.2. The test statistics 790
5.3. The inequality 792
5.4. A numerical example 793
5.5. Instrumental variables 794
6. Asymptotic equivalence and optimality of the test statistics 796
7. The Lagrange Multiplier test as a diagnostic 801
8. Lagrange Multiplier tests for non-spherical disturbances 802
8.1. Testing for heteroscedasticity 803
8.2. Serial correlation 805
9. Testing the specification of the mean in several complex models 808
9.1. Testing for non-linearities 809
9.2. Testing for common factor dynamics 811
9.3. Testing for exogeneity 812
9.4. Discrete choice and truncated distributions 817
10. Alternative testing procedures 819
11. Non-standard situations 822
12. Conclusion 824
References 825
1. Introduction
a,=Pr(yEC,IflEO,). (I)
778 R. F. Engle
Notice that although the power will generally depend upon the unknown parame-
ter 8, the size usually does not. In most problems where the null hypothesis is
composite (includes more than one possible value of 8) the class of tests is
restricted to those where the size does not depend upon the particular value of
8 E S,,. Such tests are called similar tests.
Frequently, there are no tests whose size is calculable exactly or whose size is
independent of the point chosen within the null parameter space. In these cases,
the investigator may resort to asymptotic criteria of optimality for tests. Such an
approach may produce tests which have good finite sample properties and in fact,
if there exist exact tests, the asymptotic approach will generally produce them. Let
C, be a sequence of critical regions perhaps defined by a sequence of vectors of
statistics sr( JJ) 2 cr, where cr is a sequence of constant vectors. Then the limiting
size and power of the test are simply
A test is called consistent if a(/?) = 1 for all 0 E 0,. That is, a consistent test will
always reject the null when it is false; Type II errors are eliminated for large
samples if a test is consistent.
As most hypothesis tests are consistent, it remains important to choose among
them. This is done by examining the rate at which the power function approaches
its limiting value. The most common limiting argument is to consider the power
of the test to distinguish alternatives which are very close to the null. As the
sample grows, alternatives ever closer to the null can be detected by the test. The
power against such local alternatives for tests of fixed asymptotic size provides the
major criterion for the optimality of asymptotic tests.
The vast majority of all testing problems in econometrics can be formulated in
terms of a partition of the parameter space into two sub-vectors 8 = (e;, 0;)’
where the null hypothesis specifies values, $’ for 8,, but leaves 0, unconstrained.
In a normal testing problem, 8, might be the mean and e, the variance, or in a
regression context, 8, might be several of the parameters while 0, includes the rest,
the variance and the serial correlation coefficient, if the model has been estimated
by Cochrane-Orcutt. Thus 8i includes the parameters of interest in the test.
In this context, the null hypothesis is simply:
for some vector 6. Although this alternative is obviously rather peculiar, it serves
to focus attention on the portion of the power curve which is most sensitive to the
quality of the test. The choice of 6 determines in what direction the test will seek
departures from the null hypothesis. Frequently, the investigator will chose a test
which is equally good in all directions 6, called an invariant test.
It is in this context that the optimality of the likelihood ratio test can be
established as is done in Section 6. It is asymptotically locally most powerful
among all invariant tests. Frequently in this chapter the term asymptotically
optimal will be used to refer to this characterization. Any tests which have the
property that asymptotically they always agree if the data are generated by the
null or by a local alternative, will be termed asymptotically equivalent. Two tests
Et and t2 with the same critical values will be asymptotically equivalent if
plim 1El- t21= 0 for the null and local alternatives.
Frequently in testing problems non-linear hypotheses such as g(8) = 0 are
considered where g is a p X 1 vector of functions defined on 0. Letting the true
value of 0 under the null be 8’, then g(e’) = 0. Assuming g has continuous first
derivatives, expand this in a Taylor series:
g(e)=g(e0)+G(8)(e-e”),
where I? lies between 0 and 8’ and G( .) is the first derivative matrix of g. For the
null and local alternatives, 8 approaches 8’ so G(8) + G(f3’) = G and the
restriction is simply this linear hypothesis:
Ge = GBO.
For any linear hypothesis one can always reparameterize by a linear non-singular
matrix A -‘8 = + such that this null is Ho:+I= &, C#I* unrestricted. To do this let
A, have K - p columns in the orthogonal complement of G so that GA, = 0. The
remaining p columns of A say A,, span the row space of G so that GA is
non-singular. Then the null becomes:
In this section the basic forms of the three tests will be given and interpreted.
Most of this material is familiar in the econometrics literature in Breusch and
Pagan (1980) or Savin (1976) and Bemdt and Savin (1977). Some new results and
intuitions will be offered. Throughout it will be assumed that the likelihood
function satisfies standard regularity conditions which allow two term Taylor
series expansions and the interchange of integral and derivative. In addition, it
will be assumed that the information matrix is non-singular, so that the parame-
ters are (locally) identified.
The simplest testing problem assumes that the data y are generated by a joint
density function f( y, 0’) under the null hypothesis and by f( y, 0) with 0 E Rk
under the alternative. This is a test of a simple null against a composite
alternative. The log-likelihood is defined as:
(6)
which is maximized at a value 8 satisfying:
Defining s( ~9,v) = dL( 0, ~)/a0 as the score, the MLE sets the score to zero. The
variance of 8 is easily calculated as the inverse of Fisher’s Information, or
V( 4) = Y-1( t?)/T,
f(e) = a-$$-(e)p.
s,=~(B-eo)‘~(8)(8-8~) (8)
will have a limiting X2 distribution with k degrees of freedom when the null
hypothesis is true. This is the Wald test based upon Wald’s elegant (1943) analysis
of the general asymptotic testing problem. It is the asymptotic approximation to
the very familiar t and F tests in econometrics.
The likelihood ratio test is based upon the difference between the maximum of
the likelihood under the null and under the alternative hypotheses. Under general
conditions, the statistic,
(9)
Ch. 13: Wuld, Likelihood Ratio, and Lugrunge Multiplier Tests 781
can be shown to have a limiting X2 distribution under the null. Perhaps Wilks
(1938) was the first to derive this general limiting distribution.
The Lagrange Multiplier test is derived from a constrained maximization
principle. Maximizing the log-likelihood subject to the constraint that 8 = 0’
yields a set of Lagrange Multipliers which measure the shadow price of the
constraint. If the price is high, the constraint should be rejected as inconsistent
with the data. Letting H be the Lagrangian:
H=L(O,y)-A’(&fl’),
a= A.
8 = e”,
de
-=
3
so h = s(8’, y). Thus the test based upon the Lagrange Multipliers by Aitcheson
and Silvey (1958) and Silvey (1959) is identical to that based upon the score as
originally proposed by Rao (1948). In each case the distribution of the score is
easily found under the null since it will have mean zero and variance 9(8’)T.
Assuming a central limit theorem applies to the scores:
‘t
Figure 3.1
782 R. F. Engle
The MLE under the alternative is 4 and the hypothesized value is 8’. The Wald
test is based upon the horizontal difference between 8’ and 8, the LR test is based
upon the vertical difference, and the LM test is based on the slope of the
likelihood function at 8’. Each is a reasonable measure of the distance between
HO and Hi and it is not surprising that when L is a smooth curve well
approximated by a quadratic, they all give the same test. This is established in
Lemma 1.
Lemma I
If L = b - l/2(8 - 8)3(~9 - 8) where A is a symmetric positive definite matrix
which may depend upon the data and upon known parameters, b is a scalar and 8
is a function of the data, then the W, LR and LM tests are identical.
Proof
ix/as=-(e-B)‘A=s(e),
a2L/ae ae f = - A = - T9.
Thus:
r;,=(e”-e)‘A(60-B),
tLM = @‘)‘A-‘~(8’)
c&,=(8’-@‘A(B’-8). Q.E.D.
Whenever the true value of 8 is equal or close to do, then the likelihood
function in the neighborhood of 8’ will be approximately quadratic for large
samples, with A depending only on 8’. This is the source of the asymptotic
equivalence of the tests for local alternatives and under the null which will be
discussed in more detail in Section 6.
In the more common case where the null hypothesis is composite so that only a
subset of the parameters are fixed under the null, similar formulae for the test
statistics are available. Let 8 = (e;, 0;)’ and 8 = (&‘, 8;)’ where 0, is a k, x 1
vector of parameters specified under the null hypothesis to be 8:. The remaining
parameters f3, are unrestricted under both the null and the alternative. The
maximum likelihood estimate of 0, under the null is denoted 8, and 8 = (OF’, 6;)‘.
Ch. 13: Wald, Likelihood Ratio, and Lagrange Multiplier Tests 183
H=L(O,y)-x(6,-B;),
Thus:
Lemma 2
If the likelihood function is given as in Lemma 1 then the tests in (ll), (12), and
(13) are identical.
Proof
5w=(eP-~1)‘~ii-1(e,0-81)
= (ep - 8,)‘( A,, - A,,AglA,,)( e,0 - 8,).
784 R. F. Engle
For the other two tests, 8, must be estimated. This is done simply by setting
Sz( 8, y) = 0:
A,,(4 - 4)+4,(& - 4)
i
Sl
s2i
d&A(&B)=
[ A,,(8,-8,)+‘4,,(8,-8,) 1=O.
So, S, = 0 implies:
s,(e)=A,,(eP_8,)+A,,(8,-8,)
so
tLM = (0; - 8,)‘( A,, - A,,A,?4,,)( 0; - 8,). Q.E.D.
Examination of the tests in (ll), (12), and (13) indicates that neither the test
statistic nor its limiting distribution under the null depends upon the value of the
nuisance parameters 0,. Thus the tests are (asymptotically) similar. It is apparent
from the form of the tests as well as the proof of the lemma, that an alternative
way to derive the tests is to first concentrate the likelihood function with respect
to 6, and then apply the test for a simple null directly. This approach makes clear
that by construction the tests will not depend upon the true value of the nuisance
parameters. If the parameter vector has a joint normal limiting distribution, then
the marginal distribution with respect to the parameters of interest will also be
normal and the critical region will not depend upon the nuisance parameters
either. Under general conditions therefore, the Wald, Likelihood Ratio and
Lagrange Multiplier tests will be (asymptotically) similar.
As was described above, each of the tests can be thought of as depending on a
statistic which measures deviations between the null and alternative hypotheses,
Ch. 13: Wuld, Likelihood Ratio. and Lagrange Multiplier Tests 785
and its distribution when the null is true. For example, the LM test is based upon
the score whose limiting distribution is generally normal with variance (O’).T
under the null. However, it is frequently easier to obtain the limiting distribution
of the score in some other fashion and base the test on this. If a matrix V can be
found so that:
Under certain non-standard situations V may not equal 9 but in general it will.
This is the approach taken by Engle (1982) which gives some test statistics very
easily in complex problems.
In these two examples, exact tests are available for comparison with the asymp-
totic tests under consideration.
Consider a set of T independent observations on a Bernoulli random variable
which takes on the values:
1, with probability 8,
’ = 0, with probability 1 - 0. (14)
0, Y) = e(llg)
C(YtBe).
t
786 R. F. Engle
Notice that y, - 8 is analogous to the “residual” of the fit. The information is:
1/
Te(1-8)+(1-2e)c(y,-e) T
9(d) = E
e*(i- e)’
1
= e(i-e>.
The Wald test is given by:
E = Q,-00) *eO(i-e”)
LM
I
eo(i- eo)1 T ’
which is simply:
Both clearly have a limiting &i-square distribution with one degree of freedom.
They differ in that the LM test uses an estimate of the variance under the null
whereas the Wald uses an estimate under the alternative. When the null is true (or
a local alternative) these will have the same probability limit and thus for large
samples the tests will be equivalent. If the alternative is not close to the null, then
presumably both tests would reject with very high probability for large samples;
the asymptotic behavior of tests for non-local alternatives is usually not of
particular interest.
The likelihood ratio test statistic is given by:
The notion of how large it should be will be determined from the exact Binomial
tables.
The second example is more useful to economists but has a similar result. In
the classical linear regression problem, the test statistics are different, however,
when corrected to have the same size they are identical for finite samples as well
as asymptotically.
Let y* and x* be T x 1 and T X k matrices satisfying:
(21)
where IN means independent normal, then the likelihood would be identical but
the test statistics would not be proportional to an F distributed random variable.
Thus, inclusion of lagged dependent variables or other predetermined variables
would bring asymptotic criteria to the forefront in choosing a test statistic and
any of the three would be reasonable candidates as would the standard F
approximations. Similarly, if the distribution of y is not known to be normal, a
central limit theorem will be required to find the distribution of the test statistics
and therefore only asymptotic tests will be available.
The important case to be discussed in this section is testing a linear hypothesis
when the model is a generalized least squares model with unknown parameters in
the covariance matrix. Suppose:
where w is a finite estimable parameter vector. The model has been formulated so
that the hypothesis to be tested is Ha: fii = 0, where p = (pi, /3;)’ and x is
conformally partitioned as x = (xi, x2). The collection of parameters is now
e = (p;, p;, (72,w’)‘.
A large number of econometric problems fit into this framework. In simple
linear regression the standard heteroscedasticity and serial correlation covariance
matrices have this form. More generally if ARMA processes are assumed for the
disturbances or they are fit with spectral methods assuming only a general
stationary structure as in Engle (1980), the same analysis will apply. From pooled
time series of cross sections, variance component structures often arise which have
this form. To an extent which is discussed below, instrumental variables estima-
tion can be described in this framework. Letting X be the matrix of all
instruments, X( X’X))‘X’ has no unknown parameters but acts like a singular
covariance matrix. Because it is an idempotent matrix, its generalized inverse is
just the matrix itself, and therefore many of the same results will apply.
For systems of equations, a similar structure is often available. By stacking the
dependent variables in a single dependent vector and conformably stacking the
independent variables and the coefficient vectors, the covariance matrix of a
seemingly unrelated regression problem (SUR) will have a form satisfied by (29).
In terms of tensor products this covariance matrix is 52= z@Z, where 2 is the
contemporaneous covariance matrix. Of course more general structures are also
appropriate. The three stage least squares estimator also is closely related to this
analysis with a covariance matrix D = 2~3 X( X’X))‘X’.
790 R. F. Engle
Under these assumptions it can be shown that the information matrix is block
diagonal between the parameters /3 and (a*, 0). Therefore attention can be
confined to the /3 components of the score and information. These are given by:
(35)
The Wald statistic can be recognized as simply the F or squared t statistic
commonly computed by a GLS regression (except for finite sample degree of
freedom corrections). This illustrates that for testing one parameter, the square
root of these statistics with the appropriate sign would be the best statistic since it
would allow one tailed tests if these are desired.
It is well known that the Wald test statistic can be calculated by running two
regressions just as in (26). Care must however be taken to use the same metric
(estimate of a) for both the restricted and the unrestricted regressions. The
residuals from the unrestricted regression using fi as the covariance matrix are the
ic, however, the residuals from the restricted regression using b are not ir. Let
them be denoted uol indicating the model under Ho with the covariance matrix
under Hr. Thus, uol = y - x2/?f1 is calculated assuming b is a known matrix. The
Wald statistic can equivalently be written as:
The LM statistic can also be written in several different forms some of which
may be particularly convenient. Three different versions will be given below.
Because f’&‘x, = 0 by the definition of fi, the LM statistic is more simply
written as:
tLM=TR;. (38)
In most cases and for most computer packages Ri will be the conventionally
measured R*.In particular when Px includes an intercept under ZZ,, then Pic
will have a zero mean so that the centered and uncentered sums of squares will be
equal. Thus, if the software first transforms the data by P, the R* will be R&
A second way to rewrite the LM statistic is available along the lines of (27). Let
ul’ be the residuals from a regression of y on the unrestricted model using fi as
the covariance matrix, so that alo = y - x/I lo . Then the LM statistic is simply:
,lo~~-l,lo)/ii’ji-l~~
&,, = T( ii’&‘ii - (39)
A statistic which differs only slightly from the LM statistic comes naturally out
of the auxiliary regression. The squared t or F statistics associated with the
variables x1 in the auxillary regressions of ii on x using fi are of interest. Letting:
A = x;D-lx, - x;o-1x2(x;~-1x2)-‘x;nl,,,
then
pl0 = (x’fi-lx)~‘x’ji-lfi,
or the first elements /3i” = A-‘~$~‘ti. The F statistic aside from degree of
freedom corrections is given by:
& = @‘Afi;O/a2(‘0)
= n,b~‘x,A~‘x;~-‘~/a2(‘0), (40)
where crzoo) is the residual variance from this estimation. From (35) it is clear that
tLM = ctM if e2(lo) z fi2. The tests will differ when x1 explains some of 8, that is,
when Ho is not true. Hence, under the null and local alternatives, these two
variances will have the same probability limit and therefore the tests will have the
same limiting distribution. Furthermore, adding a linear combination of regres-
sors to both sides of a regression will not change the coefficients or the signifi-
cance of other regressors. In particular adding x2& to both sides of the auxiliary
regression converts the dependent variable to y and yet will not change [tM.
Hence, the t or F tests obtained from regressing y on x1 and x2 using fi will be
asymptotically equivalent to the LM test.
The relationship between the Wald and LM tests in this context is now clearly
visible in terms of the choice of 52 to use for the test. The Wald test uses b while
the LM test uses fi and the Likelihood Ratio test uses both. As the properties of
the tests differ only for finite samples, frequently computational considerations
will determine which to use. The primary computational differences stem from the
estimation of D which may require non-linear or other iterative procedures. It
may further require some specification search over a class of possible disturbance
specifications. The issue therefore hinges upon whether fi or fi is already
available from previous calculations. If the null hypothesis has already been
estimated and the investigator is trying to determine whether an additional
variable belongs in the model in the spirit of diagnostic testing, then ji is already
estimated and the LM test is easier. If on the other hand, the more general model
has been estimated, and the test is for a simplification or a test of a theory which
predicts the importance of some variable, then b is available and the Wald test is
easier. In rare cases will the LR test be computationally easier.
The three test statistics differ for finite samples but are asymptotically equiva-
lent. When the critical regions are calculated from the limiting distributions, then
there may be conflicts in inference between the tests. The surprising character of
this conflict is pointed out by a numerical inequality among the test statistics. It
was originally established by Savin (1976) and Berndt and Savin (1977) for
special cases of (29) and then by Breusch (1979) in the general case of (29). For
any data set y, x, the three test statistics will satisfy the following inequality:
(41)
Ch. 13: Wald, Likehhood Ratio, und Lagrange Multiplier Tests 193
Therefore, whenever the LM test rejects, so will the others and whenever the W
fails to reject, so do the others. The inequality, however, has nothing to say about
the relative merits of the tests because it applies under the null as well. That is, if
the Wald test has a size of 58, then the LR and LM test will have a size less than
5%. Hence their apparently inferior power performance is simply a result of a
more conservative size. When the sizes are corrected to be the same, there is no
longer a simple inequality relationship on the powers. As mentioned earlier, both
Rothenberg (1979) and Evans and Savin (1982) present results that when the sizes
are approximately corrected, the powers are approximately the same.
The estimate is not particularly good but it has the right signs and significant
t-statistics. Rho was estimated by searching over the unit interval and the estimate
is maximum likelihood.
The residuals from this estimate were then regressed upon the expanded set of
regressors, to obtain:
The same value of rho was imposed upon this estimate. The Lagrange Multiplier
statistic is (22) (0.171) = 3.76 which is slightly below the 95% level for X:(3.84)
but above the 90% level (2.71) so it rejects at 90% but not 95%. Notice that the
f-statistic on time is not significant at 95% but is at the 90% level.
Ch. 13: Wald, Likelihood Ratio, and Lagrange Multiplier Tests 195
&, = T(ii'Gii
- ii’Gir)/Ui. (44
Because G is idempotent, the two sums of squares in the numerator can be
calculated by regressing the corresponding residuals on X and looking at the
explained sums of squares. Their difference is also available as the difference
between the sums of squared residuals from the second stages of the relevant
2SLS regressions.
As long as the instrument list is unchanged from the null to the alternative
hypothesis, there is no difficulty formulating this test. If the list does change then
the Wald test appropriately uses the list under the alternative. One might suspect
that a similar LM test would be available using the more limited set of instru-
ments, however, this is not the case at least in this simple form. When the
instruments are different, the LM test can be computed as given in Engle (1979a)
but does not have the desired simple form.
In the more general case where (42) represents a stacked set of simultaneous
equations the covariance would in general be given by Z@Z, where 2 is the
contemporaneous covariance matrix. The instruments in the stacked system can
be formulated as I@ X and therefore letting 2 be the estimated covariance matrix
under the alternative, the 3SLS estimator can be written letting G =2@
X( X’X))‘X’ as:
jl= (z'Gz)-'t'Gy.
Again, through the equivalence with FIML, the approximate Wald test is:
= T( C’Gii - 2Gii).
In this section the asymptotic equivalence, the limiting distributions and the
asymptotic optimality of the three test statistic will be established under the
conditions of Crowder (1976). These rather weak conditions allow some depen-
dence of the observations and do not require that they be identically distributed.
Most econometric problems will be encompassed under these assumptions. Al-
though it is widely believed that these tests are optima1 in some sense, the
discussion in this section is designed to establish their properties under a set of
regularity conditions.
The log likelihood function assumed by Crowder allows for general dependence
of the random variables and for some types of stochastic or deterministic
exogenous variables. Let Y,, Y,, ...,Y, be p x 1 vectors of random variables
which have known conditional probability density functions f,( YIq_ i; e), where
8 E 0 an open subset of Rk and .F_t is the u field generated by Y,,..., Y-t, the
“previous history”. The log-likelihood conditional on Ye is:
also will. Notice that this result requires only that x be weakly exogenous; it need
not be strongly exogenous and can therefore depend upon past values of y.
The GLS models of Section 5 can now also be written in this framework.
Letting P’P = 52-l for any value of o, rewrite the model with y* = Py, x* = Px
so that:
y* 1x* - N( x*p,dz)
The parameters of interest are now /?, a2 and w. If the x were fixed constants,
then so will be the x*. If the x were stochastic strongly exogenous variables as
implied by (29), then so will be x *. The density h(x, $) will become h*(x*, rp, o)
but unless there is some strong a priori structure on h, w will not enter h*. If the
covariance structure is due to serial correlation then rewriting the model condi-
tional on the past will transform it directly into the Crowder framework regard-
less of whether the model is already dynamic or not.
Based on (45), the score, Hessian and information matrix are defined by:
s,(yJ) = aL(ays.e)
)
MY4 = g&(y,B),
Notice that the information matrix depends upon the sample size because the y,“s
are not identically distributed.
The essential conditions assumed by Crowder are:
L(B,y)=L(B,y)-~(B-B)/A,(B,B)(B-~),
sr(e,y)=-TA,(e,8)(e-8), (48)
slim ItLR-15,1=p~m(T(B”-B)‘(~,(e0,8)-~~(8))(e0-~)~.
The plim of the middle terms is zero for 8* = 8’ and for the sequence of local
alternatives since again plim or= 8’. The terms T’/*(s - 0’) will converge in
distribution under both Ho and HI and therefore the product converges in
probability to zero under Ho and Ht. Thus &a and [w have the same limiting
distributions. Similarly, from (48) and (10):
Theorem I
Under the assumptions in Crowder (1976), the Wald, Likelihood Ratio and
Lagrange Multiplier test statistics have the same limiting distribution when the
null hypothesis or local alternative are true.
(49)
Ch. 13: Wald, Likelihood Ratio, and Lagrange Multiplier Tests 199
where O,(l) refers to the remainder terms which vanish in probability for ZZ, and
local alternatives. Thus, asymptotically the likelihood is exactly quadratic and
Lemmas 1 and 2 establish that the tests are all the same. Furthermore, (49)
establishes that 8 is asymptotically sufficient for 8. To see this more clearly,
rewrite the joint density of y as:
and notice that by the factorization theorem, 8 is sufficient for 8 as long as y does
not enter the exponent which will be true asymptotically.
Finally, because 8 has a limiting normal distribution, with a known covariance
matrix Y(@)‘, all the testing results for hypotheses on the mean vector of a
multivariate normal, now apply asymptotically by considering 4 as the data.
To explore the nature of this optimality, suppose that the likelihood function in
(49) is exact without the O,(l) term. Then several results are immediately
apparent. If 8 is one dimensional, uniformly most powerful (UMP) tests will exist
against one sided alternatives and UMP unbiased (UMPU) tests will exist against
two sided alternatives.
If 8 = (ei, 6,) where 8, is a scalar hypothesized to have value Z_$’
under Ho but 0,
are unrestricted, then UMP similar or UMPU tests are available.
When 8, is multivariate, an invariance criterion must be added. In testing the
hypothesis p = 0 in the canonical model V - N( CL,Z), there is a natural invariance
with respect to rotations of V. If v= DV, where D is an orthogonal matrix, then
the testing problem is unchanged so that a test should be invariant to whether I/
or v are given. Essentially, this invariance says that the test should not depend on
which order the V’s are in; it should be equally sensitive to deviations in all
directions. The maximally invariant statistic in this problem is cK* which means
that any test which is to be invariant can be based upon this statistic. Under the
assumptions of the model, this will be distributed as Xi(A) with non-centrality
parameter h = p’p. The Neyman-Pearson lemma therefore establishes that the
uniformly most powerful invariant test would be based upon a critical region:
c= (CLy>c}.
To rewrite (49) in this form, let $r(i90))’ = P’P and V= P(8 - do). Then the
maximal invariant is
Thus, any test which is invariant can be based on this statistic and a uniformly
most powerful invariant test would have a critical region of the form:
c= {(a}.
This argument applies directly to the Wald, Likelihood Ratio and LM tests.
Asymptotically the remainder term in the likelihood function vanishes for the null
hypothesis and for local alternatives. Hence, these tests can be characterized as
asymptotically locally most powerful invariant tests. This is the general optimality
property of such tests which often will be simply called asymptotic optimality.
For further details on these arguments the reader is referred to Cox and Hinckley
(1974, chs. 5, 9), Lehmann (1959, chs. 4, 6, 7), and Fergurson (1967, chs. 4, 5).
In finite samples many tests derived from these principles will have stronger
properties. For example, if a UMP test exists, a locally most powerful test will be
it. Because of the invariance properties of the likelihood function it will automati-
cally generate tests with most invariance properties and all tests will be functions
of sufficient statistics.
One further property of Lagrange Multiplier tests is useful as it gives a general
optimality result for finite samples. For testing H,: 8 = B” against a local
alternative H,: 19= 8’ + 8 for 6 a vector of small numbers, the Neyman-Pearson
lemma shows that the likelihood ratio is a sufficient statistic for the test. The
likelihood ratio is:
e$q@“,y)-L(eO+&Y)
= s( 80, y)‘&
for small 6. The best test for local alternatives is therefore based on a critical
Ch. 13: Wuld, Likelihood Ratio, and Lagrange Multiplier Tests 801
region:
c= {s’bc}.
In this case 6 chooses a direction. However, if invariance is desired, then the test
would be based upon the scores in all directions:
c= (s(e”)‘~,‘(e”)s(80)>c},
of static models for time series data with the familiar low Durbin-Watson was
tolerated for many years although now most applied workers make serial correla-
tion corrections.
However, the next stage in generalization is to relax the “common factors”
restriction implicit in serial correlation assumptions [see Hendry and Mizon
(1980)] and estimate a dynamic model. Frequently, the economic implications will
be very different.
This discussion argues for the presentation of a variety of diagnostics from each
regression. Overfitting the model in many different directions allows the investiga-
tor to immediately assess the quality and stability of his specification.
The Lagrange Multiplier test is ideal for many of these tests as it is based upon
parameters fit under the null which are therefore already available. In particular,
the LM test can usually be written in terms of the residuals from the estimate
under the null. Thus, it provides a way of checking the residuals for non-random-
ness. Each alternative considered indicates the particular type of non-randomness
which might be expected.
Look for a moment at the LM test for omitted variables described in (37). The
test is based upon the R* of the regression of the residuals on the included and
potentially excluded variables. Thus, the test is based upon the squared partial
correlation coefficient between the residuals and the omitted variables. This is a
very intuitive way to examine residuals for non-randomness.
In the next sections, the LM test for a variety of types of n-&specification will
be presented. In Section 8, tests for non-spherical disturbances will be discussed
while Section 9 will examine tests for n-&specified mean functions including
non-linearities, endogeneity, truncation and several other cases.
Following Breusch and Pagan (1980) let the model be specified as:
h(z,a) = erfa,
h(z,a) = (z,c$
gives linear and quadratic cases for k = 1,2. Special case of this which might be of
interest would be:
h(v) = bo + v,P>‘,
h(z,d = exda, + v,P),
where the variance is related to the mean of yt.
From applications of the formulae for the LM test given above, Breusch and
Pagan derive the LM test. Letting 8, = (a,,...,cu,) and ah/6’0,I,,=, = KZ, where K
is a scalar. the score is:
S(do, y) = f 'ZK/d2,
(53)
where f, = tif/6: - 1, f and z are matrices with typical rows f, and z, and B and
d2 are the residuals and variance estimates under the null. This expression is
simply one-half the explained sum of squares of a regression of j on z. As pointed
out by Engle (1978), plimf’f/T = 2 under the null and local alternatives, so an
asymptotically equivalent test statistic is TR2 from this regression. As long as z
has an intercept, adding 1 to both sides and multiplying by a constant k2 will not
change the R2, thus, the statistic can be computed by regressing ii2 on z and
calculating TR2 of this regression. Koenker (1981) shows that this form is more
robust to departures from normality.
The remarkable result of this test however is that K has vanished. The test will
be the same regardless of the form of h. This happens because both the score and
the information matrix include only the derivative of h under H, and thus the
overall shape of h does not matter. As far as the LM test is concerned, the
alternative is:
h = Z,(YK,
where K is a scalar which is obviously irrelevant. This illustrates quite clearly both
the strength and the weakness of local tests. One test is optimal for all h much as
in the UMP case, however it seems plausible that it suffers from a failure to use
the functional form of h.
Does this criticism of the LM test apply to the W and LR tests? In both cases,
the parameters (Ymust be estimated by a maximum likelihood procedure and thus
the functional form of h will be important. However, the optimality of these tests
is only claimed for local alternatives. For non-local alternatives the power
function will generally go to one in any case and thus the shape of h is irrelevant
from an asymptotic point of view. It remains possible that the finite sample
non-local performance of the W and LR tests with the correct functional form for
h could be superior to the LM. Against this must be set the possible computa-
tional difficulties of W and LR tests which may face convergence problems for
some points in the sample space. Some Monte Carlo evidence that the LM test
performs well in this type of situation is contained in Godfrey (1981).
Several special cases of this test procedure illustrate the power of the technique.
Consider’ the model h = exp(cY, + (~ix,p), where Ha: (or = 0. The score as calcu-
lated in (53) evaluates all parameters, including /?, under the null. Thus, x# = j,,
the fitted values under the null. The heteroscedasticity test can be shown to have
the same limiting distribution for x,/3 as for x$ and therefore it can easily be
constructed as TR2 from S: on a constant and j,. If the model were h = exp( a0 +
a,( x,P)~) then the regression would be on a constant and j12. Thus it is very easy
to construct tests for a wide range of, possibly complex, alternatives.
Another interesting example is provided by the Autoregressive Conditional
Heteroscedasticity (ARCH) model of Engle (1982). In this case z, includes lagged
squared residuals as well as perhaps other variables. The conditional variance is
hypothesized to increase when the residuals increase. In the simplest case:
This is really much like that discussed above as ii,_ 1 = y,_ 1 - x,_ ,p and both yr_ 1
and x,-i are legitimately taken as given in the conditional distribution. The test
naturally comes out to be a regression of ii: on i2:_i,. . . , iifpp and an intercept
with the statistic as TR’ of this regression.
Once a heteroscedasticity correction has been made, it may be useful to test
whether it has adequately fixed the problem. Godfrey (1979) postulates the
model:
where g(0) = 0. The null hypothesis is therefore Ha: y = 0. Under the null,
estimates of & and ti = yt - x,p are obtained, 5, = h( z,&) and the derivative of h at
each point z,& can be calculated as hi. Of course, if h is linear, this is just a
constant. The test is simply again TR* of an auxiliary regression. In this case the
regression is of:
-2 -2
Ut -at h/Z 4t
11
q On $2 and ,2’
and the statistic will have the degrees of freedom of the number of parameters in
qt.
White (1980a) proposes a test for very general forms of heteroscedasticity. His
test includes all the alternatives for which the least squares standard errors are
biased. The heteroscedastic model includes all the squares and crossproducts of
the data. That is, if the original model were y = & + &xi + &x2 + E, the White
test would consider xi, x2, x:,x; and xix2 as determinants of u2. The test is as
usual formulated as TR2 of a regression of u2 on these variables plus an intercept.
These are in fact just the regressors which would be used to test for random
coefficients as in Breusch and Pagan (1979).
There is now a vast literature on testing for and estimating models with serial
correlation. Tests based on the LM principles are the most recent addition to the
econometrician’s tool kit and as they are generally very simple, attention will be
confined to them.
Suppose:
Y,lXt - ~(X#d)>
(55)
a( L)u, = E, u, = Y, - X,P? a(L)=l-a,L-a2L2- ... -cx*LP,
806 H. F. Engle
and E, is a white noise process. Then it may be of interest to test the hypothesis
Ha: (Yt= a.. = CQ= 0. Under H,, ordinary least squares is maximum likelihood
and thus the LM approach is attractive for its simplicity. An alternative formula-
tion of (55) which shows how it fits into Crowder’s framework is:
where #1_1 is the past information in both y and x. Thus, again under H, the
regression simplifies to OLS but under the alternative, there are non-linear
restrictions. The formulation (56) makes it clear that serial correlation can also be
viewed as a restricted model relative to the general dynamic model without the
non-linear restrictions. This is the common factor test which is discussed by
Hendry and Mizon (1980) and Sargan (1980) and for which Engle (1979a) gives
an LM test.
The likelihood function is easily written in terms of (56) and the score is
simply:
order serial correlation and higher order lags of dependent variables, the LM test
is likely to be preferred at least for higher order problems. See Godfrey and
Tremayne (1979) for further details.
It would seem attractive to construct a test against moving average dis-
turbances. Thus suppose the model has the form:
Y,l-%- ~(x,P~e,2)~
Yt-xlP=%,
u, = E, - OLi&,_i- . . ’ - ap&,_p, (58)
S( y, 8) = ii’U/u2,
which is identical to that in (57) for the AR(p) model. As the null hypothesis is
the same, the two tests will be the same. Again, the LM tests for different
alternatives turn out to be the same test. For local alternatives, the autoregressive
and moving average errors look the same and therefore one test will do for both.
When a serial correlation process has been fit for a particular model, it may still
be of interest to test for higher order serial correlation. Godfrey (1978b) supposes
that a ( p, q) residual model has been fit and that (p + r, q) is to be taken as the
alternative not surprisingly, the test against ( p, q + r) is identical. Consider here
the simplest case where q = 0. Then the residuals under the null can be written as:
fi,=r*-x,P,
The test for (p + r,O) or (p, r) error process can be calculated as TR2 of the
regression of Et on it, cr_i ,..., iit_p, E,_i ,..., ‘&-r, where .Zr= x, - yix,_i - . . . -
$,x,_~. Just as in the heteroscedasticity case the regression is of transformed
residuals on transformed data and the omitted variables. Here the new ingredient
is the inclusion of ii,_,, . . . , ii_p in the regression to account for the optimization
over y under the null.
This approach applies directly to diagnostic tests for time series models.
Godfrey (1979a), Poskitt and Tremayne (1980), Hosking (1980) and Newbold
808 R. F. Engle
(1980)
have developed and analyzed tests for a wide range of alternatives. In each
case the score depends simply on the residual autocorrelations, however the tests
differ from the familiar Box-Pierce-Portmanteau test in the calculation of the
critical region. Consequently, the LM tests will have superior properties at least
asymptotically for a finite parameterization of the alternative. If the number of
parameters under test becomes large with the sample size then the tests become
asymptotically equivalent. However, one might suspect that the power properties
of tests against low order alternatives might make them the most suitable general
purpose diagnostic tools.
When LM tests for serial correlation are derived in a simultaneous equation
framework, the statistics are somewhat more complicated and in fact there are
several incorrect tests in the literature. The difficulty arises over the differences in
instrument lists under the null and alternative models. For a survey of this
material plus presentation of several tests, see Breusch and Godfrey (1980). In the
standard simultaneous equation model:
Y,B+ X,r=u,,
(59)
U, = RU,_, + E,,
then the score under the null will have the form:
M=yY+p(r-a)-‘+c, (61)
810 R. F. Engle
where M is real money demand, Y is real GNP and r is the interest rate. Perhaps
their best results are when Ml is used for M and the long-term government bond
rate is used for r. The null hypothesis to be tested is cy= 0. The normal score is
proportional to u’z where I, the omitted variable, is the derivative of the
right-hand side with respect to (Yevaluated under the null:
Therefore, the LM test is a test of whether l/r2 belongs in the regression along
with Y and l/r.
Breusch and Pagan obtain the statistic [rM = 11.47 and therefore reject 1y= 0.
Including a constant term this becomes 5.92 which is still very significant in the
X2 table. However, correcting for serial correlation in the model under the null
changes the results dramatically. A second-order autoregressive model with
parameters 1.5295 and -0.5597 was required to whiten the residuals. These
parameters are used in an auxiliary regression of the transformed residual on the
three transformed right-hand side variables and a constant, to obtain an R2 =
0.01096. This is simply GLS where the covariance parameters are assumed
known. Thus, the LM statistic is 5rM = 0.515 which is distributed as X: if the null
is true. As can be seen it is very small suggesting that the liquidity trap is not
significantly different from zero.
As a second example, consider testing the hypothesis that the elasticity of
substitution of a production function is equal to 1 against the alternative that is
constant but not unity. If y is output and x1 and x2 are factors of production, the
model under the alternative can be written as:
(62)
ag 2
ap +=
+(1-s)
i
log?
1
which is simply the Kmenta (1967) approximation. Thus the Cobb-Douglas form
can be estimated with appropriate heteroscedasticity or serial correlation and the
Ch. 13: Wald, Likelihood Ratio, and Lagrange Multiplier Tests 811
unit elasticity assumption tested with power equal to a likelihood ratio test
without ever doing a non-linear regression.
As a third example, Davidson, Hendry, Srba and Yeo (1978) estimate a
consumption function for the United Kingdom which pays particular attention to
the model dynamics. The equation finally chosen can be expressed as:
+PJ&+P&+PJlP,, (63)
where c, Y and p are the logs of real consumption, real personal disposable income
and the price level, and A, is the i th difference. In a subsequent paper Hendry and
Von Ungem-Stemberg (1979) argue that the income series is mismeasured in
periods of inflation. The income which accrues from the holdings of financial
assets should be measured by the real rate of interest rather than the nominal as is
now done. There is a capital loss of p times the asset which should be netted out
of income. The appropriate log income measure is Y: = log( Y, - apL,_ 1) where L
is liquid assets of the personal sector and (Yis a scale parameter to reflect the fact
that L is not all financial assets.
The previous model corresponds to (Y= 0 and the argument for the respecifica-
tion of the model rests on the presumption that a # 0. The LM test can be easily
calculated whereas the likelihood ratio and Wald tests require non-linear estima-
tion if not respecification. The derivative of Y* with respect to (Yevaluated under
the null is simply - pL,_,/Y,. Denote this by x,. The score is proportional to u’z,
where z = &A,x, + &A,A,x, - &x,_~, and the betas are evaluated at their
estimates under the null. This is now a one degree of freedom test and can be
simply performed. The test is significant with a chi squared value of 5. As a one
tailed test it is significant at the 2.5% level.
In a standard time series regression framework, there has been much attention
given to the testing and estimation of serial correlation patterns in the dis-
turbances. A typical model might have the form:
YI = x,P + ?J
1’ PWU, = E,, E, - IN(0, a’), (64)
where p(L) is an r th order lag polynomial and x, is a 1 x k row vector which for
the moment is assumed to include no lagged exogenous or endogenous variables.
Sargan (1964, 1980) and Hendry and Mizon (1978) have suggested that this is
often a strong restriction on a general dynamic model. By multiplying through by
p(L) the equation can equivalently be written as:
812 R. F. Engle
Tests for exogeneity are a source of controversy partly because of the variety of
definitions of exogeneity implicit in the formulation of the hypotheses. In this
paper the notions of weak and strong exogeneity as formulated by Engle et al.
(1983) will be used in the context of linear simultaneous equation systems. In this
case weak exogeneity is essentially that the equations defining weakly exogenous
variables can be ignored without a loss of information. In textbook cases weakly
exogenous variables are predetermined. Strong exogeneity implies, in addition,
that the variables in question cannot be forecast by past values of endogenous
variables which is the definition implicit in Granger (1969) “non-causality”.
Consider a complete simultaneous equation system with G equations and K
predetermined variables so that Y, E, and V are T X G, X is T X K and the
coefficient matrices are conformable. The structural and reduced forms are:
where E, are rows of E which are independent and the x are weakly exogenous.
Partitioning this set of equations into the first and the remaining G - 1, the
structure becomes:
The hypothesis that Y, is weakly exogenous to the first equation in this full
information context is simply the condition for a recursive structure:
Partitioning this as in (71) using the identity Is21= 1fi2,, 1)f&, - 9,,St;2’ti2,, I gives:
where tildes represent estimates under the null and 4, is the row vector of
residuals under the null. Recognizing that c,h22iJ2~~2,/T = I, this can be rewrit-
ten as:
where the estimates assume (Y= uiz = 0, and the x’s include a variety of presum-
ably predetermined variables including lagged interest rates. Testing the hypothe-
sis that (Y= 0 by considering RG35 as an omitted variable is not legitimate as it
will be correlated with Ed. If one does the test anyway, a cl&squared value of 35 is
obtained.
The appropriate test of the weak exogeneity of RG35 is done by testing ui and
RG35 - Bii, as omitted from the second equation where 1, = ARAAA - x2y2.
816 R. F. Engie
This test was calculated by regressing P, on x2, 8, and RG35 - pii,. The resulting
TR* = 1.25 which is quite small, indicating that the data does not contain
evidence against the hypothesis. Careful examination of x1 and x2 in this case
shows that the identification of the model under the alternative is rather flimsy
and therefore the test probably has very little power.
A second class of weak exogeneity tests can be formulated using the same
analysis. These might be called limited information tests because it is assumed
that there are no overidentifying restrictions available from the second block of
equations. In this case equation (70) can be replaced by:
Y,=xn2+E2. (82)
Now the definition of weak exogeneity is simply that Q2,,= 0 because (Y= 0
imposes no restrictions on the model. This situation would be expected to occur
when the second equation is only very roughly specified.
A very similar situation occurs in the case where Y, is possibly measured with
error. Suppose Y2*is the true unobserved value of Y, but one observes Y, = Y2*+ n.
If the equation defining Y;Cis:
This can clearly be jointly tested by letting ur, pi and yf be the omitted variables
from each of the equations. Clearly the weak exogeneity and the Granger
non-causality are very separate parts of the hypothesis and can be tested
separately. Most often however when Granger causality is being tested on its own,
the appropriate model is (82) as overidentifying restrictions are rarely available.
Partitioning the parameter vector and x, vector conformably into /3 = (pi, &)‘,
the hypothesis to be tested is H,,: & = 0. The model has already been estimated
using only x2 as the exogenous variables and it is desired to test whether some
other variables were omitted. These estimates under the null will be denoted &
which implies a set of probabilities p,. The score and information matrix of this
model are given by:
(86)
(87)
where f is the derivative of F. Notice that the score is essentially a function of the
“residuals” y, - p,. Evaluating these test statistics under the null, the LM test
818 R. E Engle
(88)
where
and
Ylx- w%u2),
but we only have data for y 5 c. Thus, the likelihood function is given as the
probability density of y divided by the probability of observing this family. The
log-likelihood can be expressed in terms of I+ and @ which are the Gaussian
density and distribution functions respectively as:
L=Clog~((y,-x,P)/a)-Clog~7(c-x,P)/a). (89)
I ,
(90)
Ch. 13: Wald, Likelihood Ratio, and Lagrange Mulriplier Tests 819
To estimate this model one sets the score to zero and solves for the parameters.
Notice that this implies including another term in the regression which is the ratio
of the normal density to its distribution. The inclusion of this ratio, called the
Mills ratio, is a distinctive feature of much of the work of self-selectivity. The
information matrix can be shown to be:
In this section three alternative closely related testing procedures will be briefly
explained and the relationship between these methods and ones discussed in this
chapter will be highlighted. The three alternatives are Neyman’s (1959) C(a) test,
Durbin’s (1970) general procedure, and Hausman’s (1978) specification test.
Throughout this section the parameter vector will be partitioned as 8’ = (e;, 0;)
and the null hypothesis will be H,: @i= 0:. Neyman’s test, as exposited by
Breusch and Pagan (1980), is a direct generalization of the LM test which allows
consistent byt inefficie=nt estimgtion of the parameters e2 under the null. Let this
estimate be (3, and let B = (f7p, 0;)‘. Expanding the score evaluated at 8 around the
ML estimate 6 gives:
(93)
The C(a) test is just the LM test using (93) for the score. This adjustment can be
viewed as one step of a Newton-Raphson iteration to find an efficient estimate of
8, based upon an initial consistent estimate. In some situations such as the one
discussed in Breusch and Pagan, this results in a substantial simplification.
The Durbin (1970) procedure is also based on different estimates of the
parameters. He suggests calculating the maximum likelihood estimate of “r
assuming @,= &, the ML estimate under the null. Letting this new estimate be B,,
the test is=based upon the difference 8, - SF. Expanding the score with respect to
8, +bout 8r holding 0, = & and recognizing that the first term is zero by definition
of 0, the following relationship is found:
Because the Hessian is assumed to be non-singular, any test based upon 8, - 6’:
will have the same critical region as one based upon the score; thus the two tests
are equivalent. In implementation there are of course many asymptotically
equivalent forms of the tests, and it is the choice of the asymptotic form of the
test which gives rise to the differences between the LM test for serial correlation
and Durbin’s h test.
The third principle is Hausman’s (1978) specification test. The spirit of this test
is somewhat different. The parameters of interest are not 8, but rather r3,. The
objective is to restrict the parameter space by setting 8, to some preassigned
values without destroying the consistency of the estimates of 0,. The test is based
upon the difference between the efficient estimates under the null, &, and a
consistent but possibly inefficient estimate und,er the alternative 8,. Hausman
makes few assumptions about the properties of &; Hausman and Taylor (1980),
however, modify the statement of the result somewhat to use the maximum
likelihood estimate under the alternative e2. For the moment, this interpretation
will be used here. Expanding the score around the maximum likelihood estimate
and evaluating it at t? gives:
Ch. 13: Wuld, Likelihood Rutio, and Lugrunge Multiplier Tests 821
(95)
It was shown above that asymPtotically optimal tests could be based upon either
the score or the difference (0, - 0:). As these are related by a non-singular
transformation which asymptotically is Ya”, critical regions based on either
statistic will be the same. Hausman’s difference is based upon Xzl times the
score asymptotically. If this matrix is non-singular, then the tests will all be
asymptotically equivalent. The dimension of Y21 is q X p where p is the number
of restrictions and q = k - p is the number of remaining parameters. Thus a
necessary condition for this test to be asymptotically equivalent is that min( p, q)
= p. A sufficient condition is that rank(.Y*‘) = p. The equivalence requires that
there be at least as many parameters unrestricted as restricted. However, parame-
ters which are asymptotically independent of the parameters under test will not
count. For example, in a classical linear regression model, the variance and any
serial correlation parameters will not count in the number of unrestricted parame-
ters. The reason for the difficulty is that the test is formulated to ignore all
information in $, - 0: even though it frequently would be available from the
calculation of i?*.
Hausman and Taylor (1980) in responding to essentially this criticism from
Holly (1980) point out that in the case q < p, the specification test can be
interpreted as an asymptotically optimal test of a different hypothesis. They
propose the hypothesis H,*: 4;Y21(8, - d,“) = 0 or simply YZ,,(r3,- 0,“) = 0. If
H,* is true, the bias in t$ from restricting 8, = 0: would asymptotically be zero.
The hypothesis H,* is explicitly a consistency hypothesis. The Hausman test is
one of many asymptotically equivalent ways to test this hypothesis. In fact, the
same Wald, LR and LM tests are available as pointed out by Riess (1982). The
investigator must however decide which hypothesis he wishes to test, Ho or H,*.
In answering the question of which hypothesis is relevant, it is important to ask
why the test is being undertaken in the first place. As the parameters of interest
are e,, the main purpose of the test is to find a more parsimonious specification,
and the advantage of a parsimonious specification is that more efficient estimates
of the parameters of interest can be obtained. Thus if consistency were the only
concern of the investigator, he would not bother to restrict the model at all. The
objective is therefore to improve the efficiency of the estimation by testing and
then imposing some restrictions. These restrictions ought, however, to be grounded
in an economic hypothesis rather than purely data based as is likely to be the case
for H,* which simply asserts that the true parameters lie in the column null space
of Y2i.
822 R. F. Engle
While many non-standard situations may arise in practice, two will be discussed
here. The first considers the properties of the Wald, LM and LR tests when the
likelihood function is misspecified. The second looks at the case where the
information matrix is singular under the null.
White (1982) and Domowitz and White (1982) have recently examined the
problem of inference in maximum likelihood situations where the wrong likeli-
hood has been maximized. These quasi-maximum likelihood estimates may well
be consistent, however the standard errors derived from the information matrix
are not correct. For example, the disturbances may be assumed to be normally
distributed when in fact they are double exponentials. White has proposed
generalizations of the Wald and LM test principles which do have the right size
and which are asymptotically powerful when the density is correctly assumed.
These are derived from the fact that the two expressions for the information
matrix are no longer equivalent for QML estimates. The expectation of the outer
product of the scores does not equal minus the expectation of the Hessian.
Letting L, be the log-likelihood of the tth observation, White constructs the
matrices:
[rM = sY”C,‘A”s.
Notice that if the distribution is correct, then A = - B so that C = A-’ and the
whole term becomes simply A” as usual. Thus the use of the quasi-LM statistic
corrects the size of the test when the distribution is false but gives the asymptoti-
cally optimal test when it is true. Except for possible finite sample and computa-
tional costs, it appears to be a sensible procedure. Exactly the same correction is
Ch. 13: Wald, Likelihood Ratio, and Lagrange Multiplier Tests 823
made to the Wald test to obtain a quasi Wald test. Because it is the divergence
between A and B which creates the situation, White proposes an omnibus test for
differences between A and B.
In some situations, an alternative to this approach would be to test for
normality directly as well as for other departures from the specification. Jarque
and Bera (1980, 1982) propose such a test by taking the Pearson density as the
alternative and simultaneously testing for serial correlation, functional form
misspecification and heteroscedasticity. This joint test decomposes into indepen-
dent LM tests because of the block diagonality of the information matrix for this
problem.
A second non-standard situation which occurs periodically in practice is when
some of the parameters are estimable only when the null hypothesis is false. That
is, the information matrix under the null is singular. Two simple examples with
rather different conclusions are:
In both cases, the likelihood function can be maximized under both the null and
alternative, but the limiting distribution of the likelihood ratio statistic is not
clear. Furthermore, conventional Wald and LM tests also have difficulties-the
LM will have a parameter which is unidentified under the null which appears in
the score, and the Wald will have an unknown limiting distribution. In the first
example, it is easy to see that by reparameterizing the model, the null hypothesis
becomes a two degree of freedom standard test. In the second example, however,
there is no simple solution. Unless the parameter (Yis given a priori, the tests will
have the above-mentioned problems. A solution proposed by Davies (1977) is to
obtain the LM test statistic for each value of the unidentified parameter and then
base the test on the maximum of these. Any one of these would be chi squared
with one degree of freedom, however, the maximum of a set of dependent chi
squares would not be chi squared in general. Davies finds a bound for the
distribution which gives a test with size less than or equal to the nominal value.
As an example of this, Watson (1982) considers the problem of testing whether
a regression coefficient is constant or whether it follows a first order autoregres-
sive process. The model can be expressed as:
The null hypothesis is that at = 0; this however makes the parameter p unidenti-
fiable. The test is constructed by first searching over the possible values of p to
find the maximum LM test statistic, and then finding the limiting distribution of
the test to determine the critical value. A Monte Carlo evaluation of the test
showed it to work reasonably well except for values of p close to unity when the
limiting distribution was well approximated only for quite large samples.
Several other applications of this result occur in econometrics. In factor
analytical models, the number of parameters varies with the number of factors so
testing the number of factors may involve such a problem. Testing a series for
white noise against an AR(l) plus noise again leads to this problem as the
parameter in the autoregression is not identified under the null. A closely related
problem occurred in testing for common factor dynamics as shown in Engle
(1979a). Several others could be illustrated.
12. Conclusion
References
Aitcheson,J. and S. D. Silvey (1958), “MaximumLikelihood Estimation of Parameters Subject to
Restraints”, Annals of Mathematical Statistics. 29:813-828.
Anderson, T. W. (1971j, The Statistical Analysis’of Time Series. New York: John Wiley and Sons.
Bera, A. K. and C. M. Jarque (1982), “Model Specification Tests: A Simultaneous Approach”,
Journal of Econometrics, 20:59-82.
Bemdt, E. R. and N. E. Savin (1977), “Conflict Among Criteria for Testing Hypotheses in the
Multivariate Linear Regression Model”, Econometrica, 45:1263-1278.
Ch. 13: Wald, Llkelrhood Ratio. and Lagrange Multiplier Tests 825
Breusch. T. S. (1978). “Testing for Autocorrelation in Dynamic Linear Models”, Au.vraliun Economic
Papers. 17:334-355.
Breusch, T. S. and A. R. Pagan (1979) “A Simple Test for Heteroskedasticity and Random Coefficient
Variation”. Econometrica, 47:1287-1294.
Breusch, T. S. (1979). “Conflict Among Criteria for Testing Hypotheses: Extensions and Comments”.
Econometrica, 47:203-207.
Breusch. T. S. and L. G. Godfrey (1980) “A Review of Recent Work on Testing for Autocorrelation
in Dynamic Economic Models”, Discussion Paper *8017, University of Southampton.
Breusch, T. S. and A. R. Pagan (1980), “The Lagrange Multiplier Test and Its Applications to Model
Specification in Econometrics”, Review of Economic Studies, 47~239-254.
Cox. D. R. and D. V. Hinckley (1974). Theoretical Statisrics. London: Chapman and Hall,
Crowder, M. J. (1976), “Maximum Likelihood Estimation for Dependent Observations”, Journul of
rhe Rqval Statistical Society, Series B, 45-53.
Davidson, J. E. H., Hendry. D. F., Srba, F.. and S. Yeo (1978), “Econometric Modelling of the
Aggregate Time-Series Relationship Between Consumers’ Expenditure and Income in the United
Kingdom”, Economic Journal, 88:661-692.
Davies. R. B. (1977). “Hypothesis Testing When a Nuisance Parameter is Present Only Under the
Alternative”, Biometrrka, 64:247-254.
Domowitz, I. and H. White (1982), “Misspecified Models with Dependent Observations”, Journal of
Economeincs, 20:35-58.
Durbin. J. (1970), “Testing for Serial Correlation in Least Squares Regression When Some of the
Regressors are Lagged Dependent Variables”, Econometrica. 38:410-421.
Eisner. !<. (1971). “Non-linear Estimates of the Liquidity Trap”, Econometrica, 39:861-X64.
Engle. K. F. (1979) “Estimation of the Price Elasticity of Demand Facing Metropolitan Producers”,
Journal of Urban Economics, 6:42-64.
Engle, R. F. (1982) “Autoregression Conditional Heteroskedasticity with Estimates of the Variance of
U.K. InRation”, Econometrica, 50:987-1007.
Engle, R. F. (1979a). “A General Approach to the Construction of Model Diagnostics Based on the
Lagrange Multiplier Principle”, U.C.S.D. Discussion Paper 79-43.
Engle. R. F. (1982a). “A General Approach to Lagrange Multiplier Model Diagnostics”, Journul of
Econometrics, 20:83-104.
Engle. R. F. (1980) “Hypothesis Testing in Spectral Regression: the Lagrange Multiplier as a
Regression Diagnostic”, in: Kmenta and Ramsey, eds.. Crireriu /or Eruluutron oj Economerrrt
Models. New York: Academic Press.
Engle, R. F., D. F. Hendry, and J. F. Richard (1983). “Exogeneity”, Econometncu. 50:227-304.
Evans, Cr. 8. A. and N. E. Savin (1982) “Conflict Among the Criteria Revisited: The W, LR and LM
tests”. EconometrIca, 50:737-748.
Ferguson. T. S. (1967), Mathemurrcal Srutistrcs. New York: Academic Press,
Godfrey, L. G. (1978) “Testing for Multiplicative Heteroskedasticity”, Journul 01 Economerrrts,
81227-236.
Godfrey. L. G. (lY78a), “Testing Against general Autoregressive and Moving Average Error Models
When the Regressors Include Lagged Dependent Variables”, Economerrrca, 46:1293-1302.
Godfrey, L. G. (lY78b), “Testing for Higher Order Serial Correlation in Regression Equations when
the Regressors Include Lagged Dependent Variables”, Economerrica, 46:1303-1310.
Godfrey, L. G. (1979) “A Diagnostic Check on the Variance Model in Regression Equations with
Heteroskedastic Disturbances”, unpublished manuscript, University of York.
Godfrey, L. G. (197Ya), “Testing the Adequacy of a Time Series Model”, Biometrika, 66:67-72.
Godfrey, L. G. and A. R. Tremayne (1979). “A Note on Testing for Fourth Order Autocorrelation in
Dynamic Quarterly Regression Equations”, unpublished manuscript, University of York,
Godfrey, L. Cr. (1980). “On the Invariance of the Lagrange Multiplier Test with Respect to Certain
Changes in the Alternative Hypothesis”, Econometricu, 49:1443-1456.
Hausman, J. (1978), “Specification Tests in Econometrics”, Econometrica, 46:1251-1212.
Hausman, J. and D. Wise (1977) “Social Experimentation Truncated Distributions, and Eflicient
Estimation”, Econometrica, 45:319-339.
Hausman, J. and W. Taylor (1980) “Comparing Specification Tests and Classical Tests”, unpublished
manuscript.
Hendry. D. F. and T. von Ungem-Stemberg (1979), “Liquidity and Inflation Effects on Consumers’
826 R. F Engle
Expenditure”, in: Angus Deaton, ed., Festschrift for Richard Stone. Cambridge: Cambridge Univer-
sity Press.
Hendry, D. F. and G. Mizon (1980) “An Empricial Application and Monte Carlo Analysis of Tests of
Dynamic Specification”, Review of Economic Studies, 47:21-46.
Holly, A. (1982). “A Remark on Hausman’s Specification Test, ” Economefnca, v. 50: 749-759.
Hosking, J. R. M. (1980). “Lagrange Multiplier Tests of Time Series Models”, Journal of the Rqval
Statistical Society B, 42:170-181.
Jarque, C. and A. K. Bera (1980) “Efficient Tests for Normality, Homoscedasticity. and Serial
Independence of Regression Residuals”, Economics Letters, 6:255-259.
King, M. L. and G. H. Hillier (1980). “A Small Sample Power Property of the Lagrange Multiplier
Test”, Discussion Paper, Monash University.
Kmenta, .I. (1967). “On Estimation of the CES Production Function”, Internutional Economic Reoiew.
8:180-189.
Koenker, R. (1981). “A Note on Studentizing a Test for Heteroscedasticity”, Journal of Economefrics,
17:107-112.
Konstas, P. and M. Khouja (1969). “The Keynesian Demand-for-Money Function: Another Look and
Some Additional”. Journal of Monet Credit and Banking. 1~765-777.
Lehmann, E. L. (1959). Testing Statishcal Hjyotheses. New York: John Wiley and Sons.
Neyman, J. (1959). “Optimal Asymptotic Tests of Composite Statistical Hypotheses”, in (U.
Grenander, ed.) Probability and Statistics. Stockholm: Almquist and Wiksell, pp. 213-234.
Newbold, P. (1980), “The Equivalence of Two Tests of Time Series Model Adequacy”, Btometrica,
671463-465.
Pifer, H. (1969) “A Non-linear Maximum Likelihood Estimate of the Liquidity trap,” Econome/ricu.
371324-332.
Poskitt, D.S. and A.P. Tremayne (1980), “Testing the Specification of a Fitted ARMA Model”,
Biometricu, 671359-363.
Rao, C. R. (1948) “Large Sample Tests of Statistical Hypothese Concerning Several Parameters with
Application to Problems of Estimation”, Proceedings of the Cambridge Phllosophrcal Socie<lx,
44150-57.
Reiss, P. (1982), “Alternative Interpretations of Hausman’s m Test”, manuscript Yale University.
Rothenberg, T. J. (1980) “Comparing Alternative Asymptotically Equivalent Tests”, invited paper
presented at World Congress of the Econometric Society, Aix-en-Provence, 1980.
Sargan, J. D. (1964). “Wages and Prices in the United Kingdom: A Study in Econometric Methodol-
ogy”, in (P.E. Hart, G. Mills, J.K. Whitaker, eds.) Econometric Ana@sis for Nationul Economit
Planning. London: Butterworths, 1964.
Sargan, J. D. (1980), “Some Tests of Dynamic Specification for a Single Equation”, Econometrica,
481879-897.
Savin, N. E. (1976) “Conflicts Among Testing Procedures in a Linear Regression Model with
Autoregressive Disturbances”, Econometrica, 44:1303-1313.
Silvey, D. S. (1959). “The Lagrangean Multiplier Test”, Annals of Murhemaricul Stutistics, 30:389-407.
Wald, A. (1943). “Tests of Statistical Hypotheses Concerning Several Parameters When the Number
of Observations is Large”, Transacnons of the American Mathematical Sociey, 541426-482.
Watson, M. (1982). “A Test for Regression Coefficient Stability When a Parameter is Identified Only
Under the Alternative”, Harvard Discussion Paper 906.
White, H. (1980). “A Heteroskedasticity Consistent Covariance Matrix Estimator and a Direct Test
for Heteroskedasticity”, Econometrica, 48:817-838.
White, H. (1982). “Maximum Likelihood Estimation of M&specified Models”, Economerricu, 50:1-26.
White, K. (1972), “Estimation of the Liquidity Trap With a Generalized Functional Form”,
Econometncu, 40:193-199.
Wilks, S. S. (1938), “The Large Sample Distribution of the Likelihood Ratio for Testing Composite
Hypotheses”, Annals of Mathemurical Statistics, 9~60-62.