0% found this document useful (0 votes)
20 views10 pages

Lecture 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Lecture 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Gujarati: Basic II. Relaxing the 13.

Econometric Modeling: © The McGraw−Hill


Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

13
ECONOMETRIC
MODELING: MODEL
SPECIFICATION AND
DIAGNOSTIC TESTING

Applied econometrics cannot be done mechanically; it needs understanding,


intuition and skill.1
. . . we generally drive across bridges without worrying about the soundness of
their construction because we are reasonably sure that someone rigorously
checked their engineering principles and practice. Economists must do likewise
with models or else attach the warning ‘not responsible if attempted use leads to
collapse’.2
Economists’ search for “truth” has over the years given rise to the view that econ-
omists are people searching in a dark room for a non-existent black cat; econo-
metricians are regularly accused of finding one.3
One of the assumptions of the classical linear regression model (CLRM),
Assumption 9, is that the regression model used in the analysis is “correctly”
specified: If the model is not “correctly” specified, we encounter the problem
of model specification error or model specification bias. In this chapter
we take a close and critical look at this assumption, because searching for
the correct model is like searching for the Holy Grail. In particular we ex-
amine the following questions:
1. How does one go about finding the “correct” model? In other words,
what are the criteria in choosing a model for empirical analysis?

1
Keith Cuthbertson, Stephen G. Hall, and Mark P. Taylor, Applied Econometrics Techniques,
Michigan University Press, 1992, p. X.
2
David F. Hendry, Dynamic Econometrics, Oxford University Press, U.K., 1995, p. 68.
3
Peter Kennedy, A Guide to Econometrics, 3d ed., The MIT Press, Cambridge, Mass., 1992,
p. 82.

506
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

CHAPTER THIRTEEN: ECONOMETRIC MODELING 507

2. What types of model specification errors is one likely to encounter in


practice?
3. What are the consequences of specification errors?
4. How does one detect specification errors? In other words, what are
some of the diagnostic tools that one can use?
5. Having detected specification errors, what remedies can one adopt
and with what benefits?
6. How does one evaluate the performance of competing models?
The topic of model specification and evaluation is vast, and very extensive
empirical work has been done in this area. Not only that, but there are
philosophical differences on this topic. Although we cannot do full justice to
this topic in one chapter, we hope to bring out some of the essential issues
involved in model specification and model evaluation.

13.1 MODEL SELECTION CRITERIA


According to Hendry and Richard, a model chosen for empirical analysis
should satisfy the following criteria4:
1. Be data admissible; that is, predictions made from the model must be
logically possible.
2. Be consistent with theory; that is, it must make good economic sense.
For example, if Milton Friedman’s permanent income hypothesis holds,
the intercept value in the regression of permanent consumption on perma-
nent income is expected to be zero.
3. Have weakly exogenous regressors; that is, the explanatory variables,
or regressors, must be uncorrelated with the error term.
4. Exhibit parameter constancy; that is, the values of the parameters
should be stable. Otherwise, forecasting will be difficulty. As Friedman
notes, “The only relevant test of the validity of a hypothesis [model] is com-
parison of its predictions with experience.”5 In the absence of parameter
constancy, such predictions will not be reliable.
5. Exhibit data coherency; that is, the residuals estimated from the
model must be purely random (technically, white noise). In other words, if
the regression model is adequate, the residuals from this model must be
white noise. If that is not the case, there is some specification error in the
model. Shortly, we will explore the nature of specification error(s).
6. Be encompassing; that is, the model should encompass or include all
the rival models in the sense that it is capable of explaining their results. In
short, other models cannot be an improvement over the chosen model.

4
D. F. Hendry and J. F. Richard, “The Econometric Analysis of Economic Time Series,”
International Statistical Review, vol. 51, 1983, pp. 3–33.
5
Milton Friedman, “The Methodology of Positive Economics,” in Essays in Positive Eco-
nomics, University of Chicago Press, Chicago, 1953, p. 7.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

508 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

It is one thing to list criteria of a “good” model and quite another to actu-
ally develop it, for in practice one is likely to commit various model specifi-
cation errors, which we discuss in the next section.

13.2 TYPES OF SPECIFICATION ERRORS


Assume that on the basis of the criteria just listed we arrive at a model that
we accept as a good model. To be concrete, let this model be

Yi = β1 + β2 Xi + β3 Xi2 + β4 Xi3 + u1i (13.2.1)

where Y = total cost of production and X = output. Equation (13.2.1) is the


familiar textbook example of the cubic total cost function.
But suppose for some reason (say, laziness in plotting the scattergram) a
researcher decides to use the following model:

Yi = α1 + α2 Xi + α3 Xi2 + u2i (13.2.2)

Note that we have changed the notation to distinguish this model from the
true model.
Since (13.2.1) is assumed true, adopting (13.2.2) would constitute a speci-
fication error, the error consisting in omitting a relevant variable (Xi3 ).
Therefore, the error term u2i in (13.2.2) is in fact

u2i = u1i + β4 Xi3 (13.2.3)

We shall see shortly the importance of this relationship.


Now suppose that another researcher uses the following model:

Yi = λ1 + λ2 Xi + λ3 Xi2 + λ4 Xi3 + λ5 Xi4 + u3i (13.2.4)

If (13.2.1) is the “truth,” (13.2.4) also constitutes a specification error, the


error here consisting in including an unnecessary or irrelevant variable
in the sense that the true model assumes λ5 to be zero. The new error term
is in fact

u3i = u1i − λ5 Xi4


(13.2.5)
= u1i since λ5 = 0 in the true model (Why?)

Now assume that yet another researcher postulates the following model:

ln Yi = γ1 + γ2 Xi + γ3 Xi2 + γ4 Xi3 + u4i (13.2.6)

In relation to the true model, (13.2.6) would also constitute a specification


bias, the bias here being the use of the wrong functional form: In (13.2.1)
Y appears linearly, whereas in (13.2.6) it appears log-linearly.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

CHAPTER THIRTEEN: ECONOMETRIC MODELING 509

Finally, consider the researcher who uses the following model:

Yi* = β1* + β2* Xi* + β3* Xi*2 + β4* Xi*3 + ui* (13.2.7)

where Yi* = Yi + ε i and Xi* = Xi + wi , εi and wi being the errors of measure-


ment. What (13.2.7) states is that instead of using the true Yi and Xi we use
their proxies, Yi* and Xi* , which may contain errors of measurement. There-
fore, in (13.2.7) we commit the errors of measurement bias. In applied
work data are plagued by errors of approximations or errors of incomplete
coverage or simply errors of omitting some observations. In the social
sciences we often depend on secondary data and usually have no way of
knowing the types of errors, if any, made by the primary data-collecting
agency.
Another type of specification error relates to the way the stochastic error
ui (or ut) enters the regression model. Consider for instance, the following
bivariate regression model without the intercept term:

Yi = β Xi ui (13.2.8)

where the stochastic error term enters multiplicatively with the property
that ln ui satisfies the assumptions of the CLRM, against the following
model
Yi = α Xi + ui (13.2.9)

where the error term enters additively. Although the variables are the same
in the two models, we have denoted the slope coefficient in (13.2.8) by β and
the slope coefficient in (13.2.9) by α. Now if (13.2.8) is the “correct” or “true”
model, would the estimated α provide an unbiased estimate of the true β?
That is, will E(α̂) = β? If that is not the case, improper stochastic specifica-
tion of the error term will constitute another source of specification error.
To sum up, in developing an empirical model, one is likely to commit one
or more of the following specification errors:
1. Omission of a relevant variable(s)
2. Inclusion of an unnecessary variable(s)
3. Adopting the wrong functional form
4. Errors of measurement
5. Incorrect specification of the stochastic error term
Before turning to an examination of these specification errors in some
detail, it may be fruitful to distinguish between model specification errors
and model mis-specification errors. The first four types of error discussed
above are essentially in the nature of model specification errors in that we
have in mind a “true” model but somehow we do not estimate the correct
model. In model mis-specification errors, we do not know what the true
model is to begin with. In this context one may recall the controversy
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

510 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

between the Keynesians and the monetarists. The monetarists give primacy
to money in explaining changes in GDP, whereas the Keynesians emphasize
the role of government expenditure to explain changes in GDP. So to speak,
there are two competing models.
In what follows, we will first consider model specification errors and then
examine model mis-specification errors.

13.3 CONSEQUENCES OF MODEL SPECIFICATION ERRORS


Whatever the sources of specification errors, what are the consequences? To
keep the discussion simple, we will answer this question in the context of
the three-variable model and consider in this section the first two types of
specification errors discussed earlier, namely, (1) underfitting a model,
that is, omitting relevant variables, and (2) overfitting a model, that is, in-
cluding unnecessary variables. Our discussion here can be easily general-
ized to more than two regressors, but with tedious algebra6; matrix algebra
becomes almost a necessity once we go beyond the three-variable case.

Underfitting a Model (Omitting a Relevant Variable)


Suppose the true model is:

Yi = β1 + β2 X2i + β3 X3i + ui (13.3.1)

but for some reason we fit the following model:

Yi = α1 + α2 X2i + vi (13.3.2)

The consequences of omitting variable X3 are as follows:


1. If the left-out, or omitted, variable X3 is correlated with the included
variable X2, that is, r2 3, the correlation coefficient between the two variables,
is nonzero, α̂1 and α̂2 are biased as well as inconsistent. That is, E(α̂1 ) $= β1
and E(α̂2 ) $= β2 , and the bias does not disappear as the sample size gets
larger.
2. Even if X2 and X3 are not correlated, α̂1 is biased, although α̂2 is now
unbiased.
3. The disturbance variance σ 2 is incorrectly estimated.  2
4. The conventionally measured variance of α̂2 ( = σ 2 / x2i ) is a biased
estimator of the variance of the true estimator 2 β̂ .
5. In consequence, the usual confidence interval and hypothesis-testing
procedures are likely to give misleading conclusions about the statistical
significance of the estimated parameters.

6
But see exercise 13.32.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

CHAPTER THIRTEEN: ECONOMETRIC MODELING 511

6. As another consequence, the forecasts based on the incorrect model


and the forecast (confidence) intervals will be unreliable.
Although proofs of each of the above statements will take us far afield,7 it is
shown in Appendix 13A, Section 13A.1, that

E(α̂2 ) = β2 + β3 b3 2 (13.3.3)

where b3 2 is the slope in the


 regression 2of the excluded variable X3 on the in-
cluded variable X2 (b3 2 = x3i x2i / x2i ). As (13.3.3) shows, α̂2 is biased, un-
less β3 or b3 2 or both are zero. We rule out β3 being zero, because in that case
we do not have specification error to begin with. The coefficient b3 2 will be
zero if X2 and X3 are uncorrelated, which is unlikely in most economic data.
Generally, however, the extent of the bias will depend on the bias term
β3b3 2. If, for instance, β3 is positive (i.e., X3 has a positive effect on Y) and
b3 2 is positive (i.e., X2 and X3 are positively correlated), α̂2 , on average, will
overestimate the true β2 (i.e., positive bias). But this result should not be
surprising, for X2 represents not only its direct effect on Y but also its indirect
effect (via X3) on Y. In short, X2 gets credit for the influence that is rightly at-
tributable to X3, the latter prevented from showing its effect explicitly be-
cause it is not “allowed” to enter the model. As a concrete example, consider
the example discussed in Chapter 7.

ILLUSTRATIVE EXAMPLE: CHILD MORTALITY regress FLR on PGNP (regression of the excluded
REVISITED variable on the included variable), the slope coefficient
in this regression [b3 2 in terms of Eq. (13.3.3)] is
Regressing child mortality (CM) on per capita GNP 0.00256.8 This suggests that as PGNP increases by a
(PGNP) and female literacy rate (FLR), we obtained the unit, on average, FLR goes up by 0.00256 units. But if
regression results shown in Eq. (7.6.2), giving the partial FLR goes up by these units, its effect on CM will be
slope coefficient values of the two variables as −0.0056 (−2.2316) (0.00256) = β̂3 b3 2 = −0.00543.
and −2.2316, respectively. But if we now drop the FLR Therefore, from (13.3.3) we finally have ( β̂2 +
variable, we obtain the results shown in Eq. (7.7.2). If we β̂3 b3 2 ) = [−0.0056 + (−2.2316)(0.00256)] ≈ −0.0111,
regard (7.6.2) as the correct model, then (7.7.2) is a mis- which is about the value of the PGNP coefficient ob-
specified model in that it omits the relevant variable FLR. tained in the incorrect model (7.7.2).9 As this example il-
Now you can see that in the correct model the coefficient lustrates, the true impact of PGNP on CM is much less
of the PGNP variable was −0.0056, whereas in the (−0.0056) than that suggested by the incorrect model
“incorrect” model (7.7.2) it is now −0.0114. (7.7.2), namely, (−0.0114).
In absolute terms, now PGNP has a greater impact
on CM as compared with the true model. But if we

7
For an algebraic treatment, see Jan Kmenta, Elements of Econometrics, Macmillan,
New York, 1971, pp. 391–399. Those with a matrix algebra background may want to consult
J. Johnston, Econometrics Methods, 4th ed., McGraw-Hill, New York, 1997, pp. 119–112.
8
The regression results are:

FLR = 47.5971 + 0.00256PGNP
se = (3.5553) (0.0011) r2 = 0.0721
9
Note that in the true model β̂2 and β̂3 are unbiased estimates of their true values.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

512 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

Now let us examine the variances of α̂2 and β̂2


σ2
var (α̂2 ) =  2 (13.3.4)
x2i

σ2 σ2
var (β̂2 ) =  2
= VIF (13.3.5)
1 − r223 2
! 
x2i x2i

where VIF (a measure of collinearity) is the variance inflation factor


[ = 1/(1 − r223 )] discussed in Chapter 10 and r2 3 is the correlation coefficient
between variables X2 and X3; Eqs. (13.3.4) and (13.3.5) are familiar to us
from Chapters 3 and 7.
As formulas (13.3.4) and (13.3.5) are not the same, in general, var (α̂2 ) will
be different from var (β̂2 ). But we know that var (β̂2 ) is unbiased (why?).
Therefore, var (α̂2 ) is biased, thus substantiating the statement made in
point 4 earlier. Since 0 < r223 < 1, it would seem that in the present case
var (α̂2 ) < var (β̂2 ). Now we face a dilemma: Although α̂2 is biased, its vari-
ance is smaller than the variance of the unbiased estimator β̂2 (of course, we
are ruling out the case where r2 3 = 0, since in practice there is some corre-
lation between regressors). So, there is a tradeoff involved here.10
The story is not complete yet, however, for the σ 2 estimated from model
(13.3.2) and that estimated from the true model (13.3.1) are not the same be-
cause the RSS of the two models as well as their degrees of freedom (df) are
different. You may recall that we obtain an estimate of σ 2 as σ̂ 2 = RSS/df,
which depends on the number of regressors included in the model as well as
the df ( = n, number of parameters estimated). Now if we add variables to
the model, the RSS generally decreases (recall that as more variables are
added to the model, the R2 increases), but the degrees of freedom also de-
crease because more parameters are estimated. The net outcome depends
on whether the RSS decreases sufficiently to offset the loss of degrees of
freedom due to the addition of regressors. It is quite possible that if a re-
gressor has a strong impact on the regressand—for example, it may reduce
RSS more than the loss in degrees of freedom as a result of its addition to
the model—inclusion of such variables will not only reduce the bias but will
also increase precision (i.e., reduce standard errors) of the estimators.
On the other hand, if the relevant variables have only a marginal impact
on the regressand, and if they are highly correlated (i.e., VIF is larger), we
may reduce the bias in the coefficients of the variables already included in
the model, but increase their standard errors (i.e., make them less efficient).
Indeed, the tradeoff in this situation between bias and precision can be sub-
stantial. As you can see from this discussion, the tradeoff will depend on the
relative importance of the various regressors.

10
To bypass the tradeoff between bias and efficiency, one could choose to minimize the
mean square error (MSE), since it accounts for both bias and efficiency. On MSE, see the sta-
tistical appendix, App. A. See also exercise 13.6.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

CHAPTER THIRTEEN: ECONOMETRIC MODELING 513

To conclude this discussion, let us consider the special case where r2 3 = 0,


that is, X2 and X3 are uncorrelated. This will result in b3 2 being zero (why?).
Therefore, it can be seen from (13.3.3) that α̂2 is now unbiased.11 Also, it
seems from (13.3.4) and (13.3.5) that the variances of α̂2 and β̂2 are the
same. Is there no harm in dropping the variable X3 from the model even
though it may be relevant theoretically? The answer generally is no, for
in this case, as noted earlier, var (α̂2 ) estimated from (13.3.4) is still biased
and therefore our hypothesis-testing procedures are likely to remain sus-
pect.12 Besides, in most economic research X2 and X3 will be correlated,
thus creating the problems discussed previously. The point is clear: Once
a model is formulated on the basis of the relevant theory, one is ill-
advised to drop a variable from such a model.

Inclusion of an Irrelevant Variable (Overfitting a Model)


Now let us assume that

Yi = β1 + β2 X2i + ui (13.3.6)

is the truth, but we fit the following model:

Yi = α1 + α2 X2i + α3 X3i + vi (13.3.7)

and thus commit the specification error of including an unnecessary vari-


able in the model.
The consequences of this specification error are as follows:
1. The OLS estimators of the parameters of the “incorrect” model are all
unbiased and consistent, that is, E(α1) = β1, E(α̂2 ) = β2, and E(α̂3 ) = β3 = 0.
2. The error variance σ 2 is correctly estimated.
3. The usual confidence interval and hypothesis-testing procedures re-
main valid.
4. However, the estimated α’s will be generally inefficient, that is, their
variances will be generally larger than those of the β̂’s of the true model.
The proofs of some of these statements can be found in Appendix 13A,
Section 13A.2. The point of interest here is the relative inefficiency of the
α̂’s. This can be shown easily.
From the usual OLS formula we know that
σ2
var (β̂2 ) =  2 (13.3.8)
x2i

11
Note, though, α̂1 is still biased, which can be seen intuitively as follows: We know that
β̂1 = Ȳ − β̂2 X̄ 2 − β̂3 X̄3 , whereas α̂1 = Ȳ − α̂2 X̄2 , and even if α̂2 = β̂2 , the two intercept estima-
tors will not be the same.
12
For details, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar Publisher,
1994, pp. 371–372.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

514 PART TWO: RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL

and
σ2
var (α̂2 ) =  2
1 − r223
!
x2i (13.3.9)

Therefore,
var (α̂2 ) 1
= (13.3.10)
var (β̂2 ) 1 − r223

Since 0 ≤ r223 ≤ 1, it follows that var (α̂2 ) ≥ var (β̂2 ); that is, the variance of α̂2
is generally greater than the variance of β̂2 even though, on average, α̂2 = β2
[i.e., E(α̂2 ) = β2 ].
The implication of this finding is that the inclusion of the unnecessary
variable X3 makes the variance of α̂2 larger than necessary, thereby making
α̂2 less precise. This is also true of α̂1 .
Notice the asymmetry in the two types of specification biases we have
considered. If we exclude a relevant variable, the coefficients of the vari-
ables retained in the model are generally biased as well as inconsistent, the
error variance is incorrectly estimated, and the usual hypothesis-testing
procedures become invalid. On the other hand, including an irrelevant vari-
able in the model still gives us unbiased and consistent estimates of the co-
efficients in the true model, the error variance is correctly estimated, and
the conventional hypothesis-testing methods are still valid; the only penalty
we pay for the inclusion of the superfluous variable is that the estimated
variances of the coefficients are larger, and as a result our probability infer-
ences about the parameters are less precise. An unwanted conclusion here
would be that it is better to include irrelevant variables than to omit the rel-
evant ones. But this philosophy is not to be espoused because addition of
unnecessary variables will lead to loss in efficiency of the estimators and
may also lead to the problem of multicollinearity (why?), not to mention the
loss of degrees of freedom. Therefore,
In general, the best approach is to include only explanatory variables that, on
theoretical grounds, directly influence the dependent variable and that are not
accounted for by other included variables.13

13.4 TESTS OF SPECIFICATION ERRORS


Knowing the consequences of specification errors is one thing but finding
out whether one has committed such errors is quite another, for we do not
deliberately set out to commit such errors. Very often specification biases
arise inadvertently, perhaps from our inability to formulate the model as

13
Michael D. Intriligator, Econometric Models, Techniques and Applications, Prentice Hall,
Englewood Cliffs, N.J., 1978, p. 189. Recall the Occam’s razor principle.
Gujarati: Basic II. Relaxing the 13. Econometric Modeling: © The McGraw−Hill
Econometrics, Fourth Assumptions of the Model Specification and Companies, 2004
Edition Classical Model Diagnostic Testing

CHAPTER THIRTEEN: ECONOMETRIC MODELING 515

precisely as possible because the underlying theory is weak or because


we do not have the right kind of data to test the model. As Davidson notes,
“Because of the non-experimental nature of economics, we are never sure
how the observed data were generated. The test of any hypothesis in eco-
nomics always turns out to depend on additional assumptions necessary
to specify a reasonably parsimonious model, which may or may not be
justified.”14
The practical question then is not why specification errors are made, for
they generally are, but how to detect them. Once it is found that specifica-
tion errors have been made, the remedies often suggest themselves. If, for
example, it can be shown that a variable is inappropriately omitted from a
model, the obvious remedy is to include that variable in the analysis, as-
suming, of course, the data on that variable are available.
In this section we discuss some tests that one may use to detect specifica-
tion errors.

Detecting the Presence of Unnecessary Variables


(Overfitting a Model)
Suppose we develop a k-variable model to explain a phenomenon:

Yi = β1 + β2 X2i + · · · + βk Xki + ui (13.4.1)

However, we are not totally sure that, say, the variable Xk really belongs in
the model. One simple way to find this out is to test the significance of the
estimated βk with the usual t test: t = β̂k/se (β̂k). But suppose that we are not
sure whether, say, X3 and X4 legitimately belong in the model. This can be
easily ascertained by the F test discussed in Chapter 8. Thus, detecting the
presence of an irrelevant variable (or variables) is not a difficult task.
It is, however, very important to remember that in carrying out these tests
of significance we have a specific model in mind. We accept that model as
the maintained hypothesis or the “truth,” however tentative it may be.
Given that model, then, we can find out whether one or more regressors are
really relevant by the usual t and F tests. But note carefully that we should
not use the t and F tests to build a model iteratively, that is, we should not
say that initially Y is related to X2 only because β̂2 is statistically significant
and then expand the model to include X3 and decide to keep that variable in
the model if β̂3 turns out to be statistically significant, and so on. This strat-
egy of building a model is called the bottom-up approach (starting with
a smaller model and expanding it as one goes along) or by the somewhat
pejorative term, data mining (other names are regression fishing, data
grubbing, data snooping, and number crunching).

14
James Davidson, Econometric Theory, Blackwell Publishers, Oxford, U.K., 2000, p. 153.

You might also like