Module 4 Part 3
Module 4 Part 3
MODEL: HYPOTHESIS
TESTING
❑The important question that we raise is: How “good” is the estimated regression line given in Equation
(2.20)?
❑how can we tell that it really is a good estimator of the true population regression function (PRF)?
❑How can we be sure just on the basis of a single sample given in Table 2-2 that the estimated
regression function (i.e., the sample regression function [SRF]) is in fact a good approximation of the
true PRF?
❑Now we have assumed that the Xi values are known or given that our analysis is a conditional
regression analysis, conditional upon the given X’s. E(Y|X).
❑we treat the X values as nonstochastic
❑This means that unless we are willing to assume how the stochastic u terms are generated, we
will not be able to tell how good an SRF is as an estimate of the true PRF.
❑But in testing statistical hypotheses based on the SRF, we cannot make further progress, unless
we make some specific assumptions about how ui are generated.
❑This is precisely what the so-called classical linear regression model (CLRM) does.
THE CLASSICAL LINEAR REGRESSION
MODEL
The CLRM makes the following assumptions
❑The regression model is linear in the parameters; it may or may not be linear in the variables.
That is, the regression model is of the following type.
It represents all those factors that are not specifically introduced in the model.
What Assumption (3.1) states is that these other factors or forces are not related to Xi (the variable explicitly
introduced in the model) and therefore, given the value of Xi, their mean value is zero.
❑ The variance of each ui is constant, or homoscedastic (homo means equal and scedastic means variance). That is
❑ There is no correlation between two error terms. This is the assumption of no autocorrelation.
❑ Algebraically, this assumption can be written as
❑ Here cov stands for covariance and i and j are any two error terms.
• It does not mean that if one u is above the mean value, another error term u will also be above the mean value
(for positive correlation), or that if one error term is below the mean value, another error term has to be above
the mean value, or vice versa (negative correlation).
• In short, the assumption of no autocorrelation means the error terms ui are random.
• Since any two error terms are assumed to be uncorrelated, it means that any two Y values will also be
uncorrelated; that is, . This is because and given that the B’s are fixed numbers and that X
is assumed to be fixed, Y will vary as u varies.
• So, if the u’s are uncorrelated, the Y’s will be uncorrelated also.
❑ The regression model is correctly specified. Alternatively, there is no specification bias or specification error in the
model used in empirical analysis.
What this assumption implies is that we have included all the variables that affect a particular phenomenon
• Assumption 7
In the PRF the error term ui follows the normal distribution with mean zero and variance .
That is,
Where is an estimator of (recall we use ˆ to indicate an estimator) and is the residual sum
of squares (RSS), that is, , the sum of the squared difference between the actual Y and
the estimated Y.
which is known as the standard error of the regression (SER), which is simply the standard
deviation of the Y values about the estimated regression line
❑This standard error of regression is often used as a summary measure of the goodness of fit of
the estimated regression line.
❑the smaller the value of , the closer the actual Y value is to its estimated value from the
regression model.
Summary of the Math S.A.T. Score Function
Let us express the estimated S.A.T. score function in the following
form:
• the estimated slope coefficient of the math S.A.T. score function (i.e., the coefficient of the annual family
income variable) is 0.0013 and its standard deviation, or standard error, is 0.000245. This is a measure of
variability of b2 from sample to sample
• What use can we make of this finding? Can we say, for example, that our computed b2 lies within a
certain number of standard deviation units from the true B2?
• If we can do that, we can state with some confidence (i.e., probability) how good the computed SRF, is as
an estimate of the true PRF.