Two-Variable Regression Model, The Problem of Estimation
Two-Variable Regression Model, The Problem of Estimation
• II. They are point estimators; that is, given the sample, each estimator will
provide only a single (point, not interval) value of the relevant population
parameter.
• III. Once the OLS estimates are obtained from the sample data, the sample
regression line (Figure 3.1) can be easily obtained.
• In regression analysis our objective is not only to obtain βˆ1 and βˆ2
but also to draw inferences about the true β1 and β2. For example, we
would like to know how close βˆ1 and βˆ2 are to their counterparts in the
population or how close Yˆi is to the true E(Y | Xi).
• The least-squares estimates are a function of the sample data. But since the
data change from sample to sample, the estimates will change. Therefore,
what is needed is some measure of “reliability” or precision of the
estimators βˆ1 and βˆ2. In statistics the precision of an estimate is measured by
its standard error (se), which can be obtained as follows:
• σ2 is the constant or homoscedastic variance of ui of Assumption 4.
• σ2 itself is estimated by the following formula:
• where ˆσ2 is the OLS estimator of the true but unknown σ2 and where the
expression n−2 is known as the number of degrees of freedom (df), is
the residual sum of squares (RSS). Once is known, ˆσ2 can be easily
computed.
• Compared with Eq. (3.1.2), Eq. (3.3.6) is easy to use, for it does not require
computing ˆui for each observation.
• Since
• Since var (βˆ2) is always positive, as is the variance of any variable, the nature
of the covariance between βˆ1 and βˆ2 depends on the sign of X¯ . If X¯ is
positive, then as the formula shows, the covariance will be negative. Thus, if
the slope coefficient β2 is overestimated (i.e., the slope is too steep), the
intercept coefficient β1 will be underestimated (i.e., the intercept will be too
small).
PROPERTIES OF LEAST-SQUARES ESTIMATORS:
THE GAUSS–MARKOV THEOREM
• We now consider the goodness of fit of the fitted regression line to a set of
data; that is, we shall find out how “well” the sample regression line fits the
data. The coefficient of determination r2 (two-variable case) or R2 (multiple
regression) is a summary measure that tells how well the sample regression
line fits the data.
• Consider a heuristic explanation of r2 in terms of a graphical device, known
as the Venn diagram shown in Figure 3.9.
• In this figure the circle Y represents variation in the dependent variable Y and
the circle X represents variation in the explanatory variable X. The overlap of
the two circles indicates the extent to which the variation in Y is explained
by the variation in X.
• To compute this r2, we proceed as follows: Recall that
• Yi = Yˆi +uˆi (2.6.3)
• or in the deviation form
• yi = ˆyi + ˆui (3.5.1)
• where use is made of (3.1.13) and (3.1.14). Squaring (3.5.1) on both sides
and summing over the sample, we obtain
•
NEXT