Econometrics ch4
Econometrics ch4
Econometrics ch4
By Domodar N. Gujarati
Prof. M. El-Sakka
Dept of Economics Kuwait University
(2.6.2)
(2.6.3)
ui = Yi Yi
= Yi 1 2Xi
(3.1.1)
Now given n pairs of observations on Y and X, we would like to determine the
SRF in such a manner that it is as close as possible to the actual Y. To this
end, we may adopt the following criterion:
Choose the SRF in such a way that the sum of the residuals ui = (Yi Yi) is
as small as possible.
(3.1.2)
The last step in (3.1.7) can be obtained directly from (3.1.4) by simple
algebraic manipulations. Incidentally, note that, by making use of simple
algebraic identities, formula (3.1.6) for estimating 2 can be alternatively
expressed as:
II. They are point estimators; that is, given the sample, each estimator will
provide only a single (point, not interval) value of the relevant population
parameter.
III. Once the OLS estimates are obtained from the sample data, the sample
regression line (Figure 3.1) can be easily obtained .
Summing both sides of this last equality over the sample values and
dividing through by the sample size n gives
Y = Y
(3.1.10)
(2.6.2)
can be expressed in an alternative form where both Y and X are expressed as
deviations from their mean values. To see this, sum (2.6.2) on both sides to
give:
Yi = n1 + 2Xi +ui
= n1 + 2Xi
since ui = 0
(3.1.11)
Dividing Eq. (3.1.11) through by n, we obtain
Y = 1 + 2X
(3.1.12)
which is the same as (3.1.7). Subtracting Eq. (3.1.12) from (2.6.2), we obtain
Yi Y = 2(Xi X ) + ui
Or
yi = 2xi +ui
(3.1.13)
This
Keep in mind that the regressand Y and the regressor X themselves may be
nonlinear.
look at Table 2.1. Keeping the value of income X fixed, say, at $80, we
draw at random a family and observe its weekly family consumption
expenditure Y as, say, $60. Still keeping X at $80, we draw at random
another family and observe its Y value as $75. In each of these
drawings (i.e., repeated sampling), the value of X is fixed at $80. We
can repeat this process for all the X values shown in Table 2.1.
This means that our regression analysis is conditional regression
analysis, that is, conditional on the given values of the regressor(s) X.
E(ui | Xi) = 0
Suppose in our PRF (Yt = 1 + 2Xt + ut) that ut and ut1 are positively
correlated. Then Yt depends not only on Xt but also on ut1 for ut1 to
some extent determines ut.
In the hypothetical example of Table 3.1, imagine that we had only the first
pair of observations on Y and X (4 and 1). From this single observation there
is no way to estimate the two unknowns, 1 and 2. We need at least two pairs
of observations to estimate the two unknowns
The least-squares estimates are a function of the sample data. But since the
data change from sample to sample, the estimates will change. Therefore,
what is needed is some measure of reliability or precision of the estimators
1 and 2. In statistics the precision of an estimate is measured by its
standard error (se), which can be obtained as follows:
where 2 is the OLS estimator of the true but unknown 2 and where the
expression n2 is known as the number of degrees of freedom (df),
is
the residual sum of squares (RSS). Once
is known, 2 can be easily
computed.
Compared with Eq. (3.1.2), Eq. (3.3.6) is easy to use, for it does not require
computing ui for each observation.
Since
an alternative expression for computing
is
Note the following features of the variances (and therefore the standard
errors) of 1 and 2.
3. Since 1 and 2 are estimators, they will not only vary from sample to
sample but in a given sample they are likely to be dependent on each other,
this dependence being measured by the covariance between them.
Since var (2) is always positive, as is the variance of any variable, the nature
of the covariance between 1 and 2 depends on the sign of X . If X is
positive, then as the formula shows, the covariance will be negative. Thus, if
the slope coefficient 2 is overestimated (i.e., the slope is too steep), the
intercept coefficient 1 will be underestimated (i.e., the intercept will be too
small).
What all this means can be explained with the aid of Figure 3.8. In Figure
3.8(a) we have shown the sampling distribution of the OLS estimator 2, that
is, the distribution of the values taken by 2 in repeated sampling experiment.
For convenience we have assumed 2 to be distributed symmetrically. As the
figure shows, the mean of the 2 values, E(2), is equal to the true 2. In this
situation we say that 2 is an unbiased estimator of 2. In Figure 3.8(b) we
have shown the sampling distribution of 2, an alternative estimator of 2
obtained by using another (i.e., other than OLS) method.
For convenience, assume that *2, like 2, is unbiased, that is, its average or
expected value is equal to 2. Assume further that both 2 and *2 are linear
estimators, that is, they are linear functions of Y. Which estimator, 2 or *2,
would you choose? To answer this question, superimpose the two figures, as
in Figure 3.8(c). It is obvious that although both 2 and *2 are unbiased
the distribution of *2 is more diffused or widespread around the mean
value than the distribution of 2. In other words, the variance of *2 is
larger than the variance of 2.
Now given two estimators that are both linear and unbiased, one would
choose the estimator with the smaller variance because it is more likely to
be close to 2 than the alternative estimator. In short, one would choose the
BLUE estimator.
We now consider the goodness of fit of the fitted regression line to a set of
data; that is, we shall find out how well the sample regression line fits the
data. The coefficient of determination r2 (two-variable case) or R2 (multiple
regression) is a summary measure that tells how well the sample regression
line fits the data.
Consider a heuristic explanation of r2 in terms of a graphical device, known
as the Venn diagram shown in Figure 3.9.
In this figure the circle Y represents variation in the dependent variable Y and
the circle X represents variation in the explanatory variable X. The overlap of
the two circles indicates the extent to which the variation in Y is explained
by the variation in X.
(3.5.1)
where use is made of (3.1.13) and (3.1.14). Squaring (3.5.1) on both sides
and summing over the sample, we obtain
(2.6.3)
Since
= 0 and yi = 2xi .
A NUMERICAL EXAMPLE