Econometrics - Chapter - Chapter - II
Econometrics - Chapter - Chapter - II
Where y is termed as the dependent or study variable and X is termed as the independent or
explanatory variable. The terms 0 and 1X are the parameters of the model. The parameter
is termed as an intercept term, and the parameter 1X is termed as the slope parameter.
These parameters are usually called as regression coefficients. The unobservable error
component U accounts for the failure of data to lie on the straight line and represents the
difference between the true and observed realization of y. U surrogates for all those variables
that are omitted from the model but that collectively affect Y. The obvious question is: Why
not introduce these variables into the model explicitly? Stated otherwise, why not develop a
multiple regression model with as many variables as possible? The reasons are many.
A) Vagueness of theory: The theory, if any, determining the behavior of Y may be, and often
is, incomplete. We might know for certain that weekly income X influences weekly
consumption expenditure Y, but we might be ignorant or unsure about the other
variables affecting Y. Therefore, ui may be used as a substitute for all the excluded or
omitted variables from the model.
B) Unavailability of data: Even if we know what some of the excluded variables are and
therefore consider a multiple regression rather than a simple regression, we may not have
quantitative information about these variables. It is a common experience in empirical
analysis that the data we would ideally like to have often are not available. For example,
in principle we could introduce family wealth as an explanatory variable in addition to the
income variable to explain family consumption expenditure. But un fortunately,
information on family wealth generally is not available. Therefore, we may be forced to
omit the wealth variable from our model despite its great theoretical relevance in
explaining consumption expenditure.
C) Core variables versus peripheral variables: Assume in our consumption- income example
that besides income X1, the number of children per family X2, sex X3, religion X4,
education X5, and geographical region X6 also affect consumption expenditure. But it is
quite possible that the joint influence of all or some of these variables may be so small
and at best nonsystematic or random that as a practical matter and for cost
considerations it does not pay to introduce them into the model explicitly. One hopes
that their combined effect can be treated as a random variable ui
D) Intrinsic randomness in human behavior: Even if we succeed in introducing all the relevant
variables into the model, there is bound to be some “intrinsic” randomness in individual
Y’s that cannot be explained no matter how hard we try. Humans are not machines that will
do as instructed. So there is unpredictable element. Example: due to unexplained case, an
increase in income may not influence consumption. Thus the disturbance term captures such
human behavior that is left unexplained by the economic model. The disturbances, the u’s, may
very well reflect this intrinsic randomness
E) Poor proxy variables: Although the classical regression model (to be developed in Chapter
3) assumes that the variables Y and X are measured accurately, in practice the data may
be plagued by errors of measurement. Variable being explained cannot be measures
accurately, either because of data collection difficulties or because it is inherently un measurable
and a proxy variables must be used instead. The disturbance term can in these circumstances be
thought of as representing this measurement error [(of the variable(s)]
Example: measuring taste is not an easy job
F) Principle of parsimony: Following Occam’s razor, we would like to keep our regression
model as simple as possible. If we can explain the behavior of Y “substantially” with
two or three explanatory variables and if our theory is not strong enough to suggest
what other variables might be included, why introduce more variables? Let ui
represent all other variables. Of course, we should not exclude relevant and
important variables just to keep the regression model simple
G) Wrong functional form: Even if we have theoretically correct variables explaining a
phenomenon and even if we can obtain data on these variables, very often we do not
know the form of the functional relation- ship between the regressand and the
regressors. Is consumption expendi- ture a linear (invariable) function of income or a
nonlinear (invariable) function? If it is the former, Yi = β1 + B2 Xi + ui is the proper
functional re- lationship between Y and X, but if it is the latter, Yi = β1 + β2 Xi + β3 X2 +
ui may be the correct functional form. In two-variable models the functional form of the
relationship can often be judged from the scatter gram. But in a multiple regression
model, it is not easy to determine the appropriate functional form, for graphically we
cannot visualize scatter grams in multiple dimensions. For all these reasons, the stochastic
disturbances ui assume an extremely critical role in regression analysis, which we will
see as we progress.
Generally speaking simple linear regression analysis is concerned with the study of the
dependency of one dependent variable on one variable called the explanatory variable(s) or
the independent variable. Moreover, the true relationship that connects the variables
involved is split in to two. They are systematic (or explained variation and random or
(unexplained) variation. Using (2.2) we can disaggregate the two components as follows
Y = 0 + 1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
regard it is essential to know what the term linear really means, for it can be interpreted in
two different ways. These are,
ii) Linearity in the parameter: this implies that the parameters (i.e., ) are raised to their first
degree. In this interpretation Y = 0 + 1X2 is a linear regression model but Y = 0 + 21X is
not. The latter is an example of a nonlinear (in the parameters) regression model of the
two interpretation of linearity, linearity in the parameters is relevant for the development
of the regression theory. Thus the term linear regression means a regression that is linear
in the parameters, the’s; it may or may not be linear in the explanatory variables.
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the
basis of the known or fixed values of the explanatory variable(s).
The data in the above table refer to a total population of 60 families in a hypothetical
community and their weekly income (X) and weekly consumption expenditure (Y), in
dollar. The 60 families are divided into 10 income groups (from $80 to $260) and the
weekly expenditures of each family in the various groups are as shown in the table.
Therefore, we have 10 fixed X values and the corresponding Y values against each of
the X values; so to speak, there are 10 Y subpopulations. There is considerable
variation in weekly consumption expenditure in each income group, which can be
seen clearly from Figure 2.1. But the general picture that one gets is that, despite the
variability of weekly consumption expenditure within each income bracket, on the
average, weekly consumption expenditure increases as income increases. To see this
clearly, in Table 2.1 we have given the mean, or average, weekly consumption
expenditure corresponding to each of the 10 levels of income. Thus, corresponding to
the weekly income level of $80, the mean consumption expenditure is $65, while
corresponding to the income level of $200, it is $137. In all we have 10 mean values
for the 10 subpopulations of Y. We call these mean values conditional expected
values, as they depend on the given values of the (conditioning) variable X.
Symbolically, we denote them as E(Y | X), which is read as the expected value of Y
given the value of X (see also Table 2.2).
The dark circled points in Figure 2.1 show the conditional mean values of Y
against the various X values. If we join these conditional mean values, we obtain
what is known as the population regression line (PRL), or more generally, the
200
150
100
$
50
80 100 120 140 160 180 200 220 240 260
Weekly income, $
FIGURE 2.1 Conditional distribution of expenditure for various levels of income (data of Table 2.1).
Y
Weekly consumption expenditure,
Conditional
Weekly consumption expenditure,
mean
E(Y | Xi)
$
149
Distribution of
Y given X =
65 $220
$
X
80 140 220
FIGURE 2.2 Population regression line (data of Table 2.1
This figure shows that for each X (i.e., income level) there is a population of Y values
(weekly consumption expenditures) that are spread around the (conditional) mean of
those Y values. For simplicity, we are assuming that these Y values are distributed
symmetrically around their respective (conditional) mean values. And the regression
line (or curve) passes through these (conditional) mean values. With this background,
the reader may find it instructive to reread the definition of regression given in
Section 1.2.
2.2 THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF)
From the preceding discussion and Figures. 2.1 and 2.2, it is clear that each
conditional mean E(Y | Xi ) is a function of Xi, where Xi is a given value of X.
Symbolically,
E(Y | Xi) = f (Xi )---------------------------------------------------------------------(2.2.1)
where f (Xi ) denotes some function of the explanatory variable X. In our example,
E(Y | Xi ) is a linear function of Xi. Equation (2.2.1) is known as the conditional
expectation function (CEF) or population regression function (PRF) or population
regression (PR) for short. It states merely that the expected value of the distribution
of Y given Xi is functionally related to Xi. In simple terms, it tells how the mean or
average response of Y varies with X. What form does the function f (Xi) assume?
This is an important question because in real situations we do not have the entire
population avail- able for examination. The functional form of the PRF is therefore an
empirical question, although in specific cases theory may have something to say. For
example, an economist might posit that consumption expenditure is linearly related to
income. Therefore, as a first approximation or a working hypothesis, we may assume
that the PRF E(Y | Xi ) is a linear function of Xi, say, of the type
E(Y | Xi ) = β1 + β2 Xi---------------------------------------------------------------- (2.2.2)
where β1 and β2 are unknown but fixed parameters known as the regression
coefficients; β1 and β2 are also known as intercept and slope coefficients,
respectively. Equation (2.2.1) itself is known as the linear population regression
function. Some alternative expressions used in the literature are linear population
regression model or simply linear population regression. In the sequel, the terms
regression, regression equation, and regression model will be used
synonymously. In regression analysis our interest is in estimating the PRFs like
(2.2.2), that is, estimating the values of the unknowns β1 and β2 on the basis of
observations on Y and X. This topic will be studied in detail in Chapter 3.
2.2 THE MEANING OF THE TERM LINEAR
Since this text is concerned primarily with linear models like (2.2.2), it is essential
to know what the term linear really means, for it can be interpreted in two different
ways.
Linearity in the Variables
The first and perhaps more “natural” meaning of linearity is that the conditional
Yes No
No NLRM NLRM
Yi = E(Y | Xi ) + ui-----------------------
(2.4.1)
where the deviation ui is an unobservable random variable taking positive or
negative values. Technically, ui is known as the stochastic disturbance or
stochastic error term. How do we interpret (2.4.1)? We can say that the
expenditure of an individual family, given its income level, can be expressed as
the sum of two components: (1) E(Y | Xi ), which is simply the mean consumption
expenditure of all the families with the same level of income. This component is
known as the systematic, or deterministic, component, and (2) ui , which is the
random, or nonsystematic, component. We shall examine shortly the nature of the
stochastic disturbance term, but for the moment assume that it is a surrogate or
proxy for all the omitted or neglected variables that may affect Y but are not (or
cannot be) included in the regression model. If E(Y | Xi ) is assumed to be linear in
Xi , as in (2.2.2), Eq. (2.4.1) may be written as
Yi = E(Y | Xi ) + ui
= β1 + β2 Xi + ui-----------------------------------------------------------------------------
(2.4.2)
Y2 = 60 = β1 + β2 (80) + u2
Y3 = 65 = β1 + β2 (80) + u3 (2.4.3)
Y4 = 70 = β1 + β2(80) + u4
Y5 = 75 = β1 + β2(80) + u5
Thus, the assumption that the regression line passes through the conditional means
of Y (see Figure 2.2) implies that the conditional mean values of ui (conditional upon
the given X’s) are zero. From the previous discussion, it is clear (2.2.2) and (2.4.2) are
equivalent forms if E(ui | Xi ) = 0.9 But the stochastic specification (2.4.2) has the
advantage that it clearly shows that there are other variables besides income that affect
consumption expenditure and that an individual family’s consumption expenditure
cannot be fully explained only by the variable(s) included in the regression model.
To see this, suppose we draw another random sample from the population of Table
2.1, as presented in Table 2.5. Plotting the data of Tables 2.4 and 2.5, we obtain the
scatter gram given in Figure 2.4. In the scatter gram two sample regression lines are
drawn so as to “fit” the scatters reasonably well: SRF1 is based on the first sample,
and SRF2 is based on the second sample. Which of the two regression lines rep-
resents the “true” population regression line? If we avoid the temptation of looking at
Figure 2.1, which purportedly represents the PR, there is no way we can be
absolutely sure that either of the regression lines shown in Figure 2.4 represents the
true population regression line (or curve). The regression lines in Figure 2.4 are known
as the sample regression lines. Supposedly they represent the population regression
line, but because of sampling fluctuations they are at best an approximation of the
true PR. In general, we would get N different SRFs for N different samples, and these
SRFs are not likely to be the same.
TABLE 2.4
Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260
200
SRF2
First sample (Table 2.4) Regression based
Second sample (Table on SRF1
2000 2.5)
Weekly consumption expenditure,
Regression based
on
150
100
$
50
80 100
120
140
160
180
200
220
240
260
Weekly income, $
Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or
method that tells how to estimate the population parameter from the information provided by the
sample at hand. A particular numerical value obtained by the estimator in an application is
known as an estimate. Now just as we expressed the PRF in two equivalent forms, (2.2.2) and
(2.4.2), we can express the SRF (2.6.1) in its stochastic form as follows:
Yi = βˆ1 + βˆ2 Xi + uˆ i----------------------------------------------------------------(2.6.2)
where, in addition to the symbols already defined, uˆ i denotes the (sample) residual term.
Conceptually uˆ i is analogous to ui and can be regarded as an estimate of ui . It is introduced
in the SRF for the same reasons as ui was introduced in the PRF. To sum up, then, we find our
primary objective in regression analysis is to estimate the PRF
Yi = β1 + β2 Xi + ui ---------------------------------------------------------------------------------(2.4.2)
on the basis of the SRF
ˆ ˆ
----------------------------------------------------------------------------------(2.6.2)
Yi = β 1 + β xi =
uˆ i
because more often than not our analysis is based upon a single sample from some population.
But because of sampling fluctuations our estimate of the PRF based on the SRF is at best an
approximate one. This approximation is shown diagrammatically in Figure 2.5
Y
SRF: Yi = 1 + 2 Xi
Yi
Y
i
ui
Yi
Yi
E(Y | Xi)
$
Weekly consumption
E(Y | Xi)
For X = Xi , we have one (sample) observation Y = Yi . In terms of the
SRF, the observed Yi can be expressed as
Yi = Yˆi + uˆ i-------------------------------------------------------------------------------
(2.6.3)
and in terms of the PRF, it can be expressed as
Yi = E(Y | Xi ) + ui--------------------------------------------------------------------------
(2.6.4)
Now obviously in Figure 2.5 Yˆi overestimates the true E(Y | Xi ) for the
Xi shown therein. By the same token, for any Xi to the left of the point
A, the SRF will underestimate the true PRF. But the reader can readily
see that such over- and underestimation is inevitable because of
sampling fluctuations. The critical question now is: Granted that the
SRF is but an approximation of the PRF, can we devise a rule or a
method that will make this ap proximation as “close” as possible? In
other words, how should the SRF be constructed so that βˆ1 is as
“close” as possible to the true β1 and βˆ2 is as “close” as possible to
the true β2 even though we will never know the true β1 and β2?
.