Multiple Regression
Multiple Regression
2. Fixed values or values independent of the error term. Here, this means we
require zero covariance between and each variables.
( ) ( )
( | ) for each
( )
( )
( | )
Put differently, measures the ―direct‖ or the ―net‖ effect of a unit change in
on the mean value of ,.
Meaning of ………….
OLS Estimation of the Partial Regression Coefficients
To find the OLS estimators, let us first write the sample regression function (SRF)
corresponding to the PRF as follows:
̂ ̂ ̂ ̂
Where ̂ is the residual term, the sample counterpart of the stochastic disturbance
term .
Here, ̅, ̅ , ………………….
Or, equivalently,
Where is the sample coefficient of correlation between and
Or, equivalently,
∑̂
̂
The Multiple Coefficient of Determination and the Multiple Coefficient of
Correlation R
In the two-variable case we saw that measures the goodness of fit of the
regression equation; that is, it gives the proportion or percentage of the total
variation in the dependent variable explained by the (single) explanatory variable
. This notation of can be easily extended to regression models containing more
than two variables.
Thus, in the three-variable model we would like to know the proportion of the
variation in explained by the variables and jointly. The quantity that gives
this information is known as the multiple coefficient of determination and is
denoted by ; conceptually it is akin to .
Recall that in the two-variable case we defined the quantity as the coefficient of
correlation and indicated that it measures the degree of (linear) association between
two variables. The three-or-more-variable analogue of is the coefficient of
multiple correlation, denoted by , and it is a measure of the degree of association
between and all the explanatory variables jointly. Although can be positive or
negative, R is always taken to be positive. In practice, however, is of little
importance. The more meaningful quantity is .
An Illustrative Example
Consider the behavior of Child Mortality (CM) in Relation to per Capita GNP
(PGNP) and Female Literacy Rate (FLR). CM is the number of deaths of children
under five per 1000 live births, PGNP is per capita GNP in 1980, and FLR is
measured in percent. We need to estimate the (partial) regression coefficients of
each regressor and our model is:
From 64 sample countries using the EViews statistical package, we obtained the
following results:
( ) ( ) ( )
The intercept value of about 263, means that if the values of PGNP and FLR were
fixed at zero, the mean child mortality rate would be about 263 deaths per thousand
live births. All one could infer is that if the two regressors were fixed at zero, child
mortality will be quite high, which makes practical sense.
Impact on the Dependent Variable of a Unit Change in More than One Regressor
Before proceeding further, suppose we want to find out what would happen to the
child mortality rate if we were to increase PGNP and FLR simultaneously.
Suppose per capita GNP were to increase by a dollar and at the same time the
female literacy rate were to go up by one percentage point. What would be the
impact of this simultaneous change on the child mortality rate? To find out, all we
have to do is multiply the coefficients of PGNP and FLR by the proposed changes
and add the resulting terms. In our example this gives us:
( ) ( )
That is, as a result of this simultaneous change in PGNP and FLR, the number of
deaths of children under age 5 would go down by about 2.24 deaths.
∑̂
Therefore, ∑
will increase as the number of variables increases. In
view of this, in comparing two regression models with the same dependent
variable but differing number of X variables, one should be very wary of choosing
the model with the highest .
Where,
the number of parameters in the model including the intercept term. In the
three-variable regression, .
The thus defined is known as the adjusted , denoted by ̅ . The term
adjusted means adjusted for the df associated with the sums of squares entering
into the equation:
∑̂ ⁄
̅ ……………….(2)
∑ ⁄
It is easy to see that and ̅ are related because, substituting Eq. (1) into Eq.
(2), we obtain
Sometimes researchers play the game of maximizing ̅ , that is, choosing the
model that gives the highest ̅ . But this may be dangerous, for in regression
analysis our objective is not to obtain a high ̅ per se but rather to obtain
dependable estimates of the true population regression coefficients and draw
statistical inferences about them. In empirical analysis it is not unusual to obtain a
very high ̅ but find that some of the regression coefficients either are statistically
insignificant or have signs that are contrary to a priori expectations. Therefore, the
researcher should be more concerned about the logical or theoretical relevance of
the explanatory variables to the dependent variable and their statistical
significance. If in this process we obtain a high ̅ , well and good; on the other
hand, if ̅ is low, it does not mean the model is necessarily bad.
Output
Labor input
Capital input
Stochastic disturbance term
Base of natural logarithm
From Equation it is clear that the relationship between output and the two inputs is
nonlinear. However, if we log-transform this model, we obtain:
Where
Thus written, the model is linear in the parameters , , and and is therefore
a linear regression model. Notice, though, it is nonlinear in the variables and
but linear in the logs of these variables. In short, this is a log-log, double-log, or
log–linear model.
The properties of the Cobb–Douglas production function are quite well known:
1. is the (partial) elasticity of output with respect to the labor input, that is, it
measures the percentage change in output for, say, a 1 percent change in the labor
input, holding the capital input constant
2. Likewise, is the (partial) elasticity of output with respect to the capital input,
holding the labor input constant.
3. The sum ( + ) gives information about the returns to scale, that is, the
response of output to a proportionate change in the inputs. If this sum is 1, then
there are constant returns to scale, that is, doubling the inputs will double the
output, tripling the inputs will triple the output, and so on. If the sum is less than 1,
there are decreasing returns to scale— doubling the inputs will less than double the
output. Finally, if the sum is greater than 1, there are increasing returns to scale—
doubling the inputs will more than double the output.
Multiple Regression Analysis: The Problem of Inference
This chapter extends the ideas of interval estimation and hypothesis testing
involving three or more variables.
The Normality Assumption Once Again
As per previous discussion if our objective is estimation as well as inference, then,
we need to assume that the follow the normal distribution with zero mean
and constant variance .
With the normality assumption we find that the OLS estimators of the partial
regression coefficients are best linear unbiased estimators (BLUE). Moreover, the
estimators ̂ , ̂ , and ̂ are themselves normally distributed with means equal to
true , , and and the variances.
And the distribution can be used to establish confidence intervals as well as test
statistical hypotheses about the true population partial regression coefficients as
follows:
with df.
Note that the df are now because in computing ∑ ̂ and hence ̂ we first
need to estimate the three partial regression coefficients, which therefore put three
restrictions on the residual sum of squares (RSS) (following this logic in the four-
variable case there will be df,
Hypothesis Testing about Individual Regression Coefficients
We can use the test to test a hypothesis about any individual partial regression
coefficient. To illustrate the mechanics, consider the following child mortality
regression:
( ) ( ) ( )
̅
Let us postulate that
and
The null hypothesis states that, with (female literacy rate) held constant,
(PGNP) has no (linear) influence on (child mortality). To test the null
hypothesis, we use the test and if the computed value exceeds the critical
value at the chosen level of significance, we may reject the null hypothesis;
otherwise, we may not reject it. We obtain:
( )
Since the computed value of (in absolute terms) exceeds the critical
value of , we can reject the null hypothesis that PGNP has no effect on child
mortality. To put it more positively, with the female literacy rate held constant, per
capita GNP has a significant (negative) effect on child mortality, as one would
expect a priori. Graphically, the situation is as shown in Figure:
and
Throughout the previous section we were concerned with testing the significance
of the estimated partial regression coefficients individually, that is, under the
separate hypothesis that each true population partial regression coefficient was
zero. But now consider the following hypothesis:
This null hypothesis is a joint hypothesis that and are jointly or
simultaneously equal to zero. A test of such a hypothesis is called a test of the
overall significance of the observed or estimated regression line, that is, whether
is linearly related to both and .
This joint hypothesis can be tested by the analysis of variance (ANOVA) technique
which can be demonstrated as follows.:
Under the assumption of normal distribution for and the null hypothesis
, the variable
TSS has, as usual, df and RSS has df for reasons already discussed.
ESS has df since it is a function of ̂ and ̂ . Therefore, following the ANOVA
procedure discussed in Table:
If the value computed from Equation exceeds the critical value from the
table at the percent level of significance, we reject ; otherwise we do not
reject it. Alternatively, if the p value of the observed F is sufficiently low, we can
reject .
Turning to our illustrative example, we obtain the ANOVA table, as shown in
Table.
( )
is distributed as the distribution with and df.
More generally, in the variable case (including intercept), if we assume that the
disturbances are normally distributed and that the null hypothesis is
( )
………(1)
( )
follows the distribution with and df. (Note: The total number of
parameters to be estimated is , of which is the intercept term.)
Above Equation shows how F and are related. These two vary directly. When
= 0, F is zero ipso facto. The larger the , the greater the F value. In the limit,
when , F is infinite. Thus the test, which is a measure of the overall
significance of the estimated regression, is also a test of significance of . In
other words, testing the null hypothesis in Eq. (2) is equivalent to testing the null
hypothesis that (the population) is zero.
By virtue of the close connection between and , the ANOVA Table can be
recast as:
Which is about the same as obtained before, except for the rounding errors