0% found this document useful (0 votes)
3 views20 pages

Multiple Regression

Uploaded by

ahad.riyaz01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Multiple Regression

Uploaded by

ahad.riyaz01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Multiple Regression Analysis

The simplest possible multiple regression model is the three-variable regression,


with one dependent variable and two explanatory variables. Generalizing the two-
variable PRF, we may write the three-variable PRF as

The coefficients and are called the partial regression coefficients.

Within the framework of the CLRM, we assume the following:

1. Linear regression model, or linear in the parameters.

2. Fixed values or values independent of the error term. Here, this means we
require zero covariance between and each variables.

( ) ( )

3. Zero mean value of disturbance

( | ) for each

4. Homoscedasticity or constant variance of

( )

5. No autocorrelation, or serial correlation, between the disturbances.

( )

6. The number of observations must be greater than the number of parameters to


be estimated.

7. There must be variation in the values of the variables.

8. No exact collinearity between the variables.

9. There is no specification bias.


Interpretation of Multiple Regression Equation
We have the three-variable PRF as

Taking the conditional expectation of on both sides, we obtain

( | )

As in the two-variable case, multiple regression analysis is regression analysis


conditional upon the fixed values of the regressors, and what we obtain is the
average or mean value of or the mean response of for the given values of the
regressors.

The Meaning of Partial Regression Coefficients

The regression coefficients and are known as partial regression or partial


slope coefficients. The meaning of partial regression coefficient is as follows:

measures the change in the mean value of , ( ), per unit change in ,


holding the value of constant.

Put differently, measures the ―direct‖ or the ―net‖ effect of a unit change in
on the mean value of ,.

Meaning of ………….
OLS Estimation of the Partial Regression Coefficients

We have the three-variable PRF as

To find the OLS estimators, let us first write the sample regression function (SRF)
corresponding to the PRF as follows:

̂ ̂ ̂ ̂

Where ̂ is the residual term, the sample counterpart of the stochastic disturbance
term .

Here, ̅, ̅ , ………………….

Or, equivalently,
Where is the sample coefficient of correlation between and

Or, equivalently,

In all these formulas is the (homoscedastic) variance of the population


disturbances and ̂ is the unbiased estimator of which is,

∑̂
̂
The Multiple Coefficient of Determination and the Multiple Coefficient of
Correlation R

In the two-variable case we saw that measures the goodness of fit of the
regression equation; that is, it gives the proportion or percentage of the total
variation in the dependent variable explained by the (single) explanatory variable
. This notation of can be easily extended to regression models containing more
than two variables.
Thus, in the three-variable model we would like to know the proportion of the
variation in explained by the variables and jointly. The quantity that gives
this information is known as the multiple coefficient of determination and is
denoted by ; conceptually it is akin to .

Recall that in the two-variable case we defined the quantity as the coefficient of
correlation and indicated that it measures the degree of (linear) association between
two variables. The three-or-more-variable analogue of is the coefficient of
multiple correlation, denoted by , and it is a measure of the degree of association
between and all the explanatory variables jointly. Although can be positive or
negative, R is always taken to be positive. In practice, however, is of little
importance. The more meaningful quantity is .

An Illustrative Example
Consider the behavior of Child Mortality (CM) in Relation to per Capita GNP
(PGNP) and Female Literacy Rate (FLR). CM is the number of deaths of children
under five per 1000 live births, PGNP is per capita GNP in 1980, and FLR is
measured in percent. We need to estimate the (partial) regression coefficients of
each regressor and our model is:

From 64 sample countries using the EViews statistical package, we obtained the
following results:

( ) ( ) ( )

Let us now interpret regression coefficients:


̂ is the partial regression coefficient of and tells us that with
the influence of held constant, as increases, say, by a dollar, on
average, child mortality goes down by units.
To make it more economically interpretable, if the per capita GNP goes up by a
thousand dollars, on average, the number of deaths of children under age goes
down by about per thousand live births.
̂ tells us that holding the influence of PGNP constant, on average,
the number of deaths of children under age 5 goes down by about 2.23 per
thousand live births as the female literacy rate increases by one percentage point.

The intercept value of about 263, means that if the values of PGNP and FLR were
fixed at zero, the mean child mortality rate would be about 263 deaths per thousand
live births. All one could infer is that if the two regressors were fixed at zero, child
mortality will be quite high, which makes practical sense.

means that about percent of the variation in child mortality is


explained by PGNP and FLR, a fairly high value considering that the maximum
value of can at most be 1.

Impact on the Dependent Variable of a Unit Change in More than One Regressor

Before proceeding further, suppose we want to find out what would happen to the
child mortality rate if we were to increase PGNP and FLR simultaneously.
Suppose per capita GNP were to increase by a dollar and at the same time the
female literacy rate were to go up by one percentage point. What would be the
impact of this simultaneous change on the child mortality rate? To find out, all we
have to do is multiply the coefficients of PGNP and FLR by the proposed changes
and add the resulting terms. In our example this gives us:
( ) ( )
That is, as a result of this simultaneous change in PGNP and FLR, the number of
deaths of children under age 5 would go down by about 2.24 deaths.

and the Adjusted


An important property of is that it is a non-decreasing function of the number
of explanatory variables or regressors present in the model, as the number of
regressors increases, almost invariably increases and never decreases. Stated
differently, an additional variable will not decrease . To see this, recall the
definition of the coefficient of determination:

Now ∑ is independent of the number of variables in the model because it is


simply ∑( ̅ ) . The RSS, ∑ ̂ however, depends on the number of regressors
present in the model. Intuitively, it is clear that as the number of variables
increases, ∑ ̂ is likely to decrease (at least it will not increase);
Since
∑̂ ∑( )

∑̂
Therefore, ∑
will increase as the number of variables increases. In

view of this, in comparing two regression models with the same dependent
variable but differing number of X variables, one should be very wary of choosing
the model with the highest .

Now we can consider an alternative coefficient of determination, which is as


follows:

Where,
the number of parameters in the model including the intercept term. In the
three-variable regression, .
The thus defined is known as the adjusted , denoted by ̅ . The term
adjusted means adjusted for the df associated with the sums of squares entering
into the equation:

∑ ̂ has df in a model involving parameters, which include the intercept


term, and ∑ has df.
For the three-variable case, we know that ∑ ̂ has df.
Equation can also be written as:

Where ̂ is the residual variance, an unbiased estimator of true , and is the


sample variance of Y.

Now rewrite the equations


∑̂

…………………….(1)

∑̂ ⁄
̅ ……………….(2)
∑ ⁄

It is easy to see that and ̅ are related because, substituting Eq. (1) into Eq.
(2), we obtain

It is immediately apparent that:


(1) For , ̅ which implies that as the number of variables
increases, the adjusted increases less than the unadjusted ̅ ; and
(2) ̅ can be negative, although is necessarily nonnegative. In case ̅ turns
out to be negative in an application, its value is taken as zero.
Two special cases:
(i) If ,̅ = = 1.
(ii) If , ̅ =( ) ( ) in which case ̅ can be negative if
.
Which should one use in practice?

. . it is good practice to use ̅ rather than because tends to give an overly


optimistic picture of the fit of the regression, particularly when the number of
explanatory variables is not very small compared with the number of observations.
Henri Theil, Introduction to Econometrics, Prentice Hall, Englewood Cliffs

The “Game’’ of Maximizing ̅

Sometimes researchers play the game of maximizing ̅ , that is, choosing the
model that gives the highest ̅ . But this may be dangerous, for in regression
analysis our objective is not to obtain a high ̅ per se but rather to obtain
dependable estimates of the true population regression coefficients and draw
statistical inferences about them. In empirical analysis it is not unusual to obtain a
very high ̅ but find that some of the regression coefficients either are statistically
insignificant or have signs that are contrary to a priori expectations. Therefore, the
researcher should be more concerned about the logical or theoretical relevance of
the explanatory variables to the dependent variable and their statistical
significance. If in this process we obtain a high ̅ , well and good; on the other
hand, if ̅ is low, it does not mean the model is necessarily bad.

The Cobb–Douglas Production Function: Interpretation


The Cobb–Douglas production function, in its stochastic form, may be expressed
as,
Where,

Output
Labor input
Capital input
Stochastic disturbance term
Base of natural logarithm

From Equation it is clear that the relationship between output and the two inputs is
nonlinear. However, if we log-transform this model, we obtain:

Where
Thus written, the model is linear in the parameters , , and and is therefore
a linear regression model. Notice, though, it is nonlinear in the variables and
but linear in the logs of these variables. In short, this is a log-log, double-log, or
log–linear model.

The properties of the Cobb–Douglas production function are quite well known:

1. is the (partial) elasticity of output with respect to the labor input, that is, it
measures the percentage change in output for, say, a 1 percent change in the labor
input, holding the capital input constant
2. Likewise, is the (partial) elasticity of output with respect to the capital input,
holding the labor input constant.
3. The sum ( + ) gives information about the returns to scale, that is, the
response of output to a proportionate change in the inputs. If this sum is 1, then
there are constant returns to scale, that is, doubling the inputs will double the
output, tripling the inputs will triple the output, and so on. If the sum is less than 1,
there are decreasing returns to scale— doubling the inputs will less than double the
output. Finally, if the sum is greater than 1, there are increasing returns to scale—
doubling the inputs will more than double the output.
Multiple Regression Analysis: The Problem of Inference
This chapter extends the ideas of interval estimation and hypothesis testing
involving three or more variables.
The Normality Assumption Once Again
As per previous discussion if our objective is estimation as well as inference, then,
we need to assume that the follow the normal distribution with zero mean
and constant variance .
With the normality assumption we find that the OLS estimators of the partial
regression coefficients are best linear unbiased estimators (BLUE). Moreover, the
estimators ̂ , ̂ , and ̂ are themselves normally distributed with means equal to
true , , and and the variances.
And the distribution can be used to establish confidence intervals as well as test
statistical hypotheses about the true population partial regression coefficients as
follows:

with df.

Note that the df are now because in computing ∑ ̂ and hence ̂ we first
need to estimate the three partial regression coefficients, which therefore put three
restrictions on the residual sum of squares (RSS) (following this logic in the four-
variable case there will be df,
Hypothesis Testing about Individual Regression Coefficients

We can use the test to test a hypothesis about any individual partial regression
coefficient. To illustrate the mechanics, consider the following child mortality
regression:

( ) ( ) ( )

̅
Let us postulate that

and

The null hypothesis states that, with (female literacy rate) held constant,
(PGNP) has no (linear) influence on (child mortality). To test the null
hypothesis, we use the test and if the computed value exceeds the critical
value at the chosen level of significance, we may reject the null hypothesis;
otherwise, we may not reject it. We obtain:

( )

Notice that we have observations. Therefore, the degrees of freedom in this


example are . If you refer to the table given in Appendix, we do not have data
corresponding to df. The closest we have are for df. If we use these df, and
assume , the level of significance (i.e., the probability of committing a Type I
error) of percent, the critical value is for a two-tail test.

Since the computed value of (in absolute terms) exceeds the critical
value of , we can reject the null hypothesis that PGNP has no effect on child
mortality. To put it more positively, with the female literacy rate held constant, per
capita GNP has a significant (negative) effect on child mortality, as one would
expect a priori. Graphically, the situation is as shown in Figure:

Now check the postulate

and

Testing the Overall Significance of the Sample Regression

Throughout the previous section we were concerned with testing the significance
of the estimated partial regression coefficients individually, that is, under the
separate hypothesis that each true population partial regression coefficient was
zero. But now consider the following hypothesis:
This null hypothesis is a joint hypothesis that and are jointly or
simultaneously equal to zero. A test of such a hypothesis is called a test of the
overall significance of the observed or estimated regression line, that is, whether
is linearly related to both and .

This joint hypothesis can be tested by the analysis of variance (ANOVA) technique
which can be demonstrated as follows.:

Under the assumption of normal distribution for and the null hypothesis
, the variable

is distributed as the F distribution with 2 and n − 3 df.

TSS has, as usual, df and RSS has df for reasons already discussed.
ESS has df since it is a function of ̂ and ̂ . Therefore, following the ANOVA
procedure discussed in Table:
If the value computed from Equation exceeds the critical value from the
table at the percent level of significance, we reject ; otherwise we do not
reject it. Alternatively, if the p value of the observed F is sufficiently low, we can
reject .
Turning to our illustrative example, we obtain the ANOVA table, as shown in
Table.

From ratio we have,

If you were to use the conventional percent level-of-significance value, the


critical value for df in the numerator and df in the denominator (the actual
df, however, are 61) is about , or about if you were to use the percent
level of significance leading to the rejection of the hypothesis that together PGNP
and FLR have no effect on child mortality.
An Important Relationship between and

There is an intimate relationship between the coefficient of determination and


the test used in the analysis of variance. Assuming the normal distribution for the
disturbances and the null hypothesis that , we have seen that

( )
is distributed as the distribution with and df.
More generally, in the variable case (including intercept), if we assume that the
disturbances are normally distributed and that the null hypothesis is

then it follows that

( )
………(1)
( )

follows the distribution with and df. (Note: The total number of
parameters to be estimated is , of which is the intercept term.)

Let us manipulate Equation (1) as follows:


…(2)

Above Equation shows how F and are related. These two vary directly. When
= 0, F is zero ipso facto. The larger the , the greater the F value. In the limit,
when , F is infinite. Thus the test, which is a measure of the overall
significance of the estimated regression, is also a test of significance of . In
other words, testing the null hypothesis in Eq. (2) is equivalent to testing the null
hypothesis that (the population) is zero.

For the three-variable case, Eq. (2) becomes

By virtue of the close connection between and , the ANOVA Table can be
recast as:

Which is about the same as obtained before, except for the rounding errors

You might also like