0% found this document useful (0 votes)
15 views20 pages

CH 7

The document discusses multiple regression analysis, emphasizing the need for models that incorporate multiple explanatory variables to better reflect real-world complexities. It outlines the notation, assumptions, and interpretation of multiple regression coefficients, particularly focusing on the partial regression coefficients and their significance. Additionally, it covers the properties of ordinary least squares (OLS) estimators and the concept of the multiple coefficient of determination (R²) as a measure of model fit.

Uploaded by

Avik Chokhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views20 pages

CH 7

The document discusses multiple regression analysis, emphasizing the need for models that incorporate multiple explanatory variables to better reflect real-world complexities. It outlines the notation, assumptions, and interpretation of multiple regression coefficients, particularly focusing on the partial regression coefficients and their significance. Additionally, it covers the properties of ordinary least squares (OLS) estimators and the concept of the multiple coefficient of determination (R²) as a measure of model fit.

Uploaded by

Avik Chokhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

3/29/2022

7. Multiple Regression Analysis: The


Problem of Estimation

Multivariate Regression
• The two-variable model is basic in understanding how
OLS estimation, hypothesis testing etc. works
• But inadequate in practice
• Economic theory and real life is seldom simple
• Hence we need models in which the dependent
variable, or regressand, 𝑌 depends on two or more
explanatory variables, or regressors.
• e.g. Income vs. yrs. of education
• Individual’s income also explained by yrs. of experience,
geographic location, family income etc.

1
3/29/2022

7.1 THE THREE-VARIABLE MODEL:


NOTATION AND ASSUMPTIONS

NOTATION AND ASSUMPTIONS


• The two-variable population regression function (PRF)
can be modified to obtain the three-variable PRF
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖
𝑌 is the dependent variable,
𝑋2 and 𝑋3 the explanatory variables (or regressors),
𝑢 the stochastic disturbance term, and 𝑖 the 𝑖th
observation;
𝛽1 is the intercept term: average value of 𝑌 when 𝑋2 and
𝑋3 are set to zero
𝛽2 and 𝛽3 are called the partial regression coefficients

2
3/29/2022

NOTATION AND ASSUMPTIONS


Assumptions:
• Continue to operate under the classical linear
regression model (CLRM) framework. Specifically:
• Zero mean value of 𝑢𝑖
𝐸 𝑢𝑖 𝑋2𝑖 , 𝑋3𝑖 ) = 0 for each 𝑖
• No serial correlation
cov 𝑢𝑖 , 𝑢𝑗 = 0 𝑖≠𝑗
• Homoscedasticity
var(𝑢𝑖 ) = 𝜎 2

NOTATION AND ASSUMPTIONS


• Zero covariance between 𝑢𝑖 and each 𝑋 variable
cov 𝑢𝑖 , 𝑋2𝑖 = cov(𝑢𝑖 , 𝑋3𝑖 ) = 0
• No specification bias, or that the model is correctly
specified
• No exact collinearity between the X variables, or No
exact linear relationship between 𝑋2 and 𝑋3 (aka no
multicollinearity)
• no collinearity means none of the regressors (𝑋 ′ s) can
be written as exact linear combinations of the remaining
regressors in the model
• There are no set of numbers, 𝜆2 and 𝜆3 , not both zero
such that
𝜆2 𝑋2𝑖 + 𝜆3 𝑋3𝑖 = 0
6

3
3/29/2022

NOTATION AND ASSUMPTIONS


• multiple regression model is linear in the parameters,
• The values of the regressors 𝑋 are fixed in repeated
sampling, and that there is sufficient variability in the
values of the regressors.

7.2 INTERPRETATION OF MULTIPLE


REGRESSION EQUATION

4
3/29/2022

NOTATION AND ASSUMPTIONS


• Given the assumptions of the classical regression model
mentioned above
• Taking expectation on both sides
𝐸 𝑌𝑖 𝑋2𝑖 , 𝑋3𝑖 ) = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖
• conditional mean or expected value of 𝒀 conditional
upon the given or fixed values of 𝑋2 and 𝑋3
• as with the two-variable case, multiple regression
analysis is regression analysis conditional upon the
fixed values of the regressors
• we obtain the average/mean value of 𝑌 or the mean
response of 𝑌 for the given values of the regressors.
9

7.3 MEANING OF PARTIAL REGRESSION


COEFFICIENTS

10

5
3/29/2022

Meaning Of Partial Regression Coefficients

• 𝛽2 and 𝛽3 : partial regression or partial slope


coefficients
• 𝛽2 measures the change in the mean value of 𝑌, 𝐸(𝑌),
per unit change in 𝑋2 , holding the value of 𝑋3 constant.
• it gives the “direct” or the “net” effect of a unit change
in 𝑋2 on the mean value of 𝑌
• (net of any effect that 𝑋3 may have on mean 𝑌)
• Similarly for 𝛽3 , it measures the change in the mean
value of 𝑌 per unit change in 𝑋3 , holding the value of
𝑋2 constant
• holding the influence of a regressor constant?

11

Meaning Of Partial Regression Coefficients


• Example: consider 𝑌 = child mortality (CM), 𝑋2 = per
capita GNP (PGNP), and 𝑋3 = female literacy rate (FLR).
• If we want to hold the influence of FLR constant.
• FLR may effect CM as well as PGNP
• To remove the (linear) influence of FLR from both CM
and PGNP by running the regression of CM on FLR and
that of PGNP on FLR separately

12

6
3/29/2022

Meaning Of Partial Regression Coefficients

• we consider residuals obtained from these regressions

• To obtain the net effect of PGNP on CM


• regress 𝑢ො 1𝑖 on 𝑢ො 2𝑖 , which no longer have the (linear)
influence of FLR

13

Meaning Of Partial Regression Coefficients

• Regressing 𝑢ො1𝑖 on 𝑢ො 2𝑖 results in:

• This regression has no intercept term


• The slope coefficient of −0.0056 now gives the net
effect of a unit change in PGNP on CM
• This is the “true” effect of a unit change in PGNP on CM
• or the true slope of CM with respect to PGNP, 𝛽2
𝐶𝑀 = 𝛽1 + 𝛽2 𝑃𝐺𝑁𝑃 + 𝛽3 𝐹𝐿𝑅
• 𝛽2 is the partial regression coefficient of CM with
respect to PGNP,
14

7
3/29/2022

Meaning Of Partial Regression Coefficients

• we do not have to follow the same procedure to find


the true partial regression coefficients (𝛽2 , 𝛽3 ) every
time

15

7.4 OLS ESTIMATION OF THE PARTIAL


REGRESSION COEFFICIENTS (𝛽2, 𝛽3 )

16

8
3/29/2022

OLS ESTIMATION OF 𝛽1 , 𝛽2 , and 𝛽3


• Start with the SRF
𝑌𝑖 = 𝛽መ1 + 𝛽መ2 𝑋2𝑖 + 𝛽መ3 𝑋3𝑖 + 𝑢ො 𝑖
We want to minimize the sum of squared residuals (aka
residual sum of squares, RSS)
2
min ∑ 𝑢ො 𝑖2 = ∑ 𝑌𝑖 − 𝛽መ1 − 𝛽መ2 𝑋2𝑖 − 𝛽መ3 𝑋3𝑖
We get

17

OLS ESTIMATION OF 𝛽1 , 𝛽2 , and 𝛽3


• Simplifying the above FOCs we have

• Where 𝑦𝑖 = 𝑌𝑖 − 𝑌,
ത 𝑥2𝑖 = 𝑋2𝑖 − 𝑋ത2 , and 𝑥3𝑖 = 𝑋3𝑖 − 𝑋ത3
18

9
3/29/2022

OLS ESTIMATION OF 𝛽1 , 𝛽2 , and 𝛽3


• we need standard errors for two purposes:
1. to establish confidence intervals and
2. to test statistical hypotheses

19

OLS ESTIMATION OF 𝛽1 , 𝛽2 , and 𝛽3

20

10
3/29/2022

Properties of OLS Estimators


• Similar to those of the two-variable model
1. The three-variable regression line (surface) passes
through the means 𝑌, ത 𝑋ത2 , 𝑋ത3

2. The mean value of the estimated 𝑌𝑖 (= 𝑌෠𝑖 ) is equal to


the mean value of the actual 𝑌𝑖 ,
𝑌ത෠ = 𝑌ത
3. ∑𝑢ො 𝑖 = 𝑢തො = 0

21

Properties of OLS Estimators


4. The residuals 𝑢ො 𝑖 are uncorrelated with 𝑋2𝑖 and 𝑋3𝑖 ,
that is, ∑𝑢ො 𝑖 𝑋2𝑖 = ∑𝑢ො 𝑖 𝑋3𝑖 = 0
5. The residuals 𝑢ො 𝑖 are uncorrelated with 𝑌𝑖 , that is,
∑𝑢ො 𝑖 𝑌𝑖 = 0
6. OLS estimators of the partial regression coefficients
are linear and unbiased, and also have minimum
variance in the class of all linear unbiased estimators.
• They are BLUE

22

11
3/29/2022

7.5 THE MULTIPLE COEFFICIENT OF


DETERMINATION 𝑅2

23

Multiple Coefficient of Determination 𝑅2


• In the two-variable case, 𝑟 2 measures the ‘goodness of
fit’ of the regression equation
• It gives the proportion or percentage of the total
variation in the dependent variable 𝑌 explained by the
(single) explanatory variable 𝑋.
• in the three- variable model, 𝑅2 gives the proportion of
the variation in 𝑌 explained by the variables 𝑋2 and 𝑋3
jointly.
• 𝑅 2 is the multiple coefficient of determination

24

12
3/29/2022

Multiple Coefficient of Determination 𝑅2


The SRF:
𝑌𝑖 = 𝛽መ1 + 𝛽መ2 𝑋2𝑖 + 𝛽መ3 𝑋3𝑖 + 𝑢ො 𝑖
𝑌𝑖 = 𝑌෠𝑖 + 𝑢ො 𝑖
In the deviation form,
𝑦𝑖 = 𝑦ො𝑖 + 𝑢ො 𝑖
Squaring both sides and summing over sample values:
∑𝑦𝑖2 = ∑𝑦ො𝑖2 + ∑𝑢ො 𝑖2 + 2∑𝑦ො𝑖 𝑢ො 𝑖
Since 𝑢ො 𝑖 are uncorrelated with 𝑌෠𝑖 ⇒ ∑𝑦ො𝑖 𝑢ො 𝑖 = 0, then
∑𝑦𝑖2 = ∑𝑦ො𝑖2 + ∑𝑢ො 𝑖2
𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆

25

Multiple Coefficient of Determination 𝑅2


• Since
∑𝑢ො 𝑖2 = ∑𝑦𝑖2 − 𝛽መ2 ∑𝑦𝑖 𝑥2𝑖 − 𝛽መ3 ∑𝑦𝑖 𝑥3𝑖
Plugging in the above equation
∑𝑦𝑖2 = ∑𝑦ො𝑖2 + ∑𝑦𝑖2 − 𝛽መ2 ∑𝑦𝑖 𝑥2𝑖 − 𝛽መ3 ∑𝑦𝑖 𝑥3𝑖
So,
𝐸𝑆𝑆 = ∑𝑦ො𝑖2 = 𝛽መ2 ∑𝑦𝑖 𝑥2𝑖 + 𝛽መ3 ∑𝑦𝑖 𝑥3𝑖
by definition,
𝐸𝑆𝑆 𝛽መ2 ∑𝑦𝑖 𝑥2𝑖 + 𝛽መ3 ∑𝑦𝑖 𝑥3𝑖
𝑅2 = =
𝑇𝑆𝑆 ∑𝑦𝑖2
0 ≤ 𝑅2 ≤ 1

26

13
3/29/2022

Multiple Coefficient of Determination 𝑅2


• An important property of 𝑅2 : it is a non-decreasing
function of the number of explanatory variables or
regressors 𝑋 present in the model.
• as the number of regressors 𝑋 increases, 𝑅2 almost
invariably increases and never decreases.
𝐸𝑆𝑆 𝑅𝑆𝑆 ∑𝑢ො 𝑖2
𝑅2 = = 1− =1−
𝑇𝑆𝑆 𝑇𝑆𝑆 ∑𝑦𝑖2
• ∑𝑦𝑖2 is independent of the number of 𝑋 variables in the
model
• ∑𝑢ො 𝑖2 depends on the number of regressors present in
the model
• as the number of 𝑋 variables increases, ∑𝑢ො 𝑖2 is likely to
decrease (at least it will not increase)
• hence 𝑅 2 will increase
27

Multiple Coefficient of Determination 𝑅2


• Comparing two regression models with the same
dependent variable but differing number of 𝑋 variables,
one should be very wary of choosing the model with
the highest 𝑅 2
• To compare two 𝑅2 terms

• 𝑅ത 2 is the adjusted 𝑹𝟐
28

14
3/29/2022

Multiple Coefficient of Determination 𝑅2


• When comparing two models on the basis of the
coefficient of determination 𝑅2 , whether adjusted or
not,
• the sample size 𝑛 and the dependent variable must be
the same
• the explanatory variables may take any form

29

7.9 FUNCTIONAL FORMS IN MULTIPLE


REGRESSION

30

15
3/29/2022

The Cobb–Douglas Production Function

Where, 𝑌 = output
𝑋2 = labor input
𝑋3 = capital input
𝑢 = stochastic disturbance term
𝑒 = base of natural logarithm
Taking natural log on both sides
ln 𝑌𝑖 = ln𝛽1 + 𝛽2 ln 𝑋2𝑖 + 𝛽3 ln 𝑋3𝑖 + 𝑢𝑖
ln 𝑌𝑖 = 𝛽0 + 𝛽2 ln𝑋2𝑖 + 𝛽3 ln𝑋3𝑖 + 𝑢𝑖
where 𝛽0 = ln𝛽1
31

Functional Forms: C–D Function


• 𝛽2 is the (partial) elasticity of output 𝑌 with respect
to the labor input 𝑋2 , holding the capital input
constant
• 𝛽3 is the (partial) elasticity of output 𝑌 with respect
to the capital input 𝑋2 , holding the labor input
constant
• The sum 𝛽2 + 𝛽3 gives information about the returns
to scale  the response of output to a proportionate
change in the inputs.
• If this sum is 1, then there are constant returns to scale
• If less than 1, there are decreasing returns to scale
• If greater than 1, there are increasing returns to scale

32

16
3/29/2022

Functional Forms: C–D Function

33

Functional Forms: C–D Function

• 𝑋2 : Labor days; 𝛽መ2 = 1.4988, 1% increase in the labor


input led on the average to a 1.5% increase in the
output upon holding the capital input constant
• 𝑋3 : Real capital input; 𝛽መ3 = 0.4899, holding the labor
input constant, 1% increase in the capital input led on
the average to an increase of 0.5% in the output
34

17
3/29/2022

Functional Forms: C–D Function


• 𝛽መ2 + 𝛽መ3 = 1.9887 ⇒ increasing returns to scale
• The 𝑅2 value of 0.8890 means that ≈ 89% of the
variation in the (log of) output is explained by the (logs
of) labor and capital.

35

7.10 POLYNOMIAL REGRESSION MODELS

36

18
3/29/2022

Polynomial Regression Model

U-shaped curve:
short-run marginal
cost (MC) of
production (Y) of a
commodity to the
level of its output
(X)

Geometrically: MC curve is a parabola.


Mathematically, equation for a parabola:
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2 37

Polynomial Regression Model


𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋 2
• The above expression is quadratic function, or more
generally, a 2nd – degree polynomial in the variable 𝑋
(The highest power of 𝑋 represents the degree of the
polynomial)
• Adding the error term to get the stochastic version
𝑌 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖2 + 𝑢𝑖
• NOTE: For polynomial regressions there is only one
explanatory variable on the right-hand side but it
appears with various powers, thus making them
multiple regression models

38

19
3/29/2022

Polynomial Regression Model


• Since the 2nd – degree polynomial (or kth deg.)
polynomial is linear in the parameters, 𝛽’s, they can be
estimated by the usual OLS methodology.
• what about the collinearity problem?
• X’s highly correlated since they are all powers of X?
• Yes, but terms like 𝑋 2 , 𝑋 3 , 𝑋 4 , etc., are all nonlinear
functions of X and hence, do not violate the no
multicollinearity assumption.
• ∴ polynomial regression models can be estimated by
techniques presented in this chapter and present no
new estimation problems.

39

40

20

You might also like