0% found this document useful (0 votes)
16 views16 pages

7 Lecture07

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

7 Lecture07

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 7

Multiple Regression Analysis:


The estimation

Obid Khakimov

Westminster International University in Tashkent

Econometrics, 5ECON012C WIUT 1/1


Three-variable model and assumptions
The simplest possible multiple regression model is the three-variable
regression, with one dependent variable and two explanatory variables.

Yi = β1 + β2 X2i + β3 X3i + ui

The coefficients β2 and β3 are called the partial regression


coefficients. Within the framework of the CLRM, we assume
1. Zero mean value of ui 4. Zero covariance between ui
E(ui |X2i , X3i ) = 0 and X
cov(u, X2i ) = cov(u, X3i ) = 0
2. No serial correlation
cov(ui , uj ) = 0, i ̸= j 5. No specification bias
3. Homoscedasticity 6. Correct model specification
var(ui ) = σ 2 7. No perfect collinearity
Econometrics, 5ECON012C WIUT 2/1
Perfect Collinearity

Informally, no collinearity means none of the regressors can be written


as exact linear combinations of the remaining regressors in the model.
Formally, no collinearity means that there exists no set of numbers,
λ2 and λ3 , not both zero such that

λ2 X2i + λ3 X3i = 0

In this case, there is no way to assess the separate influence of X2i


and X3i on Yi

Econometrics, 5ECON012C WIUT 3/1


The meaning of partial regression coefficients

β2 measures the direct or the net effect of a unit change in X2 on the


mean value of Y .

β3 measures the direct or the net effect of a unit change in X3 on the


mean value of Y .

Econometrics, 5ECON012C WIUT 4/1


The meaning of partial regression coefficients

Example,
How do we actually go about holding the influence of a regressor
constant? Recall child mortality example, Y = child mortality (CM),
X2 = per capita GNP (PGNP), and X3 = female literacy rate (FLR).
Let us suppose we want to hold the influence of FLR constant, i.e.,
to get the partial regression coefficient of CM with respect to PGNP.
Since FLR may have some effect on CM as well as PGNP, what we
can do is to remove the (linear) influence of FLR from both CM and
PGNP by running the regression of CM on FLR and that of PGNP on
FLR separately and then looking at the residuals obtained from these
regressions.

Econometrics, 5ECON012C WIUT 5/1


The meaning of partial regression coefficients

CMi = 263.86 − 2.39F LRi + û1i


P GN Pi = −39.30 + 28.14F LRi + û2i

If we now regress û1i on û2i , which are “purified” of the (linear)


influence of FLR, we obtain the net effect of PGNP on CM? That is
indeed the case. The regression results are as follows:

û1i = −0.0056û2i + êi

Econometrics, 5ECON012C WIUT 6/1


Estimation of multiple regression: OLS

Let say that the PRF is

Yi = β1 + β2 X2i + β3 X3i + ui

Now we apply OLS procedure, using sample data:


X X 2
min û2i = Yi − β̂1 − β̂2 X2i − β̂3 X3i
β̂1 ,β̂2 ,β̂3

Econometrics, 5ECON012C WIUT 7/1


OLS estimates

β̂1 = Ȳ − β̂2 X̄2i − β̂3 X̄3i

( yi x2i )( x23i ) − ( yi x3i )( x2i x3i )


P P P P
β̂2 = P 2 P 2
( x2i )( x3i ) − ( x2i x3i )2
P

( yi x3i )( x22i ) − ( yi x2i )( x3i x2i )


P P P P
β̂3 =
( x23i )( x22i ) − ( x3i x2i )2
P P P

X̄22 x23i + X̄32 x22i − 2X̄2 X̄3 x2i x3i


 P P P 
1
var(β̂1 ) = + P 2 2 · σ2
n x2i x3i − (x2i x3i )2
σ2
var(β̂2 ) = P
x22i (1
− r23 )2
σ2
P 2
2 ûi
var(β̂3 ) = P , σ̂ =
x23i (1 − r32 )2 n−3

Econometrics, 5ECON012C WIUT 8/1


R squared
Recall that

Yi = β̂1 + β̂2 X2i + β̂3 X3i + ûi


= Ŷi + ûi

Now by definition,
ESS
R2 =
T SS
P P
β̂2 yi x2i + β̂3 yi x3i
= P 2
yi

Econometrics, 5ECON012C WIUT 9/1


Introduction to Specification Bias
Assume that CMi = β1 + β2 P GN Pi + β3 F LRi + ui is the true
model explaining the behavior of child mortality in relation to per
capita GNP and female literacy rate (FLR). But suppose we disregard
FLR and estimate the following simple regression:

CMi = α1 + α2 P GN Pi + u1i

Now estimating this model would constitute a specification error; the


error here consists in omitting the variable the female literacy rate. In
general, α̂2 will not be an unbiased estimator of the true β2 . So the
coefficients, standard errors and R squared will be different.

Econometrics, 5ECON012C WIUT 10 / 1


Adjusted R squared
An important property of R squared is that it is a nondecreasing
function of the number of explanatory variables or regressors present
in the model, unless the added variable is perfectly collinear with the
other regressors; as the number of regressors increases, R squared
almost invariably increases and never decreases.

As Theil (1978) notes ”... it is good practice to use R̄2 (adjusted R


squared) rather than R2 because R2 tends to give an overly
optimistic picture of the fit of the regression, particularly when the
number of explanatory variables is not very small compared with the
number of observations”.
n−1
R̄2 = 1 − (1 − R2 )
k−1

Econometrics, 5ECON012C WIUT 11 / 1


Comparing Two R2 Values

It is crucial to note that in comparing two models on the basis of the


coefficient of determination, whether adjusted or not, the sample size
and the dependent variables must be the same. Thus for the models

lnYi = β1 + β2 X2i + β3 X3i + ui


Yi = β1 + β2 X2i + β3 X3i + ui

the computed R2 terms cannot be compared.

Econometrics, 5ECON012C WIUT 12 / 1


The Game of Maximizing R̄2
A warning is order: Sometimes researchers play the game of
maximizing R̄2 , that is, choosing the model that gives the highest
R̄2 . But this may be dangerous, for in regression analysis our
objective is not to obtain a high R̄2 per se but rather to obtain
dependable estimates of the true population regression coefficients
and draw statistical inferences about them. In empirical analysis it is
not unusual to obtain a very high R̄2 but find that some of the
regression coefficients either are statistically insignificant or have
signs that are contrary to a priori expectations. Therefore, the
researcher should be more concerned about the logical or theoretical
relevance of the explanatory variables to the dependent variable and
their statistical significance. If in this process we obtain a high R̄2 ,
well and good; on the other hand, if R̄2 is low, it does not mean the
model is necessarily bad.
Econometrics, 5ECON012C WIUT 13 / 1
The Cobb–Douglas Production Function

β2 β3
Yi = β1 X2i X3i

lnYi = β1 + β2 lnX2i + β3 lnX3i

lnYi = β1 + β2 lnX2i + β3 lnX3i + ui


where
Y - output
X2 - labor input
X3 - capital input

Econometrics, 5ECON012C WIUT 14 / 1


Polynomial regression models

Yi = β1 + β2 Xi + β3 Xi2

Yi = β1 + β2 Xi + β3 Xi2 + ··· + βk Xik + ui

Marginal cost of production


Notice that in these types of polynomial
regressions there is only one explanatory
variable on the right-hand side but it
appears with various powers, thus
making them multiple regression models.
Output

Econometrics, 5ECON012C WIUT 15 / 1


Reference
Gujarati, D.N., and Porter, D.C., (2009), Basic Econometrics, 5th
edition. Chapter 7

Econometrics, 5ECON012C WIUT 16 / 1

You might also like