0% found this document useful (0 votes)
12 views

Multiple Regression Model

Uploaded by

cide1217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Multiple Regression Model

Uploaded by

cide1217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Multiple Regression Model

The Multiple Regression Model

• The Multiple regression model takes the form

Yi = β0 + β1 Xi1 + β2 Xi2 + . . . + βk Xik + ui

• There are k regressors (explanatory Variables) and a constant.


Hence there will be k+1 parameters to estimate.

• Assumption M.1:
We will keep the basic least squares assumption - We will as-
sume that the error term is mean independent of all regressors
(loosely speaking - all Xs are uncorrelated with the error term,
i.e.
E(ui |X1 , X2 , . . . , Xk ) = E(ui |X) = 0

EEC 2002-2003. III


Interpretation of the coefficients

• Since the error term is mean independent of the Xs, varying


the X’s does not have an impact on the error term.
• Thus under Assumption M.1 the coefficients in the regression
model have the following simple interpretation:
∂Yi
βj =
∂Xij

• Thus each coefficient measures the impact of the corresponding


X on Y keeping all other factors (Xs and u) constant. A ceteris
paribus effect.

EEC 2002-2003. III


Dummy Variables

• Some of the explanatory variables are not necessarily continu-


ous variables. Y may also be determined by qualitative factors
which are not measured in any units:
– sex, nationality or race.
– type of education (vocational, general).
– type of housing (flat, large house or small house).
• These characteristics are coded into dummy variables. These
variables take only two values, 0 or 1:

Di = 0 if individual is male
Di = 1 if individual is female

EEC 2002-2003. III


Dummy Variables: Intercept Specific Relationship

• The dummy variable can be used to build a model with an


intercept that vary across groups coded by the dummy variable:

Yi = β0 + β1 Xi + β2 Di + ui

Y Yi = β0 + β1 Xi + β2
6

Yi = β0 + β1 Xi
β0 + β2

β0

• Interpretation: The observations for which Di = 1 have on


average a Yi which is β2 units higher.
• Example: WTP, income and sex

Variable Coefficient st. err


log income 0.22 0.06
sex (1=Male) 0.01 0.09
constant 0.42 0.47

EEC 2002-2003. III


Dummy Variables: Slope Specific Relationship

• The dummy variable can also be interacted with a continuous


variable, to get a slope specific to each group:

Yi = β0 + β1 Xi + β2 Xi Di + ui

Y
6 Y = β0 + (β1 + β2 )X

Y = β0 + β1 X

β0

-
X

• Interpretation: For observations with Di = 0, a one unit in-


crease in Xi leads to an increase of β1 units in Yi . For those
with Di = 1, Yi increases by β1 + β2 units.
• Example: WTP, income and sex

Variable Coefficient st. err


log income 0.23 0.06
sex (1=Male)*log income 0.003 0.01
constant 0.42 0.47

EEC 2002-2003. III


Least Squares in the Multiple Regression Model

• We maintain the same set of assumptions as in the one variable


regression model.
• We modify assumption 1 to assumption M1 to take into ac-
count the existence of many regressors.
• The OLS estimator is chosen to minimise the residual sum of
squares exactly as before.
• Thus β0 , β1 , . . . , βk are chosen to minimise

N 
N
S= u2i = (Yi − β0 − β1 Xi1 − . . . − βk Xik )2
i=1 i=1

• Differentiating S with respect to each coefficient in turn we


obtain a set of k + 1 equations constituting the first order con-
ditions for minimising the residual sum of squares S. These
equations are called the Normal Equations.

EEC 2002-2003. III


A solution for two regressors

• With two regressors this represents a two equation system with


two unknowns, i.e. β1 and β2 .
• The solution for β1 is
N
 N
 N
 N

(Xi2 − X̄2 )Xi2 (Xi2 − X̄2 )Xi1 − (Yi − Ȳ )Xi2 (Yi − Ȳ )Xi1
i=1 i=1 i=1 i=1
β̂1 = N N N N
   
(Xi2 − X̄2 Xi2 ) (Xi1 − X̄1 )Xi1 − (Xi2 − X̄2 )Xi1 (Xi1 − X̄1 )Xi2
i=1 i=1 i=1 i=1

• This formula can also be written as


cov(Y, X1 )V ar(X2 ) − cov(X1 , X2 )cov(Y, X2 )
β̂1 =
V ar(X1 )V ar(X2 ) − cov(X1 , X2 )2
Similarly we can derive the formula for the other coefficient
(β2 )
• Note that the formula for βˆ1 is now different from the formula
we had in the two variable regression model. This now takes
into account the presence of the other regressor(s).
• The extent to which the two formulae differ depends on the
covariance of X1 and X2 .
• When this covariance is zero we are back to the formula for the
one variable regression model.

EEC 2002-2003. III


The Gauss Markov Theorem

• The Gauss Markov Theorem is valid for the multiple regression


model. We need however to modify assumption A.4.
• Define the covariance matrix of the regressors X to be
 
V ar(X1 ) cov(X1 , X2 ) . . . cov(X1 , Xk )
 cov(X , X ) V ar(X ) . . . cov(X , X ) 
 1 2 2 2 k 
cov(X) =  .. .. . . 
 . . . . .
. 
cov(X1 , Xk ) cov(X2 , Xk ) . . . V ar(Xk )

• Assumption M.4: We assume that cov(X) positive definite


and hence can be inverted.
• Theorem: Under Assumptions M.1 A.2 and A3 and M.4 the
Ordinary Least Squares Estimator (OLS) is Best in the class
of Linear Unbiased estimators (BLUE).
• As before this means that OLS provides estimates that are least
sensitive to changes in the data - given the stated assumptions.

EEC 2002-2003. III


An Example

• We investigate the determinants of log willingness to pay.


• We include as explanatory variables:
– log income,
– education coded as low, medium and high,
– age of the head of household, in years.
– household size.

Variable Coef. Std Err. t-stat


log income 0.14 0.07 2.2
medium education 0.47 0.16 2.9
high education 0.58 0.18 3.1
age 0.0012 0.004 0.3
household size 0.008 0.02 0.4
constant 0.53 0.55 0.96
number of observations 352
2
R 0.0697
2
adjusted R 0.0562

interpretation:
• When income goes up by 1%, WTP goes up by 0.14%.
• low education is the reference group (we have omitted this
dummy variable). Medium educated individuals have a WTP
47% higher than the low educated ones and high educated 58%
more.

EEC 2002-2003. III


Omitted Variable Bias

• Suppose the true regression relationship has the form

Yi = β0 + β1 Xi1 + β2 Xi2 + ui

• Instead we decide to estimate:

Yi = β0 + β1 Xi1 + νi

• We will show that in general this omission will lead to a biased


estimate of X1 .

• Suppose we use OLS on the second equation. As we know we


will obtain:
N
(Xi1 − X̄1 )νi
i=1
β̂1 = β1 +
N
(Xi1 − X̄1 )2
i=1

• The question is : What is the expected value of the last ex-


pression on the right hand side. For an unbiased estimator this
will be zero. Here we will show that it is not zero.

EEC 2002-2003. III


Omitted Variable Bias

• First note that according to the true model we have that

νi = β2 Xi2 + ui

• We can substitute this into the expression for the OLS estima-
tor to obtain

N 
N
(Xi1 − X̄1 )β2 Xi2 + (Xi1 − X̄1 )ui
i=1 i=1
β̂1 = β1 +

N
(Xi1 − X̄1 )2
i=1

• Now we can take expectations of this expression.



N 
N
E[(Xi1 − X̄1 )β2 Xi2 |X] + E[(Xi1 − X̄1 )ui |X]
E[β̂1 |X] = β1 + i=1 i=1

N
(Xi1 − X̄1 )2
i=1

The last expression is zero under the assumption that u is mean


independent of X [Assumption M.1].
• This expression can be written more compactly as:
cov(X1 , X2 )
E[β̂1 |X] = β1 + β2
V ar(X1 )

EEC 2002-2003. III


Omitted Variable Bias

cov(X1 , X2 )
E[β̂1 |X] = β1 + β2
V ar(X1 )
• The bias will be zero in two cases:
– When the coefficient β2 is zero. In this case the regressor
X2 obviously does not belong to the regression.
– When the covariance between the two regressors X1 and
X2 is zero.
• Thus in general omitting regressors which have an impact on
Y (β2 non-zero) will bias the OLS estimator of the coefficients
on the included regressors unless the omitted regressors are
uncorrelated with the included ones.

EEC 2002-2003. III


Summary of Results

• Omitting a regressor which has an impact on the dependent


variable and is correlated with the included regressors leads to
”omitted variable bias”

• Including a regressor which has no impact on the dependent


variable and is correlated with the included regressors leads
to a reduction in the efficiency of estimation of the variables
included in the regression.

EEC 2002-2003. III


Measurement Error

• Data is often measured with error.


– reporting errors.
– coding errors.
• The measurement error can affect either the dependent vari-
able or the explanatory variables. The effect is dramatically
different.

EEC 2002-2003. III


Measurement Error on Dependent Variable

• Yi is measured with error. We assume that the measurement


error is additive and not correlated with Xi .
• We observe Y̌i = Yi + νi . We regress Y̌i on Xi :

Y̌i = β0 + β1 Xi + ui
Yi = β0 + β1 Xi + ui − νi
= β0 + β1 Xi + wi

• The assumptions we have made for OLS to be unbiased and


BLUE are not violated. OLS estimator is unbiased.
• The variance of the slope coefficient is:
1 V ar(wi )
V ar(β̂1 ) =
N V ar(Xi )
1 V ar(ui − νi )
=
N V ar(Xi )
1 V ar(ui ) + V ar(νi )
=
N V ar(Xi )
1 V ar(ui )

N V ar(Xi )

• The variance of the estimator is larger with measurement error


on Yi .

EEC 2002-2003. III


Measurement Error on Explanatory Variables

• Xi is measured with errors. We assume that the error is addi-


tive and not correlated with Xi .
• We observe X̌i = Xi + νi instead. The regression we perform
is Yi on X̌i . The estimator of β1 is expressed as:


N
¯ )(Y − Ȳ )
(X̌i − X̌ i
i=1
β̂1 =

N
¯ )2
(X̌i − X̌
i=1

N
(Xi + νi − X̄)(β0 + β1 Xi + ui − Ȳ )
i=1
=

N
(Xi + νi − X̄)2
i=1

N
β1 (Xi − X̄)2
i=1
=

N
(Xi − X̄)2 + νi2 − 2νi (Xi − X̄)
i=1

β1 V ar(Xi )
E(βˆ1 ) = ≤ β1
V ar(Xi ) + V ar(νi )
• Measurement error on Xi leads to a biased OLS estimate,
biased towards zero. This is also called attenuation bias.

EEC 2002-2003. III

You might also like