0% found this document useful (0 votes)
36 views30 pages

Econometrics I CH 3 MLR

Uploaded by

Getacho Defaru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views30 pages

Econometrics I CH 3 MLR

Uploaded by

Getacho Defaru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 3:Multiple Linear Regression

3.1 Introduction

 Multiple linear regression(MLR) is an extension of Simple Linear


Regression(SLR).
 Among the deriving factors to extend SLR to MLR is that, in economics it
is rare to get that one dependent variable, which is affected by only one
explanatory variable.
 Often an economic variable is influenced by a host of several
variables.
 For example, the demand for a commodity depends not only on its
price, but on many other factors such as the income of the consumer ,
the price of substitutes previous, the price of complementary goods ,
the family (population) size , the taste of the consumers etc.
.
Introduction Cont’ed…
Therefore, the two variables model is often in adequate in practical
works.
 Hence, MLR is entirely concerned with the relationship between one
dependent variable (Y) and two or more explanatory variables (X1i, X2i,
…,Xki) which is linear in parameters.
The MLR for K-explanatory variable, we can write the model as follows:
3.2. Method Of Ordinary Least Squares Revised
 Now let us have n sample observations from N population, because of
the reasons that we know, on Y, X1i,X2i …Xki and obtain estimates of the
true population parameters β0, β1, β 2 … βK.

Yi X1i X2i …. Xki


Y1 X11 X21 ….Xk1
Y2 X12 X22 ….Xk2
Y3 X13 X23 ….Xk3
   
Yn X1n X2n ….Xkn
The sample regression function (SRF) can be written as

Y i  ˆ 0  ˆ 1 X 1 i  ˆ 2 X 2 i  ...ˆ k X i k  e i
OLS Cont’ed…
   
The regression line equation is Ŷ   o   1 X 1i  2 X 2i ...k X ki
i
As discussed in simple linear regression, the estimates of multiple linear
regression will also be obtained by choosing the values of the coefficients or
parameters that will minimize the sum of squares of the residuals(RSS).
Symbolically,

n
ˆ
n n
m i n i m i z e i  1 e  i 1 (Y i Y i )  i 1 (Y i  β̂0  β̂1 X 1 i  β̂2 X 2 i  β̂k X k i )2
i
2 2

β̂0 ,  1 ,... β̂k

A necessary condition for a minimum value is that the partial derivatives of


the above expression with respect to the
unknowns: ˆ 0 , ˆ 1 ,...,  ˆ should be set to zero.
n k n
(  e ) i
2
 [  ( Y i  β̂0  β̂1 X 1i  β̂2 X 2i  β̂k X ki ) 2 ]
i 1
0 i1
0
ˆ0 , ˆ 1 ,...,ˆ βˆ0 , β̂1 ,..., β̂k
OLS Cont’ed…
And the F.O. C yields the following K+1 normal
n n
( e ) i
2
 [ (Yi  β̂0  β̂1 X1i  β̂2 X2i  β̂k Xki )2 ]
i 1
0 i1
 (Yi  β̂0  β̂1  β̂2 X2i  ˆβk Xki )  ei  0
ˆ0 βˆ0
X1i
n n
( e ) i
2
 [ (Yi  β̂0  β̂1 X1i  β̂2 X2i  β̂k Xki )2 ]
i 1
0 i1
 (Yi  β̂0  β̂1 X1i  β̂2 X2i  ˆβk X ki )X 1i  ei X 1i  0
ˆ 1
βˆ1
n n
( e ) i
2
 [ (Yi β̂0 β̂1 X1i β̂2 X2i β
ˆ k Xki )2 ]
i1
 0  i1  (Yi β̂0 β̂1 X1i β̂2 X2i ˆβ
k Xki )X2i  ei X2i  0
ˆ2 βˆ2

n n
( e ) 2
i  [ (Yi  β̂0  β̂1X 1i  β̂2 X 2i  β̂k Xki )2 ]
i1
 0  i1  (Yi  β̂0  β̂1X1i  β̂2X2i  β
ˆ X )X  e X 0
ˆ ˆ
βk
k
OLS Cont’ed…
For the sake of simplicity, let’s start our discussion with the simplest
multiple regression model i.e. model with two explanatory variables, Y =
f(X1, X2) or
  
Y i   o   1 X 1i   2 X 2 i  ei

n n
( e ) i
2
 [ (Yi  β̂0  β̂1 X1i  β̂2 X2i )2 ]
i 1
0 i1
0
ˆ βˆ0
0
n n
( e ) i
2
 [ (Yi  β̂0  β̂1 X1i  β̂2 X2i )2 ]
i 1
0 i1
0
ˆ 1 βˆ1
n n
( e ) i
2
 [ (Yi  β̂0  β̂1 X1i  β̂2 X2i )2 ]
i 1
0 i1
0
ˆ 2 βˆ2
OLS Cont’ed…
After differentiating we get the following normal
n n n n

Y i   ˆ1  X
β̂0  β 1i  βˆ 2  X 2i  0
i 1 i 1 i1 i 1

n n n

Y i  n β̂0  βˆ 1  X 1i  βˆ 2  X 2i .......... ........ ....(1)


i 1 i 1 i1

n n n n

 YX 1i  β̂0  X 1i ˆ1  X
β 2
1i
 β̂2  X 2i X 1i  0
i1 i1 i1 i1

n n n n

 YX 1i  β̂0  X 1i ˆ1  X
β 2
1i
 β̂2  X 2i X 1i .......... ........ .......(2)
i1 i1 i1 i1

n n n n

 YX 2i  β̂0  X 2i  β̂1  X1i X 2i  β


ˆ2 X 2  0
2i
i 1 i 1 i 1 i 1

n n n n

 YX 2i  β̂0  X 2i  β̂1  X1i X 2i  β


ˆ 2  X 2 ............................( 3 )
2i
i 1 i 1 i 1 i 1
OLS Cont’ed…
From equation (1) we do obtain
β̂0  Y  β̂1 X 1  β̂2 X 2 .......... .......... .....( 4 )
By substituting equation (4) in the normal equations (2) and (3),
the model can have the following deviation forms.
n n n

 yx 1i
ˆ1  x  β̂2  x 2i x1i .......... ........ .......(5)
β 2
1i
i1 i1 i1

n n n

 yx 2i  β̂1  x 1i x 2i  βˆ 2  x 2 .......... ........ ......(6 )


2i
i 1 i 1 i 1

Now to solve for ˆ &ˆ we can use either substitution


1 2

or Cramer’s rule.
OLS Cont’ed…
To use the Cramer’s rule, we need to rewrite the above two
equations in matrix form(notation) as follows
 yx1i    x1i2 x x  βˆ1 
  x x
1i 2i
 
 yx 2i   1i 2i x
 2
2i
 βˆ2 

[F] [ A]  [ ]
Then find the determinants of matrix A
 1i x x
x  x x x x  x x  x x 
2
x
 x
2
A
1i 2i 2 2 2 2

x1i x2i x 2i
1i 2i 1i 2i 1i 2i 1i 2i 1i 2i

To find ˆ , substitute the first column of A by elements


1
A
of F, then find | A 1|and finally find = A 1
OLS Cont’ed…
 yx x x
A1    1i  2i   yx 2i  x1i x 2i
1i 1i 2i 2
yx x
 yx 2i x 2
2i

ˆ1 
A1
  1i  2i   yx2i  x1i x2i
yx x 2

A x x 1i
2
2
2i   x x 1i 2i 2

The same is true for ˆ2 , substitute the second column of A by


elements of F, then find |A 2 |and finally find = A 2
A

A2 
 1i
x 2
 yx
  yx  x   yx  x
1i 2
x 2i
 x x  yx
1i 2i 2i
1i 1i 2i 1i

ˆ2 
A2
  yx  x   yx  x x
2i
2
1i 1i 1i 2i

A  x  x   x x 2
1i
2
2i 1i 2i
2
3.3. Partial Correlation Coefficients & their Interpretation
 The coefficient of correlation r as a measure of the degree of
linear association between two variables.
 For the three-variable regression model we can compute three
correlation coefficients: r12 (correlation between Y and X2), r13
(correlation coefficient between Y and X3), and r23 (correlation
coefficient between X2 and X3); notice that letting the subscript
1 represent Y for notational convenience.
 A correlation coefficient that is independent of the influence, if any, of
X3 on X2 and Y. Such a correlation coefficient can be obtained and is
known appropriately as the partial correlation coefficient.
Partial Correlation Coefficients cont’ed…
We define
 r12.3 = partial correlation coefficient between Y and X2, holding
X3 constant
 r13.2 = partial correlation coefficient between Y and X3, holding
X2 constant
 r23.1 = partial correlation coefficient between X2 and X3, holding
Y constant
 They can be easily obtained as :
Partial Correlation Coefficients Cont’ed…
The expression may be called the coefficient of partial
determination and may be interpreted as the proportion of the
variation in Y not explained by the variable X3 that has been
explained by the inclusion of X2 into the model.
3.4. Coefficient of Multiple Determination
In SLR, we saw how the proportion of the variation in Y is explained by the
variable X only, which is measured by R2.
But, in MLR, we would like to know the proportion of the variation in Y
explained by the variables X1…Xk, which are incorporated in the model,
jointly.
 The measure of the goodness of fit in the multiple regression case is called the
coefficient of multiple determination.
 It explains the proportion of the variation in y which is due to the variables x1 and x2
jointly.
 It measures the proportion of total variation in the dependent variable Y explained by
the explanatory variables jointly.
 Therefore, the notion of R2 can be easily extended to regression models
containing more than two(k) variables as follows
ESS  ŷ 2
e 2


β̂1  x1i y  β̂2  x2i y  ...  β̂k  x ki y
R 2Y,X1 ...Xn   1
TSS y 2
y 2
y 2
Coefficient of Multiple Determination Cont‘ed…
In case of simple multiple linear regression, regression with
two explanatory variables, R2 is thus:
ESS  ŷ  e2 β̂  x y  β̂  x y
2
1 1i 2 2i
R 2
  1 
TSS  y
Y,X 1 X 2
 y2
y2 2

One limitation with R2 is that it can be made large by adding more and more
variables, even if the add variables have no economic justification or contribution.

Algebraically,

• as the variables are added the sum of squared residuals (RSS) goes down
(it can remain unchanged, but this is rare) and thus R2 goes up.
Coefficient of Multiple Determination Cont‘ed…
This makes R2 a poor measure for deciding whether to
add or remove a variable or several variables to/from a
model.
So as to alleviate such weakness, an alternative
measure of goodness of fit is known as the adjusted
R2. e 2

AdjustedR 2  R 2  1  n  k  1   1  R 2  n  1  
  
 y 2
  n  k 
n 1

• Where; k = the number of parameters in the model (including the


intercept term), n = the number of sample observations, R2 = is the
unadjusted multiple coefficient of determination
Coefficient of Multiple Determination Cont‘ed…
This measure, the adjusted R2, does not always goes up when a variable is
added because of the degree of freedom term n-k is the numerator.

• If useful predictors are added to the model, then the adjusted R2 will
increases. And if you useless predictors added to the model , the
adjusted R2 will decrease.

The adjusted R2 can be negative, although R2 is necessarily non-negative.


In this case its value is taken as zero.

But with small samples, if the number of regressors (X’s) are large in relation
to the sample observations, the adjusted R2 will be much smaller than R2.
16
3.5 Assumptions & Properties OLS (MLR)
1. The regression is linear in parameter
2. The conditional mean of error term, , is zero
3. Homoscedasticity: The variance of each i is the same for all
the Xis values.
4. Errors are uncorrelated across observations(no
autocorrelation or no serial correlation):
5. All independent variables(Xs) are uncorrelated with the error
term(ɛi)
6. No independent variable is a perfect linear function of any
other independent variable (no perfect multi-collinearity).
That means cov(X1iX2i) shouldn't be near to 1 or -1
Assumptions & Properties OLS cont’ed…
7. The error term is normally and independently distributed,
i.e., i ~ NIID(0, 2). .
8. All Xs are assumed to be non-stochastic, and must take at
least two different values.
9. The number of observations n must be greater than the
number of parameters(k) to be estimated.
10. Correct specification of the model:
• The model has no specification error in that all the important
explanatory variables appear explicitly in the function and the
mathematical form is correctly defined.
• In other words, we have all the right X’s on the right hand side, we
squared them if they should be squared, we took logarithms if they
should be in logarithms, etc.
Assumptions & Properties OLS cont’ed…
As we proved in chapter two, given the assumptions of the
classical linear regression model, the OLS estimators possess
some ideal or optimum properties.
• That is, OLS estimators are the Best Linear Unbiased Estimators
(BLUE) of 0,1,2…k. Specifically
• Linearity: All coefficients are a linear function of the random terms.
• Unbiasedness: The mean values of the sample parameters are equal
to the population.
ˆ
• Minimum variance e E (βˆ) βi
• Efficiency : OLS estimators fulfill both unbiasedness and minimum variance
property
Var ( i ) OLS  Var (~i )other
Assumptions & Properties OLS cont’ed…
For the case of simple multiple linear regression that is
  
Yi  βo β1 X1i  β2 X2i  ei and following the same procedure that
we made in chapter two, the variances of are ˆ , ˆ & ˆ given 0 1 2

as follows
x  X 22 x  2X X x x 
1 X  
2 2 2
1
ˆ0 )  σ
Var(β ˆ  
2 2 1 1 2 1 2

 x  x   x x 
2
n 2
1
2
2 1 2 

σ̂2  x 22 n
V ar ( β̂1 ) 
x x
σ̂  e
  x x 
2 2
2 2
and i
22 1 2 i1
1

ˆ  x
σ
2 2

Var( βˆ2 )  1

  x   x x 
2
x 12 2
2 1 2
3.6. Interval Estimation and Hypothesis testing
 The principle involved in interval estimation and testing
multiple linear regressions parameters is identical with that of
simple linear regression.
 Interval estimate consists of two numerical values(two
intervals) that, with specified degree of confidence, we feel
includes the parameters to be estimated.
 For the case of k explanatory variables the interval estimates or
confidence intervals with some degree of freedom and
confidence level is

Β
ˆ i  (t α/2
nk
)s ê( ˆi )
Interval Estimation and Hypothesis testing cont’ed…
o Hypothesis testing: In case of multiple linear
regression we see two types of hypotheses testing
1) Hypothesis testing about individual partial regression
Coefficients
• We can test whether a particular variable X1 or X2 or Xk is
significant or not holding the other variable constant.
Y  β0  β1 X 1  β2 X 2  . . .  βk X k  e i
H0 : β1  0 H1 : β1  0
H0 : β2  0 H1 : β2  0
• Since there is only one restriction and the population standard deviation
is mostly unknown, the t test is used to test a hypothesis about any
individual partial regression coefficient
Interval Estimation and Hypothesis testing cont’ed…
2) Testing the overall significance of a regression
• This test aims at finding out whether the explanatory variables (X1,
X2, …Xk) do actually have any significant influence on the dependent
variable.
• The specification of the overall or joint significance test for the
regression of the form Y  β0  β1X 1  β2X 2  ...  βk X k  e i is as
follows
H 0 : β1  β2  ...  βk  0 H 1 : Not all βi are zero
• As can be seen from specification, there are more than one
restrictions, the appropriate test statistics is therefore F-test.
• The joint hypothesis can be tested by the analysis of variance (AOV)
technique. The following table summarizes the idea.
Interval Estimation and Hypothesis testing cont’ed…
Source of Sum of Squares Degrees of Mean square
variation (SS) Freedom (Df) (MSS)
The model ESS k–1 ESS/k-1
Residual RSS n– k RSS/n-k
Total TSS n–1 TSS/n-1
Then to undertake the test first find the calculated value of F
and compare it with the F tabulated value.
The calculated value of F can be obtained by using the following
formula.

Fstat 
 ŷ k  1  ESS k  1 
i
2 R2
k  1  R (n  k)
2

 e n  k RSS n  k (1  R ) n  k (1  R )(k  1)
2
i
2 2

24
Interval Estimation and Hypothesis testing cont’ed…
o When R2 = 0, F is zero. The larger the R2, the greater the
F value. In the limit, when R2 = 1, F is infinite.
o Thus the F test, which is a measure of the overall
significance of the estimated regression, is also a test of
significance of R2. Testing the null hypothesis is equivalent
to testing the null
o hypothesis that (the population) R2 is zero.
o Decision Rule:
• If Fstat > Ftabulated (F(k – 1, n – k)), reject H0: otherwise you may
accept it,
• Where F(k – 1, n – k) is the critical F value at the  level of
significance and (k – 1) numerator df and (n – k) denominator df.
Example
Exercise 1: Suppose we have data on wheat yield (Y), amount of fertilizer
applied (X1) and amount of rainfall (X2). It is assumed that the fluctuations
in yield can be explained by varying levels of rainfall and fertilizer. And
assume the regression model has the following form Y  β0  β1X 1  β2X 2  i
The hypothetical data is given as follow
Yield Fertilizer Rain fall
(Y) (X1) (X2) Y2 X 2
1
X 22 YX1 YX2 X1X2
obs
1 40 100 10 1600 10000 100 4000 4000 1000
2 50 200 20 2500 40000 400 10000 10000 4000
3 50 300 10 2500 90000 100 15000 15000 3000
4 70 400 30 4900 160000 900 28000 28000 12000
5 65 500 20 4225 250000 400 32500 32500 10000
6 65 600 20 4225 360000 400 39000 39000 12000
7 80 700 30 6400 490000 900 56000 56000 21000
sum 420 2800 140 26350 1400000 3200 184500 184500 63000
Based on the given information
a) Estimate the regression line and interpret the results

b) Compute the measures of goodness of fit(R-squared and adjusted R- squared) and


interpret the result
c) Compute the variances of each coefficient.

d) Compute confidence intervals for the intercept and slope coefficients at 95%
confidence level and interpret the result

e) Test the individual significance for the two explanatory variables at 5%


significance level and interpret the result

f) Test the over all significance or fitness of the model at 95% confidence
level and interpret the result
The STATA results are presented as follows
About 98.1 % of the
variation of wheat yield
is due to the variation of
. reg y X1 X2
rainfall an fertilizer.
Source SS df MS Number of obs = 7
F( 2, 4) = 105.33
Model 1128.57143 2 564.285714 Prob > F = 0.0003
Residual 21.4285714 4 5.35714286 R-squared = 0.9814
As the level of fertilizer
Adj R-squared = 0.9720
Total 1150 6 191.666667 Root MSE = 2.3146
increase(X1), the level of
wheat yield (Y) increases
on average via 0.0381,
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
other things remain
constant .
X1 .0380952 .0058321 6.53 0.003 .0219027 .0542878 As the level of rainfall
X2 .8333333 .1543033 5.40 0.006 .4049186 1.261748
increase(X2), the level of
_cons 28.09524 2.491482 11.28 0.000 21.17777 35.0127
wheat yield (Y) increases
Without fertilizer and rainfall it is possible to produce on average via 0.833,
28.1 units of wheat (Y) other thing rema2in8
constant .
Thank you !!!

You might also like