Econometrics I CH 3 MLR
Econometrics I CH 3 MLR
3.1 Introduction
Y i ˆ 0 ˆ 1 X 1 i ˆ 2 X 2 i ...ˆ k X i k e i
OLS Cont’ed…
The regression line equation is Ŷ o 1 X 1i 2 X 2i ...k X ki
i
As discussed in simple linear regression, the estimates of multiple linear
regression will also be obtained by choosing the values of the coefficients or
parameters that will minimize the sum of squares of the residuals(RSS).
Symbolically,
n
ˆ
n n
m i n i m i z e i 1 e i 1 (Y i Y i ) i 1 (Y i β̂0 β̂1 X 1 i β̂2 X 2 i β̂k X k i )2
i
2 2
n n
( e ) 2
i [ (Yi β̂0 β̂1X 1i β̂2 X 2i β̂k Xki )2 ]
i1
0 i1 (Yi β̂0 β̂1X1i β̂2X2i β
ˆ X )X e X 0
ˆ ˆ
βk
k
OLS Cont’ed…
For the sake of simplicity, let’s start our discussion with the simplest
multiple regression model i.e. model with two explanatory variables, Y =
f(X1, X2) or
Y i o 1 X 1i 2 X 2 i ei
n n
( e ) i
2
[ (Yi β̂0 β̂1 X1i β̂2 X2i )2 ]
i 1
0 i1
0
ˆ βˆ0
0
n n
( e ) i
2
[ (Yi β̂0 β̂1 X1i β̂2 X2i )2 ]
i 1
0 i1
0
ˆ 1 βˆ1
n n
( e ) i
2
[ (Yi β̂0 β̂1 X1i β̂2 X2i )2 ]
i 1
0 i1
0
ˆ 2 βˆ2
OLS Cont’ed…
After differentiating we get the following normal
n n n n
Y i ˆ1 X
β̂0 β 1i βˆ 2 X 2i 0
i 1 i 1 i1 i 1
n n n
n n n n
YX 1i β̂0 X 1i ˆ1 X
β 2
1i
β̂2 X 2i X 1i 0
i1 i1 i1 i1
n n n n
YX 1i β̂0 X 1i ˆ1 X
β 2
1i
β̂2 X 2i X 1i .......... ........ .......(2)
i1 i1 i1 i1
n n n n
n n n n
yx 1i
ˆ1 x β̂2 x 2i x1i .......... ........ .......(5)
β 2
1i
i1 i1 i1
n n n
or Cramer’s rule.
OLS Cont’ed…
To use the Cramer’s rule, we need to rewrite the above two
equations in matrix form(notation) as follows
yx1i x1i2 x x βˆ1
x x
1i 2i
yx 2i 1i 2i x
2
2i
βˆ2
[F] [ A] [ ]
Then find the determinants of matrix A
1i x x
x x x x x x x x x
2
x
x
2
A
1i 2i 2 2 2 2
x1i x2i x 2i
1i 2i 1i 2i 1i 2i 1i 2i 1i 2i
ˆ1
A1
1i 2i yx2i x1i x2i
yx x 2
A x x 1i
2
2
2i x x 1i 2i 2
A2
1i
x 2
yx
yx x yx x
1i 2
x 2i
x x yx
1i 2i 2i
1i 1i 2i 1i
ˆ2
A2
yx x yx x x
2i
2
1i 1i 1i 2i
A x x x x 2
1i
2
2i 1i 2i
2
3.3. Partial Correlation Coefficients & their Interpretation
The coefficient of correlation r as a measure of the degree of
linear association between two variables.
For the three-variable regression model we can compute three
correlation coefficients: r12 (correlation between Y and X2), r13
(correlation coefficient between Y and X3), and r23 (correlation
coefficient between X2 and X3); notice that letting the subscript
1 represent Y for notational convenience.
A correlation coefficient that is independent of the influence, if any, of
X3 on X2 and Y. Such a correlation coefficient can be obtained and is
known appropriately as the partial correlation coefficient.
Partial Correlation Coefficients cont’ed…
We define
r12.3 = partial correlation coefficient between Y and X2, holding
X3 constant
r13.2 = partial correlation coefficient between Y and X3, holding
X2 constant
r23.1 = partial correlation coefficient between X2 and X3, holding
Y constant
They can be easily obtained as :
Partial Correlation Coefficients Cont’ed…
The expression may be called the coefficient of partial
determination and may be interpreted as the proportion of the
variation in Y not explained by the variable X3 that has been
explained by the inclusion of X2 into the model.
3.4. Coefficient of Multiple Determination
In SLR, we saw how the proportion of the variation in Y is explained by the
variable X only, which is measured by R2.
But, in MLR, we would like to know the proportion of the variation in Y
explained by the variables X1…Xk, which are incorporated in the model,
jointly.
The measure of the goodness of fit in the multiple regression case is called the
coefficient of multiple determination.
It explains the proportion of the variation in y which is due to the variables x1 and x2
jointly.
It measures the proportion of total variation in the dependent variable Y explained by
the explanatory variables jointly.
Therefore, the notion of R2 can be easily extended to regression models
containing more than two(k) variables as follows
ESS ŷ 2
e 2
β̂1 x1i y β̂2 x2i y ... β̂k x ki y
R 2Y,X1 ...Xn 1
TSS y 2
y 2
y 2
Coefficient of Multiple Determination Cont‘ed…
In case of simple multiple linear regression, regression with
two explanatory variables, R2 is thus:
ESS ŷ e2 β̂ x y β̂ x y
2
1 1i 2 2i
R 2
1
TSS y
Y,X 1 X 2
y2
y2 2
One limitation with R2 is that it can be made large by adding more and more
variables, even if the add variables have no economic justification or contribution.
Algebraically,
• as the variables are added the sum of squared residuals (RSS) goes down
(it can remain unchanged, but this is rare) and thus R2 goes up.
Coefficient of Multiple Determination Cont‘ed…
This makes R2 a poor measure for deciding whether to
add or remove a variable or several variables to/from a
model.
So as to alleviate such weakness, an alternative
measure of goodness of fit is known as the adjusted
R2. e 2
AdjustedR 2 R 2 1 n k 1 1 R 2 n 1
y 2
n k
n 1
• If useful predictors are added to the model, then the adjusted R2 will
increases. And if you useless predictors added to the model , the
adjusted R2 will decrease.
But with small samples, if the number of regressors (X’s) are large in relation
to the sample observations, the adjusted R2 will be much smaller than R2.
16
3.5 Assumptions & Properties OLS (MLR)
1. The regression is linear in parameter
2. The conditional mean of error term, , is zero
3. Homoscedasticity: The variance of each i is the same for all
the Xis values.
4. Errors are uncorrelated across observations(no
autocorrelation or no serial correlation):
5. All independent variables(Xs) are uncorrelated with the error
term(ɛi)
6. No independent variable is a perfect linear function of any
other independent variable (no perfect multi-collinearity).
That means cov(X1iX2i) shouldn't be near to 1 or -1
Assumptions & Properties OLS cont’ed…
7. The error term is normally and independently distributed,
i.e., i ~ NIID(0, 2). .
8. All Xs are assumed to be non-stochastic, and must take at
least two different values.
9. The number of observations n must be greater than the
number of parameters(k) to be estimated.
10. Correct specification of the model:
• The model has no specification error in that all the important
explanatory variables appear explicitly in the function and the
mathematical form is correctly defined.
• In other words, we have all the right X’s on the right hand side, we
squared them if they should be squared, we took logarithms if they
should be in logarithms, etc.
Assumptions & Properties OLS cont’ed…
As we proved in chapter two, given the assumptions of the
classical linear regression model, the OLS estimators possess
some ideal or optimum properties.
• That is, OLS estimators are the Best Linear Unbiased Estimators
(BLUE) of 0,1,2…k. Specifically
• Linearity: All coefficients are a linear function of the random terms.
• Unbiasedness: The mean values of the sample parameters are equal
to the population.
ˆ
• Minimum variance e E (βˆ) βi
• Efficiency : OLS estimators fulfill both unbiasedness and minimum variance
property
Var ( i ) OLS Var (~i )other
Assumptions & Properties OLS cont’ed…
For the case of simple multiple linear regression that is
Yi βo β1 X1i β2 X2i ei and following the same procedure that
we made in chapter two, the variances of are ˆ , ˆ & ˆ given 0 1 2
as follows
x X 22 x 2X X x x
1 X
2 2 2
1
ˆ0 ) σ
Var(β ˆ
2 2 1 1 2 1 2
x x x x
2
n 2
1
2
2 1 2
σ̂2 x 22 n
V ar ( β̂1 )
x x
σ̂ e
x x
2 2
2 2
and i
22 1 2 i1
1
ˆ x
σ
2 2
Var( βˆ2 ) 1
x x x
2
x 12 2
2 1 2
3.6. Interval Estimation and Hypothesis testing
The principle involved in interval estimation and testing
multiple linear regressions parameters is identical with that of
simple linear regression.
Interval estimate consists of two numerical values(two
intervals) that, with specified degree of confidence, we feel
includes the parameters to be estimated.
For the case of k explanatory variables the interval estimates or
confidence intervals with some degree of freedom and
confidence level is
Β
ˆ i (t α/2
nk
)s ê( ˆi )
Interval Estimation and Hypothesis testing cont’ed…
o Hypothesis testing: In case of multiple linear
regression we see two types of hypotheses testing
1) Hypothesis testing about individual partial regression
Coefficients
• We can test whether a particular variable X1 or X2 or Xk is
significant or not holding the other variable constant.
Y β0 β1 X 1 β2 X 2 . . . βk X k e i
H0 : β1 0 H1 : β1 0
H0 : β2 0 H1 : β2 0
• Since there is only one restriction and the population standard deviation
is mostly unknown, the t test is used to test a hypothesis about any
individual partial regression coefficient
Interval Estimation and Hypothesis testing cont’ed…
2) Testing the overall significance of a regression
• This test aims at finding out whether the explanatory variables (X1,
X2, …Xk) do actually have any significant influence on the dependent
variable.
• The specification of the overall or joint significance test for the
regression of the form Y β0 β1X 1 β2X 2 ... βk X k e i is as
follows
H 0 : β1 β2 ... βk 0 H 1 : Not all βi are zero
• As can be seen from specification, there are more than one
restrictions, the appropriate test statistics is therefore F-test.
• The joint hypothesis can be tested by the analysis of variance (AOV)
technique. The following table summarizes the idea.
Interval Estimation and Hypothesis testing cont’ed…
Source of Sum of Squares Degrees of Mean square
variation (SS) Freedom (Df) (MSS)
The model ESS k–1 ESS/k-1
Residual RSS n– k RSS/n-k
Total TSS n–1 TSS/n-1
Then to undertake the test first find the calculated value of F
and compare it with the F tabulated value.
The calculated value of F can be obtained by using the following
formula.
Fstat
ŷ k 1 ESS k 1
i
2 R2
k 1 R (n k)
2
e n k RSS n k (1 R ) n k (1 R )(k 1)
2
i
2 2
24
Interval Estimation and Hypothesis testing cont’ed…
o When R2 = 0, F is zero. The larger the R2, the greater the
F value. In the limit, when R2 = 1, F is infinite.
o Thus the F test, which is a measure of the overall
significance of the estimated regression, is also a test of
significance of R2. Testing the null hypothesis is equivalent
to testing the null
o hypothesis that (the population) R2 is zero.
o Decision Rule:
• If Fstat > Ftabulated (F(k – 1, n – k)), reject H0: otherwise you may
accept it,
• Where F(k – 1, n – k) is the critical F value at the level of
significance and (k – 1) numerator df and (n – k) denominator df.
Example
Exercise 1: Suppose we have data on wheat yield (Y), amount of fertilizer
applied (X1) and amount of rainfall (X2). It is assumed that the fluctuations
in yield can be explained by varying levels of rainfall and fertilizer. And
assume the regression model has the following form Y β0 β1X 1 β2X 2 i
The hypothetical data is given as follow
Yield Fertilizer Rain fall
(Y) (X1) (X2) Y2 X 2
1
X 22 YX1 YX2 X1X2
obs
1 40 100 10 1600 10000 100 4000 4000 1000
2 50 200 20 2500 40000 400 10000 10000 4000
3 50 300 10 2500 90000 100 15000 15000 3000
4 70 400 30 4900 160000 900 28000 28000 12000
5 65 500 20 4225 250000 400 32500 32500 10000
6 65 600 20 4225 360000 400 39000 39000 12000
7 80 700 30 6400 490000 900 56000 56000 21000
sum 420 2800 140 26350 1400000 3200 184500 184500 63000
Based on the given information
a) Estimate the regression line and interpret the results
d) Compute confidence intervals for the intercept and slope coefficients at 95%
confidence level and interpret the result
f) Test the over all significance or fitness of the model at 95% confidence
level and interpret the result
The STATA results are presented as follows
About 98.1 % of the
variation of wheat yield
is due to the variation of
. reg y X1 X2
rainfall an fertilizer.
Source SS df MS Number of obs = 7
F( 2, 4) = 105.33
Model 1128.57143 2 564.285714 Prob > F = 0.0003
Residual 21.4285714 4 5.35714286 R-squared = 0.9814
As the level of fertilizer
Adj R-squared = 0.9720
Total 1150 6 191.666667 Root MSE = 2.3146
increase(X1), the level of
wheat yield (Y) increases
on average via 0.0381,
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
other things remain
constant .
X1 .0380952 .0058321 6.53 0.003 .0219027 .0542878 As the level of rainfall
X2 .8333333 .1543033 5.40 0.006 .4049186 1.261748
increase(X2), the level of
_cons 28.09524 2.491482 11.28 0.000 21.17777 35.0127
wheat yield (Y) increases
Without fertilizer and rainfall it is possible to produce on average via 0.833,
28.1 units of wheat (Y) other thing rema2in8
constant .
Thank you !!!