Chapter 3
Chapter 3
3.1. Introduction
We have studied the two-variable model extensively in the previous unit. But in
economics you hardly found that one variable is affected by only one explanatory
variable. For example, the demand for a commodity is a function of price of the
product, price of other goods (substitute and complementary goods), income of the
consumer, wealth, previous year income consumption behavior etc. Hence the two
variable model is often inadequate in practical works. Therefore, we need to
discuss multiple regression models. The multiple linear regression is entirely
concerned with the relationship between a dependent variable (Y) and two or more
explanatory variables (X1, X2… Xn). For the sake of simplicity let’s consider a three
variable case (one dependent and two explanatory variables).
Y = f(X1, X2)
Yi 0 1 X 1 2 X 2 Ui 3.1
0 Measures the constant term and measuring the average value of Y when
X1 and X2 are zero.
1 Measures the change in Y for a unit change in X1 alone (the effects of X1
on Y given that X2 is constant.)
2 Measures the change in Y for a unit change in X2 alone (the effects of X2
on Y given that X1 is constant).
The coefficients 1 and 2 are called the partial regression coefficients.
Like simple linear regression, the model is assumed to be linear in parameters but
not necessarily in variables. Hence, the linear and nonlinear model parameter
interpretations are not the same. Let’s discuss some forms of nonlinear and linear
models below.
As mentioned above, linear regression does not mean that the involved
variables have to be in the linear form. It needs the regression has to be linear
in parameters (in B’s). Hence, we might include different forms of explanatory
(and explained) variables
o Does the effect of a variable peak at some point and then start to
decline?
2|Page Compiled By: Muhammed Siraj
Econometrics I Lecture Note DBU
Assumes that the relationship between the explanatory variable and the
dependent variable is constant:
𝜕𝑦
= 𝛽𝑘 , k = 1,2
𝜕𝑥𝑘
Linear form is used as default functional form until strong evidence that
it is inappropriate is found.
𝜕𝑦⁄
𝜕𝑙𝑛𝑦 𝑦
= = 𝛽𝑘 , k = 1,2
𝜕𝑙𝑛𝑥𝑘 𝜕𝑥𝑘⁄
𝑥𝑘
Before using a double-log model, make sure that there are no negative or
zero observations in the data set.
Yi 0 Li 1 ki 2 eu
B B
Let Y= output, L is labor & K is capital and 0 , 1 & 2 are parameters then to
𝑦 = 𝛽0 + 𝛽1 𝑙𝑛𝑋1 + 𝛽2 𝑙𝑛𝑋2 + 𝑢
𝜕𝑦 𝜕𝑦
• Then, 𝜕𝑙𝑛𝑥𝑘
= 𝜕𝑥𝑘 = 𝛽𝑘 , k = 1,2
⁄𝑥
𝑘
𝛽𝑘
Interpretation: if x increases by 1 percent, then y will change by units.
100
𝑙𝑛𝑦 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝑢
𝜕𝑦⁄
𝜕𝑙𝑛𝑦 𝑦
• Then, = = 𝛽𝑘 , k = 1,2
𝜕𝑥𝑘 𝜕𝑥𝑘
1
II. Inverse Form: 𝑦 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑋22 + 𝑢
1
• In most cases, either linear form is adequate, or common sense will point out
an easy choice from among the alternatives
3.3. Assumptions
3. Homoscedasticity
The variance of each Ui is the same for all the Xi values var (Ui) = E(Ui2) = u2
5|Page Compiled By: Muhammed Siraj
Econometrics I Lecture Note DBU
4. Normality
The values of each Ui are normally distributed Ui ~ N(0, u2 )
Here the values of the X’s are a set of fixed numbers in all hypothetical samples
We can’t exclusively list all the assumptions but the above assumptions are
some of the basic assumptions that enable us to proceed our analysis.
𝑌𝑖 𝑋1𝑖 𝑋2𝑖
𝑌1 𝑋11 𝑋21
𝑌2 𝑋12 𝑋22
𝑌3 𝑋13 𝑋23
𝑌𝑛 𝑋1𝑛 𝑋2𝑛
Yi = o 1 X 1i 2 X 2i ei
Where o , 1 and 2 are estimates of the true parameters o , 1 and 𝛽2
Yˆi = o 1 X 1i 2 X 2i is the estimated Regression line.
n
Min ei2 = (Yi – Yˆi )2 =
i
(Yi - o 1 X 1i 2 X 2i )2------------ 3.5
A necessary condition for a minimum value is that the partial derivatives of the
above expression with respect to the unknowns (i.e. o , 1 and 2 ) should be
set to zero.
2
Yi 0 1 X 1 2 X 2
ei
2
2 Yi ˆ X ˆ X ) 0 --------3.6
0 1 1 2 2
0 0
2
Yi 0 1 X 1 2 X 2
ei
2
2 X [Yi ˆ ˆ X ˆ X ] 0 ---- 3.7
1 0 1 1 2 2
1 1
2
Yi 0 1 X 1 2 X 2
ei
2
2 X [Yi ˆ ˆ X ˆ X ] 0 -----3.8
2 0 1 1 2 2
2 2
After differentiating, we get the following normal equations (for the sake of
simplicity let’s ignore the i’s sub-scripts):
𝑌𝑖 = 𝑛 o + 1 𝑋1 + 2 𝑋2 − − − − − − − − − − − − − − − − − − − − 3.9
𝑋1 𝑌𝑖 = o 𝑋1 + 1 𝑋12 + 2 𝑋1 𝑋2 − − − − − − − − − − − − − − − − − 3.10
𝑋2𝑖𝑌𝑖 = o 𝑋2 + 1 𝑋1 𝑋2 + 2 𝑋22 − − − − − − − − − − − − − − − −3.11
If you write the above equation in lower cases letters for deviations from means we
can write the above equations in the following ways.
x1 y ˆ1 x1 ˆ2 x1 x2
2
1 and 2 can easily be solved using matrix. We can rewrite the above two equations
in matrix form as follows.
x1
2
x1 x2 𝛽̂1 x1 y
[ ][ ] = [ ]-------------------------------------3.14
x1 x2 x 22 𝛽̂2 x2 y
1= x y x x y x x
1
2
2 2 1 2
x x x x
2
1
2
2 1 2
2
2= x y x x y x x
2
2
1 1 1 2
x x x x
2
1
2
2 1 2
2
i.e., y = Yi - Y , x1 = X1 - X 1 , x2= X2 - X 2
Note: The values for the parameter estimates ( o , 1 and 2 ) can also be
obtained by using other methods.
We can also express 1 and 2 in terms of covariance and variances of 𝑋1 , 𝑋2 𝑎𝑛𝑑 𝑌1
𝐶𝑜𝑣 (𝑋1 , 𝑌)𝑉𝑎𝑟(𝑋2 ) − 𝐶𝑜𝑣 (𝑋2 , 𝑌)𝐶𝑜𝑣(𝑋1 , 𝑋2 )
𝛽̂1 = − − − − − − − −3.15
𝑉𝑎𝑟(𝑋1 )𝑉𝑎𝑟(𝑋2 ) − [𝐶𝑜𝑣(𝑋1 , 𝑋2 )]2
yˆ Y Y e
2 2 2
RSS
= =1– 1
2 i i i
R
y Y Y y
y. X1 X 2 2 2 2
i i i
TSS
yi = ŷ i + ei
ei2 = (yi - ŷ i )2 = (yi - 1 x1i - 2 x 2i )2
or ei2 = ei .ei = ei(yi - 1 x1i - 2 x 2i )= ei .yi - 1 ei .x1 - 2 ei .x2
y i 2 1 x1i y i 2 x yi
2i
e
2
R 2 y. X1 X 2 = 1 – i
1
y yi
2 2
i
2
R =
ESS
ˆ 1
y x1 ˆ x2 y
2
, where x1i, x2i and yi are in their deviation forms.
TSS y 2
The value of R2 lies between 0 and 1. The higher R2 the greater the percentage of
the variation of Y explained by the regression plane, that is, the better the
goodness of fit of the regression plane to the sample observations. The closer R 2
2
If you compare R equation with the previous r2
x1 y
r2 =
ˆ1
y2
explanatory variable X2i. Then as you increase your explanatory variables the
Due to this reason we call it unadjusted R2. Therefore, if we have two models with
Therefore, to correct for this defect we adjust R2 by taking into account the degrees
Adjusted R 2
e nk
2
i
n 1
R 2 =1 - =1-(1- R 2 )
nk
yi 2
n 1
If the sample size is small R 2 < R 2 but if the sample size is very large R 2 and R 2
Note that:
If R 2 =1, R 2 =1,
(1−𝑘)
When R 2 =0, R 2 = in this case R 2 will be negative if k>1.
(𝑛−𝑘)
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝑢
Solving the above normal equations will result in algebraic complexity. But we
can solve this easily using matrix. Hence in the next section we will discuss the
matrix approach to linear regression model.
3.6.1 Matrix Approach to Linear Regression Model
We have seen, in simple linear regression that the OLS estimators (𝛽̂0 & 𝛽̂1 ) satisfy
the small sample property of an estimator i.e. BLUE property. In multiple
regression, the OLS estimators also satisfy the BLUE property. Now we proceed to
examine the desired properties of the estimators in matrix notations:
The principle involved in testing multiple regression is identical with that of simple
regression. However, in multiple regression models we will undertake two tests
of significance. One is significance of individual parameters of the model.This
test of significance is the same as the tests discussed in simple regression
model. The second test is overall significance of the model.
Var ( ( ˆ0 ) 1 2 2 1 1 2 1 2 2
u
n x x ( x x )
2 2 2
1 2 1 2
x2
2
x1
2
Var ( ( ˆ1 ) ˆ u Var( ˆ 2 ) = ˆ u
2 2
x1 x2 ( x1 x2) 2 x1 x2 ( x1 x2) 2
2 2 2 2
Where ˆ u 2 =
ei2 , n is number of sample, k is number of parameters which are
nk
estimated . Thus, the Standard errors of ̂ i ' s can be found as follows
se( ˆ0 ) Var ( ˆ0 ) se(ˆ1 ) Var (ˆ1 ) se(ˆ2 ) Var (ˆ2 )
1) A) H 0 : 1 0
H 1 : 1 0
B) H0 : 2 0
H1 : 2 0
The null hypothesis (A) states that, holding 𝑋2 constant 𝑋1 has no (linear) influence
on Y. Similarly hypothesis (B) states that holding 𝑋1 constant, 𝑋2 has no influence
on the dependent variable Yi. To test these null hypothesis we will use the
following tests:
I) Standard error test: under this and the following testing methods we
test only for 𝛽̂1 and 𝛽̂2 . 𝑇he test for 𝛽̂0 will be done in the same way.
x2
2
se(ˆ1 ) ˆ
Var (1 ) = ˆ u
2
x1 x2 ( x1 x2) 2
2 2
x1
2
∑ 𝑒𝑖2
se(ˆ2 ) ˆ
Var ( 2 ) = ˆ u , where 𝜎
̂ 2
=
2
𝑢 𝑛−3
x1 x2 ( x1 x2) 2
2 2
ˆ ˆ
a) If we find that S.E (𝛽̂1 ) > 1 and S.E ( ˆ 2 )'s > 2 (if the estimated S.E is
2 2
greater than half of the estimators).
We don’t reject the null hypothesis and interpret as stated above or
Reject the alternative hypothesis which says the ˆ1 , and ˆ2 are
different from zero
ˆ ˆ
b) If we find that S.E (𝛽̂1 ) < 1 and S.E ( ˆ 2 )'s < 2
2 2
It means if the S.E are less than half of the estimators then we can
interpreter it as follows.
Reject the null hypothesis which means the value of ˆ1 and ˆ2 is
equal to zero. In other words, accept the alternative means the value
of ˆ1 and ˆ2 are different from zero.
There is a relationship between the dependent variable Yi and the
independent variables X1 & X2 or X1 & X2 explains Yi
ˆ1 and ˆ2 are significant
Note: The smaller the standard errors, the stronger the evidence that the
estimates are statistically reliable.
II) The student’s t-test: We compute the t-ratio for each ̂ i ' s
i i
t=
~ 𝑡𝛼(𝑛−𝑘) (i = 0, 1, 2, …., k)
2
S ( i )
This is the observed (or sample) value of the t ratio, which we compare with the
theoretical value of t obtainable from the t-table with n – k degrees of freedom. The
theoretical values of t (at the chosen level of significance) are the critical values
that define the critical region in a two-tail test, with n – k degrees of freedom.
H0: i = 0
The null hypothesis states that, holding X2 constant, X1 has no (linear) influence
on y.
22 | P a g e Compiled By: Muhammed Siraj
Econometrics I Lecture Note DBU
If the computed t value exceeds the critical t value at the chosen level of
significance, we may reject the null hypothesis; otherwise, we may accept it ( 1 is
not significant at the chosen level of significance and hence the corresponding
regression does not appear to contribute to the explanation of the variations in Y).
Acceptance
region
Critical region
95%
Critical region (2.5%)
2.5%
0
Fig 3.7.1. 95% confidence interval for t
Note that the greater the value of t calculated the stronger is the evidence that i
is significant. For a number of degrees of freedom higher than 8 the critical value
of t (at the 5% level of significance) for the rejection of the null hypothesis is
approximately 2.
Throughout the previous section we were concerned with testing the significance
of the estimated partial regression coefficients individually, i.e. under the separate
hypothesis that each of the true population partial regression coefficient was zero.
In this section we extend this idea to joint test of the relevance of all the included
explanatory variables. Now consider the following:
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝑢
The test of the overall significance of the regression implies testing the null
hypothesis
H1: not all i ’s are zero (at least one of the i ’s is non-zero)
If the null hypothesis is true, then there is no linear relationship between y and
the regressors.
The above joint hypothesis can be tested by the analysis of variance (AOV)
technique. The following table summarizes the idea.
k 1
k
The test procedure for any set of hypothesis can be based on a comparison of the
sum of squared errors from the original, the unrestricted multiple regression
model to the sum of squared errors from a regression model in which the null
hypothesis is assumed to be true. When a null hypothesis is assumed to be true,
we in effect place conditions or constraints, on the values that the parameters can
take, and the sum of squared errors increases. The idea of the test is that if these
sum of squared errors are substantially different, then the assumption that the
joint null hypothesis is true has significantly reduced the ability of the model to
fit the data, and the data do not support the null hypothesis.
If the null hypothesis is true, we expect that the data are compliable with the
conditions placed on the parameters. Thus, there would be little change in the
sum of squared errors when the null hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors
in the model obtained by assuming that the null hypothesis is true and URSS be
the sum of the squared error of the original unrestricted model i.e. unrestricted
residual sum of square (URSS). It is always true that RRSS - URSS > 0.
(TSS RSS )
yˆ k 1 k 1 ESS k 1
2
F= i
U n k
i
2
RSS n k RSS N k
f(F)
5% of area
F
0 1 2 3 4 5
When R2 = 0, F is zero. The larger the R2, the greater the F value. In the limit, when
R2 = 1, F is infinite. Thus the F test, which is a measure of the overall significance
of the estimated regression, is also a test of significance of R2. Testing the null
26 | P a g e Compiled By: Muhammed Siraj
Econometrics I Lecture Note DBU
Importance of F-test
This test is undertaken in multiple regression analysis for the following reasons.
a) To test the overall significance of the explanatory variables i.e. whether the
explanatory variables X1,X2 actually do have influence on the explained
variable Yi or not
b) Test of improvement, means by introducing one additional variable in the
model to test that whether the additional variable will improve the influence
of the explanatory variable on the explained variable or not
Ex. Yi 0 1 X 1 2 X 2 U i
Yi 0 1 X 1 2 X 2 3 X 3 U i
Now if you take the first equation it has only two explanatory variables X 1
&X2 but in the second we included X3. Then the addition of X3 may affect
positively or negatively the relationship between Y & X1, X2.
C) To test the equality of coefficients obtained from different sample size (chaw
test). Suppose you may have a sample data of agricultural output of North
Shoa Zone from 1974 E.C. up to 1994 E.C. Now if you want to know the
change in agricultural output before & after the fall down of Derge (1983).
By splitting these data in to two you can compare the coefficients & by doing
so you can undertake F-test and see whether there is a change in
agricultural output or not.
D) Testing the stability of the coefficients of the variables as the number of
sample size increase
Ex. You may take first a 10 year sample for your study & estimate your
estimators. But if you increase the number of sample size in to 15 years, will
the coefficients of the variables are stable or not will be tested using F-tests.
This has been discussed in chapter 2. The (1 - ) 100% confidence interval for i is
given by
i t/2 S( i ), (i = 0, 1, 2, 3, … k)
Example: Suppose we have data on wheat yield (y), amount of rainfall (x2), and
amount of fertilizer applied (X1). It is assumed that the fluctuations in yield can be
explained by varying levels of rainfall and fertilizer.
Table 3.7.1
Yield Fertilizer Rain yi x1i x2i x1i yi x2i yi x1x2
(Y) (X1) fall (X2)
40 100 10
50 200 20
50 300 10
70 400 30
65 500 20
65 600 20
80 700 30
Y = 60 X 1 = 400 X 2 = 20 (means)
1. Find the OLS estimators (i.e., o , 1 and 2 )
Solutions: The formula for o , 1 and 2 are
o = Y 1 X1 2 X 2
x y x x y x
2
x 2i Where x’s and y’s are in deviation
1=
1i i 2i 2i i 1i
x x x x
2
1i
2
2i 1i 2i
2 forms
x y x x y x
2
x 2i
2=
2i i 1i 1i i 1i
x x x x
2
1i
2
2i 1i 2i
2
Now find the deviations of the observations from their mean values. (Column 4 to
11 in the above table). The next step will be to insert the following values (in
deviation) in to the above formula
x1i yi = 16500, x2i2 = 400, x2iyi = 600, x1i x2i = 7000, x1i2 = 280,000,
Now o = 60 – (0.0381) (400) – (0.833) (20) = 28.1
2. Find the variance of 1 , 2
Solution
ˆ u2 . x 22 ˆ 2 x22
Var( 1 ) = , Var ( 2 ) =
x x x x x x x x
2 2 2 2 2 2
1 2 1 2 1 2 1 2
ˆ = 2 e 2
i
but ei = Yi - Yˆ
nk
u
e2 = y2 - ŷ 2
e3 = y3 - ŷ 3
: :
Therefore, ei2 = (Yi - Y )2
Y Yˆ
i
2
0.0576
5.6644
21.4286
21.4286
Hence u2 = = 5.3532
73
(5.3572)(400)
Var( 1 ) = = 0.000034
(280,000)(400) (7000) 2
S( 1 ) = 0.000034 = 0.0058
(280,000)
Var( 2 ) = (5.3572) = 0.02381
63,000,000
S( 2 ) = 0.02381 = 0.1543
Interpretation: 98% of the variation in yield is due to the regression plane (i.e.,
because of variation in the amount of fertilizer and rainfall). The model is a good fit.
4. Test (a) the null hypothesis H0: 1 = 0 against the alternative hypothesis
H1: 1 0
1 1 0.0381
t=
= 6.5689 – calculated value
S ( 1 ) 0.0058
ttabulated = t 0.05 (7 3) = 2.78- can be found from the statistical table (t-distribution)
2
(b) H0: 2 =0
Decision: Since tcalculated > ttabulated , we reject H0. 2 is statistically significant
1 - t/2(n-k) S( 1 ) < 1 < 1 + t/2(n-k) S( 1 ) ,
0.0219< 1 <0.0542
Interpretation: The value of the true population parameter 1 will lie between 0.0219
and 0.0542 in 95 out of 100 cases.
Note: The coefficient of X1 and X2 ( 1 and 2 ) measures the partial effect. For example
1 measures the rate of change of Y with respect to X1 while X2 is held constant
H0: 1 = 2 = … = k = 0
R 2 (k 1) 0.98 (3 1) 0.49
Fcal = 98
(1 R ) N k (1 0.98) (7 3) 0.005
2
Decision: we reject H0 since Fcal > Ftab. We accept that the regression is significant:
not all i ’s are zero.
Now we can compute the partial correlations from the simple or zero order,
correlation coefficients as follows.