0% found this document useful (0 votes)
51 views24 pages

Econometrics Chapter 3

Uploaded by

Ibrahin Abdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views24 pages

Econometrics Chapter 3

Uploaded by

Ibrahin Abdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CHAPTER 3

Multiple Linear
Regression Analysis:
Further Analysis
3.1 Multivariate Case of CLRM
 When we use more than one explanatory variables in
econometric model, we call it multiple regression model.
 The three variable PRF can be written as:
𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + 𝒖𝒊 , where:
Y is dependent variable, 𝑋1 𝑎𝑛𝑑 𝑋2 are explanatory
variables, 𝑢𝑖 is stochastic disturbance term
 𝛽1 is the intercept which measures average value of 𝑌 when
𝑋2 𝑎𝑛𝑑 𝑋3 are excluded from the model(or set to zero)
 𝛽2 𝑎𝑛𝑑 𝛽3 are the partial slope coefficients interpreted as:
 𝛽2 measures change in the mean value of Y for a unit change in
X2 holding the effect of X3 constant

 𝛽3 measures change in the mean value of Y for a unit change in


X3 holding the effect of X2 constant
Assumptions of Classical Linear Regression Models

i. Linear in the parameters


ii. The values explanatory variables are independent of
the error term
 Means that zero covariance between Xs and error term(u)
 Cov(x1 , u) = cov(X2, u) = 0

iii. Zero mean value of error term(u), i.e., E(u) = 0


iv. Homoscedasticity or constant variance of error term(u).
v. No autocorrelation between successive vales of error
term
vi. No exact collinearity between the explanatory variables
Estimation of multiple Regression Coefficients

 Sample regression function (SRF) corresponding to the


above PRF is written as follows:
෡𝟏 + 𝜷
𝒀𝒊 = 𝜷 ෡ 𝟐 𝑿𝟐𝒊 + 𝜷
෡ 𝟑 𝑿𝟑𝒊 + 𝒖ෝ𝒊
 OLS procedure require obtaining estimators so that the
residual sum of squares (RSS) is as small as possible.

 Differentiate RSS with respect to the unknowns, set the


resulting expressions to zero, and solve them
simultaneously.
 From the above equations we can obtain:
𝛽መ1 = 𝑌ത − 𝛽መ2 𝑋ത2 − 𝛽መ3 𝑋ത3
Variances and Standard Errors of OLS Estimators

 Having obtained the OLS estimators, we can derive


their variances and standard errors

The Multiple Coefficient of Determination R2
and the Multiple Coefficient of Correlation R

 It is the proportion of the variation in dependent


variable(Y) explained by the variables X2 and X3 jointly.
 To drive R2 let’s use SRF:
𝑌𝑖 = 𝛽መ1 + 𝛽መ2 𝑋2𝑖 + 𝛽መ3 𝑋3𝑖 + 𝑢ො 𝑖
= 𝑌෠ + 𝑢ො 𝑖 where 𝑌෠ is the estimated value of Yi
from the fitted regression line
 Squaring and summing and using deviation form:
σ 𝑦 2 = σ 𝑦ො 2 + σ 𝑢ො 2 + 2 σ 𝑦ො 𝑢ො
= σ 𝑦ො 2 + σ 𝑢ො 2
 TSS = ESS + RSS
 Substituting for σ 𝑢ො 2
 By rearranging
𝑬𝑺𝑺 = σ 𝒚 ෡ 𝟐 σ 𝒚𝒊𝒙𝟐 + 𝜷
ෝ𝟐 = 𝜷 ෡ 𝟑 σ 𝒚𝒊𝒙𝟑
 Now,
𝑬𝑺𝑺 ෡ 𝟐 σ 𝒚 𝒙 +𝜷
𝜷 ෡𝟑 σ 𝒚 𝒙
𝑹𝟐 = = 𝒊 𝟐
σ 𝒚𝟐
𝒊 𝟑
𝑻𝑺𝑺
ෝ𝟐
σ𝒖
𝑹𝟐 =𝟏−σ
𝒚𝟐
Example

 Consider the following summary of data on per capita


food consumption(Y), price of food( X2 ) and per capita
income( X3 ) for the years 1927-1941 in a country.
 n=15 σ 𝑥2 𝑦 = 27.63 σ 𝑥3 𝑦 = 257.397
σ 𝑥2 𝑥3 = 275.9 σ 𝑥 2 2 = 355.14 σ 𝑥 2 3 = 838.286
σ 𝑦 2 = 99.929 𝑌ത = 88.90667 𝑋ത2 = 85.9 𝑋ത3 = 56.2
 Note: the small case letters represent the deviation form
 𝑥2 𝑦 = (𝑋2 − 𝑋ത2 )(𝑌 − 𝑌),
ത 𝑥3 𝑦 = (𝑋3 − 𝑋ത3 )(𝑌 − 𝑌)

 𝑥2 = 𝑋2 − 𝑋ത2 , 𝑥3 = 𝑋3 − 𝑋ത3 , 𝑦 = 𝑌 − 𝑌ത
a) Fit the regression line that represent the relation
b) Calculate percentage(%) of variation in food
consumption explained by price and income
c) Estimate the error variance
d) Calculate the sample correlation between x2 and x3(r23),
y and x2(r12), and y and x3(r13)
e) Test the significance of regression coefficients. (use
tc=3.055
f) Test for model adequacy(use F0.05(2,13)=3.89
Model Specification

 What are the important considerations when choosing a


model?
 What are the consequences of choosing the wrong model?
 Are there ways of assessing whether a model adequate?
 Three essential features of model choice are:
 choice of functional form,
 choice of explanatory variables to be included in the model,

 whether the multiple regression model assumptions MR1–


MR6, listed hold
Model Selection Criteria

 A model chosen for empirical analysis should satisfy the


following criteria.(Hendry and Richard,1983)
1) Data admissible:- predictions made from the model
must be logically possible
2) Consistent with theory:
3) Regressors are weakly exogenous:- explanatory
variables must be uncorrelated with the error term.
4) Exhibit parameter constancy:- values of the parameters
should be stable. Otherwise, forecasting will be difficult.
5) Exhibit data coherency:- the residuals estimated from
the model must be purely random (white noise)
 In conjunction with the above considerations Adjusted R2,
AIC, SIC and BIC should be used
6) The Adjusted Coefficient of Determination(𝑹 ഥ 𝟐)
𝑅𝑆𝑆
 Note that: 𝑅2 =1 − , a measure of goodness of fit.
𝑇𝑆𝑆
 It shows the proportion of variation in a dependent
variable explained by the explanatory variables.
 it is desirable to have a model that fits the data well,
there can be a tendency of highest R2 value.
 The problem with R2 is that it can be large by adding
more and more variables, even if the variables added
have no justification.
 As variables added RSS goes down, and thus R2 goes up.
 To overcome such problem we have to adjust by the
degree freedom(df)
𝑹𝑺𝑺Τ𝑵−𝒌
 Adjusted R2: ഥ 𝟐
𝑹 =𝟏− Τ𝑻𝑺𝑺 𝑵−𝟏
 This measure does not always go up when a variable is
added, because of the degrees of freedom term N - K in the
numerator, it increases when the t-value for added variable
coefficient is greater than one.
Information Criteria

 The Akaike information criterion (AIC) is given by


𝑅𝑆𝑆 2𝑘
𝐴𝐼𝐶 = ln +
𝑁 𝑁
 and Schwarz criterion (SC), known as the BIC is by
𝑅𝑆𝑆 𝑘𝑙𝑛(𝑁)
𝑆𝐼𝐶 = ln +
𝑁 𝑁
 Prefer the model with the smallest AIC, or the smallest SC,
Types of Specification Error

1) Omitted Variables Bias (Under fitting a Model)


 Assume that on the basis of the criteria we arrive at a
model that we accept as a good model.
2 3
 Let this model be: 𝑦 = 𝛽1 + 𝛽2 𝑥 + 𝛽3 𝑥 + 𝛽4 𝑥 + 𝑢

 suppose for some reason a researcher decides to use the

following model: 𝑦 = 𝛼1 + 𝛼2 𝑥 + 𝛼3 𝑥 2 + 𝜀
 In this case there is a specification error, the error
consisting in omitting a relevant variable.
 Therefore, the error term:

𝜀 = 𝑢 + 𝛽4 𝑥 3
 Omission of a relevant variable leads to an estimator that
is biased, known as omitted-variable bias
 Example: The regression that analysis the impact of level
of education of both husband’s(Heduc) and wife’s(Weduc)
in years of education on Family income:
෣ = −5534 + 3132𝐻𝑒𝑑𝑢𝑐 + 4523𝑊𝑒𝑑𝑢𝑐
𝐹𝑎𝑚𝑖𝑛𝑐
 An additional year of education for the husband will
increase annual family income by $3,132, and
 an additional year of education for the wife will increase
family income by $4,523.
 When wife’s education is incorrectly omitted from the
equation, the estimated equation becomes:
෣ = −26191 + 5155𝐻𝑒𝑑𝑢𝑐
𝐹𝑎𝑚𝑖𝑛𝑐
 Omitting 𝑊𝑒𝑑𝑢𝑐 leads us to overstate the effect of an
extra year of education for the husband by $2,023.
 This change in magnitude of coefficient is the effect
of incorrectly omitting a relevant variable 𝑊𝑒𝑑𝑢𝑐.
 To give a general expression for this bias:
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑒
 Let b*1 be estimator of 𝛽1 when 𝑥2 is omitted from equation
∗ ∗ ෣
𝑐𝑜𝑣(𝑥1 ,𝑥2 )
𝑏𝑖𝑎𝑠 𝑏1 = 𝐸 𝑏1 − 𝛽1 = 𝛽2 ෣
𝑣𝑎𝑟(𝑥1 )
 Sign of 𝛽2 and the sign of the cov(x2, x3) tells us the
direction of the bias.
 If the sample covariance (or sample correlation)
between 𝑥1 and the omitted variable 𝑥2 is zero, then
the least squares estimator in the misspecified model is
still unbiased.
Consequences of Under fitting a model

 If the omitted, variable 𝑥2 is correlated with the included


variable 𝑥1 , that is, the correlation coefficient between the
two variables is nonzero 𝛽0 & 𝛽1 are biased as well as
inconsistent.
 Even if X1 and X2 are not correlated, 𝛽0 is biased,
although 𝛽1 is now unbiased.
 The error variance, σ2 is incorrectly estimated
 The variance of estimator is the biased estimator of
the variance of the true model estimator
 Confidence interval and hypothesis-testing procedures are
likely to give misleading conclusions about the statistical
significance of the estimated parameters.
 The forecasts based on the incorrect model and the
forecast (confidence) intervals will be unreliable.
2) Irrelevant Variables(Overfitting a model)

 Suppose that another researcher uses the following


model:
𝑦 = 𝛼1 + 𝛼2 𝑥 + 𝛼3 𝑥 2 + 𝛼4 𝑥 3 + 𝛼5 𝑥 4 + 𝜖
 This model constitutes a specification error, the error here
consisting in including an unnecessary or irrelevant
variable in the sense that the true model assumes 𝛼5 to
be zero.
 The new error term is: 𝜖 = 𝑢 + 𝛼5 𝑥 4
 Inclusion of irrelevant variables may inflate the variances
of your estimates
Consequences of Overfitting a model

 The OLS estimators of the overfitted model are all


unbiased and consistent
 The error variance σ2 is correctly estimated.
 Confidence interval and hypothesis-testing procedures
remain valid.
 However, the estimated α’s will be inefficient, that is, their
variances will be larger than those of the true model.

You might also like