0% found this document useful (0 votes)
7 views

Multiple regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Multiple regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

7/12/2024

Multiple Linear Regression • General expression of a multiple linear regression


model with k explanatory variables
• Multiple regression analysis allows us to explicitly 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + …+ 𝛽 𝑥 + 𝑢
control for many factors that simultaneously affect
• Here, 0 is intercept, and 1 to 𝑘 are the population
the dependent variable.
parameters of the respective variables.
• Let us consider a regression model with two
• There are 𝑘 + 1 unknown population parameters.
independent variables
• 𝑢 is the error term or disturbance term. It contains
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝑢
factors other than 𝑥1, 𝑥2, … , 𝑥𝑘 that affect 𝑦.
• One assumption is to made here is that the
• Note: No matter how many explanatory variables we
explanatory variables are independent of 𝑢 i.e.,
include in the model, there will always be factors we
𝐸 𝑢 𝑥 ,𝑥 =0 cannot include. These are collectively included in 𝑢.
2 4

Example
• Suppose, wage is determined by two explanatory
variables education and experience, and other • The assumption 𝐸 𝑢 𝑥 , 𝑥 , … , 𝑥 = 0 means that
unobservable factors which are contained in 𝑢. all factors in the unobserved error term be
𝑤𝑎𝑔𝑒 = 𝛽 + 𝛽 𝑒𝑑𝑢𝑐 + 𝛽 𝑒𝑥𝑝𝑒𝑟 + 𝑢 uncorrelated with the explanatory variables.
Exercise • If there is any problem that causes 𝑢 to be correlated
Can you write the zero conditional mean assumption for with any of the explanatory variables, the above
the wage example? assumption fails to hold.
Ans: Education and experience are independent of 𝑢, • This in turn will generate bias in the parameters.
𝐸 𝑢|𝑒𝑑𝑢𝑐, 𝑒𝑥𝑝𝑒𝑟 = 0
This means that other factors affecting wage are not
related to education and experience, on average.
3 5
7/12/2024

Mechanics of obtaining the OLS estimates Interpretation of the OLS regression equation
• The estimated model is 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥
• The estimated model with two explanatory variable • The intercept, 𝛽 indicates the predicted value of 𝑦
variables is written as 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 when 𝑥 = 0 and 𝑥 = 0.
• This is also called sample regression function or • The estimates 𝛽 and 𝛽 measure the partial effect and
regression line. has ceteris paribus interpretation.

• 𝛽 , 𝛽 , 𝛽 are the estimates of 𝛽 , 𝛽 , 𝛽 , respectively. • We can write, ∆𝑦 = 𝛽 ∆𝑥 + 𝛽 ∆𝑥

• These estimates are obtained through ordinary least • When 𝑥 is held fixed i.e. ∆𝑥 = 0, we get the
predicted change in 𝑦 when 𝑥 changes as ∆𝑦 = 𝛽 ∆𝑥
squares (OLS) method. The OLS method chooses the
estimates to minimize the sum of squared residuals. • When 𝑥 is held fixed i.e. ∆𝑥 = 0, we get the predicted
change in 𝑦 when 𝑥 changes as ∆𝑦 = 𝛽 ∆𝑥
6 8

For a k explanatory variable model


Suppose there are 𝑛 observations on 𝑦, 𝑥1, and 𝑥2, the
Estimated model: 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ + 𝛽 𝑥
𝛽 , 𝛽 , 𝛽 are chosen simultaneously to make
In terms of change we can write,
∑ 𝑦 −𝛽 −𝛽 𝑥 −𝛽 𝑥 as small as possible. ∆𝑦 = 𝛽 ∆𝑥 + 𝛽 ∆𝑥 + ⋯ + 𝛽 ∆𝑥
The coefficient on 𝑥 i.e. 𝛽 measures the change in 𝑦
Exercise: Write the above expression for wage example. due to a unit change in 𝑥 , holding the rest of the
∑ 𝑤𝑎𝑔𝑒 − 𝛽 − 𝛽 𝑒𝑑𝑢𝑐 − 𝛽 𝑒𝑥𝑝𝑒𝑟 explanatory variables fixed.
Exercise: Write the above expression for a k explanatory For instance, 𝛽 measures the change in 𝑦 for a one-
unit change in 𝑥 , holding 𝑥 , 𝑥 , … , 𝑥 fixed.
variable model.
∆𝑦 = 𝛽 ∆𝑥
∑ 𝑦 −𝛽 −𝛽 𝑥 − 𝛽 𝑥 − ⋯− 𝛽 𝑥 Thus we have controlled for 𝑥 , 𝑥 , … , 𝑥 when
estimating effect of 𝑥 .
7 9
7/12/2024

Exercise Interpretation of regression coefficient on education


A researcher has regressed log of hourly wage on years
of education, years of experience, and tenure i.e. years • The regression coefficient on education is 0.092.
with the current employment using a sample of 526 This means that there is a positive relationship
workers. The estimated equation is as follows between log of wage and education. Holding
experience and tenure fixed, another year of
𝑙𝑤 = 0.284 + 0.092𝑒𝑑𝑢𝑐 + 0.004𝑒𝑥𝑝𝑒𝑟 + 0.022𝑡𝑒𝑛𝑢𝑟𝑒
education is predicted to increase log of wage by
Interpret the model.
0.092 or 9.2 percent.
• Intuitively, if we take two people with the same
year of experience and tenure, the coefficient on
education indicates the change in the predicted
wage when their education differs by one year.

10 12

Changing more than one independent


Interpretation of Intercept variable simultaneously
• When a worker completes another year at the same
The intercept 0.284 indicates the predicted log of
firm, both experience and tenure changes. Holding
wage when the values of all explanatory variables are
education fixed, we want to know the effect of such
set as zero.
change on wage.
In this particular instance, the intercept has no
• How can we find that?
meaningful meaning.
• From the estimated regression line, we can write
∆𝑙𝑤 = 0.004 ∆𝑒𝑥𝑝𝑒𝑟 + 0.022 ∆𝑡𝑒𝑛𝑢𝑟𝑒
• Since experience and tenure changes by one unit,
change in log of wage is (0.004+.022)=0.026. In other
words, wage will be increased by 2.6 percent.
11 14
7/12/2024

Testing hypothesis about a single 𝛽


• The population model Exercise
𝑦 = 𝛽 + 𝛽 𝑥 +⋯+ 𝛽 𝑥 + 𝑢 Find the fitted value from the wage equation when a
• Our primary interest lies in testing the null worker has 12 years of education, 5 years experience,
hypothesis 𝐻 : 𝛽 = 0 and 3 years of tenure.
• The statistic used to test the null hypothesis is known Answer
as t statistic.
𝑙𝑤 = 0.284 + 0.092 12 + 0.004 5 + 0.022 3
𝑡= ( )
 t with n-k-1 d.f. = 1.474

15 17

OLS fitted values and residuals • Normally, the actual value of the dependent variable
for any observation will not equal the predicted
• After estimating the OLS regression, value i.e. 𝑦 ≠ 𝑦
𝑦 = 𝛽 +𝛽 𝑥 + 𝛽 𝑥 +⋯+ 𝛽 𝑥 • The difference between observed value of the
dependent variable 𝑦 and the predicted value of the
• We can obtain a fitted value or predicted value for dependent variable 𝑦 is residual for observation-i.
each observation.
𝑢 =𝑦 −𝑦
• For observation- i, the fitted value is
• If 𝑢 > 0, it means that 𝑦 is less than 𝑦 i.e. 𝑦 is
𝑦 = 𝛽 + 𝛽 𝑥 +𝛽 𝑥 + ⋯+𝛽 𝑥 under-predicted.
• If 𝑢 < 0, it means that 𝑦 is greater than 𝑦 i.e. 𝑦 is
over-predicted.
16 18
7/12/2024

Properties of OLS fitted values and residuals By definition, 𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
1= +
• The sample average of the residuals is zero, so 𝑦 = 𝑦.
• The sample covariance between each independent 1=𝑅 +
variable and the OLS residuals is zero. 𝑅 =1−
• Consequently the sample covariance between OLS • 𝑅2 is interpreted as the proportion of the sample
fitted value and OLS residual is zero. variation in the dependent variable, that is explained
• The point 𝑥̅ , 𝑥̅ , … , 𝑥̅ , 𝑦 is always on the regression by the OLS regression line.
line. Since 𝑦 = 𝑦 • 𝑅2 is a number, and it ranges between 0 and 1.
𝑦 = 𝛽 + 𝛽 𝑥̅ + 𝛽 𝑥̅ + ⋯ + 𝛽 𝑥̅ • An important feature of 𝑅2 is that it never decreases
and usually increases when another independent
variable is added to a regression.
19 21

Goodness of fit
• The sum of the squared residuals never increases
• Goodness of fit of an OLS regression model is when additional regressors are added to the model.
denoted by R2 and it is computed as
• The fact that R2 never decreases makes it a poor
𝑅 = = tool for deciding whether another variables should
be added to a model.
Where,
• Whether or not another variable should be
• Total sum of squares, 𝑆𝑆𝑇 = ∑ 𝑦 −𝑦
included in the model should be determined by
• Explained sum of squares, 𝑆𝑆𝐸 = ∑ 𝑦 −𝑦 examining the respective partial effect i.e. whether
• Residuals sum of squares, 𝑆𝑆𝑅 = ∑ 𝑢 the partial effect on 𝑦 is non-zero.

20 22
7/12/2024

Using adjusted R2 to choose between


Interpretation of R2 models
• Suppose a fitted model is • Consider the following two models:
𝑐𝐺𝑃𝐴=1.29+0.453hsGPA+0.009ACT 𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝑢
n=141, R2=0.176 𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝑣
• The R2 is 0.176 which means that hsGPA and ACT • Suppose 𝑥4 and 𝑥5 are highly correlated and effect on 𝑦
(achievement test score) together explain about 17.6 is statistically significant.
percent of the variation in the college GPA for this
• We want to decide which model to choose. Adjusted 𝑅2
sample of students.
can help us in this respect.
• We will prefer the model that has a greater adjusted 𝑅2.

23 25

Adjusted R2
• 𝑅 = 1− 1−𝑅
• Adjusted R2 imposes a penalty for adding additional • Remember, if we add a new independent variable to
independent variable to a model. a regression, adjusted 𝑅2 increases if and only if the
• Can adjusted R2 be negative? t-statistic on the new variable is greater than 1 in
Exercise absolute value.
Suppose for an estimated model R2=0.10, n=51, k=10.
Compute the adjusted R2.
• 𝑅 = −0.125
• Note: A negative 𝑅 indicates a very poor fit.

24 26
7/12/2024

Overall significance of a regression Unbiasedness of OLS


• The k-independent variable model is Recall the assumptions of multiple linear regression
𝑦 = 𝛽 + 𝛽 𝑥 +𝛽 𝑥 + ⋯+ 𝛽 𝑥 + 𝑢 1. Linear in parameters
The model in the population can be written as
• The null hypothesis of the test of overall significance is
none of the explanatory variables has an effect on 𝑦. 𝑦 = 𝛽 + 𝛽 𝑥 +𝛽 𝑥 + 𝛽 𝑥 +⋯+ 𝛽 𝑥 + 𝑢

• Stated in terms of parameters,


2. Random sample
𝐻 :𝛽 = 𝛽 = ⋯ = 𝛽 = 0
We have a random sample of 𝑛 observations as
• The alternative hypothesis is at least one of the 𝛽 is {(𝑥 , 𝑥 , 𝑥 , … , 𝑥𝑖𝑘, 𝑦𝑖): 𝑖 = 1, 2, 3, … … , 𝑛}
different from zero.

27 29

• The null hypothesis implies that there are 𝑘


3. No perfect collinearity
restrictions, and the restricted model becomes
In the sample, none of the independent variable is
𝑦 =𝛽 +𝑢
constant and there are no exact linear relationships
• F statistic is computed as
/ among the independent variables.
/
/
• Alternatively, F-stat = ⁄ Quiz: When does multicollinearity arise?
• 𝑅2 is obtained from the regression of 𝑦 on 𝑥1, 𝑥2, . , 𝑥𝑘  Two independent variables can be perfectly correlated
when one variable is constant multiple of another.
• If F-stat with (𝑘, 𝑛 − 𝑘 − 1) d.f. is insignificant, we fail
to reject the null hypothesis. We conclude that there  Multicollinearity also arises when the sample size (n) is less
is no evidence that any of the independent variables than the number of parameters (k+1) to be estimated.
help to explain 𝑦.
28 30
7/12/2024

When does the assumption of zero


4. Zero conditional mean conditional mean violate?
The error 𝑢 has an expected value of 0 given any values  Mis-specified functional relationship between
of the independent variables. dependent and independent variables.
𝐸 𝑢 𝑥 ,𝑥 ,…,𝑥 =0  Omitting an important variable that is correlated
with any of the 𝑥 , 𝑥 , … , 𝑥 causes this assumption
to fail.
 If the independent variable(s) are measured with
error, then this assumption does not hold.
 If the dependent variable and one or more of the
explanatory variables are determined jointly.

31 33

• Under the above four assumptions,


Exogenous and endogenous variable
𝐸 𝛽 =𝛽
• When the assumption of zero conditional mean
j = 1, 2, … , k
holds, we have exogenous explanatory variables.
• This means that the OLS estimator is an unbiased
estimator of the population parameters. • If an explanatory variable, say 𝑥 , is correlated with
• When we say, OLS is unbiased under the above the 4 disturbance term 𝑢 for any reason, then 𝑥 is said to
assumptions, we mean that the procedure by which be an endogenous variable.
the OLS estimates are obtained is unbiased.
• In other words, we hope that we have a sample that
gives us an estimate close to the population value.

32 34
7/12/2024

Including irrelevant variables in a • An obvious question arises: what is the effect of


regression model including 𝑥 in the above regression line?
• Irrelevant variable is the one that has no partial • In terms of unbiasedness of 𝛽 and 𝛽 , there is
effect on the dependent variable. It means the no effect.
population coefficient of the variable is 0.
• We can say that, 𝐸 𝛽 =𝛽
• Including irrelevant variables refers to the inclusion
of independent variables in a regression model, even 𝐸 𝛽 =𝛽
though it has no partial effect on the dependent 𝐸 𝛽 =0
variable in the population.

35 37

Omission of an important variable


• Consider a model
• When an important variable is omitted from a
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝑢
regression model then the model is underspecified.
• Suppose 𝑥3 has no partial effect on 𝑦 (i.e. 𝛽 = 0) • This can cause the OLS estimator to be biased.
after 𝑥1 and 𝑥2 have been controlled for
• Deriving bias caused by omitted variable is an
• 𝐸 𝑦|𝑥 , 𝑥 , 𝑥 = 𝐸 𝑦|𝑥 , 𝑥 =𝛽 +𝛽 𝑥 +𝛽 𝑥 example of misspecification analysis.
• Because we are unsure, we tend to estimate • Suppose a true population model has two
explanatory variables and error term.
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝑢

36 38
7/12/2024

Because of unavailability of data on 𝑥 , we exclude it • But when 𝑥1 and 𝑥2 are correlated, 𝛿 has the same
sign as the correlation between 𝑥1 and 𝑥2. This means
from the regression model and estimate
 Telda 𝛿 > 0 if 𝑥1 and 𝑥2 are positively correlated.
𝑦 =𝛽 +𝛽 𝑥
𝛿 < 0 if 𝑥1 and 𝑥2 are negatively correlated.
• What happens to the unbiasedness of 𝛽 ?
• The sign of the bias depends on the sign of 𝛿 and 2
The partial effect of 𝑥 i.e., 𝛽 now contains effect of
𝛿 >0 𝛿 <0
the omitted variable as 𝛽 = 𝛽 + 𝛽 𝛿
2 > 0 Positive bias Negative bias
Where 𝛿 is the slope coefficient of regression of
2 < 0 Negative bias Positive bias
𝑥 on 𝑥 . Thus it can be seen that 𝛽 differs from 𝛽 by
(i) the term partial effect of 𝑥 on 𝑦 and • The size of the bias is determined by 𝛽 and 𝛿 .
(ii) the slope of the regression of 𝑥 on 𝑥 . • A small bias of either sign need not be a cause of
concern.
39 41

𝛿 is constant
(nonrandom) in
𝐸 𝛽 =𝐸 𝛽 +𝛽 𝛿 a given sample • 𝛽 is a population parameter which is unknown so it
=𝐸 𝛽 +𝐸 𝛽 𝛿 is difficult to be sure of the sign of 𝛽 . But we can
guess about the direction of 𝛽 .
=𝛽 +𝛽 𝛿
• 𝛿 cannot be computed when 𝑥 is unobservable and
• Bias in 𝛽 = 𝐸 𝛽 −𝛽 =𝛽 𝛿 accordingly sign of 𝛿 cannot be known. But again we
This is called omitted variable bias. can have an understanding about the direction of
relationship between 𝑥 and 𝑥 .
• The above bias can be zero if 𝛽 = 0 or 𝛿 = 0.

• Even if 𝛽 ≠ 0, bias can be zero when there is no


correlation between 𝑥 and 𝑥 in the sample.
40 42
7/12/2024

Omitted Variable Bias: More General Cases


• If 𝐸 𝛽 > 𝛽 we say that 𝛽 has an upward bias.
• Deriving the sign of omitted variable bias when there
• If 𝐸 𝛽 < 𝛽 we say that 𝛽 has a downward bias. are multiple regressors in the estimated model is
more difficult.
• If 𝐸 𝛽 tends to be zero instead of 𝛽 , then we say
that 𝛽 biased towards zero. • We must remember that correlation between a
single explanatory variable and the error generally
• If 𝛽 is negative, 𝛽 is biased towards zero if it has an results in all OLS estimators being biased.
upward bias.
• If 𝛽 is positive, 𝛽 is biased towards zero it has a
downward bias.

43 45

Exercise The variance of the OLS estimators


𝑎𝑣𝑔𝑠𝑐𝑜𝑟 = 𝛽 + 𝛽 𝑒𝑥𝑝𝑒𝑛𝑑 + 𝛽 𝑝𝑜𝑣𝑟𝑎𝑡𝑒 + 𝑢
Average score of student, Expenditure, Poverty rate
• The variance of an OLS estimator is given by
Note:
• Suppose we have district wise data on percentage of 𝑣𝑎𝑟 𝛽 = ∑
𝜎 =
children with passing grade, and per student
expenditures, but we have no information on poverty • Where 2 is the variance of the error term u.
rate. So we run the regression of avgscore on expend.
• Can you obtain the likely bias on 𝛽 ? • 𝑆𝑆𝑇 = ∑ 𝑥 − 𝑥̅ is the total sample variation
in 𝑥 .
• 𝑅 is the R-squared from regressing 𝑥 on all other
independent variables.

44 46
7/12/2024

OLS assumption about error variance, 𝜎 Components of the OLS variance


𝜎
𝑣𝑎𝑟 𝛽 =
• Assumption 5: The error 𝑢 has same variance given 𝑆𝑆𝑇 1 − 𝑅
any values of the explanatory variables i.e.,  The error variance, 𝜎 :
𝑣𝑎𝑟 𝑢|𝑥 , 𝑥 , … , 𝑥 =𝜎 • If 2 is large, the variance of the OLS estimator will
• This is called homoskedasticity assumption. also be large which makes it difficult to estimate the
partial effect of any of the regressors on 𝑦.
• If this assumption fails, then the model exhibits
heteroskedasticity. • Importantly 2 is a population characteristic, it has
nothing to do with sample size.
• One way to reduce error variance (2) is to add more
explanatory variables to the equation so that some
factors of the error term can be taken out.
47 49

Example
• Consider a model
 Total sample variation, 𝑆𝑆𝑇 = ∑ 𝑥 − 𝑥̅
𝑤𝑎𝑔𝑒 = 𝛽 + 𝛽 𝑒𝑑𝑢𝑐 + 𝛽 𝑒𝑥𝑝𝑒𝑟 + 𝛽 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑢
• Homoskedasticity assumption requires that the
variance of the unobserved error 𝑢 does not depend • The larger the total variation in 𝑥𝑗, the smaller is the
on the levels of education, experience, or tenure. variance of the OLS estimator. Therefore more
variation of 𝑥𝑗 in the sample is preferable.
𝑣𝑎𝑟 𝑢|𝑒𝑑𝑢𝑐, 𝑒𝑥𝑝𝑒𝑟, 𝑡𝑒𝑛𝑢𝑟𝑒 = 𝜎
• One way of increasing sampling variation is to
• If this variance changes with any of the three increase size of the sample. SST tends to be large
explanatory variables, then heteroskedasticity is when sample size increases.
present.

48 50
7/12/2024

Note:
• For true model: 𝑣𝑎𝑟 𝛽 = ∑
 The linear relationships among the independent 𝜎 =
variables, 𝑅 • For mis-specified model: 𝑣𝑎𝑟 𝛽 =

• Comparing the above two variances, it can be seen that


• We know R-squared measures goodness of fit. v𝑎𝑟 𝛽 is smaller than 𝑣𝑎𝑟 𝛽 .
• A value of 𝑅 close to 1 would mean 𝑥𝑗 is strongly
• But if 𝑥1 and 𝑥2 are uncorrelated, they both have same
associated with the rest of the independent
variance.
variables.
• However, if 𝑥1 and 𝑥2 are not uncorrelated,
• When 𝑅 tends to 1, variance of the OLS estimator
becomes larger and larger.  When 2  0, 𝛽 is biased, 𝛽 is unbiased, and 𝑣𝑎𝑟 𝛽 < 𝑣𝑎𝑟 𝛽
 When 𝛽 = 0, 𝛽 and 𝛽 are unbiased, and 𝑣𝑎𝑟 𝛽 < 𝑣𝑎𝑟 𝛽

51 53

Variance in a mis-specified model


• Intuitively if 𝑥2 does not have a partial effect on 𝑦
• Let, the true population model is
(𝛽 = 0), then including it in the model can only
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝑢 exacerbate multicollinearity problem, which leads to
• Now, let us consider two estimators of 𝛽 . One from inefficient estimator of 1.
the following model
• Also, if 𝛽 = 0, we may prefer 𝛽 over 𝛽 because 𝛽
𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 is unbiased and has smaller variance.
• Another one from a mis-specified model
• On the contrary, when 𝛽 ≠ 0, we prefer 𝛽 because
𝑦 =𝛽 +𝛽 𝑥 it is unbiased although its variance is high (inefficient).

52 54
7/12/2024

Internal validity and external validity


• Internal and external validity distinguish between the
population and setting studied and the population
and setting to which the results are generalized.
• ‘Setting’ means institutional, legal, social, physical,
and economic environment.
• A statistical analysis is said to have internal validity if
the statistical inferences are valid for the population
being studied.
• The analysis is said to have external validity if its
inferences and conclusions can be generalized from
the population and setting studied to other
population and settings.
55

Population and setting studied


Threat to external validity
Differences in the
population studied and the
population of interest
External validity
If the inferences and
conclusions can be generalized
to other population and
SAMPLE settings.
Internal validity
If the inferences are valid
for the population being
studied
Statistical
Analysis/
Threats to internal validity inference
1. Misspecification of the function form
2. Omitted variable bias
3. Measure error in the regressors
4. Simultaneous causality
5. Sample selection bias Population and setting of interest
56

You might also like