5ssmn932 Lecture4 2021 Collated
5ssmn932 Lecture4 2021 Collated
Dragos Radu
[email protected]
SLR.1: y = b0 + b1 x + u
SLR.2: random sampling from the population
SLR.3: some sample variation in the xi
SLR.4: E (u |x ) = 0
SLR.5: Var (u |x ) = Var (u ) = s2
• under these assumptions OLS estimator has the smallest variance
among all linear estimators and is therefore BLUE (Best Linear
Unbiased Estimator). This is the Gauss-Markov theorem.
back to our question
TestScore = b 0 + b 1 · STR + u
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637
_cons | 698.933 9.467491 73.82 0.000 680.3231 717.5428
------------------------------------------------------------------------------
do smaller classes promote learning?
can an omitted variable a↵ect our result?
immigrants in California
English learners in Californian schools
do smaller classes promote learning?
class size and % of English learners
your turn: cross tabulation in Stata
overview: where are we going from here?
Dragos Radu
[email protected]
Stata example:
Determine OVB in our simple regression of test score on class size
(separate video).
test scores and class size
is the % of English lerners a confounder?
two conditions for OVB
class size and % of English learners
two conditions for OVB
class size and % of English learners
why does OVB arise?
we would need to think about a second variable in our regression
how can we assess the OVB
% of English learners: a determinant of TestScr and related to STR
then rXu 6= 0 and the OLS estimator b̂ 1 is biased and is not consistent
OVB in our simple regression
TestScore = b 0 + b 1 · STR + u
p su
b̂ 1 ! b 1 + rXu ·
sX
In our test score example:
1. English language ability (whether the student has English as a second
language) plausibly a↵ects standardized test scores:
Z is a determinant of Y.
2. Immigrant communities tend to be less a✏uent and thus have smaller
school budgets and higher STR:
Z is correlated with X .
Dragos Radu
[email protected]
with this we extend the TestScore equation we used for simple regression:
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637
_cons | 698.933 9.467491 73.82 0.000 680.3231 717.5428
------------------------------------------------------------------------------
multiple regression regression result
TestScore = b 0 + b 1 · STR + b 2 · PctEL + u
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .3802783 -2.90 0.004 -1.848797 -.3537945
el_pct | -.6497768 .0393425 -16.52 0.000 -.7271112 -.5724423
_cons | 686.0322 7.411312 92.57 0.000 671.4641 700.6004
------------------------------------------------------------------------------
the multiple linear regression model
Yi = b 0 + b 1 · X1i + b 2 · X2i + ui
Y = b 0 + b 1 · X1 + b 2 · X2 + u
consider changing X1 by DX1 while holding X2 constant
• before the change: Y = b 0 + b 1 · X1 + b 2 · X2
• after the change: Y = b 0 + b 1 · (X1 + DX1 ) + b 2 · X2
• taking the di↵erence (after minus before): DY = b 1 · DX1
DY
b1 = DX1 , holding X2 constant
DY
b2 = DX2 , holding X1 constant
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .3802783 -2.90 0.004 -1.848797 -.3537945
el_pct | -.6497768 .0393425 -16.52 0.000 -.7271112 -.5724423
_cons | 686.0322 7.411312 92.57 0.000 671.4641 700.6004
------------------------------------------------------------------------------
\ = 686.0
TestScrore 1.10 ⇥ STR 0.65 ⇥ PctEL
describing qualitative information
wage = b 0 + 0 · female + u
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -3.531853 .3656696 -9.66 0.000 -4.249714 -2.813992
_cons | 12.34696 .2643634 46.70 0.000 11.82797 12.86594
------------------------------------------------------------------------------
tfemale = 9.66
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -3.531853 .3656696 -9.66 0.000 -4.249714 -2.813992
_cons | 12.34696 .2643634 46.70 0.000 11.82797 12.86594
------------------------------------------------------------------------------
comparison of means
wage = b 0 + 0 · female
the estimate ˆ0 = 3.53 does not control for factors that should a↵ect
wage, such as workforce experience and schooling, which could explain the
di↵erence in average wages.
if we just control for experience, the model in expected value is:
where now 0 measures the gender di↵erence when we hold fixed exper.
dummy in multiple regression
wage = b 0 + 0 · female + b 1 · exper
we impose a common slope on exper for men and women, b 1 = .333 in this example
only the intercepts that are allowed to differ.
intercept shift
graph of wage = b 0 + 0 · female + b 1 · exper for 0 <0
14
wage
men (slope = .333)
12
10
difference = 2.99
8
0 2 4 6 8 10 12 14
exper
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -2.457225 .3546709 -6.93 0.000 -3.153497 -1.760954
exper | .4158217 .0651616 6.38 0.000 .2878998 .5437436
coll | .8004933 .0734492 10.90 0.000 .6563015 .944685
_cons | 5.785301 .803433 7.20 0.000 4.208042 7.36256
------------------------------------------------------------------------------
goodness of fit
SER and RootMSE
as in regression with a single regressor, the SER and the RootMSE are
measures of the spread of the Ys around the regression line.
s
n
1
SER = · Â ûi2
n k 1 i =1
s
1 n 2
n iÂ
RootMSE = · ûi
=1
for SER we apply a correction for the degrees of freedom, i.e. for n and k
variables we have n k 1 degrees of freedom as we estimate k slope
coefficients and the intercept.
more on goodness of fit
2
R 2 and R (adjusted R 2 )
ESS SSR
R2 = =1
TSS TSS
We need to think about a di↵erent goodness-of-fit measure because the
usual R 2 can only increase when one or more variables are added to a
regression.
Sometimes we want to compare across models that have di↵erent numbers
of explanatory variables but where one is not a special case of the other. It
is useful to have a goodness-of-fit measure that penalizes adding additional
explanatory variables. (The usual R 2 has no penalty.)
more on goodness of fit: adjusted R 2
2
• the R “penalising” us for including another regressor.
2
• R does not necessarily increase when we add another regressor.
• the adjusted R-squared, also called “R-bar-squared”:
2 [SSR/(n k 1)]
R = 1
[TSS /(n 1)]
• when more regressors are added, SSR falls, but so do the degrees of
freedom df = n k 1. R̄ 2 can increase or decrease.
• for k 1, R̄ 2 < R 2 unless SSR = 0 (not an interesting case).
In addition, it is possible that R̄ 2 < 0, especially if df is small.
Remember that R 2 0 always.
simple regression result
TestScore = b 0 + b 1 · STR + u
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637
_cons | 698.933 9.467491 73.82 0.000 680.3231 717.5428
------------------------------------------------------------------------------
multiple regression regression result
TestScore = b 0 + b 1 · STR + b 2 · PctEL + u
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .3802783 -2.90 0.004 -1.848797 -.3537945
el_pct | -.6497768 .0393425 -16.52 0.000 -.7271112 -.5724423
_cons | 686.0322 7.411312 92.57 0.000 671.4641 700.6004
------------------------------------------------------------------------------
what comes next?
Dragos Radu
[email protected]
1 The conditional distribution of u given the X s has mean zero, that is,
E (u |X1 , ..., Xk ) = E (u ) = 0
2 random sampling from the population
3 large outliers are unlikely
4 no perfect multicollinearity
no perfect multicollinearity
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637
str | 0 (omitted)
_cons | 698.933 9.467491 73.82 0.000 680.3231 717.5428
------------------------------------------------------------------------------
in such a regression we would ask: what is the e↵ect on TestScore of a unit change in STR,
holding STR constant??? (a logical impossibility)
perfect multicollinearity
dummy variable trap
. gen small=str<20
. gen large=1-small
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
small | 7.37241 1.843475 4.00 0.000 3.748774 10.99605
large | 0 (omitted)
_cons | 649.9788 1.387717 468.38 0.000 647.2511 652.7066
------------------------------------------------------------------------------
dummy variable trap
suppose you have a set of multiple binary (dummy) variables, which are
mutually exclusive and exhaustive – that is, there are multiple categories
and every observation falls in one and only one category
(e.g. small or large):
• if you include all these dummy variables and a constant, you will have
perfect multicollinearity – this is the dummy variable trap.
• why is there perfect multicollinearity here?
• solution to the dummy variable trap: omit one of the groups
(e.g. large)
• how do we interpret the coefficients?
perfect multicollinearity
dummy variable trap
. gen small=str<20
. gen large=1-small
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
small | 7.37241 1.843475 4.00 0.000 3.748774 10.99605
large | 0 (omitted)
_cons | 649.9788 1.387717 468.38 0.000 647.2511 652.7066
------------------------------------------------------------------------------
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
small | 7.37241 1.843475 4.00 0.000 3.748774 10.99605
_cons | 649.9788 1.387717 468.38 0.000 647.2511 652.7066
------------------------------------------------------------------------------
dummy variable trap
Y = b 0 + b 1 · X1 + b 2 · X2 + u
• in our discussion X1 : the variable of interest and X2 : control variable
• under the conditional independence assumption (CIA):
E ( u | x 1 , x2 ) = E ( u | x2 )
• the OVB formula is one of the most important things to know about
your regression model
• if you claim no OVB for your study, you’re e↵ectively saying that the
regression you have is the regression you want
• in other words, you depend on the conditional independence
assumption (CIA):
E ( u | x1 , x2 ) = E ( u | x2 )
for a causal interpretation of your regression estimates
control variables in our California test score data
next week:
• hypothesis tests in multiple regression
• examples of nonlinearities and interactions