Lecture 6
Lecture 6
Juergen Meinecke
107 / 151
Roadmap
Selected Topics
Measures of Fit
108 / 151
There are two regression statistics that provide measures of how well
the regression line “fits” the data:
• regression 𝑅2 , and
• standard error of the regression (SER)
Main idea: how closely does the scatterplot “fit” around the
regression line?
109 / 151
Graphical illustration of “fit” of the regression line
110 / 151
The regression 𝑅2 is the fraction of the sample variation of 𝑌𝑖 that is
explained by the explanatory variable 𝑋𝑖
Total variation in the dependent variable can be broken down as
111 / 151
Definition
𝑅2 is defined by
𝐸𝑆𝑆
𝑅2 ∶= .
𝑇𝑆𝑆
Corollary
Based on the preceding terminology, it is easy to see that
𝑅𝑆𝑆
𝑅2 = 1 −
𝑇𝑆𝑆
112 / 151
Therefore,
113 / 151
In contrast, the standard error of the regression measures the spread
of the distribution of the errors
Because you don’t observe the errors 𝑢𝑖 you use the residuals 𝑢̂ 𝑖
instead
1
The second equality holds because 𝑢̂̄ ∶= ∑𝑛𝑖=1 𝑢̂ 𝑖 = 0
𝑛
114 / 151
The SER
115 / 151
Simple Regression Model
Juergen Meinecke
116 / 151
Roadmap
Selected Topics
Binary Regressor
117 / 151
Quite often an explanatory variable is binary
118 / 151
The linear model 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 reduces to
• 𝑌𝑖 = 𝛽0 + 𝑢𝑖 ̇ when 𝑋𝑖 = 0
• 𝑌𝑖 = 𝛽0 + 𝛽1 + 𝑢𝑖 when 𝑋𝑖 = 1
• 𝐸[𝑌𝑖 |𝑋𝑖 = 0] = 𝛽0
• 𝐸[𝑌𝑖 |𝑋𝑖 = 1] = 𝛽0 + 𝛽1
119 / 151
Do moms who smoke have babies with lower birth weight?
Python Code
> import pandas as pd
> df = pd.read_csv('birthweight.csv')
> smokers = df[df.smoker == 1]
> nonsmokers = df[df.smoker == 0]
> t_test(smokers.birthweight, nonsmokers.birthweight)
> t_test(smokers.birthweight, nonsmokers.birthweight)
Two-sample t-test
Mean in group 1: 3178.831615120275
Mean in group 2: 3432.0599669148055
Point estimate for difference in means: -253.22835179453068
Test statistic: -9.441398919580234
95% confidence interval: (-305.7976345612996, -200.65906902776175)
120 / 151
Regression with smoker dummy gives exact same numbers
Python Code (output edited)
> import statsmodels.formula.api as smf
> formula = 'birthweight ~ smoker'
> model1 = smf.ols(formula, data=df, missing='drop')
> reg1 = model1.fit(use_t=False)
> print(reg1.summary())
121 / 151
Simple Regression Model
Juergen Meinecke
122 / 151
Roadmap
Selected Topics
Gauss-Markov Theorem
123 / 151
OLS estimator is not the only estimator of the PRF
You can nominate anything you want as your estimator
Similar to lecture 2, here are some alternative estimators:
𝑛 𝑝
• argmin ∑𝑖=1 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) ,
𝑏0 ,𝑏1
where 𝑝 is any natural number
𝑛
• argmin ∑𝑖=1 |𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 |
𝑏0 ,𝑏1
this is called the least absolute deviations estimator
• the number 42
(the ‘answer to everything estimator’)
124 / 151
Clearly, these are all estimators
(they satisfy the definition given earlier)
Are they sensible estimators?
125 / 151
The point is: there always exist an endless number of possible
estimators for any given estimation problem
Most of them do not make any sense
1. bias
2. variance
126 / 151
Definition
An estimator 𝜃̂ for an unobserved population parameter 𝜃 is
unbiased if its expected value is equal to 𝜃, that is
E[𝜃]̂ = 𝜃
Definition
An estimator 𝜃̂ for an unobserved population parameter 𝜃 has
minimum variance if its variance is (weakly) smaller than the
variance of any other estimator of 𝜃. Sometimes we will also say
that the estimator is efficient.
127 / 151
But first we need to take a brief detour:
Definition
An estimator 𝜃̂ is linear in 𝑌𝑖 if it can be written as
𝑛
𝜃̂ = 𝑎𝑖 𝑌𝑖 ,
𝑖=1
128 / 151
When we did univariate statistics (we only looked at one random
variable 𝑌𝑖 ) we discovered that the sample average was indeed BLUE
Currently we are doing bivariate statistics (we study the joint
distribution between 𝑌𝑖 and 𝑋𝑖 )
Our estimator of choice is the OLS estimator
Now, similarly to the sample average in the univariate world,
a powerful result holds for the OLS estimator…
129 / 151
Theorem
Under OLS Assumptions 1 through 4a, the OLS estimator
𝑛
𝛽̂0 , 𝛽̂1 ∶= argmin (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )2
𝑏0 ,𝑏1 𝑖=1
is BLUE.
130 / 151
Simple Regression Model
Juergen Meinecke
131 / 151
Roadmap
Selected Topics
Homoskedasticity versus Heteroskedasticity
132 / 151
We’ve introduced the idea of homoskedasticity last week
We learned about it in OLS Assumption 4a
Homoskedasticity concerns the variance of the error terms 𝑢𝑖
133 / 151
Example of homoskedasticity
135 / 151
If the error terms are not homoskedastic, what are they?
If they are not homoskedastic, they are called heteroskedastic
How should we think about them?
The next three pictures illustrate…
136 / 151
Example of heteroskedasticity
Corollary
If the error terms 𝑢𝑖 are not homoskedastic, they are
heteroskedastic.
140 / 151
How do the OLS standard errors from last week change if the error
terms are heteroskedastic instead of homoskedastic?
141 / 151
Recall from lecture 5 how the asymptotic variance collapses to
something nice and simple under homoskedasticity:
𝑛
1
Var(𝛽̂1 |𝑋𝑖 ) = ⋯ = ̄ 2 Var(𝑢𝑖 |𝑋𝑖 )
(𝑋𝑖 − 𝑋)
𝑛 2
̄ 2
∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1
𝑛
1
= ̄ 2 𝜎2𝑢
(𝑋𝑖 − 𝑋)
𝑛 2
̄ 2
∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1
𝑛
𝜎2𝑢
= ̄ 2
(𝑋𝑖 − 𝑋)
𝑛 2
̄ 2
∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1
𝜎2𝑢
≃ 𝑛𝜎2
(𝑛𝜎2𝑋 )2 𝑋
1 𝜎2𝑢
= ,
𝑛 𝜎2𝑋
142 / 151
In contrast, under heteroskedasticity, we make our lives a bit easier
by imposing an asymptotic approximation at a much earlier stage:
𝑛
1
Var(𝛽̂1 |𝑋𝑖 ) = ⋯ = ̄ 𝑖 |𝑋𝑖
Var (𝑋𝑖 − 𝑋)𝑢
𝑛 2
̄ 2
∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1
1
≃ 𝑛Var (𝑋𝑖 − 𝜇𝑋 )𝑢𝑖
(𝑛𝜎2𝑋 )2
1 Var (𝑋𝑖 − 𝜇𝑋 )𝑢𝑖
=
𝑛 𝜎4𝑋
143 / 151
Putting things together and invoking the CLT once more
Theorem
The asymptotic distribution of the OLS estimator 𝛽̂1 under
OLS Assumptions 1 through 4b is
⎛ ⎞
𝑎𝑝𝑝𝑟𝑜𝑥. 1 Var (𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 ⎟
𝛽̂1 ∼ N ⎜ 𝛽1 ,
⎜ 𝑛 𝜎4𝑋 ⎟
⎝ ⎠
A similar theorem holds for 𝛽̂0 , it just looks a little bit uglier
144 / 151
The previous theorem is the basis for deriving confidence intervals
for 𝛽1 under heteroskedasticity
With our knowledge from the previous weeks, it is easy to propose a
95% confidence interval
⎡ Var (𝑋𝑖 − 𝜇𝑋 )𝑢𝑖
𝐶𝐼(𝛽1 ) ∶= ⎢𝛽̂1 − 1.96 ⋅ √ 2
,
⎢ √𝑛𝜎𝑋
⎣
Var (𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 ⎤
𝛽̂1 + 1.96 ⋅ √ 2
⎥
√𝑛𝜎𝑋 ⎥
⎦
145 / 151
But can estimate them easily instead:
• 𝜎𝑋 is estimated by 𝑠𝑋
146 / 151
An operational version of the confidence interval therefore is given
by
𝑠𝑢𝑥 𝑠𝑢𝑥
𝐶𝐼(𝛽1 ) ∶= 𝛽̂1 − 1.96 ⋅ 2
, 𝛽̂1 + 1.96 ⋅ 2
√𝑛𝑠𝑋 √𝑛𝑠𝑋
The ratio 𝑠𝑢𝑥 /(√𝑛𝑠2𝑋 ) is, of course, the standard error under
heteroskedasticity
147 / 151
The standard error under heteroskedasticity has the term 𝑠𝑢𝑥 in the
numerator which makes it seem a little bit more complicated to
calculate
148 / 151
Default in Python is homoskedasticity
Python Code (output edited)
> import pandas as pd
> import statsmodels.formula.api as smf
> df = pd.read_csv('caschool.csv')
> formula = 'testscr ~ str'
> model1 = smf.ols(formula, data=df, missing='drop')
> reg1 = model1.fit(use_t=False)
> print(reg1.summary())
149 / 151
New way to do things:
Python Code (output edited)
cov_type='HC1' use_t=False)
> reg1_heterosk = model1.fit(cov_type='HC1'
cov_type='HC1',
> print(reg1_heterosk.summary())
150 / 151
Homoskedastic standard errors are only correct if OLS
Assumption 4a is satisfied
Heteroskedastic standard errors are correct under both OLS
Assumption 4a and Assumption 4b
Practical implication
• If you know for sure that the error terms are homoskedastic, you
should simply use Python’s ols.fit()
• If you know for sure that the error terms are heteroskedastic,
you should use Python’s ols.fit(cov_type='HC1')
• If you do not know for sure, it is always safer to use
heteroskedasticity robust standard errors
151 / 151