0% found this document useful (0 votes)
14 views

Lecture 6

This document provides an overview of simple regression models and key concepts related to model fitting. It discusses two measures of model fit: the regression R2 and the standard error of regression (SER). R2 measures how much of the variation in the dependent variable is explained by the regression model, while SER measures the spread of residuals around the predicted regression line. The document also covers interpreting regression coefficients when the explanatory variable is binary, as well as the Gauss-Markov theorem, which establishes that the ordinary least squares (OLS) estimator is the best linear unbiased estimator under the standard OLS assumptions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 6

This document provides an overview of simple regression models and key concepts related to model fitting. It discusses two measures of model fit: the regression R2 and the standard error of regression (SER). R2 measures how much of the variation in the dependent variable is explained by the regression model, while SER measures the spread of residuals around the predicted regression line. The document also covers interpreting regression coefficients when the explanatory variable is binary, as well as the Gauss-Markov theorem, which establishes that the ordinary least squares (OLS) estimator is the best linear unbiased estimator under the standard OLS assumptions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Simple Regression Model

Juergen Meinecke

107 / 151
Roadmap

Selected Topics

Measures of Fit

108 / 151
There are two regression statistics that provide measures of how well
the regression line “fits” the data:

• regression 𝑅2 , and
• standard error of the regression (SER)

Main idea: how closely does the scatterplot “fit” around the
regression line?

109 / 151
Graphical illustration of “fit” of the regression line

110 / 151
The regression 𝑅2 is the fraction of the sample variation of 𝑌𝑖 that is
explained by the explanatory variable 𝑋𝑖
Total variation in the dependent variable can be broken down as

• total sum of squares (TSS)


𝑛 ̄ 2
𝑇𝑆𝑆 ∶= ∑𝑖=1 (𝑌𝑖 − 𝑌)
• explained sum of squares (ESS)
𝑛
𝐸𝑆𝑆 ∶= ∑𝑖=1 (𝑌̂ 𝑖 − 𝑌)
̄ 2

• residual sum of squares (RSS)


𝑛
𝑅𝑆𝑆 ∶= ∑ (𝑌𝑖 − 𝑌̂ 𝑖 )2
𝑖=1

It follows that 𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆

111 / 151
Definition
𝑅2 is defined by
𝐸𝑆𝑆
𝑅2 ∶= .
𝑇𝑆𝑆

Corollary
Based on the preceding terminology, it is easy to see that
𝑅𝑆𝑆
𝑅2 = 1 −
𝑇𝑆𝑆

112 / 151
Therefore,

• 𝑅2 = 0 means 𝐸𝑆𝑆 = 0 (the regressor X explains nothing in the


variation of the dependent variable Y)
• 𝑅2 = 1 means 𝐸𝑆𝑆 = 𝑇𝑆𝑆
(the regressor X explains all the variation of the dependent
variable Y)
• 0 ≤ 𝑅2 ≤ 1
• For a regression with a single regressor 𝑋, 𝑅2 is the square of
the sample correlation coefficient between X and Y
• Python routinely calculates and reports 𝑅2 when it runs
regressions

113 / 151
In contrast, the standard error of the regression measures the spread
of the distribution of the errors
Because you don’t observe the errors 𝑢𝑖 you use the residuals 𝑢̂ 𝑖
instead

It is defined as the estimator of the standard deviation of 𝑢𝑖 :


􏽭
⃓ 1 𝑛
𝑆𝐸𝑅 ∶= ⃓ 􏾜(𝑢̂ 𝑖 − 𝑢)̂̄ 2
𝑛 − 2 𝑖=1

􏽭
⃓ 1 𝑛 2 𝑅𝑆𝑆
=⃓ 􏾜 𝑢̂ =
𝑛 − 2 𝑖=1 𝑖 √ 𝑛 − 2

1
The second equality holds because 𝑢̂̄ ∶= ∑𝑛𝑖=1 𝑢̂ 𝑖 = 0
𝑛

114 / 151
The SER

• has the units of u, which are the units of Y


• measures the spread of the OLS residuals around the estimated
PRF

Technical note: why divide by 𝑛 − 2 instead of 𝑛 − 1?

• Division by 𝑛 − 2 is a “degrees of freedom” correction – just like


division by 𝑛 − 1 in 𝑠2𝑌 , except that for the SER, two parameters
have been estimated (𝛽0 and 𝛽1 ), whereas in 𝑠2𝑌 only one has
been estimated (𝜇𝑌 )
• When sample size 𝑛 is large, it doesn’t really matter whether 𝑛
or 𝑛 − 1 or 𝑛 − 2 is being used

115 / 151
Simple Regression Model

Juergen Meinecke

116 / 151
Roadmap

Selected Topics

Binary Regressor

117 / 151
Quite often an explanatory variable is binary

• 𝑋𝑖 = 1 if small class size (else zero)


• 𝑋𝑖 = 1 if identify as female (else zero)
• 𝑋𝑖 = 1 if smokes (else zero)

Binary regressors are called dummy variables


So far, we have looked at 𝛽1 as a slope
But does this make sense when 𝑋𝑖 is binary?

How should we interpret 𝛽1 and its estimator 𝛽̂1 ?

118 / 151
The linear model 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 reduces to

• 𝑌𝑖 = 𝛽0 + 𝑢𝑖 ̇ when 𝑋𝑖 = 0
• 𝑌𝑖 = 𝛽0 + 𝛽1 + 𝑢𝑖 when 𝑋𝑖 = 1

Analogously, the population regression functions are

• 𝐸[𝑌𝑖 |𝑋𝑖 = 0] = 𝛽0
• 𝐸[𝑌𝑖 |𝑋𝑖 = 1] = 𝛽0 + 𝛽1

It therefore follows that


𝛽1 = 𝐸[𝑌𝑖 |𝑋𝑖 = 1] − 𝐸[𝑌𝑖 |𝑋𝑖 = 0]

In words: the coefficient 𝛽1 captures the difference in group means

119 / 151
Do moms who smoke have babies with lower birth weight?
Python Code
> import pandas as pd
> df = pd.read_csv('birthweight.csv')
> smokers = df[df.smoker == 1]
> nonsmokers = df[df.smoker == 0]
> t_test(smokers.birthweight, nonsmokers.birthweight)
> t_test(smokers.birthweight, nonsmokers.birthweight)

Two-sample t-test
Mean in group 1: 3178.831615120275
Mean in group 2: 3432.0599669148055
Point estimate for difference in means: -253.22835179453068
Test statistic: -9.441398919580234
95% confidence interval: (-305.7976345612996, -200.65906902776175)

120 / 151
Regression with smoker dummy gives exact same numbers
Python Code (output edited)
> import statsmodels.formula.api as smf
> formula = 'birthweight ~ smoker'
> model1 = smf.ols(formula, data=df, missing='drop')
> reg1 = model1.fit(use_t=False)
> print(reg1.summary())

OLS Regression Results


==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3432.0600 11.871 289.115 0.000 3408.793 3455.327
smoker -253.2284 26.951 -9.396 0.000 -306.052 -200.404
==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

• 𝛽̂0 equal to average birthweight in sub-sample 𝑋𝑖 = 0


• 𝛽̂1 equal to difference in average birthweighs b/w groups

121 / 151
Simple Regression Model

Juergen Meinecke

122 / 151
Roadmap

Selected Topics

Gauss-Markov Theorem

123 / 151
OLS estimator is not the only estimator of the PRF
You can nominate anything you want as your estimator
Similar to lecture 2, here are some alternative estimators:
𝑛 𝑝
• argmin ∑𝑖=1 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 ) ,
𝑏0 ,𝑏1
where 𝑝 is any natural number
𝑛
• argmin ∑𝑖=1 |𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 |
𝑏0 ,𝑏1
this is called the least absolute deviations estimator
• the number 42
(the ‘answer to everything estimator’)

124 / 151
Clearly, these are all estimators
(they satisfy the definition given earlier)
Are they sensible estimators?

Clearly, the last one is silly

125 / 151
The point is: there always exist an endless number of possible
estimators for any given estimation problem
Most of them do not make any sense

What then constitutes a good estimator?


Let’s determine ‘goodness’ of an estimator by two properties:

1. bias
2. variance

Let’s briefly look at these again

126 / 151
Definition
An estimator 𝜃̂ for an unobserved population parameter 𝜃 is
unbiased if its expected value is equal to 𝜃, that is
E[𝜃]̂ = 𝜃

Definition
An estimator 𝜃̂ for an unobserved population parameter 𝜃 has
minimum variance if its variance is (weakly) smaller than the
variance of any other estimator of 𝜃. Sometimes we will also say
that the estimator is efficient.

Let’s see if the OLS estimator satisfies these two properties

127 / 151
But first we need to take a brief detour:
Definition
An estimator 𝜃̂ is linear in 𝑌𝑖 if it can be written as
𝑛
𝜃̂ = 􏾜 𝑎𝑖 𝑌𝑖 ,
𝑖=1

where the weights 𝑎𝑖 are functions of 𝑋𝑖 but not of 𝑌𝑖 .

It is easy to see that the OLS estimator is a linear estimator


Definition
A Best Linear Unbiased Estimator (BLUE) is an estimator that is
linear, unbiased, and has minimal variance (efficient).

If an estimator is BLUE, you can’t beat it, it’s the optimum

128 / 151
When we did univariate statistics (we only looked at one random
variable 𝑌𝑖 ) we discovered that the sample average was indeed BLUE
Currently we are doing bivariate statistics (we study the joint
distribution between 𝑌𝑖 and 𝑋𝑖 )
Our estimator of choice is the OLS estimator
Now, similarly to the sample average in the univariate world,
a powerful result holds for the OLS estimator…

129 / 151
Theorem
Under OLS Assumptions 1 through 4a, the OLS estimator
𝑛
𝛽̂0 , 𝛽̂1 ∶= argmin 􏾜(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )2
𝑏0 ,𝑏1 𝑖=1

is BLUE.

The Gauss-Markov theorem provides a theoretical justification for


using OLS
This theorem holds only for the subset of estimators that are linear
in 𝑌𝑖
There may be nonlinear estimators that are better

130 / 151
Simple Regression Model

Juergen Meinecke

131 / 151
Roadmap

Selected Topics
Homoskedasticity versus Heteroskedasticity

132 / 151
We’ve introduced the idea of homoskedasticity last week
We learned about it in OLS Assumption 4a
Homoskedasticity concerns the variance of the error terms 𝑢𝑖

Mathematically, the error terms are homoskedastic when


Var (𝑢𝑖 |𝑋𝑖 ) = 𝜎2𝑢

The essence of this equation is that the variance of 𝑢𝑖 is not a


function of 𝑋𝑖 ; instead, the variance is just a constant 𝜎2𝑢 whatever
the value of 𝑋𝑖

133 / 151
Example of homoskedasticity

Scatterplot is distributed evenly around PRF


Variance of error term is constant; does not vary with 𝑋𝑖
134 / 151
But why would we want to assume this?

It seems a bit arbitrary to make an assumption about the variance of


the unobserved error term
After all, the error term is unobserved; so why would we make
assumptions on the variance of it?
Well, the reason I gave during lecture 5 was that homoskedasticity
makes the derivation of the asymptotic distribution a little bit easier
The results just look a little bit cleaner

But homoskedasticity is not a necessary assumption

135 / 151
If the error terms are not homoskedastic, what are they?
If they are not homoskedastic, they are called heteroskedastic
How should we think about them?
The next three pictures illustrate…

136 / 151
Example of heteroskedasticity

Scatterplot gets wider as 𝑋 increases

Variance of error term increases in 𝑋


137 / 151
Example of heteroskedasticity

Scatterplot gets narrower as 𝑋 increases

Variance of error term decreases in 𝑋


138 / 151
Example of heteroskedasticity

Scatterplot gets narrower at first but then gets wider again


Variance of error term increases in 𝑋, then decreases again
139 / 151
What do these three pictures have in common?
The variance of 𝑌𝑖 itself varies in 𝑋𝑖
The following assumption clarifies what we mean by
heteroskedasticity
Assumption (OLS Assumption 4b)
The error terms 𝑢𝑖 are heteroskedastic if their variance has the
following form:
Var (𝑢𝑖 |𝑋𝑖 ) = 𝜎2𝑢 (𝑋𝑖 ),

that is, the variance is a function in 𝑋𝑖 .

Corollary
If the error terms 𝑢𝑖 are not homoskedastic, they are
heteroskedastic.

140 / 151
How do the OLS standard errors from last week change if the error
terms are heteroskedastic instead of homoskedastic?

Recall the asymptotic distribution of the OLS estimator 𝛽̂1


𝑎𝑝𝑝𝑟𝑜𝑥. 1 𝜎2𝑢
𝛽̂1 ∼ N 􏿶 𝛽1 , 􏿹
𝑛 𝜎2𝑋

This result only holds under OLS Assumptions 1 through 4a


In particular, it only holds under homoskedasticity (Assumption 4a)
If the error terms are heteroskedastic instead, we have to adjust the
asymptotic variance
This is tedious, but let’s do it!

141 / 151
Recall from lecture 5 how the asymptotic variance collapses to
something nice and simple under homoskedasticity:
𝑛
1
Var(𝛽̂1 |𝑋𝑖 ) = ⋯ = ̄ 2 Var(𝑢𝑖 |𝑋𝑖 )
􏾜(𝑋𝑖 − 𝑋)
𝑛 2
̄ 2􏿷
􏿴∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1

𝑛
1
= ̄ 2 𝜎2𝑢
􏾜(𝑋𝑖 − 𝑋)
𝑛 2
̄ 2􏿷
􏿴∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1

𝑛
𝜎2𝑢
= ̄ 2
􏾜(𝑋𝑖 − 𝑋)
𝑛 2
̄ 2􏿷
􏿴∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1

𝜎2𝑢
≃ 𝑛𝜎2
(𝑛𝜎2𝑋 )2 𝑋
1 𝜎2𝑢
= ,
𝑛 𝜎2𝑋

𝑛 ̄ 2 ≃ 𝑛𝜎2𝑋 and Var (𝑢𝑖 |𝑋𝑖 ) = 𝜎2𝑢


where we plugged in ∑𝑖=1 (𝑋𝑖 − 𝑋)

142 / 151
In contrast, under heteroskedasticity, we make our lives a bit easier
by imposing an asymptotic approximation at a much earlier stage:
𝑛
1
Var(𝛽̂1 |𝑋𝑖 ) = ⋯ = ̄ 𝑖 |𝑋𝑖 􏿸
􏾜 Var 􏿵(𝑋𝑖 − 𝑋)𝑢
𝑛 2
̄ 2􏿷
􏿴∑𝑖=1 (𝑋𝑖 − 𝑋) 𝑖=1

1
≃ 𝑛Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷
(𝑛𝜎2𝑋 )2
1 Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷
=
𝑛 𝜎4𝑋

(Note: the use of the conditional variance and the subsequent


approximation are a bit dubious; the actual math is a bit more
complicated and I am taking shortcuts here to make things easy)

143 / 151
Putting things together and invoking the CLT once more
Theorem
The asymptotic distribution of the OLS estimator 𝛽̂1 under
OLS Assumptions 1 through 4b is
⎛ ⎞
𝑎𝑝𝑝𝑟𝑜𝑥. 1 Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷 ⎟
𝛽̂1 ∼ N ⎜ 𝛽1 ,
⎜ 𝑛 𝜎4𝑋 ⎟
⎝ ⎠

A similar theorem holds for 𝛽̂0 , it just looks a little bit uglier

144 / 151
The previous theorem is the basis for deriving confidence intervals
for 𝛽1 under heteroskedasticity
With our knowledge from the previous weeks, it is easy to propose a
95% confidence interval
⎡ Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷
𝐶𝐼(𝛽1 ) ∶= ⎢𝛽̂1 − 1.96 ⋅ √ 2
,
⎢ √𝑛𝜎𝑋

Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷 ⎤
𝛽̂1 + 1.96 ⋅ √ 2

√𝑛𝜎𝑋 ⎥

Only problem: we do not know Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷 and 𝜎𝑋

145 / 151
But can estimate them easily instead:

• Var 􏿴(𝑋𝑖 − 𝜇𝑋 )𝑢𝑖 􏿷 is estimated by


1 𝑛 2
𝑠2𝑢𝑥 ∶= ̄ 𝑢̂ 𝑖 􏿷
􏾜 􏿴(𝑋𝑖 − 𝑋)
𝑛 𝑖=1

• 𝜎𝑋 is estimated by 𝑠𝑋

(Do you remember the definition of 𝑢̂ 𝑖 and 𝑠𝑋 ?)

146 / 151
An operational version of the confidence interval therefore is given
by
𝑠𝑢𝑥 𝑠𝑢𝑥
𝐶𝐼(𝛽1 ) ∶= 􏿰𝛽̂1 − 1.96 ⋅ 2
, 𝛽̂1 + 1.96 ⋅ 2 􏿳
√𝑛𝑠𝑋 √𝑛𝑠𝑋

The ratio 𝑠𝑢𝑥 /(√𝑛𝑠2𝑋 ) is, of course, the standard error under
heteroskedasticity

The standard error will differ under homoskedasticity and


heteroskedasticity

147 / 151
The standard error under heteroskedasticity has the term 𝑠𝑢𝑥 in the
numerator which makes it seem a little bit more complicated to
calculate

But it is actually less complicated than it looks


In practice, Python computes this for you anyway

148 / 151
Default in Python is homoskedasticity
Python Code (output edited)
> import pandas as pd
> import statsmodels.formula.api as smf
> df = pd.read_csv('caschool.csv')
> formula = 'testscr ~ str'
> model1 = smf.ols(formula, data=df, missing='drop')
> reg1 = model1.fit(use_t=False)
> print(reg1.summary())

OLS Regression Results


==============================================================================
Dep. Variable: testscr R-squared: 0.051
Model: OLS Adj. R-squared: 0.049
Method: Least Squares F-statistic: 22.58
No. Observations: 420
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 698.9330 9.467 73.825 0.000 680.377 717.489
str -2.2798 0.480 -4.751 0.000 -3.220 -1.339
==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

149 / 151
New way to do things:
Python Code (output edited)
cov_type='HC1' use_t=False)
> reg1_heterosk = model1.fit(cov_type='HC1'
cov_type='HC1',
> print(reg1_heterosk.summary())

OLS Regression Results


==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 698.9330 10.364 67.436 0.000 678.619 719.247
str -2.2798 0.519 -4.389 0.000 -3.298 -1.262
==============================================================================
Notes: [1] Standard Errors are heteroscedasticity robust (HC1)

Using the option cov_type='HC1' inside ols.fit()


is Python’s way of adjusting for heteroskedasticity
This is called the heteroskedasticity robust option

(Aside: cov_type='HC1' makes the same standard error


adjustment as Stata’s robust)

150 / 151
Homoskedastic standard errors are only correct if OLS
Assumption 4a is satisfied
Heteroskedastic standard errors are correct under both OLS
Assumption 4a and Assumption 4b

Practical implication

• If you know for sure that the error terms are homoskedastic, you
should simply use Python’s ols.fit()
• If you know for sure that the error terms are heteroskedastic,
you should use Python’s ols.fit(cov_type='HC1')
• If you do not know for sure, it is always safer to use
heteroskedasticity robust standard errors

151 / 151

You might also like