0% found this document useful (0 votes)
6 views376 pages

Tinywow Groupproject

The document outlines the specification of the OLS model, focusing on the Gauss-Markov assumptions that ensure OLS is the Best Linear Unbiased Estimator (BLUE). It reviews various functional forms, including quadratic and log models, and discusses their implications for estimating relationships, such as the effect of age on happiness. Additionally, it covers binary outcomes and the Linear Probability Model (LPM) for modeling binary outcomes using OLS.

Uploaded by

kvffhryykg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views376 pages

Tinywow Groupproject

The document outlines the specification of the OLS model, focusing on the Gauss-Markov assumptions that ensure OLS is the Best Linear Unbiased Estimator (BLUE). It reviews various functional forms, including quadratic and log models, and discusses their implications for estimating relationships, such as the effect of age on happiness. Additionally, it covers binary outcomes and the Linear Probability Model (LPM) for modeling binary outcomes using OLS.

Uploaded by

kvffhryykg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 376

Functional form: specification of the OLS model

5SSPP213: Econometrics

Dr. Ian Levely

4 March 2025

(DPE, KCL) W8 4 March 2025 1 / 32


Reveiw: Gauss-Markov assumptions

Outline

1. Reveiw: Gauss-Markov assumptions

2. Review of Special Functions

3. Elasticities and Log models

4. Quadratic and higher order functions

5. Binary outcomes

(DPE, KCL) W8 4 March 2025 2 / 32


Reveiw: Gauss-Markov assumptions

When GM assumptions met, OLS is "BLUE"

▶ Under G-M assumptions, OLS is "BLUE":


▶ Best
▶ Linear
▶ Unbiased
▶ Estimator

(DPE, KCL) W8 4 March 2025 3 / 32


Reveiw: Gauss-Markov assumptions

Gauss-Markov Assumptions

1. Linear in parameters
▶ Not necessarily in variables (this week)

(DPE, KCL) W8 4 March 2025 4 / 32


Reveiw: Gauss-Markov assumptions

Gauss-Markov Assumptions

1. Linear in parameters
▶ Not necessarily in variables (this week)

2. Random sampling
3. No perfect co-linearity
▶ Zero conditional mean of errors
▶ No Homoskedasticity (next week)

(DPE, KCL) W8 4 March 2025 4 / 32


Reveiw: Gauss-Markov assumptions

The regression equation

▶ Before running the regression, we create a regression model that


implicitly makes some assumptions about the causal relationship.

(DPE, KCL) W8 4 March 2025 5 / 32


Reveiw: Gauss-Markov assumptions

The regression equation

▶ Before running the regression, we create a regression model that


implicitly makes some assumptions about the causal relationship.
▶ This is related to the "zero conditional mean of errors assumption."
▶ If we have omitted variables or a mis-specified equation we’d expect a
correlation between the error and X variables.

(DPE, KCL) W8 4 March 2025 5 / 32


Example: Happiness and Age

▶ Do people get happier as they get older?

(DPE, KCL) W8 4 March 2025 6 / 32


Example: Happiness and Age

▶ Do people get happier as they get older?


▶ → Some research shows that happiness is an inverted U-shaped:

happyi = β 0 + β 1 age + βage 2 + . . . ui

(DPE, KCL) W8 4 March 2025 6 / 32


Example: Happiness and Age

▶ Do people get happier as they get older?


▶ → Some research shows that happiness is an inverted U-shaped:

happyi = β 0 + β 1 age + βage 2 + . . . ui

▶ What should be included in the controls?


▶ Health? Income? Small children

(DPE, KCL) W8 4 March 2025 6 / 32


Example: Happiness and Age

▶ Do people get happier as they get older?


▶ → Some research shows that happiness is an inverted U-shaped:

happyi = β 0 + β 1 age + βage 2 + . . . ui

▶ What should be included in the controls?


▶ Health? Income? Small children
▶ Recall that a control variable "holds constant" this variation when we
interpret other variables:
β̂ ∗ x1i = E [yi |x2, x3...xk ]

(DPE, KCL) W8 4 March 2025 6 / 32


Example: Happiness and Age

▶ Do people get happier as they get older?


▶ → Some research shows that happiness is an inverted U-shaped:

happyi = β 0 + β 1 age + βage 2 + . . . ui

▶ What should be included in the controls?


▶ Health? Income? Small children
▶ Recall that a control variable "holds constant" this variation when we
interpret other variables:
β̂ ∗ x1i = E [yi |x2, x3...xk ]

▶ → Depends on research question


(DPE, KCL) W8 4 March 2025 6 / 32
(DPE, KCL) W8 4 March 2025 7 / 32
Frijters and Beatton (2012

(DPE, KCL) W8 4 March 2025 8 / 32


Review of Special Functions

Outline

1. Reveiw: Gauss-Markov assumptions

2. Review of Special Functions

3. Elasticities and Log models

4. Quadratic and higher order functions

5. Binary outcomes

(DPE, KCL) W8 4 March 2025 9 / 32


Review of Special Functions

Review of special functions (JW Appendix A): Quadratic


functions

yi = β 0 + β 1 xi + β 2 xi2 + ui

▶ Quadratic equations capture diminishing (increasing) marginal returns


of x on y
▶ If β 1 > 0 and β 2 < 0, the function is inverse U-shaped, the maximum
β
point is x ∗ = −2β1 2
▶ If β 1 < 0 and β 2 > 0, the function is U-shaped, the minimum point is
β
x ∗ = −2β1 2

(DPE, KCL) W8 4 March 2025 10 / 32


Review of Special Functions

Review of special functions: Natural logarithm

▶ ln(x ) < 0 for 0 < x < 1


▶ ln(1) = 0
▶ ln(x ) > 0 for x > 1
▶ Difference in logs approximates proportional changes:
x1 − x0 ∆x
ln(x1 ) − ln(x0 ) ≈ = for small changes in x
x0 x0
▶ Proportional changes can be expressed in percentage changes:
∆x
100 × ∆ ln(x ) ≈ 100 × ≡ %∆x
x0

(DPE, KCL) W8 4 March 2025 11 / 32


Review of Special Functions

Review of special functions: Exponential function

▶ y = exp(x ) = e x
▶ exp(0) = 1
▶ exp(1) = 2.7183
▶ The exponential function is the inverse of the log function
▶ Note that ln(y ) = β 0 + β 1 x is equivalent to y = exp( β 0 + β 1 x )
▶ If β 1 > 0, then has an increasing marginal effect on y (i.e. another
unit has a larger effect than the previous unit)
▶ Recall that exp(a + b ) = exp(a) exp(b )

(DPE, KCL) W8 4 March 2025 12 / 32


Elasticities and Log models

Outline

1. Reveiw: Gauss-Markov assumptions

2. Review of Special Functions

3. Elasticities and Log models

4. Quadratic and higher order functions

5. Binary outcomes

(DPE, KCL) W8 4 March 2025 13 / 32


Elasticities and Log models

Useful concepts: Elasticities and semi-elasticities

▶ What happen to sales when the price increase by 1%? (price elasticity
of demand)
∂y /y ∆y x %∆y
= × =
∂x /x ∆x y %∆x

▶ Elasticity of y with respect to x:

∂y /y ∆y x %∆y
= × =
∂x /x ∆x y %∆x
▶ Semi-elasticity of y with respect to x:

∂y /y %∆y
=
∂x ∆x

(DPE, KCL) W8 4 March 2025 14 / 32


Elasticities and Log models

Estimating semi-elasticities (log-linear models)

▶ Returns to another year of education:

log(wages)i = β 0 + β 1 xi + ui

▶ Interpretation:
%∆wages ≈ (100 × β 1 )∆x
▶ wages ) = 0.584 + 0.083educ, R 2 = 0.186
Example: log(\

(DPE, KCL) W8 4 March 2025 15 / 32


Elasticities and Log models

More non-linearities: Elasticities (log-log models)

▶ Percentage change in CEO salary due to a 1% increase in sales:

log(CEO salary)i = β 0 + β 1 log(sales)i + ui

▶ Interpretation:
%∆CEO salary ≈ β 1 %∆sales
▶ \
Example: log(CEO salary) = 4.822 + 0.257 log(sales ), R 2 = 0.211

(DPE, KCL) W8 4 March 2025 16 / 32


Elasticities and Log models

Summary of functional forms involving logarithms

▶ Log-linear: log(y ) = β 0 + β 1 x (semi-elasticity)


▶ Log-log: log(y ) = β 0 + β 1 log(x ) (elasticity)
▶ Level-log: y = β 0 + β 1 log(x )

(DPE, KCL) W8 4 March 2025 17 / 32


Quadratic and higher order functions

Outline

1. Reveiw: Gauss-Markov assumptions

2. Review of Special Functions

3. Elasticities and Log models

4. Quadratic and higher order functions

5. Binary outcomes

(DPE, KCL) W8 4 March 2025 18 / 32


Quadratic and higher order functions

Introducing quadratic terms

▶ Decreasing or increasing marginal effects:

yi = β 0 + β 1 xi + β 2 xi2 + ui
∆y
▶ Marginal effect of: ∆x ≈ β 1 + 2β 2 x
▶ Decreasing marginal effects: β 1 > 0 and β 2 < 0
▶ Increasing marginal effects: β 1 < 0 and β 2 > 0

(DPE, KCL) W8 4 March 2025 19 / 32


Quadratic and higher order functions

▶ \
Example: wages = 3.73 + 0.298exper − 0.0061exper2 , R 2 = 0.093
▶ Marginal effect: ∆
widehaty ∆x ≈0.298−2(0.0061)exper
First year: 0.298
Second year: 0.286
Eleventh year: 0.176

(DPE, KCL) W8 4 March 2025 20 / 32


Quadratic and higher order functions

Estimating turning points

− β1
▶ Maximum or minimum: x ∗ = 2β 2
−0.298
▶ Example: x ∗ = 2(−0.0061)
≈ 24.4
▶ Check if estimates make sense

(DPE, KCL) W8 4 March 2025 21 / 32


Quadratic and higher order functions

A classic example: Cobb-Douglas production function

▶ Cobb-Douglas production function: Q = β 1 Lβ2 K β3


▶ Log-linear form: ln(Qi ) = ln( β 1 ) + β 2 ln(Li ) + β 3 ln(Ki ) + ui
▶ β 2 : elasticity of output with respect to labor
▶ β 3 : elasticity of output with respect to capital
▶ β 2 + β 3 : returns to scale

(DPE, KCL) W8 4 March 2025 22 / 32


Quadratic and higher order functions

How to choose? A general approach to modeling

▶ How do we identify nonlinearities?


▶ Economic theory, intuition
▶ Higher order polynomials are used to control for unknown forms of
non-linearities (e.g. hospital choice and distance)

yi = β 0 + β 1 xi + β 2 xi2 + β 3 xi3 + ui
▶ How can I know if a non-linear model of degree r is actually linear?

H0 : β 2 = β 3 = · · · = β r = 0 vs. at least one β j ̸= 0

(DPE, KCL) W8 4 March 2025 23 / 32


Quadratic and higher order functions

Which degree shall I use?

▶ A sequential hypothesis testing


1. Estimate a polynomial model of degree r
2. Test β x r = 0
▶ If you can reject that β x r = 0, then keep x r
▶ If you cannot reject that β x r = 0, then estimate a regression of degree
r-1
▶ Test β x r −1 = 0
▶ Continue until the highest order coefficient is significantly different
from zero

(DPE, KCL) W8 4 March 2025 24 / 32


Quadratic and higher order functions

Considerations when using quadratics

▶ A quadratic function in an explanatory variable allows for increasing or


decreasing effect
▶ The turning point should be calculated to see if it makes sense
▶ A seemingly small coefficient on the square of a variable can be
practically important in what it implies about a changing slope
▶ One can compute the slope at various values of x to see if it is
practically important
▶ Plotting the (changing) marginal effects against the values of x can be
useful

(DPE, KCL) W8 4 March 2025 25 / 32


Binary outcomes

Outline

1. Reveiw: Gauss-Markov assumptions

2. Review of Special Functions

3. Elasticities and Log models

4. Quadratic and higher order functions

5. Binary outcomes

(DPE, KCL) W8 4 March 2025 26 / 32


Binary outcomes

Linear Probability Model


▶ When the event we would like to explain is a binary outcome (y=1
denotes one outcome; y=0 denotes the alternative outcome), we are
modeling the probability of y=1
▶ This is because E [y |x ] = Pr (y = 1|x )
▶ And since probabilities must sum to one:
Pr (y = 0|x ) = 1 − Pr (y = 1|x ) → Never do two regressions!
▶ An OLS regression with binary outcome is called a Linear Probability
Model (LPM)
∆Pr (y = 1|x ) = β j ∆xj
▶ In the LPM the slope coefficient measures the predicted change in the
probability of y=1 when xj increases by one unit: β j represents the
increase/decrease in Pr (y = 1|x ) and will be a number between 0 and
1. Multiplying β j × 100 can be interpreted as the percentage points
increase/decrease.
(DPE, KCL) W8 4 March 2025 27 / 32
Binary outcomes

Goodness of fit for LPM

▶ ˆ (y = 1|x ) are meant to predict a


The estimated probabilities ŷ = Pr
zero-one outcome
▶ Dichotomize the predicted values: One could choose a threshold and
define the predicted value ỹi = 1 if ŷi g a and ỹi = 0 if ŷi < a
▶ Then one can tabulate the dichotomized predicted values against the
original y and calculate the percentage of corrected predicted:
▶ Many people use a = 0.5 as default threshold but this is not
reasonable if the unconditional frequency of y = 1 is much lower (or
higher)

(DPE, KCL) W8 4 March 2025 28 / 32


Binary outcomes

(DPE, KCL) W8 4 March 2025 29 / 32


Binary outcomes

Considerations with LPM


▶ A probability cannot be linearly related to independent variables for all
their possible values: if an additional child decreases the probability of
women working by -0.30, this would imply that having 4 additional
children reduces the probability by -0.30(4) = -1.20, which is
impossible.
▶ This implies that LPM works well for values of xj near the average in
the sample
▶ The Var (y ) = p (1 − p ) when y is a binary variable, and in LPM
Pr (y = 1) depends on the x’s. This means that the Var (y |x ) must be
heteroscedastic → standard errors should be robust
▶ In the LPM there is nothing that restricts the predictions
ˆ (y = 1|x ) between 0 and 1. If there are many fitted values
ŷ = Pr
outside the unit interval, the LPM is not a good model.
▶ One can restrict the fitted values to lie in the unit interval by imposing
a logit/probit function
(DPE, KCL) W8 4 March 2025 30 / 32
Binary outcomes

The logit and probit functions

▶ Logit and probit functions can be used to restrict the fitted values to
lie in the unit interval.

(DPE, KCL) W8 4 March 2025 31 / 32


Binary outcomes

(DPE, KCL) W8 4 March 2025 32 / 32


Multiple Regression & Goodness of fit
5SSPP213: Econometrics

Dr. Ian Levely

28 January 2025

(DPE, KCL) W3 28 January 2025 1 / 35


Multiple regression

Outline

1. Multiple regression

2. Goodness of fit

3. OLS variance and Hypothesis testing

4. The Gauss-Markov assumptions

(DPE, KCL) W3 28 January 2025 2 / 35


Multiple regression

Multiple regression

▶ Regression analysis is very versatile.


▶ We can add more independent variables (e.g)
▶ Control variables.
▶ The are many extensions to OLS to account for violations of some of
the OLS assumptions.

(DPE, KCL) W3 28 January 2025 3 / 35


Multiple regression

Adding controls
We estimated that for country i:

Lifeexpectancyi = ´ 0 + ´ 1 Poverty _ratei = ui


If we also think that average wealth in a country has an effect on life
expectancy, we can add this to the regression model:

Lifeexpectancy = ´ 0 + ´ 1 Poverty _rate + ´ 2 GDPi + ui

(DPE, KCL) W3 28 January 2025 4 / 35


Multiple regression

Adding controls
We estimated that for country i:

Lifeexpectancyi = ´ 0 + ´ 1 Poverty _ratei = ui


If we also think that average wealth in a country has an effect on life
expectancy, we can add this to the regression model:

Lifeexpectancy = ´ 0 + ´ 1 Poverty _rate + ´ 2 GDPi + ui


(1) (2)
Poverty rate -0.27 *** -0.19***
(0.02) (0.20)
GDP per capita 0.21***
(thousand US) (0.03)
r-square 0.57 0.73
n 123 123
▶ * p<0.10; ** p<0.05, ***p<0.01
(DPE, KCL) W3 28 January 2025 4 / 35
Multiple regression

Control variables

▶ Adding a control variable holds that variable constant in the analysis.

(DPE, KCL) W3 28 January 2025 5 / 35


Multiple regression

Control variables

▶ Adding a control variable holds that variable constant in the analysis.


▶ When to add control variables:
▶ Have an effect on the outcome,
▶ are correlated with the explanatory variable of interest.

(DPE, KCL) W3 28 January 2025 5 / 35


Multiple regression

Control variables

▶ Adding a control variable holds that variable constant in the analysis.


▶ When to add control variables:
▶ Have an effect on the outcome,
▶ are correlated with the explanatory variable of interest.
▶ Leaving out controls leads to omitted variable bias.

(DPE, KCL) W3 28 January 2025 5 / 35


Multiple regression

Estimating multiple regression (OLS):


▶ Just as before, we minimize the sum of squared residuals:
ûi = yi − ( ´ˆ 0 ´ˆ 1 x1i + ´ˆ 1 x2i )

n
min ∑ ûi2
i =1
▶ By setting the FOC’s (wrt to ´ˆ0 , ´ 1 and ´ 2 ) to 0.
▶ We get:
n
∑ yi − ´ˆ 0 − ´ˆ 1 x1i − ´ˆ 2 x2i = 0

i =1
n
∑ x1i yi − ´ˆ 0 − ´ˆ 1 x1i − ´ˆ 2 x2i = 0

i =1
n
∑ x2i yi − ´ˆ 0 − ´ˆ 1 x1i − ´ˆ 2 x2i = 0

i =1
(DPE, KCL) W3 28 January 2025 6 / 35
Multiple regression

Estimating multiple regression (OLS):


▶ Given the following model:

ŷ = ´ˆ 0 + ´ˆ 1 x1 + ´ˆ 2 x2 + ... + ´ˆ k xk
▶ We can write the same equation in terms of ∆ increments:

∆ŷ = ´ˆ 1 ∆x1 + ´ˆ 2 ∆x2 + ... + ´ˆ k ∆xk


▶ Suppose we assume that changes are only observed in x1 this implies
that ∆x2 ...∆xk = 0. Or to put it differently, if we hold constant x2 ...xk
then,
∆ŷ = ´ˆ 1 ∆x1
∆ŷ
´ˆ 1 =
∆x1
▶ So for every unit change in x1 , ŷ will vary by ´ˆ 1 , ceteris paribus (to be
read: holding everything else constant)
(DPE, KCL) W3 28 January 2025 7 / 35
Multiple regression

Estimating multiple coefficients

▶ Suppose we only have two independent variables, k = 2:

´ˆ 0 = ȳ − ´ˆ 1 x̄1 − ´ˆ 2 x̄2

V2 Cy 1 − C12 Cy 2
´ˆ 1 = 2
V1 V2 − C12

V1 Cy 2 − C12 Cy 1
´ˆ 2 = 2
V1 V2 − C12
▶ Where:
▶ Vi is the variance of variable k.
▶ Cyi is the covariance between y and variable k.
▶ C12 is the covariance between variables 1 and 2.

(DPE, KCL) W3 28 January 2025 8 / 35


Multiple regression

Omitted variables

(DPE, KCL) W3 28 January 2025 9 / 35


Multiple regression

Simpsons’s paradox

▶ Simpson’s paradox: A correlation observed in a large group is


reversed when looking at smaller groups.
▶ This is an example of a confound or "lurking" variable.

(DPE, KCL) W3 28 January 2025 10 / 35


Multiple regression

Simpsons’s paradox: voting for the far-right in Germany

Source: https://www.ft.com/content/94e3acec-a767-11e7-ab55-27219df83c97

(DPE, KCL) W3 28 January 2025 11 / 35


Multiple regression

Ommited variable bias

▶ Let’s say we estimate:

yi = ´′0 + ´′1 x1 + vi

▶ BUT we the true population model is:

y = ´ 0 + ´ 1 x1i + ´ 2 x2i + ui
This implies that:

vi = ´ 2 x2i + ui

(DPE, KCL) W3 28 January 2025 12 / 35


Multiple regression

Omited variable bias

▶ Recall:
n  
∑ ( xi − x̄ )vi
′  i =1
E [ ´ˆ1 ] = ´ 1 + E 


 n 
2
∑ (xi − x̄ )
i =1
▶ When:
1. cov (x1 , x2 ) ̸= 0
2. ´ 2 ̸= 0
▶ Then the numerator is NOT zero, and we have a biased estimator.
▶ (Also, refer to estimators for multiple regression: bias is proportional
to the cov (x1 , x2)

(DPE, KCL) W3 28 January 2025 13 / 35


Multiple regression

▶ Summary of bias of ´′1 when x2 is omitted from the regression:


cov (x1 , x2 ) > 0 cov (x1 , x2 ) > 0
´2 > 0 + -
´2 < 0 - +

(DPE, KCL) W3 28 January 2025 14 / 35


Multiple regression

▶ Summary of bias of ´′1 when x2 is omitted from the regression:


cov (x1 , x2 ) > 0 cov (x1 , x2 ) > 0
´2 > 0 + -
´2 < 0 - +
▶ BUT, we do not always know what these parameters if we’ve not run
the regression!

(DPE, KCL) W3 28 January 2025 14 / 35


Goodness of fit

Outline

1. Multiple regression

2. Goodness of fit

3. OLS variance and Hypothesis testing

4. The Gauss-Markov assumptions

(DPE, KCL) W3 28 January 2025 15 / 35


Goodness of fit

Goodness of fit

▶ The regression "explains"


variation in dependent variable:
▶ Compare difference between
observations and mean
(yi − ȳ ) with the residuals:
predicted values are closer!

(DPE, KCL) W3 28 January 2025 16 / 35


Goodness of fit

Goodness of fit
▶ Total sum of squares (n x sample variance)
n
SStot = ∑ (yi − ȳ )2
i −1

▶ Explained variation: the regression sum of squares


n
SSreg = ∑ (ŷi − ȳ )2
i −1

▶ where ȳ = ´ˆ 0 + ´ˆ 1 xi , the predicted value of yi .


▶ The proportion of variation explained by the regression is:

Variation of estimated ŷ
R2 =
Total variation of observed y
SSreg
R2 = (1)
SStot
(DPE, KCL) W3 28 January 2025 17 / 35
Goodness of fit

Goodness of fit

▶ The unexplained variation in y is the residual sum of squares: (aka


sum of squared errors).
n
SSres = ∑ (ŷi − yi )2 (2)
i −1

▶ The R 2 can also be defined as:

SSres
R2 = 1 − (3)
SStot

(DPE, KCL) W3 28 January 2025 18 / 35


Goodness of fit

R-square properties

▶ R 2 ranges from 0 to 1.
▶ If R 2 = 0 =⇒ The OLS regression does not explained any variation in

the values of y
▶ If R 2 = 1 =⇒ The OLS regression explains all the variation of y.
▶ For regressions with just one independent variable (bivariate
regression), the r-square is the square of the correlation coefficient.

(DPE, KCL) W3 28 January 2025 19 / 35


OLS variance and Hypothesis testing

Outline

1. Multiple regression

2. Goodness of fit

3. OLS variance and Hypothesis testing

4. The Gauss-Markov assumptions

(DPE, KCL) W3 28 January 2025 20 / 35


OLS variance and Hypothesis testing

OLS as an estimator

▶ In econometrics, we deal with sample data, which we use to make


inferences about a population (i.e., data-generating process).

(DPE, KCL) W3 28 January 2025 21 / 35


OLS variance and Hypothesis testing

OLS as an estimator

▶ In econometrics, we deal with sample data, which we use to make


inferences about a population (i.e., data-generating process).
▶ ´ˆ is a random variable, since it will be slightly different for each
sample of n.

(DPE, KCL) W3 28 January 2025 21 / 35


OLS variance and Hypothesis testing

OLS as an estimator

▶ In econometrics, we deal with sample data, which we use to make


inferences about a population (i.e., data-generating process).
▶ ´ˆ is a random variable, since it will be slightly different for each
sample of n.
▶ ´ k follows a t-distribution with (n − k − 1) degrees of freedom

(DPE, KCL) W3 28 January 2025 21 / 35


OLS variance and Hypothesis testing

(DPE, KCL) W3 28 January 2025 22 / 35


OLS variance and Hypothesis testing

Normality of the error term.

▶ When n is a random draw from the populaiton and:


E (u | x1 , x2 , ..., xk ) = 0

▶ The distribution of u is normally distributed (t-dist.) with zero mean


and variance Ã2 :
Ã2 : u ∼ N (0, Ã2 )
▶ This is due to the central limit theorem.

(DPE, KCL) W3 28 January 2025 23 / 35


OLS variance and Hypothesis testing

Standard error of regression (for 1 x variable)

▶ We estimate à using the residuals, ûi , to calculate the regression


standard error (or root-mean squared error):
s
1
Ã̂ =
(n − 2) ∑ ûi2 (4)

▶ Using Ã̂ we can calculate the standard error of the OLS estimator ´ˆ 1

s
se ( ´ˆ 1 ) = p
∑(xi − x̄ )2

(DPE, KCL) W3 28 January 2025 24 / 35


OLS variance and Hypothesis testing

Standard error of regression (for 1 x variable)

▶ We estimate à using the residuals, ûi , to calculate the regression


standard error (or root-mean squared error):
s
1
Ã̂ =
(n − 2) ∑ ûi2 (4)

▶ Using Ã̂ we can calculate the standard error of the OLS estimator ´ˆ 1

s
se ( ´ˆ 1 ) = p
∑(xi − x̄ )2
´ˆ − ´
t=
se ( ´ˆ )

(DPE, KCL) W3 28 January 2025 24 / 35


OLS variance and Hypothesis testing

Standard errors for multiple regression

▶ With k estimators, the se for ´ˆj is:


ˆ )= Ã2
se ( ´j (5)
SSTj (1 − Rj2 )

▶ Where:
▶ SSTj = ∑ni=1 (xi j − x̄j )
▶ Rj2 is the R-square from regressing xj on all other variables:
Xij = ³0 + ³1 x[ i1] + ³2 xi2 ...³k −1 xi,(k −1) + ui

(DPE, KCL) W3 28 January 2025 25 / 35


OLS variance and Hypothesis testing

Standard error implications

ˆ )= Ã2
se ( ´j (6)
SSTj (1 − Rj2 )

▶ se ( ´ j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.

(DPE, KCL) W3 28 January 2025 26 / 35


OLS variance and Hypothesis testing

Standard error implications

ˆ )= Ã2
se ( ´j (6)
SSTj (1 − Rj2 )

▶ se ( ´ j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( ´ j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.

(DPE, KCL) W3 28 January 2025 26 / 35


OLS variance and Hypothesis testing

Standard error implications

ˆ )= Ã2
se ( ´j (6)
SSTj (1 − Rj2 )

▶ se ( ´ j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( ´ j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.
▶ However, note that more variation (relative to covariances) leads to
lower coefficient size.

(DPE, KCL) W3 28 January 2025 26 / 35


OLS variance and Hypothesis testing

Standard error implications

ˆ )= Ã2
se ( ´j (6)
SSTj (1 − Rj2 )

▶ se ( ´ j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( ´ j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.
▶ However, note that more variation (relative to covariances) leads to
lower coefficient size.
▶ Adding a variable that’s not needed (i.e., no omitted variable bias) will
increase the variance of coefficients when there’s a positive correlation
with another variable.

(DPE, KCL) W3 28 January 2025 26 / 35


OLS variance and Hypothesis testing

The OLS coefficients and hypothesis testing

▶ Just like x̄ follows a t-distribition, ´ˆ is a random variable that also


follows a t-distribution.
▶ In most cases, we’d test:

H0 = ´ 1 = 0

HA = ´ 1 ̸ = 0
▶ When H0 is ´ = 0, then:

´ˆ
t=
se ´ˆ

(DPE, KCL) W3 28 January 2025 27 / 35


OLS variance and Hypothesis testing

The t statistic

▶ Remember that population parameters (like Ã) are unknown. So à is


estimated using the regresion standard error, s, which can, then, be
used to calculate se ( ´ˆ j ).
▶ Under CLM assumptions it can be assumed the following distribution:

( ´ˆ j − ´ j )
∼ tn − k − 1
se ( ´ˆ j )
▶ The t distribution has has df = n − k − 1, and k indicates the
number of slopes being estimated.

(DPE, KCL) W3 28 January 2025 28 / 35


The Gauss-Markov assumptions

Outline

1. Multiple regression

2. Goodness of fit

3. OLS variance and Hypothesis testing

4. The Gauss-Markov assumptions

(DPE, KCL) W3 28 January 2025 29 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

(DPE, KCL) W3 28 January 2025 30 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i

(DPE, KCL) W3 28 January 2025 30 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i
▶ We can have non linear variables, e.g., ´ + ´ x 2 + u
0 1 i i

(DPE, KCL) W3 28 January 2025 30 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i
▶ We can have non linear variables, e.g., ´ + ´ x 2 + u
0 1 i i
▶ For example, the Mincer equation is typically used to estimate the
effect of education and experience on earnings:

(DPE, KCL) W3 28 January 2025 30 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i
▶ We can have non linear variables, e.g., ´ + ´ x 2 + u
0 1 i i
▶ For example, the Mincer equation is typically used to estimate the
effect of education and experience on earnings:

incomei = ´ o + ´ 1 Educationi + ´ 2 Experiencei + ´ 3 Experiencei2 + ϵi

(DPE, KCL) W3 28 January 2025 30 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).

(DPE, KCL) W3 28 January 2025 31 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s

(DPE, KCL) W3 28 January 2025 31 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s
▶ This allows for hypothesis testing:
▶ With sample size sufficiently large, the central limit theorem tells us
that ´ˆ will follow a t-distribution.

(DPE, KCL) W3 28 January 2025 31 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s
▶ This allows for hypothesis testing:
▶ With sample size sufficiently large, the central limit theorem tells us
that ´ˆ will follow a t-distribution.
▶ How can we test for random sampling?

(DPE, KCL) W3 28 January 2025 31 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s
▶ This allows for hypothesis testing:
▶ With sample size sufficiently large, the central limit theorem tells us
that ´ˆ will follow a t-distribution.
▶ How can we test for random sampling?

(DPE, KCL) W3 28 January 2025 31 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others

(DPE, KCL) W3 28 January 2025 32 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

(DPE, KCL) W3 28 January 2025 32 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

▶ If experience is calculated as Agei − Educationi , we cannot calculate


the OLS estimates.
▶ The X matrix will not be "full rank" which means we’d be essentially
dividing by zero when estimating the ´ vector.

(DPE, KCL) W3 28 January 2025 32 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

▶ If experience is calculated as Agei − Educationi , we cannot calculate


the OLS estimates.
▶ The X matrix will not be "full rank" which means we’d be essentially
dividing by zero when estimating the ´ vector.
▶ Common occurrence is the "dummy variable trap":

(DPE, KCL) W3 28 January 2025 32 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

▶ If experience is calculated as Agei − Educationi , we cannot calculate


the OLS estimates.
▶ The X matrix will not be "full rank" which means we’d be essentially
dividing by zero when estimating the ´ vector.
▶ Common occurrence is the "dummy variable trap":
▶ Dummy variables: di ∈ 0, 1

(DPE, KCL) W3 28 January 2025 32 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

4 Zero conditional mean:


▶ The error term in regression model, ui , should be zero, conditional on
controlling for x variables:
E (u | x1 , x2 , ..., xk ) = 0

(DPE, KCL) W3 28 January 2025 33 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

4 Zero conditional mean:


▶ The error term in regression model, ui , should be zero, conditional on
controlling for x variables:
E (u | x1 , x2 , ..., xk ) = 0

▶ Proof: Expected value of OLS estimator.

(DPE, KCL) W3 28 January 2025 33 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

4 Zero conditional mean:


▶ The error term in regression model, ui , should be zero, conditional on
controlling for x variables:
E (u | x1 , x2 , ..., xk ) = 0

▶ Proof: Expected value of OLS estimator.


▶ What happens when a variable is an extra variable is added that isn’t
need?

(DPE, KCL) W3 28 January 2025 33 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

5 No Homoskedasticity:
▶ The variance in the error term, ui , given the independent variables is
not correlated with any X variable.
Var (u | x1 , x2 , ..., xk ) = Ã2

(DPE, KCL) W3 28 January 2025 34 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

5 No Homoskedasticity:
▶ The variance in the error term, ui , given the independent variables is
not correlated with any X variable.
Var (u | x1 , x2 , ..., xk ) = Ã2

▶ Where Ã2 is the variance of ui

(DPE, KCL) W3 28 January 2025 34 / 35


The Gauss-Markov assumptions

Gauss-Markov Assumptions

5 No Homoskedasticity:
▶ The variance in the error term, ui , given the independent variables is
not correlated with any X variable.
Var (u | x1 , x2 , ..., xk ) = Ã2

▶ Where Ã2 is the variance of ui


▶ When this condition is not met, OLS estimates are still unbiased, but
the standard errors will potentially be biased.

(DPE, KCL) W3 28 January 2025 34 / 35


The Gauss-Markov assumptions

OLS is "BLUE"

▶ Under G-M assumptions, OLS is "BLUE":


▶ Best
▶ Linear
▶ Unbiased
▶ Estimator

(DPE, KCL) W3 28 January 2025 35 / 35


W1 (Review of) Statistical Concepts

Dr. Ian Levely

16 September 2024

(DPE, KCL) W1 16 September 2024 1 / 53


Why do economists need data?

▶ Describe populations etc...


▶ Make inferences about larger populations from samples,
▶ Test (causal) hypotheses to learn about how the world works.

(DPE, KCL) W1 16 September 2024 2 / 53


A scatter plot of Poverty and Life expectancy

(DPE, KCL) W1 16 September 2024 3 / 53


1. Describing data

▶ The mean:
x̄ = 1
n ∑ni=1 xi
▶ variance:
1
Var (x ) = n ∑(xi − x̄ )2
▶ standard deviation q
1
σ= n ∑(xi − x̄ )2

(DPE, KCL) W1 16 September 2024 4 / 53


Variance and standard deviation of a sample

▶ For a sample:
1
Var (x ) = n −1 ∑(xi − x̄ )2
▶ Sample standard deviation:
q
1
σ= n −1 ∑(xi − x̄ )2
▶ These are the formulas we use to estimate the population parameter in
a finite sample.
▶ Where does x̄ come from? The sample!

(DPE, KCL) W1 16 September 2024 5 / 53


A scatter plot of Poverty and Life expectancy

(DPE, KCL) W1 16 September 2024 6 / 53


The covariance and correlation
▶ The covariance:
▶ The covariance between two variables, x and y is defined as the
average of the product between the distance of each variable with
respect to its mean.
n
1
Covxy = ∑
n − 1 i =1
(xi − x̄ )(yi − ȳ )

▶ The values of the covariance can be positive or negative.


▶ Units? . . . x units X y units.
▶ The correlation coefficient:
▶ To solve this, we can calculate the (sample) correlation coefficient,
(also called Peason’s r ).

cov (x, y )
rxy =
sx sy
cov (xy )
▶ For the population: ρ = σx σy
▶ Units? . . . unit free – always between(-1,1)
(DPE, KCL) W1 16 September 2024 7 / 53
Interpreting the covariance

▶ What are units of the covariance?

▶ x: % of population living under poverty


▶ y: life expectancy in years
n
1
(n − 1) i∑
cov (x, y ) = (xi − x̄ ) (yi − ȳ )
=1
| {z } | {z }
USD years

▶ In this case the units is USD*years (which is not very useful).


▶ Hard to compare across different measures – say proxies for income
and wealth.

(DPE, KCL) W1 16 September 2024 8 / 53


The correlation coefficient

▶ To solve this, we can calculate the (sample) correlation coefficient,


(also called Peason’s r ).

cov (x, y )
rxy =
s x sy

cov (xy )
▶ For the population: ρ = σx σy

(DPE, KCL) W1 16 September 2024 9 / 53


OLS

▶ What if we want to know how a change in the poverty rate affects life
expectancy?
▶ Neither the covariance or correlation coffecient can give us that
information.
▶ If we write an equation describing the relationship, it might look
something like:
Lifeexpectancy = β 0 + β 1 Poverty

(DPE, KCL) W1 16 September 2024 10 / 53


A scatter plot of Poverty and Life expectancy

(DPE, KCL) W1 16 September 2024 11 / 53


Summarizing the relationship

▶ n = 123
▶ Life expectancy at birth:
▶ x̄ = 72.20; s = 7.34.

▶ Extreme poverty rate:


▶ x̄ = 13.91; s = 20.31.

▶ Cov: -112.44
▶ ρ = −0.75

(DPE, KCL) W1 16 September 2024 12 / 53


2. Making inferences about data

▶ The population is very broadly all of the individuals that fit a certain
criteria.
▶ It could be a known quantity, of specific individuals (e.g. the
population of the UK),
▶ BUT it can also be more esoteric: people who haven’t been born yet;
events that haven’t happened.
▶ We can think of a sample as a random draw from the population.

(DPE, KCL) W1 16 September 2024 13 / 53


Statistics as random variables

▶ When we have a sample that is drawn randomly from a population,


then we can treat our sample statistics as random variables that have
a distribution;
▶ We use our sample statistics to make inferences about the population
– or data-generating process.

(DPE, KCL) W1 16 September 2024 14 / 53


Some definitions

▶ Random variable X:
▶ outcome of a random process.
▶ Can be continuous or discrete
▶ Probability Density Function (pdf) of X, f(x)
▶ When X is discrete: The probability that variable X can take a
particular value, Pr [X = a] when X is continuous: It makes no sense
that the probability of a continuous random variable take on a
particular value, we use pdf to compute events involving a range of
values. The probabilities of a (small) set of values, for example
Pr [a f X f b ].

(DPE, KCL) W1 16 September 2024 15 / 53


The ‘moments’ of a distribution:

1. Measures of central tendency


2. Variance
3. Skewedness (symmetry)
4. Kurtosis (tails)

(DPE, KCL) W1 16 September 2024 16 / 53


Central tendency
▶ We can think of the population mean, µ, as the expected value of a
particular variable.

E [x ] = µ

▶ We can think of the sample mean as being a random variable with the
expected value equal to the population mean:

E [x̄ ] = µ

▶ Similarly: E [s ] = σ, etc...
(DPE, KCL) W1 16 September 2024 17 / 53
Expected value

▶ For a discrete variable, the expected value is the sum of all of the
possible outcomes (xj ) times the probability of those outcomes
occuring f (xj )
k
E ( x ) = x 1 f ( x 1 ) + x 2 f ( x2 ) . . . x k f ( x k ) = ∑ x j f ( xj )
j =1

▶ For a continuous variable, we take the integral of the pdf over x.


Z ∞
xf (x )dx
−∞

(DPE, KCL) W1 16 September 2024 18 / 53


Some useful properties of expected value

1. For any constant c, E (c ) = c


2. For an constants a and b, E (aX + b ) = aE (X ) + b
3. If {a1 , a2 , ...an } are constants and {X1 , X2 , ...Xn } are random
variables, then

E (a1 X1 + a2 X2 an Xn ) = a1 E (X1 + a2 E (X2 ) + an E (Xn )}

(DPE, KCL) W1 16 September 2024 19 / 53


Uniform distribution
▶ In a uniform distribution, each value of of x has the same probability:
▶ For a discrete variable, rolling a single die, it looks like this:

(DPE, KCL) W1 16 September 2024 20 / 53


Probablity distribution for rolling 2 dice

(DPE, KCL) W1 16 September 2024 21 / 53


Contnous uniform distribution

▶ In a continues distribution, a function describes the probability of each


value occuring over a given range.
▶ For a continus variable, a uniform distribution would look like this:

(DPE, KCL) W1 16 September 2024 22 / 53


Contnous uniform distribution

▶ In a continues distribution, a function describes the probability of each


value occuring over a given range.
▶ For a continus variable, a uniform distribution would look like this:

(DPE, KCL) W1 16 September 2024 23 / 53


2: The variance

▶ We estimate the population variance, σ2 , with the sample variance,

1
s2 = n −1 ∑(xi − x̄ )2

E [s 2 ] = σ 2
▶ Why n-1? This is the degrees of freedom.
▶ (Over)simplification: because we have x̄, this “counts as an
observation”.
▶ Degrees of freedom: the number of observations (n), minus the
number of random variables in the test.
▶ x̄ is a random variable

(DPE, KCL) W1 16 September 2024 24 / 53


Normal density function

(DPE, KCL) W1 16 September 2024 25 / 53


Normal density functions with different means/sd’s

(DPE, KCL) W1 16 September 2024 26 / 53


3-4 Skewness and Kurtosis

(DPE, KCL) W1 16 September 2024 27 / 53


A few things to consider about distributions:

▶ Joint distributions tell us the probability that 2 random variables


simultaneously take on a certain value.

fx,y (x, y ) = P (X = x, Y = y ).
▶ Conditional distributions tell us the distribution of variable, Y, given
the value of another variable X.

fY | X ( Y | X ) = P ( X = x | X = x )

P (X = x, Y = y
=
P (X = x )

(DPE, KCL) W1 16 September 2024 28 / 53


Conditional Expectation

▶ We are interested in how one variable’s distribution changes with


respect to another variable
Question: How does the shape of the distribution of lifespan change in
response to income?
m m
E (Y | x ) = ∑ yj f Y | X = ∑ yj ( P ( Y = x | X = x)
j =1 j =1

(DPE, KCL) W1 16 September 2024 29 / 53


Properties of conditional Expectation

1. E [c (x ) | X ] = c (X ), for any function c(X).


2. For functions a(X) and b(X):

E [a (X ) + b (X ) | X ] = a (X )E (Y ) + b (X )
3. If X and Y are independent, then E (Y | X ) = E (Y )
4. E [E (Y )] = E (Y ), the law of iterated expectations.

(DPE, KCL) W1 16 September 2024 30 / 53


Using data to test hypotheses

(DPE, KCL) W1 16 September 2024 31 / 53


Central limit theorem

Central Limit Theorem

▶ x̄ has a distribution.
▶ The central limit theorem tells us that x̄ is normally distributed.
▶ This is true even when the distribution of x is not normal.

(DPE, KCL) W1 16 September 2024 33 / 53


Central limit theorem

Normally distributed data

(DPE, KCL) W1 16 September 2024 34 / 53


Central limit theorem

Distribution of means, n=20

(DPE, KCL) W1 16 September 2024 35 / 53


Central limit theorem

Non-Normally distributed data

(DPE, KCL) W1 16 September 2024 36 / 53


Central limit theorem

Distribution of means, n=20

(DPE, KCL) W1 16 September 2024 37 / 53


The student’s t distributions

T-distribution
▶ x̄ is approximately normally distributed:
▶ Central tendency: µ
▶ Standard deviation:
σ

n

▶ But we don’t know σ!


s

n

▶ This is called the standard error (of x̄ in this case).


▶ More precisely, x̄ ’s distribution approaches normal as n increases.
▶ It follows a t-distribution, with n-1 degrees of freedom.
▶ (Sometimes called "student dist.")
(DPE, KCL) W1 16 September 2024 39 / 53
The t-distribution

(DPE, KCL) W1 16 September 2024 41 / 53


The t-distribution

Degrees of freedom

▶ Generally, adding variables to a test statistic means that you need to


reduce the degrees of freedom by 1.
▶ If we have the x̄ then we only need (n-1) observations to know the
whole data set.
▶ For the t-distribution: with more degrees of freedom, approaches
normal distribution (this makes sense: when n=N we’re just dealing
with population and can use the z-score).

(DPE, KCL) W1 16 September 2024 42 / 53


The t-distribution

Inference
▶ The central limit theorem makes it possible to make inferences about
the population based on the sample statistics.
1. Confidence intervals: Based on our estimate that of µ and σ, (x̄ and
s, respectively), we can calculate an interval for at that shows how
spread out our estimate is likely to be (this is imprecise wording).
2. Hypothesis testing: we start by saying IF µ is equal to something
(our hypothesis), who likely are we to observe the value of x̄?
▶ Think of this like working backwards: if we know something about the
distribution of x̄, we can calculate the "area under the curve" for a
given µ
▶ This makes it possible to test hypotheses based on sample statistics as
well as construct confidence intervals.
▶ This basically makes statistics useful! Without this, we cannot use
statistics to test theories – just to describe data.
(DPE, KCL) W1 16 September 2024 43 / 53
The t-distribution

(DPE, KCL) W1 16 September 2024 44 / 53


The t-distribution

(DPE, KCL) W1 16 September 2024 45 / 53


The t-distribution

Interpreting the confidence interval

(DPE, KCL) W1 16 September 2024 46 / 53


The t-distribution

The distribution of x̄

▶ For a sample, we calculate the t-stat, which tells us the chances that a
sample mean is above/below a certain level of µ.

x̄ − µ
t= √
s/ n

▶ What’s different about the t-distribution?


▶ What do we actually know in this equation from the sample?
▶ We don’t know µ.

(DPE, KCL) W1 16 September 2024 47 / 53


The t-distribution

Hypothesis testing

▶ Hypothesis testing: we first make a guess about µ then use the t-stat
to test the hypothesis → given the sample data, how likely is a given
value of µ?
▶ We then plug that guess into the t-stat formula and compare that to
the critical t-value to evaluate how likely the given outcome is.
▶ We call this guess the Null Hypothesis, H0 : µ = µ0 .
▶ We compare this to the Alternative Hypothesis, Ha .
▶ The two-sided test: Ha : µ ̸= µ0
▶ The one-sided test: Ha : µ > µ0
▶ The one-sided test: Ha : µ < µ0
▶ Careful: the critical t-value for 1 and 2-sided tests is different

(DPE, KCL) W1 16 September 2024 48 / 53


The t-distribution

Rejecting the null

▶ Before doing the test, we pick a


critical value t ∗ .
▶ As with confidence intervals, it’s
usually 95%.
▶ If t > t ∗ , we reject the null
hypothesis.

Rejecting the null means that we


can say that – for the given
confidence level – we can be
sure that the null hypothesis is
not true. BUT... we can never
confrm the null hypothesis.

(DPE, KCL) W1 16 September 2024 49 / 53


The t-distribution

The two-sided t-test

▶ H0 : µ = 0
▶ Ha : µ ̸= 8 (N/B: 2-sided test!)
▶ We observe a sample mean, x̄, which is a random variable.
▶ The hypotesis test is asking: How likely are we to observe the value of
x̄ that we do, if µ = 0?
▶ Assuming µ = 0, and estimating σ with s (i.e. the sample s.d.), we
know how likely a particular value of x̄ is.
▶ We pick a confidence level – let’s say 95%. This means we will reject
the null when our value of (±)x̄ would only occur in < 0.05 samples
under H0.

(DPE, KCL) W1 16 September 2024 50 / 53


The t-distribution

The p-value

▶ A p-value is another way of


expressing the results of
hypothesis testing.
▶ It expresses the area under the
curve (one or two-tailed
depending on the type of test).
▶ Think of it as the lowest possible
value of α that we can reject the
null at. (i.e. the highest
confidence interval.

(DPE, KCL) W1 16 September 2024 51 / 53


The t-distribution

The p-value (cont.)

▶ The p-value < 0.05 corresponds


to the 95% confidence level: it
means there’s a 5% chance that
we reject the null erroneously
(called a "false positive" or
type-I error).
▶ Think: how are the p-value,
t-test and confidence interval
related?

(DPE, KCL) W1 16 September 2024 52 / 53


The t-distribution

The p-value (cont.)

▶ For the previous t-test:


▶ t = -0.55, 49 dof
▶ For the one-sided test (Ha : µ < 7),
▶ p= 0.29
That means the result is only significant at the 71% level. If µ = 7,
we’d expect a result at least as extreme in 29% of samples.
▶ What’s the p-value for the two-sided test? (p=0.58)

(DPE, KCL) W1 16 September 2024 53 / 53


Diagnostic tests
5SSPP213: Econometrics

Dr. Ian Levely

11 March 2025

(DPE, KCL) W9 11 March 2025 1 / 22


Reveiw: Gauss-Markov assumptions

Outline

1. Reveiw: Gauss-Markov assumptions

2. Heteroskedasticity

(DPE, KCL) W9 11 March 2025 2 / 22


Reveiw: Gauss-Markov assumptions

When GM assumptions met, OLS is "BLUE"

▶ Under G-M assumptions, OLS is "BLUE":


▶ Best
▶ Linear
▶ Unbiased
▶ Estimator

(DPE, KCL) W9 11 March 2025 3 / 22


Reveiw: Gauss-Markov assumptions

Gauss-Markov Assumptions

1. Linear in parameters
2. Random sampling
3. No perfect co-linearity
4. Zero conditional mean of errors
5. Homoskedasticity – or constant error term.

(DPE, KCL) W9 11 March 2025 4 / 22


Heteroskedasticity

Outline

1. Reveiw: Gauss-Markov assumptions

2. Heteroskedasticity

(DPE, KCL) W9 11 March 2025 5 / 22


Heteroskedasticity

Review: Normality of the error term.

▶ When n is a random draw from the population and:


E (u | x1 , x2 , ..., xk ) = 0

▶ The distribution of u is normally distributed (t-dist.) with zero mean


and variance σ2 :
σ2 : u ∼ N (0, σ2 )
▶ This is due to the central limit theorem.

(DPE, KCL) W9 11 March 2025 6 / 22


Heteroskedasticity

Standard error of regression

▶ We use, then, the residuals, ûi , to calculate the regression standard


error (RSE):
s
1
(n − 2) ∑ i
RSE = û 2 (1)

▶ Using s we can calculate the standard error of the OLS estimator β̂ 1


s
se ( β̂ 1 ) = p
∑(xi − x̄ )2

(DPE, KCL) W9 11 March 2025 7 / 22


Heteroskedasticity

Standard error of regression

▶ We use, then, the residuals, ûi , to calculate the regression standard


error (RSE):
s
1
(n − 2) ∑ i
RSE = û 2 (1)

▶ Using s we can calculate the standard error of the OLS estimator β̂ 1


s
se ( β̂ 1 ) = p
∑(xi − x̄ )2

▶ When we estimate standard errors of β’s, we make the assumption


that the error term is constant across all i’s.

(DPE, KCL) W9 11 March 2025 7 / 22


Heteroskedasticity

Standard errors for multiple regression

▶ With k estimators, the se for β̂ j is:


σ
se ( β̂j ) = (2)
SSTj (1 − Rj2 )

▶ Where:
▶ SSTj = ∑ni=1 (xij − x̄j )
▶ Rj2 is the R-square from regressing xj on all other variables:
Xij = α0 + α1 xi1 + α2 xi2 ...αk −1 xi,(k −1) + ui

(DPE, KCL) W9 11 March 2025 8 / 22


Heteroskedasticity

What is heteroskedasticity.

▶ Variance in the error term, u, given the various independent variables is


the same for all combinations of outcomes of the explanatory variables.
▶ Given this model:

y = β 0 + β 1 x1 + β 2 x2 + ... + β k xk + u

Homoskedasticiy implies that:

Var (u | x1 , x2 , ..., xk ) = σ2

▶ We have a heteroskedasticity problem when:

Var (u | x1 , x2 , ..., xk ) ̸= σ2

(DPE, KCL) W9 11 March 2025 9 / 22


Heteroskedasticity

A graphical representation of homocedasticity

(DPE, KCL) W9 11 March 2025 10 / 22


Heteroskedasticity

A graphical representation of heteroscedasticity

(DPE, KCL) W9 11 March 2025 11 / 22


Heteroskedasticity

Heteroskedasticity

▶ A regression suffers from heteroskedasticity if the variance of the


error, conditional on the explanatory variables, is not constant.
▶ Unconditional error variance is unaffected by heteroskedasticity, so β̂ is
still unbiased
▶ The variance of u is dependent on E [y |x ].
▶ How to spot heteroskedasticity? We can make some educated guesses
by plotting the residuals.
▶ A note of caution: if the E [y |x ] is misspecified then a test for

heteroskedasticity can reject H0 even if the Var (y |x ) is constant.


▶ For example, suppose we omit a quadratic term that should be
included. Then a test of heteroskedasticity can be significant, because
misspecification affects the residuals.

(DPE, KCL) W9 11 March 2025 12 / 22


Heteroskedasticity

Detecting heteroskedasticity: The Breusch-Pagan Test

▶ Calculate the squared OLS residuals (ûi2 ) from your original equation
and run the following auxiliary regression:

ûi2 = β̂ 0 + β̂ 1 x1i + β̂ 2 x2i + . . . β̂ k xki

▶ Test the following hypotheses:

Ho : Var (u | x1 , x2 , ..., xk ) = σ2 (Homkcedasticity)


Ha : Var (u | x1 , x2 , ..., xk ) ̸= σ2 (Heteroscedasticity)

(DPE, KCL) W9 11 March 2025 13 / 22


Heteroskedasticity

Detecting heteroskedasticity: The Breusch-Pagan Test

ûi2 = β̂ 0 + β̂ 1 x1i + β̂ 2 x2i + . . . β̂ k xki


Ho : Var (u | x1 , x2 , ..., xk ) = σ2 (Homkcedasticity)
Ha : Var (u | x1 , x2 , ..., xk ) ̸= σ2 (Heteroskedasticity)
These hypotheses can tested using the F-statistic on joint significance of all
coefficients.
▶ The statistic F is significant =⇒ Null hypothesis is rejected and
assume that heteroskedasticity exists.

Rû22 /k
F = ∼ F(k,n−k −1)
(1 − Rû22 )/(n − k − 1)

(DPE, KCL) W9 11 March 2025 14 / 22


Heteroskedasticity

BP test cont.

▶ Well, actually....
▶ Typically a slightly different version is used, the Lagrange-multiplier
statistic:

LM = nRû22 ∼ χ2k
▶ This follows a chi-square distribution.
▶ in Stata: hettest

(DPE, KCL) W9 11 March 2025 15 / 22


Heteroskedasticity

(DPE, KCL) W9 11 March 2025 16 / 22


Heteroskedasticity

White test

▶ As in BP test, take square residual and then regress on all variables


plus the squares of these variables.

û 2 = δ0 + δ1 x1 + δ2 x2 + δ3 x3 + δ4 x12 + δ5 x22 + δ6 x32


▶ As for the BP test if the statistic F in this new regression is significant
=⇒ The null hypothesis is rejected and assume that heteroskedasticity
exists.
▶ Or, using LM test:

LM = nRû22 ∼ χ2k

(DPE, KCL) W9 11 March 2025 17 / 22


Heteroskedasticity

Consequences of Heteroskedasticity

▶ OLS will no longer (necessarily) be efficient –


▶ SE will no longer be consistent – leading to the incorrect outcomes of
hypothesis testing.

(DPE, KCL) W9 11 March 2025 18 / 22


Heteroskedasticity

Solutions to Heteroskedasticity

▶ Weighted least squares (we won’t cover this)


▶ Robust standard errors.

(DPE, KCL) W9 11 March 2025 19 / 22


Heteroskedasticity

Heteroskedasticity-robust standard errors

∑ni=1 (xi − x̄ )ui


β1 = β1 +
∑ni=1 (xi − x̄ )2
▶ If Var (ui |xi ) = σ2 we have that
σ2
Var ( β̂ 1 ) =
∑ni=1 (xi − x̄ )2
▶ But if Var (ui |xi ) = σi2 ̸= σ2 we have that
∑ni=1 (xi − x̄ )2 σi2
Var ( β̂ 1 ) =
(∑ni=1 (xi − x̄ )2 )2

The White estimator:


n 2 2
d ( β̂ 1 ) = ∑i =1 (xi − x̄ ) ûi
Var
TSSx2
(DPE, KCL) W9 11 March 2025 20 / 22
Heteroskedasticity

Heteroskedasticity-robust standard errors (multivariate


regression)

∑ni=1 rˆij2 ûi2


d ( βj ) =
Var
SSRj2
where rˆij is the ith residual from regressing xj on all other independent
variables, and SSRj is the sum of squared residuals from this regression.
1. Regress model yi = β 0 + β 1 xi1 + β 2 xi2 + · · · + β k xik + ui and get the
residual ûi .
2. Regress xj on all other independent variables and get the residual rˆij .
The residual rˆij is the part of the xj that is uncorrelated with all the
other independent variables. The sum of squared residuals from this
regression is SSRj .
3. Multiply each i observation of ûi2 with the corresponding observation
of rˆij2 and sum up the products over n.
4. Divide the sum by SSRj2 .
(DPE, KCL) W9 11 March 2025 21 / 22
Heteroskedasticity

In Stata:

▶ tests for heteroskedasticity (after running a regression)


▶ BP test: hettest
▶ White test: imtest
▶ fitted values and residuals:
▶ predict e, residuals ("res is the new variable)
▶ predict yhat ("res is the new variable)

(DPE, KCL) W9 11 March 2025 22 / 22


Review, Understanding Causality and Policy Evaluation
5SSPP213: Econometrics

Dr. Ian Levely

11 February 2025

(DPE, KCL) W5 11 February 2025 1 / 48


Review (plus a few new things!)

Estimating multiple coefficients

▶ Suppose we only have two independent variables, k = 2:

β̂ 0 = ȳ − β̂ 1 x̄1 − β̂ 2 x̄2

V2 Cy 1 − C12 Cy 2
β̂ 1 = 2
V1 V2 − C12

V1 Cy 2 − C12 Cy 1
β̂ 2 = 2
V1 V2 − C12
▶ Where:
▶ Vi is the variance of variable k.
▶ Cyi is the covariance between y and variable k.
▶ C12 is the covariance between variables 1 and 2.

(DPE, KCL) W5 11 February 2025 3 / 48


Review (plus a few new things!)

Omitted variable bias

▶ Recall:
n  
∑ ( xi − x̄ )vi
′  i =1
E [ βˆ1 ] = β 1 + E 


 n 
2
∑ (xi − x̄ )
i =1
▶ When:
1. cov (x1 , x2 ) ̸= 0
2. β 2 ̸= 0
▶ Then the numerator is NOT zero, and we have a biased estimator.
▶ (Also, refer to estimators for multiple regression: bias is proportional
to the cov (x1 , x2)

(DPE, KCL) W5 11 February 2025 4 / 48


Review (plus a few new things!)

OLS as an estimator

▶ β̂ is a random variable, and follows a t-distribution with (n − k − 1)


degrees of freedom
▶ When the sample is randomly drawn from the population:
E (u | x1 , x2 , ..., xk ) = 0

▶ The distribution of u is normally distributed (t-dist.) with zero mean


and variance σ2 (due to CLT).

σ2 : u ∼ N (0, σ2 )

(DPE, KCL) W5 11 February 2025 5 / 48


Review (plus a few new things!)

Standard error of regression

▶ We estimate σ using the residuals, ûi , to calculate the regression


standard error (or root-mean squared error):
s
1
s = σ̂ =
(n − 1 − k ) ∑ ûi2 (1)

▶ With only 1 x variable,σ̂ we can calculate the standard error of the


OLS estimator β̂ 1

s
se ( β̂ 1 ) = p
∑(xi − x̄ )2
▶ We estimate the variance of the error term, ui using the variance of
the residual, ûi (sometimes written as ei ), with and adjustment for the
degrees of freedom.

(DPE, KCL) W5 11 February 2025 6 / 48


Review (plus a few new things!)

Standard errors for multiple regression

▶ With k estimators, the se for β̂ j is:


σ2
se ( β̂j ) = (2)
SSTj (1 − Rj2 )

▶ Where:
▶ SSTj = ∑ni=1 (xij − x̄j ) (The Total Sum of Squares)
▶ Rj2 is the R-square from regressing xj on all other variables:

Xij = α0 + α1 xi1 + α2 xi2 ...αk −1 xi,(k −1) + ui

(DPE, KCL) W5 11 February 2025 7 / 48


Review (plus a few new things!)

Standard error implications

σ2
se ( β̂j ) = (3)
SSTj (1 − Rj2 )

▶ se ( β j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( β j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.
▶ However, note that more variation (relative to covariances) leads to
lower coefficient size.
▶ Adding a variable that’s not needed (i.e., no omitted variable bias) will
increase the variance of coefficients when there’s a positive correlation
with another variable.

(DPE, KCL) W5 11 February 2025 8 / 48


Review (plus a few new things!)

Standard errors for multiple regression (New!)

▶ Note that:
σ2 σ2 1
se ( β̂ j ) = 2
= (4)
SSTj (1 − Rj ) SSTj (1 − Rj ) (1 − Rj2 )
2

1
▶ Where (1−Rj2 )
is the Variance inflation factor. It tells us how
correlation of xj with other control variables in the model increases the
variance (standard error) of β̂ j .

(DPE, KCL) W5 11 February 2025 9 / 48


Review (plus a few new things!)

Goodness of fit

▶ The regression "explains"


variation in dependent variable:
▶ Compare difference between
observations and mean
(yi − ȳ ) with the residuals:
predicted values are closer!

(DPE, KCL) W5 11 February 2025 10 / 48


Review (plus a few new things!)

Goodness of fit

▶ Total sum of squares (n x sample variance)

n
SStot = ∑ (yi − ȳ )2
i −1

▶ Explained variation: the regression sum of squares


n
SSreg = ∑ (ŷi − ȳ )2
i −1

▶ The proportion of variation explained by the regression is:

SSreg
R2 = (5)
SStot

(DPE, KCL) W5 11 February 2025 11 / 48


Review (plus a few new things!)

R-square (cont.)
▶ The unexplained variation in y is the residual sum of squares: (aka
sum of squared errors).
n
SSres = ∑ (ŷi − yi )2
i −1

▶ The R2 can also be defined as:


SSres
R2 = 1 − (6)
SStot
▶ R 2 ranges from 0 to 1.
▶ If R 2 = 0 =⇒ The OLS regression does not explained any variation in

the values of y
▶ If R 2 = 1 =⇒ The OLS regression explains all the variation of y.
▶ For regressions with just one independent variable (bivariate
regression), the r-square is the square of the correlation coefficient.
(DPE, KCL) W5 11 February 2025 12 / 48
Review (plus a few new things!)

Adjusted R-square (New!)


▶ A modified version of R 2 that does not necessarily increase when a
new regressor is added is
n − 1 SSres
R̄ 2 = 1 −
n − k − 1 SStot
where k is the number of variables in the model, n is the number of
observations. R̄ 2 is always less than R 2 .
▶ The factor n−n− 1
k −1 is a penalty for including extra variables: SSR ³,
n −1 2
n−k −1 ↑ (so R̄ can even be negative).
▶ R̄ 2 can be used to compare models that have the same dependent
variable.
▶ Ultimately, the decision to include a variable or not should be based on
whether including it allows us better to estimate the causal effect of
interest.
(DPE, KCL) W5 11 February 2025 13 / 48
Review (plus a few new things!)

We want to know whether one year of tenure is worth one year of univ
log(wages ) = β 0 + β 1 tenure + β 2 univ + β 3 exper + u

H0 : β 1 = β 2 HA : β 1 ̸ = β 2

▶ Approach 1: The test-statistic is

β̂ 1 − β̂ 2 β̂ 1 − β̂ 2
t= =q
se ( β̂ 1 − β̂ 2 ) var ( β̂ 1 ) + var ( β̂ 2 ) − 2cov ( β̂ 1 , β̂ 2 )

(DPE, KCL) W5 11 February 2025 14 / 48


Review (plus a few new things!)

Using re-parametrization to test single linear restriction of


parameters

▶ We want to know whether one year of tenure is worth one year of univ

log(wages ) = β 0 + β 1 dl + β 2 univ + β 3 exper + u

H0 : β 1 = β 2 HA : β 1 ̸ = β 2
▶ Approach 1: The test-statistic is

β̂ 1 − β̂ 2 β̂ 1 − β̂ 2
t= =q
se ( β̂ 1 − β̂ 2 ) var ( β̂ 1 ) + var ( β̂ 2 ) − 2cov ( β̂ 1 , β̂ 2 )

▶ We need to recover the covariance using matrix algebra

(DPE, KCL) W5 11 February 2025 15 / 48


Review (plus a few new things!)

Testing multiple restrictions (aka testing joint hypotheses)

▶ Frequently we wish to test whether a set of restrictions on parameters


is jointly significant
▶ E.g.

log(wages ) = β 0 + β 1 educ + β 2 exper + β 3 exper 2 + β 4 tenure + u

▶ H0 : β 2 = 0, β 3 = 0 HA : At least one of β 2 , β 3 is different from zero.


▶ In Stata just write "test educ = tenure" after running regression.

(DPE, KCL) W5 11 February 2025 16 / 48


Review (plus a few new things!)

F-statistic

▶ Intuition: The SSRR of the restricted model is larger because we’re


constraining parameters if SSRR and SSRUR are about the same, it
would mean that the restrictions hold.
▶ Is the relative increase in SSR in the restricted model "large enough"
to warrant a rejection of H0 ? he test is:

(SSRR − SSRUR )/q


F = ∼ Fq,n−k −1
SSRUR /(n − k − 1)
▶ where q is the number of restrictions imposed
▶ F is always nonnegative because SSRR g SSRUR
▶ Fix Type I error and find the critical value (depends on q and n-k-1)
▶ Rejection rule: reject H0 in favour of HA
(at the chosen level of significance) if F > c

(DPE, KCL) W5 11 February 2025 17 / 48


Review (plus a few new things!)

(DPE, KCL) W5 11 February 2025 18 / 48


Review (plus a few new things!)

The R-squared form of the F-statistic

▶ SSR depends on the unit of measurement so it can be very large


▶ Recall that SSR = (1 − R 2 )TSS
▶ An alternative formula for the F-statistic is
2 − R 2 ) /q
(RUR R
F = 2 ) / (n − k − 1)
∼ Fq,n−k −1
(1 − RUR
▶ 2 g R 2 (note: the R 2 comes
F is always nonnegative because RUR R UR
first)
▶ NOTE: The R-squared form of the F-statistic can only be used if the
restricted and unrestricted models have the same dependent variable.

(DPE, KCL) W5 11 February 2025 19 / 48


Review (plus a few new things!)

The F test for overall significance of a regression

▶ This is the test that all regressors do not help explain y, i.e. the joint
exclusion of all independent variables.

H0 : β 1 = β 2 = β 3 = ... = β k = 0
2 /q
RUR
Fall ≡ 2 ) / (n − k − 1)
∼ Fq,n−k −1
(1 − RUR
▶ This is because under the H0 , the RR2 = 0
▶ We reject the H0 if Fall > cF .
▶ In other words, if the p-value of F < 0.05( < 0.01, < 0.10, depending
on the confidence level we wish).
▶ If fail to reject, then there is no evidence that any of the independent
variables help explain y.

(DPE, KCL) W5 11 February 2025 20 / 48


Making Causal Inferences

Reverse causality

(DPE, KCL) W5 11 February 2025 22 / 48


Making Causal Inferences

Omitted variables

(DPE, KCL) W5 11 February 2025 23 / 48


Making Causal Inferences

Endogeneity

1. Omitted variable bias: another (uncontrolled) variable is affecting both


the treatment and outcome.
▶ This violates G-M assumption that error terms have zero conditional
mean: E [ui |X ] = 0.
2. Reverse causality: ’outcome’ causes the treatment.
3. Simultaneity: variables in the model cause each other, which means
it’s impossible to separate the effect one from the other.

(DPE, KCL) W5 11 February 2025 24 / 48


Making Causal Inferences

OVB

▶ If the Gauss-Markov conditions are met, the estimator β̂ 1 is unbiased.


This means that

E ( β̂ 1 ) = β 1 when N −→ ∞ and Cov (Xi , ui ) = 0

(DPE, KCL) W5 11 February 2025 25 / 48


Making Causal Inferences

Correlation between education and income:

▶ Do any of these apply here? Or can we interpret the correlation as


causal?
▶ Individuals self-select into education, based on:
▶ Ability, hi
▶ Socio-economic background, wi

incomei = β 0 + β 1 educi + β 2 agei + β 3 agei2 + ui

ui = δ1 hi + δ2 + wi + vi

▶ When:
▶ δ1 ̸= 0
▶ cov (hi , educationi ) ̸= 0,

(DPE, KCL) W5 11 February 2025 26 / 48


Making Causal Inferences

Correlation between education and income:

▶ What about adding control variables for standardized test scores (as a
proxy for ability) and parents’ income as a proxy for socio-economic
background.

inci = β 0 + β 1 educi + β 2 agei + β 3 agei2 + β 3 hi + β 4 wi + ui

What is this solving?

(DPE, KCL) W5 11 February 2025 27 / 48


Making Causal Inferences

Simultaneity

inci = β 0 + β 1 educi + β 2 agei + β 3 agei2 + β 3 hi + β 4 wi + ui

▶ How do hi , wi , educi relate to one another?


▶ Plausibly:
educi = γ0 + γ1 hi + γ2 wi + vi ,

▶ This means that our estimate of β̂ 1 will measure the effect of both
education itself in addition to the effect of hi and wi , and thus, our
model is not specified correctly.

(DPE, KCL) W5 11 February 2025 28 / 48


Making Causal Inferences

Simultaneous equations — reverse causality

yi = β 0 + β 1 x i + u i

▶ and at the same time:


xi = γ0 + γ1 yi + vi

yi = β 0 + γ1 yi + vi
▶ ui will be correlated with xi

(DPE, KCL) W5 11 February 2025 29 / 48


Making Causal Inferences

The "before/after" comparison

▶ What if we compare incomes before/after completing some sort of


education? (e.g. MBA, vocational re-training, etc...)?
▶ We can therefore eliminate the individual fixed effects: the
characteristics that don’t change over time.
▶ Does this solve the self-selection problem?

(DPE, KCL) W5 11 February 2025 30 / 48


Making Causal Inferences

Before/after comparison

(DPE, KCL) W5 11 February 2025 31 / 48


Making Causal Inferences

Before/after comparison

(DPE, KCL) W5 11 February 2025 32 / 48


Making Causal Inferences

Before/after comparison

(DPE, KCL) W5 11 February 2025 33 / 48


Making Causal Inferences

Exogenous assignment to treatment: the gold standard

(DPE, KCL) W5 11 February 2025 34 / 48


Making Causal Inferences

Duflo 2001 (AER):

▶ Schools built as part of


government project across
Indonesia.
▶ The time of construction was
exogenous – uncorrelated with
individual children’s
characteritics.
▶ 2 groups of kids by age: those
who were too old to benefit (12
+ in 1974) and those who did
benefit.
▶ 2 groups of kids by location:
those in areas with high/low
number of new school.
(DPE, KCL) W5 11 February 2025 35 / 48
Making Causal Inferences

(DPE, KCL) W5 11 February 2025 36 / 48


Making Causal Inferences

(DPE, KCL) W5 11 February 2025 37 / 48


Making Causal Inferences

(DPE, KCL) W5 11 February 2025 38 / 48


Making Causal Inferences

(DPE, KCL) W5 11 February 2025 39 / 48


Making Causal Inferences

External vs internal validity

1. External validity: The inferences generated in your model can be


extended from your population and settings of study to other
populations with different settings.

2. Internal validity: The causal claims inferred from your model are
truly valid for the population under study.

(DPE, KCL) W5 11 February 2025 40 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 41 / 48


Making Causal Inferences

The difference-in-differences method

(DPE, KCL) W5 11 February 2025 42 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 43 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 44 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 45 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 46 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 47 / 48


Making Causal Inferences

Diff-in-diff

(DPE, KCL) W5 11 February 2025 48 / 48


Inference in OLS
5SSPP213: Econometrics

Dr. Ian Levely

4 February 2025

(DPE, KCL) W4 4 February 2025 1 / 11


OLS variance and Hypothesis testing

Outline

1. OLS variance and Hypothesis testing

(DPE, KCL) W4 4 February 2025 2 / 11


OLS variance and Hypothesis testing

OLS as an estimator

▶ In econometrics, we deal with sample data, which we use to make


inferences about a population (i.e., data-generating process).

(DPE, KCL) W4 4 February 2025 3 / 11


OLS variance and Hypothesis testing

OLS as an estimator

▶ In econometrics, we deal with sample data, which we use to make


inferences about a population (i.e., data-generating process).
▶ β̂ is a random variable, since it will be slightly different for each
sample of n.

(DPE, KCL) W4 4 February 2025 3 / 11


OLS variance and Hypothesis testing

OLS as an estimator

▶ In econometrics, we deal with sample data, which we use to make


inferences about a population (i.e., data-generating process).
▶ β̂ is a random variable, since it will be slightly different for each
sample of n.
▶ β k follows a t-distribution with (n − k − 1) degrees of freedom

(DPE, KCL) W4 4 February 2025 3 / 11


OLS variance and Hypothesis testing

(DPE, KCL) W4 4 February 2025 4 / 11


OLS variance and Hypothesis testing

Normality of the error term.

▶ When n is a random draw from the populaiton and:


E (u | x1 , x2 , ..., xk ) = 0

▶ The distribution of u is normally distributed (t-dist.) with zero mean


and variance σ2 :
σ2 : u ∼ N (0, σ2 )
▶ This is due to the central limit theorem.

(DPE, KCL) W4 4 February 2025 5 / 11


OLS variance and Hypothesis testing

Standard error of regression

▶ We use, then, the residuals, ûi , to calculate the regression standard


error (RSE):
s
1
RSE =
(n − 2) ∑ ûi2 (1)

▶ Using s we can calculate the standard error of the OLS estimator β̂ 1

s
se ( β̂ 1 ) = p
∑(xi − x̄ )2

(DPE, KCL) W4 4 February 2025 6 / 11


OLS variance and Hypothesis testing

Standard error of regression

▶ We use, then, the residuals, ûi , to calculate the regression standard


error (RSE):
s
1
RSE =
(n − 2) ∑ ûi2 (1)

▶ Using s we can calculate the standard error of the OLS estimator β̂ 1

s
se ( β̂ 1 ) = p
∑(xi − x̄ )2
β̂ − β
t=
se β̂

(DPE, KCL) W4 4 February 2025 6 / 11


OLS variance and Hypothesis testing

Standard errors for multiple regression

▶ With k estimators, the se for β̂ j is:


σ2
se ( β̂j = (2)
SSTj (1 − Rj2 )

▶ Where:
▶ SSTj = ∑ni=1 (xi j − x̄j )
▶ Rj2 is the R-square from regressing xj on all other variables:
Xij = α0 + α1 xi1 + α2 xi2 ...αk −1 xi,(k −1) + ui

(DPE, KCL) W4 4 February 2025 7 / 11


OLS variance and Hypothesis testing

Understanding the SE

▶ Why does the sampling distribution of β̂ j depend on the distribution


of the error term?
▶ Recall taht:
∑i (Xi − X̄ )ui
β̂ j = β j +
∑i (Xi − X̄ )2

(DPE, KCL) W4 4 February 2025 8 / 11


OLS variance and Hypothesis testing

Standard error implications

σ2
se ( β̂j ) = (3)
SSTj (1 − Rj2 )

▶ se ( β j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.

(DPE, KCL) W4 4 February 2025 9 / 11


OLS variance and Hypothesis testing

Standard error implications

σ2
se ( β̂j ) = (3)
SSTj (1 − Rj2 )

▶ se ( β j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( β j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.

(DPE, KCL) W4 4 February 2025 9 / 11


OLS variance and Hypothesis testing

Standard error implications

σ2
se ( β̂j ) = (3)
SSTj (1 − Rj2 )

▶ se ( β j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( β j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.
▶ However, note that more variation (relative to covariances) leads to
lower coefficient size.

(DPE, KCL) W4 4 February 2025 9 / 11


OLS variance and Hypothesis testing

Standard error implications

σ2
se ( β̂j ) = (3)
SSTj (1 − Rj2 )

▶ se ( β j ) is increasing as other variables are more closely related to xj ,


while the coefficient size will decrease.
▶ se ( β j ) is decreasing with SSTj – i.e., more variation in the variable
leads to lower se.
▶ However, note that more variation (relative to covariances) leads to
lower coefficient size.
▶ Adding a variable that’s not needed (i.e., no omitted variable bias) will
increase the variance of coefficients when there’s a positive correlation
with another variable.

(DPE, KCL) W4 4 February 2025 9 / 11


OLS variance and Hypothesis testing

The OLS coefficients and hypothesis testing

▶ Just like x̄ follows a t-distribition, β̂ is a random variable that also


follows a t-distribution.
▶ In most cases, we’d test:

H0 = β 1 = 0

HA = β 1 ̸ = 0
▶ When H0 is β = 0, then:

β̂
t=
se β̂

(DPE, KCL) W4 4 February 2025 10 / 11


OLS variance and Hypothesis testing

The t statistic

▶ Remember that population parameters (like σ) are unknown. So σ is


estimated using the regresion standard error, s, which can, then, be
used to calculate se ( β̂ j ).
▶ Under CLM assumptions it can be assumed the following distribution:

( β̂ j − β j )
∼ tn − k − 1
se ( β̂ j )
▶ The t distribution has has df = n − k − 1, and k indicates the
number of slopes being estimated.

(DPE, KCL) W4 4 February 2025 11 / 11


Dummy variables and interaction terms
5SSPP213: Econometrics

Dr. Ian Levely

25 February 2025

(DPE, KCL) W7 25 February 2025 1 / 22


Types of variables

▶ Two(three) types of variables:


▶ Numeric
▶ Categorical
▶ Ordered
▶ Un-ordered

(DPE, KCL) W7 25 February 2025 2 / 22


Qualitative variables

▶ Many qualitative information are measured using discrete variables,


that is variables that take on a limited number of values
▶ Binary variables:
▶ Yes or No
▶ X > 1 or X g a
▶ Categorical variables, e.g. marital status = single, married, divorced...
▶ Ordinal variables, e.g. credit ratings, value scales (likert scales)
▶ We describe discrete variables using histograms, frequencies and
percentages

(DPE, KCL) W7 25 February 2025 3 / 22


A single dummy independent variable
▶ Application to gender pay gap: do wages differ between men and
women?
▶ Suppose we have only women and men in the sample.
▶ We can classify a person’s gender by the values 1=female, 0=male (or
vice versa, it is arbitrary)
wagesi = β 0 + δ0 femalei + β 1 educi + ui
▶ delta0 is the coefficient on the dummy variable female

E [wage |female = 1, educ ] − E [wage |female = 0, educ ] = δ0


▶ δ0 is the difference in average wage between men and women with the
same level of education. In other words, it is the extra wage gain/loss
on average if the person is a woman rather than a man (holding
education fixed). The intercept for men is β 0 . The intercept for
women is β 0 + δ0 .
(DPE, KCL) W7 25 February 2025 4 / 22
A graphical interpretation

(DPE, KCL) W7 25 February 2025 5 / 22


Dummy variable trap

▶ Why don’t we include a dummy variable=1 for men and a dummy


variable=1 for women?
▶ We only need two different intercepts:

wagesi = β 0 (1) + δ0 femalei + δ1 malei + β 1 educi + ui (1)


▶ Model (1) cannot be estimated because of perfect multicollinearity:
for every row in the data female + male = 1 which is collinear with
the constant
▶ The coefficients of dummy variables are always interpreted in
comparison to the excluded category (‘dummy’=0, base group)

(DPE, KCL) W7 25 February 2025 6 / 22


Creating dummy variables in Stata (Example)

▶ Sometimes we want to create dummy variables based on the contents


of another variable
▶ Say we want to predict demand for different ice creams and we have a
variable coding flavour , where 1=strawberry , 2=mango, 3=banana,
4=almond, and 5=hazelnut
▶ What if we wanted to create a dummy equal to 1 for the fruit flavours
and zero for the nut flavours?
▶ Then we can type generate fruit=0 to create a new variable and fill
it with zeroes for all observations
▶ Then we type replace fruit=1 if flavour<=3
▶ Alternately replace fruit=1 if flavour==1 | flavour==2 |
flavour==3

(DPE, KCL) W7 25 February 2025 7 / 22


Another example: Explaining Human Development Index
Ethnicity and Human Development Index
1
.9
.8
.7
.6
HDI
.5
.4
.3
.2
.1

Correlation=−0.5725
0

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Ethnic fragmentation
Source: Pollock (2010)

(DPE, KCL) W7 25 February 2025 8 / 22


An example: Explaining Human Development Index
1. What accounts for variations of
HDI across countries? One could think that political regime and
ethnicity could be among key factors explaining this relationship

HDI = β̂ 0 + δ̂1 Democracy + β̂ 1 Ethnicity + u

2. The model above contains two independent variables:


2.1 Democracy refers to whether a country is a democracy or not. It takes
the value 1, for democratic regimes. Our hypothesis states that HDI
will be higher in democracies than in dictatorships.
2.2 Ethnicity is our usual variable on ethnicity. It is a continuous variable
where large values indicates more heterogeneity. Our hypothesis states
that the higher the ethnic fragmentation, the smaller the level of HDI.
3. Notation: We will denote the effect of dummy variables using δ̂ and
we will reserve β̂ to indicate how continuous independent variables
affect the dependent variable.
(DPE, KCL) W7 25 February 2025 9 / 22
Interpreting the model
1. We need to consider the different values that our model takes given
the two different values that the variable Democracy has.
2. If Democracy=0, i.e, regimes are "Autocracies", then we can write the
model as follows:
HDI = β̂ 0 + (δ̂1 ∗ 0) + β̂ 1 Ethnicity + u =
= β̂ 0 + β̂ 1 Ethnicity + u
3. If Democracy=1, i.e, regimes are "Democracies", then we can write
the model as follows:
HDI = β̂ 0 + (δ̂1 ∗ 1) + β̂ 1 Ethnicity + u =
= ( β̂ 0 + δ̂1 ) + β̂ 1 Ethnicity + u
4. The value δ1 indicates the difference in HDI between a democracy and
a dictatorship given the same level of ethnicity. That shift is reflected
in the intercept but not on the slope of the curve.
(DPE, KCL) W7 25 February 2025 10 / 22
Visual representation

(DPE, KCL) W7 25 February 2025 11 / 22


Different intercepts

1. If δ̂1 > 0
1.1 In this case the new intercept will be ^˛0 +δ̂1 and the regression line will
increase along the y axis exactly by δ̂1
2. If δ̂1 < 0
2.1 In this case, the new intercept will be ^˛0 −δ̂1 and the regression line will
decrease along the y axis exactly by δ̂1
3. Another way to define δ̂1 is to look at the expected value of the
regression given the different values of the dummy variable:

E (HDI | Democracy = 1) − E (HDI | Democracy = 0), ceteris paribus

(DPE, KCL) W7 25 February 2025 12 / 22


Dummy variables: inference
1. Once a regression containing a dummy variable is run, we are
interested in testing the hypothesis:
H0 : δ1 = 0
Ha : δ1 ̸= 0
2. To conduct the test we use the usual t-test:
δ̂1
t=
se (δ̂1 )
3. If we reject Ho : δ1 = 0, then we can test the case for:
H0 : β̂ 0 + δ1 = 0
Ha : β̂ 0 + δ1 ̸= 0
4. If we cannot reject Ho : δ1 = 0, then we use the significance of the
intercept to test the effect of the dummy variable when it takes the
value 0, ceteris paribus.
(DPE, KCL) W7 25 February 2025 13 / 22
Dummy variables and categorical variables

▶ Sometimes, we need to include categorical variables in our model.


Suppose:

HDI = β̂ 0 + δ̂1 Democracy + β̂ 1 Ethnicity + β̂ 2 ElectoralSystem + u

where the variable ElectoralSystem has, for example, 3 categories: 1


indicates that SMD are used; 2 indicates Multi-member districts and 3
refers to multi-tier electoral systems
▶ In situations like this, it is interesting to see the effect of SMD, for
example, on HDI.

(DPE, KCL) W7 25 February 2025 14 / 22


Dealing with categorical variables

1. So given our model and our interest in seeing the effect of SMD on
HDI:

HDI = β̂ 0 + δ̂1 Democracy + β̂ 1 Ethnicity + β̂ 2 ElectoralSystem + u

2. How do you deal with categorical variables?


2.1 You can decompose such variable in a set of dummy variables where 1
refers to a particular type of electoral system and 0 to the rest. By
doing so, you can test the effect of such type of electoral rules:

HDI = β̂ 0 + δ̂1 Democracy + β̂ 1 Ethnicity + δ̂2 MMD + δ̂3 Multi + u

(DPE, KCL) W7 25 February 2025 15 / 22


Interactions

Interactions with two dummy variables

▶ HDI example with a new dummy variable, Rural. The variable takes
value 1 if the majority of a population in a country lives in rural areas
and 0, otherwise.

HDI = β̂ 0 + δ̂1 Democracy + δ̂2 Rural + β̂ 1 Ethnicity + u

▶ theory suggests HDI levels are explained by the combined effect of


political regime and rural economy. So, our new model should look
like this one

HDI = β̂ 0 + δ̂1 Democracy + δ̂2 Rural + δ̂3 Democracy ∗ Rural +


+ β̂ 1 Ethnicity + u

(DPE, KCL) W7 25 February 2025 17 / 22


Interactions

Interpreting interactions with two dummy variables


Since we are interacting two dummy variables, what we are doing, in
reality, is estimating the effect of such groups on HDI. We are calculating
different intercepts holding Ethnicity constant.
1. If Democracy = 1 and Rural = 1
E (HDI | Democracy = 1, Rural = 1) = β̂ 0 + δ̂1 + δ̂2 + δ̂3
2. If Democracy = 1 and Rural = 0
E (HDI | Democracy = 1, Rural = 0) = β̂ 0 + δ̂1
3. If Democracy = 0 and Rural = 1
E (HDI | Democracy = 0, Rural = 1) = β̂ 0 + δ̂2
4. If Democracy = 0 and Rural = 0
E (HDI | Democracy = 0, Rural = 0) = β̂ 0
(DPE, KCL) W7 25 February 2025 18 / 22
Interactions

Shifting intercepts

Using data from Pollock (2010) we can estimate the coefficients using the
procedure described in the previous slide. A regression model containing an
interaction of two dummy variables generate four different intercepts:
Rural Urban
Democracy 0.76 0.88
Dictatorship 0.52 0.75

(DPE, KCL) W7 25 February 2025 19 / 22


Interactions

.9
.8
.7
HDI
.6
.5
.4

0 .2 .4 .6 .8 1
Ethnicity

Rural Democracy Urban Democracy


Rural Dictatorship Urban Dictatorship

(DPE, KCL) W7 25 February 2025 20 / 22


Interactions

(DPE, KCL) W7 25 February 2025 21 / 22


Interactions

Interaction terms

▶ The model in the previous slide has two intercepts and two slopes:

ln (wages )i = ( β 0 + δ0 femalei ) + ( β 1 + δ1 femalei )educi + ... + ui

▶ intercept for male: β 0 ; intercept for female: β 0 + δ0


▶ slope for male: β 1 ; slope for female: β 1 + δ1
▶ Are the returns to education the same for men and women?
▶ H0 : δ1 = 0 vs. HA : δ1 ̸= 0 (t-test)
▶ How do we test if the whole wage equation is the same for men and
women?
H0 : δ0 = δ1 = 0(F − test )

(DPE, KCL) W7 25 February 2025 22 / 22


W2: OLS Basics
5SSPP213: Econometrics

Dr. Ian Levely

16 September 2024

(DPE, KCL) W2 16 September 2024 1 / 46


Plan

1. Review: covariance/correlation
2. OLS!
▶ Simple OlS
▶ OLS assumptions
▶ Hypothesis testing
3. Second part:
▶ ID strategies
4. Third part:
▶ Working with regressions

(DPE, KCL) W2 16 September 2024 2 / 46


A scatter plot of Poverty and Life expectancy

(DPE, KCL) W2 16 September 2024 3 / 46


▶ What does the Covariance describe?

(DPE, KCL) W2 16 September 2024 4 / 46


▶ What does the Covariance describe? What units is it in?

(DPE, KCL) W2 16 September 2024 4 / 46


▶ What does the Covariance describe? What units is it in?
▶ What does the correlation coefficient tell us? What units is it in?

(DPE, KCL) W2 16 September 2024 4 / 46


The covariance and correlation
▶ The covariance:

n
1
Covxy = ∑
n − 1 i =1
(xi − x̄ )(yi − ȳ )

▶ The values of the covariance can be positive or negative.


▶ Units?

(DPE, KCL) W2 16 September 2024 5 / 46


The covariance and correlation
▶ The covariance:

n
1
Covxy = ∑
n − 1 i =1
(xi − x̄ )(yi − ȳ )

▶ The values of the covariance can be positive or negative.


▶ Units? . . . x units * y units.
▶ The correlation coefficient:
▶ To solve this, we can calculate the (sample) correlation coefficient,
(also called Peason’s r ).

cov (x, y )
rxy =
sx sy
cov (xy )
▶ For the population: ρ = Ãx Ãy
▶ Unit free – always between(-1,1)
(DPE, KCL) W2 16 September 2024 5 / 46
Ordinary Least Squares: regression analysis

Outline

1. Ordinary Least Squares: regression analysis

2. OLS assumptions

(DPE, KCL) W2 16 September 2024 6 / 46


Ordinary Least Squares: regression analysis

OLS

▶ What if we want to know how a change in the poverty rate affects life
expectancy?

(DPE, KCL) W2 16 September 2024 7 / 46


Ordinary Least Squares: regression analysis

OLS

▶ What if we want to know how a change in the poverty rate affects life
expectancy?
▶ Neither the covariance or correlation coffecient can give us that
information.

(DPE, KCL) W2 16 September 2024 7 / 46


Ordinary Least Squares: regression analysis

OLS

▶ What if we want to know how a change in the poverty rate affects life
expectancy?
▶ Neither the covariance or correlation coffecient can give us that
information.
▶ If we write an equation describing the relationship, it might look
something like:
Lifeexpectancy = ´ 0 + ´ 1 Poverty

(DPE, KCL) W2 16 September 2024 7 / 46


Ordinary Least Squares: regression analysis

A scatter plot of Poverty and Life expectancy

(DPE, KCL) W2 16 September 2024 8 / 46


Ordinary Least Squares: regression analysis

Summarizing the relationship

▶ n = 123
▶ Life expectancy at birth:
▶ x̄ = 72.20; s = 7.34.

▶ Extreme poverty rate:


▶ x̄ = 13.91; s = 20.31.

▶ Cov: -112.44
▶ ρ = −0.75

(DPE, KCL) W2 16 September 2024 9 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The slope of a curve: The basic idea

(DPE, KCL) W2 16 September 2024 10 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The regression equation

▶ We first model the relationship between the variables:


Lifeexpectancy = ´ 0 + ´ 1 Povertyrate
▶ ´ 0 indicates life expectancy when (if) poverty rate = 0,
▶ ´ 1 is the slope of the line – it tells us how a one unit increase in
poverty would increase life expectancy.
▶ Remember:
▶ There are population parameters! We will use sample data to estimate
them.
▶ What assumptions are we making? (Much more on this later!)

(DPE, KCL) W2 16 September 2024 11 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

A scatter plot of Poverty and Life expectancy

(DPE, KCL) W2 16 September 2024 12 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The regression equation

▶ We want to use our sample data to estimate the relationship between


these variables:
Lifeexpectancyi = ´ 0 + ´ 1 Rateofpovertyi + ui

▶ ´ 0 indicates life expectancy when (if) the rate of extreme poverty = 0,

▶ ´ 1 is the slope of the line – it tells us how a one unit the poverty rate
would increase life expectancy.
▶ Because the relationship is not perfect, we allow for an error term, ui ;
▶ We also add the subscript "i" to show what the units of observation
are.

(DPE, KCL) W2 16 September 2024 13 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The regression equation

▶ In general, a regression equation looks like this:


yi = ´ 0 + ´ 1 x i + u i
where:
▶ y is the dependent variable
▶ x is the independent variable
▶ u is the error term – it accounts for the fact that other factors affect
y.

(DPE, KCL) W2 16 September 2024 14 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The error term:

▶ The error term u in the regression model is important to carefully


consider:
▶ We assume that u is on average equal to zero.
▶ We assume that u and x are not correlated.
E (u | x ) = 0
and this implies
E (y | x ) = ´ 0 + ´ 1 x

▶ In words, E (y | x ) means "the expected value of y given x, and the


linear function means that the this changes with x.

(DPE, KCL) W2 16 September 2024 15 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Estimating OLS

▶ Just as we never observe µ, we never observe the ´s.


▶ We observe (x,y) for a sample size of n.
▶ We then come up with estimates of each ´, which we call ´ˆ
▶ Given this random sample, for each observation we will be able to
estimate the following equation (note the hat over the ´ s!)
yi = ´ˆ 0 + ´ˆ 1 xi + ûi

(DPE, KCL) W2 16 September 2024 16 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The OLS: residuals

▶ A residual, ûi , is an estimator of the population error term, u.


ûi = yi − ŷi
ûi = yi − ´ˆ 0 + ´ˆ 1 xi


(DPE, KCL) W2 16 September 2024 17 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The OLS: residuals

▶ A residual, ûi , is an estimator of the population error term, u.


ûi = yi − ŷi
ûi = yi − ´ˆ 0 + ´ˆ 1 xi


▶ Note:
▶ Sometimes the notation is ei
▶ What’s the difference between the residual and error term, ui ?

(DPE, KCL) W2 16 September 2024 17 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The OLS: residuals

▶ A residual, ûi , is an estimator of the population error term, u.


ûi = yi − ŷi
ûi = yi − ´ˆ 0 + ´ˆ 1 xi


▶ Note:
▶ Sometimes the notation is ei
▶ What’s the difference between the residual and error term, ui ?

▶ Error term is in the population, we assume it is zero in the regression


model; the residual is the difference between OLS estimates and sample
data, it’s equal to zero by construction.

(DPE, KCL) W2 16 September 2024 17 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The residuals

(DPE, KCL) W2 16 September 2024 18 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Least squares:

▶ First approach: minimize squared residuals (thus ordinary "least


squares").

n
∑ ûi2
i =1
n  2
∑ yi − ( ´ˆ 0 + ´ˆ 1 xi )
i =1

(DPE, KCL) W2 16 September 2024 19 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving OLS estimators

n  2
min ∑ yi − ( ´ˆ 0 + ´ˆ 1 xi ) (1)
i =1

▶ We take the F.O.C.s of (1) wrt ´ˆ 0 and ´ˆ 1 and set them to zero:
▶ We’ll start with ´ 0
" #
n

0=
∂´ 0 ∑ (yi − ´ˆ 0 − ´ˆ 1 xi )2
i =1
!
n
0=2 ∑ (yi − ´ˆ 0 − ´ˆ 1 xi )
i =1
n
0= ∑ (yi − ´ˆ 0 − ´ˆ 1 xi ) (2)
i =1

(DPE, KCL) W2 16 September 2024 20 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving OLS estimators

▶ Now wrt ´ 1 :
" #
n

0=
∂´ 1 ∑ (yi − ´ˆ 0 − ´ˆ 1 xi ) 2
i =1
!
n
0=2 ∑ xi (yi − ´ˆ 0 − ´ˆ 1 xi )xi
i =1
!
n
0= ∑ xi (yi − ´ˆ 0 − ´ˆ 1 xi )
i =1
summation properties

(DPE, KCL) W2 16 September 2024 21 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The OLS "normal" equations

n
0= ∑ (yi − ´ˆ 0 − ´ˆ 1 xi ) (2)
i =1
!
n
0= ∑ xi (yi − ´ˆ 0 − ´ˆ 1 xi ) (3)
i =1

(DPE, KCL) W2 16 September 2024 22 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

▶ From (2):
n
0= ∑ (yi − ´ˆ 0 − ´ˆ 1 xi )
i =1

(DPE, KCL) W2 16 September 2024 23 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

▶ From (2):
n
0= ∑ (yi − ´ˆ 0 − ´ˆ 1 xi )
i =1
n n n
0= ∑ yi − ∑ ´ˆ 0 − ∑ ´ˆ 1 xi
i =1 i =1 i =1

(DPE, KCL) W2 16 September 2024 23 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

▶ From (2):
n
0= ∑ (yi − ´ˆ 0 − ´ˆ 1 xi )
i =1
n n n
0= ∑ yi − ∑ ´ˆ 0 − ∑ ´ˆ 1 xi
i =1 i =1 i =1

n ´ˆ0 = nȳ − n ´ˆ1 x̄

(DPE, KCL) W2 16 September 2024 23 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

▶ From (2):
n
0= ∑ (yi − ´ˆ 0 − ´ˆ 1 xi )
i =1
n n n
0= ∑ yi − ∑ ´ˆ 0 − ∑ ´ˆ 1 xi
i =1 i =1 i =1

n ´ˆ0 = nȳ − n ´ˆ1 x̄

´ˆ 0 = ȳ − ´ˆ 1 x̄ (4)

(DPE, KCL) W2 16 September 2024 23 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

´ˆ 0 = ȳ − ´ˆ 1 x̄ (4)

▶ Note: this shows that the sum of residuals is 0.

(DPE, KCL) W2 16 September 2024 24 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

´ˆ 0 = ȳ − ´ˆ 1 x̄ (4)

▶ Note: this shows that the sum of residuals is 0.

(DPE, KCL) W2 16 September 2024 24 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Deriving the intercept estimator

´ˆ 0 = ȳ − ´ˆ 1 x̄ (4)

▶ Note: this shows that the sum of residuals is 0.


▶ Note: this tells us that the OLS line will intersect the point
(x̄, ȳ )

(DPE, KCL) W2 16 September 2024 24 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

▶ Plugging in (4) to (3) we get the estimator of the slope, ´ 1 ,


!
n
0= ∑ xi (yi − ´ˆ 0 − ´ˆ 1 xi )xi (3)
i =1

(DPE, KCL) W2 16 September 2024 25 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

▶ Plugging in (4) to (3) we get the estimator of the slope, ´ 1 ,


!
n
0= ∑ xi (yi − ´ˆ 0 − ´ˆ 1 xi )xi (3)
i =1
!
n
0= ∑ xi (yi − (ȳ − x̄ ´ˆ1 ) − ´ˆ 1 xi )
i =1

(DPE, KCL) W2 16 September 2024 25 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

▶ Plugging in (4) to (3) we get the estimator of the slope, ´ 1 ,


!
n
0= ∑ xi (yi − ´ˆ 0 − ´ˆ 1 xi )xi (3)
i =1
!
n
0= ∑ xi (yi − (ȳ − x̄ ´ˆ1 ) − ´ˆ 1 xi )
i =1
n
∑ xi yi − ȳ xi + xi x̄ ´ˆ1 − ´ˆ 1 xi2

0=
i =1

(DPE, KCL) W2 16 September 2024 25 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

n
∑ xi yi − ȳ xi + xi x̄ ´ˆ1 − ´ˆ 1 xi2

0=
i =1

(DPE, KCL) W2 16 September 2024 26 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

n
∑ xi yi − ȳ xi + xi x̄ ´ˆ1 − ´ˆ 1 xi2

0=
i =1

n
∑ xi (yi − ȳ ) − ´ˆ1 xi (xi − x̄ )

0=
i =1

(DPE, KCL) W2 16 September 2024 26 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

n
∑ xi yi − ȳ xi + xi x̄ ´ˆ1 − ´ˆ 1 xi2

0=
i =1

n
∑ xi (yi − ȳ ) − ´ˆ1 xi (xi − x̄ )

0=
i =1
!
n n
0= ∑ xi (yi − ȳ ) − ´ˆ1 ∑ xi (xi − x̄ )
i =1 i =1

(DPE, KCL) W2 16 September 2024 26 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

n
∑ xi yi − ȳ xi + xi x̄ ´ˆ1 − ´ˆ 1 xi2

0=
i =1

n
∑ xi (yi − ȳ ) − ´ˆ1 xi (xi − x̄ )

0=
i =1
!
n n
0= ∑ xi (yi − ȳ ) − ´ˆ1 ∑ xi (xi − x̄ )
i =1 i =1
n
∑ (xi − x̄ )(yi − ȳ )
i =1
n
∑ (xi − x̄ )2
i =1

(DPE, KCL) W2 16 September 2024 26 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

The intercept and the slope of the regression using OLS

n
∑ (xi − x̄ )(yi − ȳ )
i =1
n
∑ (xi − x̄ )2
i =1

▶ which can also be written as:

Covxy
´ˆ 1 =
Varx

(DPE, KCL) W2 16 September 2024 27 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Some nice properties of the OLS estimators

▶ The sum of the OLS residuals is 0


▶ The covariance between x and the OLS residuals is also, 0
▶ The OLS regression line always crosses the point (x̄, ȳ ). It always
passes by the sample mean values.

(DPE, KCL) W2 16 September 2024 28 / 46


Ordinary Least Squares: regression analysis Deriving OLS estimators

Poverty and Life expectancy: means and regression

(DPE, KCL) W2 16 September 2024 29 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

What are the regression coefficients indicating?

The estimated OLS regression equation is

Lifeexpectancyi = ´ 0 + ´ 1 Rateofpovertyi + ui

where:
▶ We estimate the coefficients based on the sample data. The results
are:

▶ ´ˆ1 = −0.23 =⇒ This is the slope of the curve - it means that for
each additional percentage of people living in extreme poverty, life
expectancy in the country goes down by 0.23 years.
▶ ´ˆ0 = 73.98 =⇒ This is the intercept, it tells average life expectancy
when the poverty rate is zero.

(DPE, KCL) W2 16 September 2024 30 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

Some facts about OLS regression

▶ The distinction between the dependent and the independent variables


is essential in regression.
▶ Reversing the variables produces a different regression line,
importantly with different standard errors
▶ OLS estimators are sensitive to influential observations like outliers.

(DPE, KCL) W2 16 September 2024 31 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

Covariance, correlation and the regression slope

▶ The slope of the regression line is closely connected to the correlation


of the two variables (rxy ) =⇒They both have the same sign.

(DPE, KCL) W2 16 September 2024 32 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

Covariance, correlation and the regression slope

▶ The slope of the regression line is closely connected to the correlation


of the two variables (rxy ) =⇒They both have the same sign.
▶ To see this, decompose the formula to estimate ´ˆ 1 as follows:
n
∑ (xi − x̄ )(yi − ȳ )
i =1 Covxy sy
´ˆ 1 = n = = rxy
Varx sx
∑ (xi − x̄ )2
i =1

given that
Covxy
rxy =
sx sy

(DPE, KCL) W2 16 September 2024 32 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´
▶ The population model

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´
▶ The population model

yi = ´ 0 + ´ 1 xi + u i

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´
▶ The population model

yi = ´ 0 + ´ 1 xi + u i
▶ What is ui ?

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´
▶ The population model

yi = ´ 0 + ´ 1 xi + u i
▶ What is ui ? the error term : everyting else not in the model.

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´
▶ The population model

yi = ´ 0 + ´ 1 xi + u i
▶ What is ui ? the error term : everyting else not in the model.
▶ If the model is properly speicifed, then:

E ( ui ) = 0

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiassed (hopefully!)

▶ We use OLS to estimate the population parameters.


▶ OLS is unbiased IFF E [ ´ˆ1 = ´
▶ The population model

yi = ´ 0 + ´ 1 xi + u i
▶ What is ui ? the error term : everyting else not in the model.
▶ If the model is properly speicifed, then:

E ( ui ) = 0

(DPE, KCL) W2 16 September 2024 33 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiased (hopefully!)

▶ We also assume that:


E (u | x ) = E (u )
▶ This implies:

Cov (x, u ) = E (xu ) − E (x )E (u ) = E (xu ) = 0


this is so because if x and u are independent, then,

E (xu ) = E (x )E (u )

(DPE, KCL) W2 16 September 2024 34 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiased (hopefully!)


▶ So:

yi = ´ 0 + ´ 1 x i + u i

n
∑ (xi ûi )
i =1
´ˆ1 = ´ 1 + n
∑ x̄ 2
i =1

(DPE, KCL) W2 16 September 2024 35 / 46


Ordinary Least Squares: regression analysis Interpreting regression coefficients

OLS is unbiased (hopefully!)


▶ So:

yi = ´ 0 + ´ 1 x i + u i

n
∑ (xi ûi )
i =1
´ˆ1 = ´ 1 + n
∑ x̄ 2
i =1
n  

 i =1
( xi ûi )
E [ ´ˆ1 ] = ´ 1 + E 


 n 
2
∑ x̄
i =1

(DPE, KCL) W2 16 September 2024 35 / 46


OLS assumptions

Outline

1. Ordinary Least Squares: regression analysis

2. OLS assumptions

(DPE, KCL) W2 16 September 2024 36 / 46


OLS assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

(DPE, KCL) W2 16 September 2024 37 / 46


OLS assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i

(DPE, KCL) W2 16 September 2024 37 / 46


OLS assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i
▶ We can have non linear variables, e.g., ´ + ´ x 2 + u
0 1 i i

(DPE, KCL) W2 16 September 2024 37 / 46


OLS assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i
▶ We can have non linear variables, e.g., ´ + ´ x 2 + u
0 1 i i
▶ For example, the Mincer equation is typically used to estimate the
effect of education and experience on earnings:

(DPE, KCL) W2 16 September 2024 37 / 46


OLS assumptions

Gauss-Markov Assumptions

▶ Standard assumptions about the population model.

1. Linear in parameters:
▶ y = ´ + ´ x + u , but NOT y = ´ + ´2 x + u
i 0 1 i i i 0 1 i i
▶ We can have non linear variables, e.g., ´ + ´ x 2 + u
0 1 i i
▶ For example, the Mincer equation is typically used to estimate the
effect of education and experience on earnings:

incomei = ´ o + ´ 1 Educationi + ´ 2 Experiencei + ´ 3 Experiencei2 + ϵi

(DPE, KCL) W2 16 September 2024 37 / 46


OLS assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).

(DPE, KCL) W2 16 September 2024 38 / 46


OLS assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s

(DPE, KCL) W2 16 September 2024 38 / 46


OLS assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s
▶ This allows for hypothesis testing:
▶ With sample size sufficiently large, the central limit theorem tells us
that ´ˆ will follow a t-distribution.

(DPE, KCL) W2 16 September 2024 38 / 46


OLS assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s
▶ This allows for hypothesis testing:
▶ With sample size sufficiently large, the central limit theorem tells us
that ´ˆ will follow a t-distribution.
▶ How can we test for random sampling?

(DPE, KCL) W2 16 September 2024 38 / 46


OLS assumptions

Gauss-Markov Assumptions

2 Random sampling:
▶ The sample data is representative (i.e., randomly drawn from the
population).
▶ Since we’ll obatain different estimators ´ˆ for each random draw of n,
ˆ are random variables, with expected values and standard errors.
the ´’s
▶ This allows for hypothesis testing:
▶ With sample size sufficiently large, the central limit theorem tells us
that ´ˆ will follow a t-distribution.
▶ How can we test for random sampling?

(DPE, KCL) W2 16 September 2024 38 / 46


OLS assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others

(DPE, KCL) W2 16 September 2024 39 / 46


OLS assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

(DPE, KCL) W2 16 September 2024 39 / 46


OLS assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

▶ If experience is calculated as Agei − Educationi , we cannot calculate


the OLS estimates.
▶ The X matrix will not be "full rank" which means we’d be essentially
dividing by zero when estimating the ´ vector.

(DPE, KCL) W2 16 September 2024 39 / 46


OLS assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

▶ If experience is calculated as Agei − Educationi , we cannot calculate


the OLS estimates.
▶ The X matrix will not be "full rank" which means we’d be essentially
dividing by zero when estimating the ´ vector.
▶ Common occurrence is the "dummy variable trap":

(DPE, KCL) W2 16 September 2024 39 / 46


OLS assumptions

Gauss-Markov Assumptions

3 No perfect co-linearity:
▶ Cannot perfectly predict any independent variable with a (linear)
combination of others
▶ For example:

yi = ´ 0 +1 Educationi + ´ 2 Agei + ´ 3 Experiencei + ´ 4 Experiencei2 +

▶ If experience is calculated as Agei − Educationi , we cannot calculate


the OLS estimates.
▶ The X matrix will not be "full rank" which means we’d be essentially
dividing by zero when estimating the ´ vector.
▶ Common occurrence is the "dummy variable trap":
▶ Dummy variables: di ∈ 0, 1

(DPE, KCL) W2 16 September 2024 39 / 46


OLS assumptions

Gauss-Markov Assumptions

4 Zero conditional mean:


▶ The error term in regression model, ui , should be zero, conditional on
controlling for x variables:
E (u | x1 , x2 , ..., xk ) = 0

(DPE, KCL) W2 16 September 2024 40 / 46


OLS assumptions

Gauss-Markov Assumptions

4 Zero conditional mean:


▶ The error term in regression model, ui , should be zero, conditional on
controlling for x variables:
E (u | x1 , x2 , ..., xk ) = 0

▶ Proof: Expected value of OLS estimator.

(DPE, KCL) W2 16 September 2024 40 / 46


OLS assumptions

Gauss-Markov Assumptions

4 Zero conditional mean:


▶ The error term in regression model, ui , should be zero, conditional on
controlling for x variables:
E (u | x1 , x2 , ..., xk ) = 0

▶ Proof: Expected value of OLS estimator.


▶ What happens when a variable is an extra variable is added that isn’t
need?

(DPE, KCL) W2 16 September 2024 40 / 46


OLS assumptions

Gauss-Markov Assumptions

5 No Homoskedasticity:
▶ The variance in the error term, ui , given the independent variables is
not correlated with any X variable.
Var (u | x1 , x2 , ..., xk ) = Ã2

(DPE, KCL) W2 16 September 2024 41 / 46


OLS assumptions

Gauss-Markov Assumptions

5 No Homoskedasticity:
▶ The variance in the error term, ui , given the independent variables is
not correlated with any X variable.
Var (u | x1 , x2 , ..., xk ) = Ã2

▶ Where Ã2 is the variance of ui

(DPE, KCL) W2 16 September 2024 41 / 46


OLS assumptions

Gauss-Markov Assumptions

5 No Homoskedasticity:
▶ The variance in the error term, ui , given the independent variables is
not correlated with any X variable.
Var (u | x1 , x2 , ..., xk ) = Ã2

▶ Where Ã2 is the variance of ui


▶ When this condition is not met, OLS estimates are still unbiased, but
the standard errors will potentially be biased.

(DPE, KCL) W2 16 September 2024 41 / 46


OLS assumptions

OLS is "BLUE"

▶ Under G-M assumptions, OLS is "BLUE":


▶ Best
▶ Linear
▶ Unbiased
▶ Estimator

(DPE, KCL) W2 16 September 2024 42 / 46


OLS assumptions

Goodness of fit

▶ The regression "explains"


variation in dependent variable:
▶ Compare difference between
observations and mean
(yi − ȳ ) with the residuals:
predicted values are closer!

(DPE, KCL) W2 16 September 2024 43 / 46


OLS assumptions

Goodness of fit
▶ Total sum of squares (n x sample variance)
n
SStot = ∑ (yi − ȳ )2
i −1

▶ Explained variation: the regression sum of squares


n
SSreg = ∑ (ŷi − ȳ )2
i −1

▶ where ȳ = ´ˆ 0 + ´ˆ 1 xi , the predicted value of yi .


▶ The proportion of variation explained by the regression is:

Variation of estimated ŷ
R2 =
Total variation of observed y
SSreg
R2 = (5)
SStot
(DPE, KCL) W2 16 September 2024 44 / 46
OLS assumptions

Goodness of fit

▶ The unexplained variation in y is the residual sum of squares: (aka


sum of squared errors).
n
SSres = ∑ (ŷi − yi )2 (6)
i −1

▶ The R 2 can also be defined as:

SSres
R2 = 1 − (7)
SStot

(DPE, KCL) W2 16 September 2024 45 / 46


OLS assumptions

R-square properties

▶ R 2 ranges from 0 to 1.
▶ If R 2 = 0 =⇒ The OLS regression does not explained any variation in

the values of y
▶ If R 2 = 1 =⇒ The OLS regression explains all the variation of y.
▶ For regressions with just one independent variable (bivariate
regression), the r-square is the square of the correlation coefficient.

(DPE, KCL) W2 16 September 2024 46 / 46


Endogeneity and ID strategies
Instrumental Variables

Dr. Ian Levely

18 March 2025

(DPE, KCL) Econometrics Session 2 18 March 2025 1 / 33


(DPE, KCL) Econometrics Session 2 18 March 2025 2 / 33
Estimating returns to education

▶ Say that we want to estimate the following equation,


incomei = β 0 + β 1 educationi + β 2 agei + β 3 agei2 + ui

▶ If we find that β 1 > 0 at the 95% confidence level, can we conclude


that education causes an increase in income?

(DPE, KCL) Econometrics Session 2 18 March 2025 3 / 33


Reverse causality

(DPE, KCL) Econometrics Session 2 18 March 2025 4 / 33


Omitted variables

(DPE, KCL) Econometrics Session 2 18 March 2025 5 / 33


Endogeneity

1. Omitted variable bias: another (uncontrolled) variable is affecting both


the treatment and outcome.
▶ This violates G-M assumption that error terms have zero conditional
mean: E [ui |X ] = 0.
2. Reverse causality: ’outcome’ causes the treatment.
3. Simultaneity: variables in the model cause each other, which means
it’s impossible to separate the effect one from the other.

(DPE, KCL) Econometrics Session 2 18 March 2025 6 / 33


OVB

▶ If the Gauss-Markov conditions are met, the estimator β̂ 1 is unbiased.


This means that

E ( β̂ 1 ) = β 1 when N −→ ∞ and Cov (Xi , ui ) = 0

(DPE, KCL) Econometrics Session 2 18 March 2025 7 / 33


Correlation between education and income:

▶ Do any of these apply here? Or can we interpret the correlation as


causal?
▶ Individuals self-select into education, based on:
▶ Ability, i
▶ Socio-economic background, wi

incomei = β 0 + β 1 educi + β 2 agei + β 3 agei2 + ui

ui = δ1 hi + δ2 + wi + vi

▶ When:
▶ δ1 ̸= 0
▶ cov (hi , educationi ) ̸= 0,

(DPE, KCL) Econometrics Session 2 18 March 2025 8 / 33


Correlation between education and income:

▶ What about adding control variables for standardized test scores (as a
proxy for ability) and parents’ income as a proxy for socio-economic
background.

inci = β 0 + β 1 educi + β 2 agei + β 3 agei2 + β 3 hi + β 4 wi + ui

What is this solving?

(DPE, KCL) Econometrics Session 2 18 March 2025 9 / 33


Simultaneity

inci = β 0 + β 1 educi + β 2 agei + β 3 agei2 + β 3 hi + β 4 wi + ui

▶ How do hi , wi , educi relate to one another?


▶ Plausibly:
educi = γ0 + γ1 hi + γ2 wi + vi ,

▶ This means that our estimate of β̂ 1 will measure the effect of both
education itself in addition to the effect of hi and wi , and thus, our
model is not specified correctly.

(DPE, KCL) Econometrics Session 2 18 March 2025 10 / 33


Simultaneious equations

yi = β 0 + β 1 xi + β 2 w i + u i

▶ and at the same time:


wi = γ0 + γ1 xi + vi
▶ B2 is capturing some effect of x1 as well as the direct effect of w1 .
▶ We can’t solve a system of simultaneous equations unless there’s 1
less equation than exogenous variables.

(DPE, KCL) Econometrics Session 2 18 March 2025 11 / 33


Simultaneous equations — reverse causality

yi = β 0 + β 1 x i + u i

▶ and at the same time:


xi = γ0 + γ1 yi + vi

yi = β 0 + γ1 yi + vi
▶ ui will be correlated with xi

(DPE, KCL) Econometrics Session 2 18 March 2025 12 / 33


Sources of exogenous variation?

▶ When randomizing treatment is


difficult, one approach is to look
for other exogenous variation:
natural experiment.

(DPE, KCL) Econometrics Session 2 18 March 2025 13 / 33


Sources of exogenous variation?

▶ When randomizing treatment is


difficult, one approach is to look
for other exogenous variation:
natural experiment:
▶ The Vietnam draft randomly
assigned some male citizens to
serve in the US armed forces
through a lottery, BUT, students
were exempt.
▶ This provides exogenous
variation in schooling – even
though not all those drafted
received more education.

(DPE, KCL) Econometrics Session 2 18 March 2025 14 / 33


Natural Experiment

(DPE, KCL) Econometrics Session 2 18 March 2025 15 / 33


The reduced form regression

▶ Say we have the following system of equations:

y = β 0 + β 1 xi + β 2 w i + u i

xi = γ0 + γ1 wi + γ3 zi ui
▶ Where cor (xi , wi ) ̸= 0 and β 2 ̸= 0, which means we have
simultaneity.
▶ However, z is exogenous to wi and yi , but predicts x: cor (zi , xi ) ̸= 0

(DPE, KCL) Econometrics Session 2 18 March 2025 16 / 33


The reduced form regression

▶ We can replace xi with zi and estimate α1 instead of β 1

y = α 0 + α 1 z i + α 2 wi + u i

▶ Since zi does not cause yi directly–only through xi , we can attribute


this to effect to x.
▶ This will be unbiased, but have much more "noise".
▶ This is the reduced form regression (we’ll see later why it’s called this).

(DPE, KCL) Econometrics Session 2 18 March 2025 17 / 33


Instrumental variable approach

(DPE, KCL) Econometrics Session 2 18 March 2025 18 / 33


IV with a single regressor and a single instrument

Yi = β 0 + β 1 Xi + ui

▶ Where corr (xi , ui ) ̸= 0, E [ βˆ1 ] ̸= β


▶ IV regression breaks X into two parts: a part that might be correlated
with ui , and a part that is not.

(DPE, KCL) Econometrics Session 2 18 March 2025 19 / 33


What makes a good IV?

▶ Instrument relevance: Instrument relevance: IV is highly


correlated with the endogenous variable of interest, cov(zi , xi ) ̸= 0.
▶ Instrument exogeneity:
▶ IV is uncorrelated with the error term, that is cov(zi , ui ) = 0. This also
means that the IV must not be a determinant of the dependent
variable.
▶ And meet the "exclusion criteria" – it cannot predict y independently
of x.

(DPE, KCL) Econometrics Session 2 18 March 2025 20 / 33


Estimation of IV: 2-Stage Least Squares

▶ As it sounds, 2SLS has two stages, that is, you need to estimate two
regressions:
▶ STAGE 1
1. Isolate the part of X that is uncorrelated with u by regressing X on Z
using OLS:
X i = α 0 + α 1 Zi + v i

2. Using α̂0 and α̂a , we predict values of x:

Xbi = α̂0 + α̂1 Zi


▶ Because zi is uncorrelated with ui , then cor (xˆi , ui ) = 0.

(DPE, KCL) Econometrics Session 2 18 March 2025 21 / 33


Estimation: 2SLS

STAGE 2
▶ We then replace Xi with Xbi in our original model:

Yi = β̂ 0 + β̂ 1 Xbi + ûi

▶ Because cor (Xbi , ui ) = 0, the zero conditional mean assumption holds.


▶ The new estimator is called the Two Stage Least Square (2SLS)
estimator, β[
2SLS and it is a consistent estimator of the population
1
parameter β 1 .

(DPE, KCL) Econometrics Session 2 18 March 2025 22 / 33


2SLS with a single endogenous regressor

▶ First 2SLS stage:


▶ Regress X1 on all the exogenous regressors: regress X1 on
W1 , . . . , Wr , Z1 , . . . , Zm , and an intercept, by OLS.
▶ Compute predicted values Xb1i , i = 1, . . . , n

▶ Second 2SLS stage:


▶ Regress Y on Xb1 , W1 , . . . , Wr , and an intercept, by OLS.
▶ The coefficients from this second stage regression are the 2SLS
estimators, but SEs are wrong: need to get covariance matrix from 1st
and 2nd stages to correct (automatic in Stata).
▶ In simple 2SLS, the vector of exogenous variab les should have exactly
1 more variable than the vector of endogenous variables – otherwise
the model is overspecified

(DPE, KCL) Econometrics Session 2 18 March 2025 23 / 33


Effects of education OLS and IV regression (2SLS)

(DPE, KCL) Econometrics Session 2 18 March 2025 24 / 33


Checking Instrument Validity: Summary

▶ Relevance of Instrument
▶ If instrument is weak, then the 2SLS estimator is biased and t-statistic
has a non-normal distribution.
▶ To check for weak instruments with a single included endogenous
regressor, check the first-stage F-stat.

(DPE, KCL) Econometrics Session 2 18 March 2025 25 / 33


Checking Instrument Validity:

▶ Exogeneity of instrument
▶ All the instruments are uncorrelated with the error term:

corr (Z1i , ui ) = 0, . . . , corr (Zmi , ui ) = 0

▶ Remember, we can never prove that a regression coefficent is zero, so


ultimately we assume (using theory) that an instrument is exogenous.
▶ How to prove the exclusion criteria? (i.e., that zi does not have an
independent effect on yi ?

(DPE, KCL) Econometrics Session 2 18 March 2025 26 / 33


Illustrations

▶ Rainfall may determine your capacity to economically develop (See


work by Jeffrey Sachs) but rainfall does not necessarily explain
variations in levels of Aid. The level of rainfall in a country may be
considered "as if randomly assigned".
▶ Distance from the Equator may also determined economic
development (Sachs, again...) but it is exogenous to variations in
levels of Aid. In this case, being further from the Equator can also be
considered "as if randomly assigned".

(DPE, KCL) Econometrics Session 2 18 March 2025 27 / 33


IV and experiments
▶ ITT (Intention-to-treat) estimator: compare everyone who got the
treatment (at random) with those who didn’t e.g. ȳT - ȳC .
▶ Related to "effectiveness" in medical lit.
▶ Sometimes not all subjects in the treatment group receive treatment
(attrition, compliance, etc...)
▶ ATT (Average Treatment effect on the Treated) estimator: how did
the treatment affect the outcome amongst those who actually received
the treatment.
ATT = E [yt − yc |T = 1]

▶ we don’t observe [yc |T = 0], we need to measure this use a statistical


technique like IV regression, using treatment as the instrument.
▶ Related to "efficacy" in medical lit – the effect of the treatment under
ideal circumstances.
▶ ATN: (Average Treatment effect on the Non-treated):
ATN = E [yt − yc |T = 0]
(DPE, KCL) Econometrics Session 2 18 March 2025 28 / 33
Endogeneity: identification strategy

▶ In econ we usually assume that variables are endogenous by default.


▶ An “identification strategy” is a way of finding exogenous variation
in the explanatory variable of interest in order to test for a causal
relationship.

(DPE, KCL) Econometrics Session 2 18 March 2025 29 / 33


Economic development and institutional quality

▶ A famous example: Acemoglu, Johnson and Robinson.


▶ The model that AJR estimates is the following:

GDP/cap = α1 + β 1 Insti + λK QKi + ui

where Inst refers to institutions and QKi refers to the set of control
variables (Colonial past, Geography, Executive constraints...).
▶ They are interested in identify the causal link between Inst and
GDP/cap which is explained by β 1 .

(DPE, KCL) Econometrics Session 2 18 March 2025 30 / 33


Economic development and institutional quality...or the
other way round!

▶ AJR baseline models implies that Institutions are exogenous from


GDP/cap.
GDP/cap = α1 + β 1 Insti + λK QKi + ui
▶ Is this realistic?
▶ Think of the following equation:

Inst = α2 + β 2 GDP/capi + λK QKi + ui

(DPE, KCL) Econometrics Session 2 18 March 2025 31 / 33


Economic development and institutional quality

▶ AJR face a simultaneous equation model which implies reverse


causality.
▶ The equation of interest is the first one but one needs to consider the
second equation in order to estimate the effect of institutions on
GDP/cap.
▶ How do you do that? Use Instrumental Variables.

(DPE, KCL) Econometrics Session 2 18 March 2025 32 / 33


Economic development and institutional quality

▶ 2SLS estimates, Z=Distance from the Equator (m=1)


▶ is that a good instrument?

(DPE, KCL) Econometrics Session 2 18 March 2025 33 / 33

You might also like