Week1 Lecture2
Week1 Lecture2
University of Amsterdam
Week 1. Lecture 2
February, 2024
1 / 57
Overview
2 / 57
The plan for today
3 / 57
Recap: Linear model
In the first lecture, we considered the simple linear model with a single
regressor xi
yi = α + βxi + εi , i = 1, . . . , n. (1)
We used this model to understand the determinants of hotel prices in
Vienna. Unfortunately, we did not provide any motivation on why linear
model is a useful starting point of empirical analysis. More on this today.
4 / 57
Recap: OLS estimator
We used sample data {(yi , xi )}ni=1 to construct statistics that can be used
as estimates of (α, β). For this purpose, we considered the Ordinary Least
Squares (OLS) objective function.
5 / 57
1. The Art of Econometrics
6 / 57
1.1. Econometric analysis
7 / 57
Structural and reduced form model analysis
8 / 57
Microeconometrics vs. Macroeconometrics
Within Financial econometrics tools from both micro- and macro- are used
(as well separate unique tools for financial dataset). Applications can be
both aggregate (e.g. fluctuations of stock market index), as well as
disaggregated (e.g. fluctuations of individual stocks within the same stock
exchange).
9 / 57
Microeconometrics vs. Macroeconometrics. Examples.
Micro:
I Does attending an elite college bring an expected payoff in expected
lifetime income sufficient to justify the higher tuition?
I Does an increase in the minimum wage lead to reduced employment?
I Do smaller class size bring real benefits in student performance?
I Does the religious belief of your GP has an impact on the choice of the
birth control? What is the effect on the lifetime income?
Macro:
I Does a monetary policy regime that is strongly oriented towards
controlling inflation impose a real costs in terms of lost output?
I What are the effects of monetary policy when interest rates are
negative?
I Why the volatility of inflation went down in the 90s?
10 / 57
The model or a model? Causal or non-causal analysis?
As we will learn from this course, the distinction between the causal and the
non-causal models usually comes from the assumptions we impose the
distribution of εi and (x1,i , . . . , xK ,i ).
11 / 57
1.2. Out-of-sample (counterfactual predictions)
12 / 57
The R 2 is not everything
13 / 57
The counterfactual predictions
yi = α + βxi + εi , (4)
14 / 57
The Vienna hotels example.
Binary:
yb(D) = 92.09 + 34.67D, (6)
for D = 1(distance < 2km).
Continuous:
yb(distance) = 118.73 − 3.44distance. (7)
Can we compare the usefulness of the two models in terms of their
predictions?
15 / 57
Counterfactual (thought) experiment.
The binary model in both cases suggests the per night average price of:
The continuous variable model, on the other hand, gives two predictions:
16 / 57
Which one is better?
Benefits of continuous:
I Model with continuous distance variable gives different predictions that
are more in-line with the common sense.
I Also, the model avoids the boundary problem at 2km, as the difference
between predictions for 1.99km and 2.00km are 0.08eur.
I This is all, despite worse R 2 !
Benefits of binary:
I Usually hotels in the city center can be very heterogeneous (i.e. think
of IBIS Budget and Conrad Hilton). If you are only interested in
predicting the price outside the 2km radius, maybe only the information
about the hotels outside of that radius is important. For 2km and 5km,
binary model predicts using outside 2km data.
I Better in sample R 2 .
Summary: if you want to choose a good model for decision making,
you have to weight both in-sample, out-of sample, as well as
non-statistical considerations into account.
17 / 57
Interpolation vs. extrapolation
Do you think both models would still provide useful predictions? Why?
18 / 57
2. Why OLS and linear model? (Advanced
material)
19 / 57
2.1. Structural motivation of linear models
20 / 57
Binary regressor as a motivation of linear model (Reduced
form thinking)
From there:
yi = µ0 + Di β + εi , (12)
εi ≡ yi − E[yi |Di ]. Hence linear model is natural (or as we usually call a
non-parametric) model for the conditional mean of yi given binary variable
Di . Hence, linearity in (12) is not a restriction at all!
21 / 57
Coefficient β
For example, if yi was your health status, while Di is a measure whether you
received a certain drug, then the coefficient β measures the Average
Treatment Effect (ATE) in the infinite population.
In the course Empirical Project, you will learn more about estimation of
ATEs and Average Treatment Effect on Treated (ATT).
22 / 57
Linearized economic model as a motivation (Structural
form thinking)
23 / 57
2.2. Population linear projection
24 / 57
Population-level decomposition
Take two general random variables (yi , xi ) with means µy , µx , and the
covariance σxy .
25 / 57
Population-level decomposition
Cov (xi , yi − (σxy /σx2 )xi ) = Cov (xi , yi ) − (σxy /σx2 )Cov (xi , xi )
= Cov (xi , yi ) − Cov (xi , yi )
= 0.
26 / 57
Population-level decomposition
What has just happened here? From the first two moments of (yi , xi ) we
“invented” a linear model:
y i = α 0 + β0 x i + e i , (19)
27 / 57
Population linear projection
In this equation
y i = α 0 + β0 x i + e i , (21)
you can think of α0 + β0 xi as the fitted (population) value of yi on xi , and
ei as the population residual (or the error term). Equations like Eq. (21) are
usually referred to as Linear Projections in a L2 space.
28 / 57
Population linear projection vs. causal model
For
y i = α 0 + β0 x i + ε i , (22)
to serve as a proper causal model for the first two moments of yi it is
generally required that E[εi xi ] = 0 (that is not assumption at all) is replaced
by the stronger assumption that
29 / 57
Special case. Binary regressor.
yi = µ0 + Di (µ1 − µ0 ) + εi , (24)
with E[εi |Di ] = 0 for both Di = 1 and Di = 0. Hence, for models with
binary regressors population linear projection and population conditional
expectation are the same.
30 / 57
2.3. Population Least-squares
31 / 57
Linear projection as a solution to optimization problem
32 / 57
Population LS objective function
33 / 57
Population LS objective function
Hence, the linear projection coefficients (α0 , β0 ) that implicitly define (in
population) the linear model:
yi = α + βxi + εi , (27)
34 / 57
3. Finite sample statistical properties
35 / 57
3.1. Classical assumptions
36 / 57
Statistical properties
Given that (b
α, β)
b are two statistics, we want to use standard statistical
measures to understand the quality of the OLS estimator. In particular, we
might want to investigate:
I Whether the estimator is biased/unbiased for the true values (α0 , β0 ).
For any fixed value of n.
I Whether the estimator is consistent (more about this in Week 4) for
the true values (α0 , β0 ).
I Whether it has the lowest variance and/or Root mean squared error
(for any fixed value of n) in the class of certain type of estimators.
In this lecture, we will tackle points 1 and 3. These are the finite sample (n
fixed) properties of the OLS estimator.
37 / 57
Assumptions (slightly modified from the ones in the book)
To answer those points, we need to impose some restrictions on the
probabilistic model that generated {(yi , xi )}ni=1 .
Assumption 1 (Fixed regressors/Exogeneity). The n observation on the
explanatory variable x1 , . . . , xn can be treated as fixed
numbers (i.e. one can condition on them). They satisfy
2
P
(x
i i − x) > 0.
Assumption 2 (Random disturbances). The n disturbances/error terms
ε1 , . . . , εn are random variables with
E[εi |xi ] = E[εi |x1 , . . . , xn ] = 0.
Assumption 3 (Homoscedasticity). The variance of ε1 , . . . , εn exists and are
all equal E[ε2i |xi ] = E[ε2i |x1 , . . . , xn ] = σ02 > 0.
Assumption 4 (Independence). ε1 , . . . , εn are independent after
conditioning on x1 , . . . , xn .
Assumption 5 (Fixed parameters). The true parameters α0 , β0 , σ02 are fixed
unknown parameters.
Assumption 6 (Linear model). The data on y1 , . . . , yn have been generated
by:
yi = α0 + β0 xi + εi . (28)
Assumption 7 (Normality) the disturbances ε1 , . . . , εn are jointly normally
distributed conditionally on x1 , . . . , xn . 38 / 57
Discussion. Assumptions 1 and 2.
Among other things E[εi |xi ] = 0 implies that E[εi xi ] = 0. Hence, this
additional information is indeed informative beyond the linear projection we
discussed previously.
39 / 57
Discussion. Assumptions 2,3,4.
40 / 57
Discussion. Assumptions 5-6.
41 / 57
Discussion. Assumption 7.
42 / 57
3.2. Bias of the OLS estimator
43 / 57
OLS is unbiased
Let (xi , yi ) ∼ P(x, y |α0 , β0 , . . .) (so random variables are drawn from some
distribution with (among others) parameters α0 and β0 ) for all i = 1, . . . , n.
Then we say that statistics α b and βb are unbiased for α0 and β0 if:
E[b
α] = α0 , E[β]
b = β0 , (31)
44 / 57
OLS estimator of β
Recall that P
i (x − x)(yi − y )
βb = Pi 2
, (32)
i (xi − x)
Using Assumption 6 we replace yi with yi = α0 + β0 xi + εi
P P
i (x i − x)(xi − x) (εi − ε)(xi − x)
β = β0
b P 2
+ iP 2
(x i − x) i (xi − x)
Pi
εi (xi − x)
= β0 + Pi 2
.
i (xi − x)
Next consider:
P
b 1 , . . . , xn ] = E[β0 |x1 , . . . , xn ] + E i εi (xi − x)
E[β|x P 2
|x1 , . . . , xn . (33)
i (xi − x)
45 / 57
By using the definition of conditional expectations and Assumption 2:
P P
εi (xi − x) |x1 , . . . , xn ] (xi − x)
i E [εiP
E Pi 2
|x 1 , . . . , x n = 2
i (x i − x) i (xi − x)
= 0.
From here:
b 1 , . . . , xn ] = E[β0 |x1 , . . . , xn ] = β0 ,
E[β|x (34)
and by the law of iterated expectations:
E[β]
b = E[E[β|x
b 1 , . . . , xn ]] = β0 . (35)
46 / 57
OLS estimator of α is also unbiased.
What about α
b? Note that:
b = y − βx.
α b (36)
Hence:
b = α0 + β0 x + ε − βx.
α b (37)
From here:
= α0 + x E[(β0 − β)|x
b 1 , . . . , xn ] + 0
= α0 + 0 + 0
= α0 .
Here the first line follows by the linearity of expectations; the second line by
Assumption 2; the third line from the fact that E[β|xb 1 , . . . , x n ] = β0 .
47 / 57
3.3. Variance of the OLS estimator
48 / 57
Variance of βb
Using similar steps as before we can derive the (conditional) variance of the
OLS estimator β. b
" P 2 #
2 i εi (xi − x)
E[(β − β0 ) |x1 , . . . , xn ] = E
b P 2
|x1 , . . . , xn
i (xi − x)
!2
2
1 X
= P E εi (xi − x) |x1 , . . . , xn .
i (x i − x)2
i
49 / 57
Next we can expand
!2
X XX
E εi (xi − x) |x1 , . . . , xn = E εi εj (xi − x)(xj − x)|x1 , . . . , xn
i i j
" #
X
=E ε2i (xi − x)2 |x1 , . . . , xn
i
X
= σ02 (xi − x)2 .
i
50 / 57
To summarize:
σ02
E[(βb − β0 )2 |x1 , . . . , xn ] = P 2
. (38)
i (xi − x)
51 / 57
Variance of α
b
As before expand:
b = α0 + β0 x + ε − βx.
α b (39)
. Hence
σ02
E[ε2 |x1 , . . . , xn ] = (40)
n
x 2 σ02
E[((βb − β0 )x)2 |x1 , . . . , xn ] = P 2
. (41)
i (xi − x)
52 / 57
As for the covariance term, note:
53 / 57
OLS is actually BLUE
for some weights {ai } and {bi } that are a function of data {xi } only. This
is the celebrated Gauss-Markov theorem.
54 / 57
4. Summary
55 / 57
Summary today
In this lecture:
I We argued how to make out-of-sample (counterfactual) predictions
based on the OLS regression results.
I We suggested that good model need not always be the one that
maximizes/minimizes statistical measures.
I We provided motivation for the OLS objective function.
I We studied unbiasedness and variance properties of the OLS estimator
with a single regressor.
56 / 57
Next week
57 / 57