Week01 RegressionWithPanelDataPart1
Week01 RegressionWithPanelDataPart1
Osman DOGAN
1 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Today outline:
1 Review of Econometrics I
2 Panel data
3 Regression with entity fixed effects
4 Regression with time fixed effects
Readings:
1 Stock and Watson (2020, Chapter 10).
2 Hanck et al. (2021, Chapters 10).
2 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
What is econometrics?
Definition 1
Econometrics is the science and art of using economic theory and statistical
techniques to analyze economic data.
3 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Econometric models
We covered the single and multiple linear regression models. The multiple
linear regression model is stated as
Yi = β0 + β1 X1i + β2 X2i + . . . + βk Xki + ui , i = 1, 2, . . . , n. (1)
We considered this model under the following assumptions:
Assumption 1 (Zero-conditional mean assumption)
The conditional distribution of ui given X1i , . . . , Xik has mean zero, that is,
E (ui |X1i , Xi2 , . . . , Xik ) = 0 for i = 1, 2, . . . , n.
Assumption 2
(X1i , Xi2 , . . . , Xik , Yi ), i = 1, 2, . . . , n, are i.i.d. draws from their joint
distribution.
Assumption 3
Large outliers are unlikely: X1 , . . . , Xk , and Y have finite fourth moments.
Assumption 4
There is no perfect multicollinearity.
4 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Econometric models
Thus, if we think that the error term includes a variable that is correlated with
the regressor X, then we can claim that E(ui |Xi ) ̸= 0.
5 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Econometric models
The control variable approach is one of the methods that can be used to
achieve the zero conditional mean assumption.
6 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Econometric models
The conditional mean independence means that given the control variable, the
mean of ui doesn’t depend on the variable of interest.
Consider Y = β0 + β1 X + β2 W + u, where
1. X is the variable of interest
2. W is an effective control variable so that E (ui |Xi , Wi ) = E(ui |Wi ).
Then, we have the following results:
(a) β1 has a causal effect, i.e., it has a causal interpretation.
(b) The OLS estimator β̂1 is unbiased.
(c) The OLS estimator β̂2 is in general biased.
7 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Econometric models
8 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Causality
9 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Causality
10 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
In the test score example, one can imagine randomly assigning “treatments” of
different class sizes (STR) to different groups of students.
If the experiment is designed and executed so that the only systematic
difference between the groups of students is their class size, then in theory this
experiment would estimate the effect on test scores of reducing class size,
holding all else constant.
11 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
12 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Data
13 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Data
Panel data, also called longitudinal data, are data for multiple entities in which
each entity is observed at two or more time periods.
14 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Data
Panel data are usually observed at regular time intervals (monthly, quarterly,
yearly, etc.), and are balanced (all units are observed at all periods).
Panel data can be
□ a short panel: many units and few time periods,
□ a long panel: many time periods and few units,
□ a large panel: many units and many time periods.
15 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
If we have k regressors. Then, the data will be denoted in the following way:
16 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Data
17 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Data
Consider simple linear regressions between fatality rate and the real tax on a
case of beer in 1982 and 1988. The estimated models are:
\
FatalityRate = 2.01 + 0.15BeerTax (1982 data),
\
FatalityRate = 1.86 + 0.44BeerTax (1988 data).
The regression results indicate a positive relationship between the beer tax and
the fatality rate for both years.
Data
These results are contrary to our expectations: alcohol taxes are supposed to
lower the rate of traffic fatalities.
As we known from Econometrics I, this is possibly due to omitted variable bias,
since both models do not include any covariates:
1 Quality (age) of automobiles
2 Quality of roads
3 Culture around drinking and driving
4 Density of cars on the road
Because of these omitted variables, our results suffer from the omitted variable
bias.
Panel data lets us eliminate omitted variable bias when the omitted variables
are constant over time within a given state.
19 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
20 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
− FatalityRatei1982
FatalityRatei1988\
= −0.072 − 1.04 (BeerTaxi1988 − BeerTaxi1982 ) .
The estimated effect of a change in the real beer tax is negative, as predicted
by economic theory.
An increase in the real beer tax by $1 per case reduces the traffic fatality rate
by 1.04 deaths per 10.000 people.
21 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
What if you have more than 2 time periods (T > 2)? Consider:
Here, Zi is an unobserved variable that varies from one state to the next but
does not change over time.
Because Zi varies from one state to the next but is constant over time, the
model in (13) can be interpreted as having n intercepts, one for each state.
Specifically, let αi = β0 + β2 Zi . Then, (13) can be written as
Definition 8
The model in (14) is the fixed effects regression model, in which
α1 , α2 , . . . , αn are treated as unknown intercepts to be estimated. These terms
α1 , α2 , . . . , αn are also known as entity fixed effects.
22 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Consider the model in (14). The state-specific intercepts also can be expressed
using binary variables to denote the individual states. Define
(
1 if i = 2,
D2i = (15)
0, otherwise.
Then, the fixed effects regression model in (14) can be written equivalently as
where
1 β0 , β1 , γ2 , γ3 , . . . , γn are unknown coefficients to be estimated,
2 the dummy variables D3i , . . . , Dni are defined as in (15).
23 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
We claim that the models in (14) and (16) are equivalent. How can we see this
equivalence?
We can determine the the relationship between the coefficients of (14) and
(16). When i = 1, we will get
1 Y1t = β1 X1t + α1 + u1t from (14) and
2 Y1t = β0 + β1 X1t + u1t from (16).
Thus, we have α1 = β0 . Similarly, when i ≥ 2, we have
1 Yit = β1 Xit + αi + uit from (14), and
2 Yit = β0 + β1 Xit + γi + u1t from (16).
24 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
25 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
26 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
For the entity-demeaned OLS regression, consider the “time average” of (14):
T T T
1 X 1 X 1 X
Yit = β1 Xit + αi + uit . (21)
T t=1 T t=1 T t=1
27 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
We will use the plm package to estimate the fixed effects models.
As for the lm function, we have to specify the regression formula and the data
to be used in our call of plm function.
Additionally, it is required to pass a vector of names of entity and time
variables to the argument index.
28 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
29 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
There can be omitted variables that might vary over time but not across states:
Let St denote the combined effect of variables which changes over time but not
states (“safer cars”). Then,
This model can be reformulated to have an intercept that varies from one year
to the next. Considering t = 1982 and ignoring Zi from model yields:
Definition 9
The time fixed effects regression model with a single X regressor is
Yit = β0 + β1 Xit + λt + uit , where λ1 , λ2 , . . . , λT are known as the time fixed
effects.
Just as the entity fixed effects regression model can be represented using n − 1
binary indicators, the time fixed effects regression model can also be
represented using T − 1 binary indicators:
Yit = β0 + β1 Xit + δ2 B2t + δ3 B3t + . . . + δT BTt + uit , (27)
where δ2 , δ3 , . . . , δT are unknown coefficients and
(
1 if t = 2,
B2t = (28)
0, otherwise,
and other dummy variables B3t , . . . , BTt are defined similarly.
31 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
32 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
We can also have a regression model that has both entity and time fixed effects.
Definition 10
The combined entity and time fixed effects regression model is
Yit = β1 Xit + αi + λt + uit , where αi is the entity fixed effect and λt is the
time fixed effect.
33 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
FatalityRateit = −0.66BeerTaxit + α
bi + λ (33)
\ bt
The result −0.66 is close to the estimated coefficient for the regression model
including only entity fixed effects.
The coefficient is less precisely estimated but significantly different from zero at
10% significance level.
34 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Estimation
In view of (24) and (33), we conclude that the estimated relationship between
traffic fatalities and the real beer tax is not affected by omitted variable bias
due to factors that are constant over time.
Table 2: Estimation Results
Dependent variable:
mrall
beertax −0.640∗
(0.350)
Observations 336
R2 0.036
Adjusted R2 −0.149
F Statistic 10.513∗∗∗ (df = 1; 281)
Note: ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01
35 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Next Time
36 / 37
Outline Review of Econometrics I Panel Data Fixed Effects Next Time References
Bibliography I
37 / 37