Panel Data Analysis of Microeconomic Decisions: Fall 2020
Panel Data Analysis of Microeconomic Decisions: Fall 2020
Fall 2020
Panel Data Analysis Fall 2020
General Information
• first part assignment due on October 23: submit online via canvas
• textbooks
– M. Verbeek “A Guide to Modern Econometrics”, 5th edition, chapter 10.
– J. Wooldridge “Econometric analysis of cross section and panel data”, 2nd
edition, mostly chapter 10.
– C. Cameron & P.Trivedi “Microeconometrics – methods and applications”,
chapter 21.
1
Panel Data Analysis Fall 2020
Specifics part I
• first part of this course is rather theoretical. This is required to grasp and correctly
apply panel methods. It also will yield a solid preparation for the second part.
• we will not spend too much time on interpreting coefficients or discussing articles
in detail.
– research articles will be mainly used for illustration and as examples.
– slides contain links to all articles used for the course.
– we will mostly use our own data generating process (DGP) to examine the
behavior of estimators. We will use Stata. Material will be available on Canvas.
• the slides contain exercises. Some exercises will be done during the lecture, others
are to be solved at home. The solution will be provided on Canvas.
• for more extensive calculations, I will provide a set of pdf documents or additional
videos.
Outline
3
Panel Data Analysis 1. Introduction
• panel data are repeated observations for the same unit i = 1, ..., N observed over
t = 1, ..., T periods
• an observation is the pair {yit; xit} where i subscript denotes the unit i.e. individual,
and t subscript denotes time
• xit can be time-varying, i.e. age, labor force status, health, or time-constant, i.e.
sex, genes, birth date
4
Panel Data Analysis 1. Introduction
5
Panel Data Analysis 1. Introduction
• a panel data set can be interpreted as a three-dimensional matrix where the 3rd
dimension is time: think of T two-dimensional matrices (one for each period)
stacked behind each other in the 3rd dimension
• this can be done by stacking the T two-dimensional matrices below each other
(long format, done here) or next ot each other (wide format)
• Exercise: Consider the structure of the data set. Does it represent an iid random
sample? Find one pro and one contra argument.
6
Panel Data Analysis 1. Introduction
• data set
– five waves of the British Household Panel Study (BHPS)
– all individuals are employed during observation period
– balanced sample containing 1,625 individuals for 5 waves
• variables
– outcome variable yit: wage rate hourly (wage)
– time-varying xit: health status (sahex, sahgd), age (age),...
– time-invariant xit: race (white), highest educational degree (deg),...
7
Panel Data Analysis 1. Introduction
8
Panel Data Analysis 1. Introduction
9
Panel Data Analysis 1. Introduction
• example above: Observe changes in individual’s health and link to wage development
• panel data sets are typically larger than cross-section/time-series data and variables
vary over two dimensions
• estimators based on panel data often more accurate than from other data
10
Panel Data Analysis 1. Introduction
• consider the following structural linear model with cross section data
y = β0 + x 0 β + α + u
11
Panel Data Analysis 1. Introduction
3. Which solutions do we have for this problem when data are cross-section?
12
Panel Data Analysis 1. Introduction
• assume we observe the same cross section units at two points in time
– yt, xt for t = 1, 2, observed for two time periods
– assume that α is time-constant and does not vary across t
13
Panel Data Analysis 1. Introduction
∆y = ∆x0β + ∆u (2)
• for consistent estimate β̂, check orthogonality condition for Equation (2)
E(∆x∆u) = 0
←→ E [(x2 − x1)(u2 − u1)] = E(x2u2) + E(x1u1) − E(x1u2) − E(x2u1) = 0
• strict exogeneity is the key assumption for identification with panel data models
and it will be central to this course!
14
Panel Data Analysis 1. Introduction
individual dynamics
• individual who has experienced an event in past is more likely to experience that
event in future compared to individual who has not experienced that event
15
Panel Data Analysis 1. Introduction
internal instruments
• panel data also allow computing internal instruments to address endogeneity in xit
– crucial assumptions: Cov(xis, αi) = Cov(xit, αi), and Cov(xis, uit) = 0, ∀s, t
16
Panel Data Analysis 1. Introduction
• panel data comprise large amount of information, variables vary over more than one
dimension
→ considerable efficiency gains compared to less informative data
• panel data
– data sets expands over several time periods, but in each time period we observe
the same individual again
– each individual is observed over all periods → we use variation across and
within individuals in each time period
17
Panel Data Analysis 1. Introduction
• interested in estimating time effects, i.e. (µt − µs) change from one period to next
18
Panel Data Analysis 1. Introduction
• Exercise: Any idea why the variance of the estimator is smaller for panel data
than for repeated cross section? Use the formula in Equation (3) to come up
with a formal answer that you elaborate in group. Assume that uit is an iid
error term that is independent of αi and µ1, ..., µT . Hint:
– derive covariance, Cov(µ̂t, µ̂s), and variance, V ar(µ̂t), as functions of
unobservables αi and uit
– then plug in for V ar(µ̂t − µ̂s), for panel and cross section data and compare
19
Panel Data Analysis 1. Introduction
20
Panel Data Analysis 1. Introduction
1. pooled OLS (POLS) estimator: ignores panel dimension and treats data as one
big cross section
– exogeneity assumption: E(xitit) = 0 with E(xituit) = 0 and E(xitαi) = 0
– even if exogeneity assumption satisfied, POLS has efficiency problem: it
depends on αi for all t
→ correlation between is and it does not decrease as distance |t − s|
increases
21
Panel Data Analysis 1. Introduction
22
Panel Data Analysis 1. Introduction
• generate data set of N = 200 units that are observed over T = 5 periods, such
that we obtain N × T = 1000 observations
• What result do we get for β1 from an OLS regression? How does the estimator β̂1
change as N increases?
23
Panel Data Analysis 1. Introduction
• Paper: Bonsang & Klein (2012): “Retirement and subjective well-being” (link)
– study the effect of retirement on life satisfaction using panel data
– distinguish voluntary and involuntary retirement: differences in consumption-
leisure trade-off according to classical life cycle model
– different panel models to identify causal effect of retirement on life satisfaction
• example aims illustrating the advantage of having panel data in a situation where
the variable of interest is endogenous: OLS, FE, internal instruments
• Exercise (homework)
– What is the endogeneity problem with retirement? Name at least one source
of endogeneity.
– What role could income play in this relationship? How may the endogeneity
problem with income differ from the endogeneity problem with retirement?
24