0% found this document useful (1 vote)
67 views

Panel Data Analysis of Microeconomic Decisions: Fall 2020

Panel data analysis involves studying the same observational units, like individuals, firms or countries, over multiple time periods. This allows researchers to address identification problems like omitted variable bias that are difficult with cross-sectional data alone. Panel data sets are typically larger than cross-sectional or time series data alone, allowing for more efficient parameter estimates. A key advantage is that panel data methods can eliminate the influence of time-invariant characteristics through transformations like first-differencing, overcoming omitted variable bias from unobserved static factors.

Uploaded by

Aish Jamil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
67 views

Panel Data Analysis of Microeconomic Decisions: Fall 2020

Panel data analysis involves studying the same observational units, like individuals, firms or countries, over multiple time periods. This allows researchers to address identification problems like omitted variable bias that are difficult with cross-sectional data alone. Panel data sets are typically larger than cross-sectional or time series data alone, allowing for more efficient parameter estimates. A key advantage is that panel data methods can eliminate the influence of time-invariant characteristics through transformations like first-differencing, overcoming omitted variable bias from unobserved static factors.

Uploaded by

Aish Jamil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Panel Data Analysis of Microeconomic Decisions

Fall 2020
Panel Data Analysis Fall 2020

General Information

• Bettina Siflinger, email: [email protected] ; office: K538

• first half: 10 lectures, 2 computer exercises, all on zoom

• first part assignment due on October 23: submit online via canvas

• second assignment due about two weeks before exam (tba)

• grading: group assignments 30%, final written exam 70%

• textbooks
– M. Verbeek “A Guide to Modern Econometrics”, 5th edition, chapter 10.
– J. Wooldridge “Econometric analysis of cross section and panel data”, 2nd
edition, mostly chapter 10.
– C. Cameron & P.Trivedi “Microeconometrics – methods and applications”,
chapter 21.

1
Panel Data Analysis Fall 2020

Specifics part I
• first part of this course is rather theoretical. This is required to grasp and correctly
apply panel methods. It also will yield a solid preparation for the second part.

• we will not spend too much time on interpreting coefficients or discussing articles
in detail.
– research articles will be mainly used for illustration and as examples.
– slides contain links to all articles used for the course.
– we will mostly use our own data generating process (DGP) to examine the
behavior of estimators. We will use Stata. Material will be available on Canvas.

• the slides contain exercises. Some exercises will be done during the lecture, others
are to be solved at home. The solution will be provided on Canvas.

• for more extensive calculations, I will provide a set of pdf documents or additional
videos.

• important: Report typos/mistakes! This is just fair towards fellow students.


2
Panel Data Analysis Fall 2020

Outline

• panel data methods for static/dynamic linear models


1. introduction to panel data modeling
– what are panel data?
– advantages over conventional cross sections?
2. static linear model
– fixed effects (FE), first difference (FD), random effects (RE) model
– model comparison (Hausman test)
– instrumental variable (IV) models
3. dynamic linear model
– models for lagged dependent variables
– GMM estimation

3
Panel Data Analysis 1. Introduction

1. What are panel data?

• panel data are repeated observations for the same unit i = 1, ..., N observed over
t = 1, ..., T periods

• units can be individuals firms, HH, countries,...

• an observation is the pair {yit; xit} where i subscript denotes the unit i.e. individual,
and t subscript denotes time

• xit can be time-varying, i.e. age, labor force status, health, or time-constant, i.e.
sex, genes, birth date

• panel can be balanced, {yit; xit}: t = 1, . . . , T ; i = 1, . . . , N , or unbalanced,


{yit; xit}: t = ti, . . . , t̄i; i = 1, . . . , N

• we only consider panel data with large N and (relatively) small T

4
Panel Data Analysis 1. Introduction

Structure of panel data

5
Panel Data Analysis 1. Introduction

• a panel data set can be interpreted as a three-dimensional matrix where the 3rd
dimension is time: think of T two-dimensional matrices (one for each period)
stacked behind each other in the 3rd dimension

• most econometric software is written for two-dimensional matrices → need to


represent three in two dimensions

• this can be done by stacking the T two-dimensional matrices below each other
(long format, done here) or next ot each other (wide format)

• Exercise: Consider the structure of the data set. Does it represent an iid random
sample? Find one pro and one contra argument.

6
Panel Data Analysis 1. Introduction

Example panel data set: Contoyannis and Rice (2001, link)

• interested in modeling the impact of health on wages in UK

• data set
– five waves of the British Household Panel Study (BHPS)
– all individuals are employed during observation period
– balanced sample containing 1,625 individuals for 5 waves

• variables
– outcome variable yit: wage rate hourly (wage)
– time-varying xit: health status (sahex, sahgd), age (age),...
– time-invariant xit: race (white), highest educational degree (deg),...

7
Panel Data Analysis 1. Introduction

Example panel data set: structure selected variables

8
Panel Data Analysis 1. Introduction

Example panel data set: variation selected variables

9
Panel Data Analysis 1. Introduction

Motivation: Why should we use panel data?

• panel data allow the identification of certain parameters or questions, without


making too restrictive assumptions

• example above: Observe changes in individual’s health and link to wage development

→ 1.1 address identification problems (1st order for causal analysis)

• panel data sets are typically larger than cross-section/time-series data and variables
vary over two dimensions

• estimators based on panel data often more accurate than from other data

→ 1.2 more efficient estimators (even with identical N )

10
Panel Data Analysis 1. Introduction

1.1 Motivation: The omitted variables problem

• consider the following structural linear model with cross section data

y = β0 + x 0 β + α + u

y, x ≡ (x1, . . . , xk )0 observed random variables; α unobserved regressor;


u unobserved iid error term with zero conditional mean E(u|x, α) = 0

• the population regression function is

E(y|x1, ..., xk , α) = E(y|x, α) = β0 + x0β + α

– if α is uncorrelated with xj for some variable j, then α is just another variable


affecting y
– if Cov(xj , α) 6= 0, not observing α creates an omitted variable bias (OMV)

11
Panel Data Analysis 1. Introduction

• Exercise: Consider the situation above.

1. Why do we obtain an OMV?

2. Show how Cov(xj , α) 6= 0 leads to an inconsistent estimate of βj using OLS.

3. Which solutions do we have for this problem when data are cross-section?

12
Panel Data Analysis 1. Introduction

Panel data and omitted variable bias

• assume we observe the same cross section units at two points in time
– yt, xt for t = 1, 2, observed for two time periods
– assume that α is time-constant and does not vary across t

• the population regression function is

E(yt|xt, α) = β0 + x0tβ + α, t = 1, 2, (1)


yt = β0 + x0tβ + α + ut

• zero conditional mean assumption, E(ut|xt, α) = 0, t = 1, 2 implies the


orthogonaity condition E(xtut) = 0

• problem: OMV still exists as long as Cov(xjt, α) 6= 0

• with panel data we can difference Equation (1) across t to eliminate α

13
Panel Data Analysis 1. Introduction

• Differencing eliminates all time-constant factors, including α

∆y = ∆x0β + ∆u (2)

where ∆y = y2 − y1, ∆x = x2 − x1, ∆u = u2 − u1

• for consistent estimate β̂, check orthogonality condition for Equation (2)

E(∆x∆u) = 0
←→ E [(x2 − x1)(u2 − u1)] = E(x2u2) + E(x1u1) − E(x1u2) − E(x2u1) = 0

• since E(xtut) = 0, t = 1, 2, so the first two terms vanish

• to get rid of 3. and 4. term, we also need to assume E(xtus) = 0, for t 6= s!

• strict exogeneity is the key assumption for identification with panel data models
and it will be central to this course!

14
Panel Data Analysis 1. Introduction

Other reasons why panel data are useful for identification

individual dynamics

• individual who has experienced an event in past is more likely to experience that
event in future compared to individual who has not experienced that event

• conditional probability of experiencing event in future is a function of past experience

• two explanations for this empirical regularity


(a) true state dependence: lagged state, yt−1, enters model in a structural way as
explanatory variable i.e. experiencing the event changes preferences
(b) spurious state dependence: individuals differ in unobserved characteristics
which make them more/less likely to experience the event

• panel data allow distinguishing between (a) and (b). Why?

15
Panel Data Analysis 1. Introduction

internal instruments
• panel data also allow computing internal instruments to address endogeneity in xit

• structural model: yit = x0itβ + αi + uit = x0itβ + it

• Example: impact of health on wages


– problem: individuals with higher wages may be healthier (reverse causality)
– consequence: regression of health on wage yields biased estimates
– solution: instrumental variable → correlated with health but uncorrelated with
it = αi + uit
– “internal instrument”: transformed endogenous health as instrument for xit,
T
zit = xit − T1 s=1 xis
P

– crucial assumptions: Cov(xis, αi) = Cov(xit, αi), and Cov(xis, uit) = 0, ∀s, t

16
Panel Data Analysis 1. Introduction

1.2 Efficiency of parameter estimates

• panel data comprise large amount of information, variables vary over more than one
dimension
→ considerable efficiency gains compared to less informative data

• repeated cross section


– repeated cross section: data sets expands over several time periods, but in
each time period a new random sample is drawn from the population
– each individual is observed only once in the data → we use variation across
individuals in each time period

• panel data
– data sets expands over several time periods, but in each time period we observe
the same individual again
– each individual is observed over all periods → we use variation across and
within individuals in each time period

17
Panel Data Analysis 1. Introduction

Example: Variance of estimator in repeated cross section data vs.


panel data

• linear model with unobserved time-constant effect and time dummies

yit = µt + αi + uit, i = 1, ..., N, t = 1, ..., T

• interested in estimating time effects, i.e. (µt − µs) change from one period to next

• to assess efficiency we require the variance of the estimator

V ar(µ̂t − µ̂s) = V ar (µ̂t) + V ar (µ̂s) − 2Cov(µ̂t, µ̂s), (3)


N
1 X
with µ̂t = yit, t = 1, . . . , T
N i=1

18
Panel Data Analysis 1. Introduction

• Exercise: Any idea why the variance of the estimator is smaller for panel data
than for repeated cross section? Use the formula in Equation (3) to come up
with a formal answer that you elaborate in group. Assume that uit is an iid
error term that is independent of αi and µ1, ..., µT . Hint:
– derive covariance, Cov(µ̂t, µ̂s), and variance, V ar(µ̂t), as functions of
unobservables αi and uit
– then plug in for V ar(µ̂t − µ̂s), for panel and cross section data and compare

19
Panel Data Analysis 1. Introduction

1.3 Panel data estimators: An overview

• consider the following linear model for i = 1, ..., N and t = 1, ..., T

yit = β0 + x0itβ + it (4)

– xit: single explanatory variable


– it: error term varies over t and captures unobservable factors

• consistent estimation of β using OLS requires satisfying


– population orthogonality assumption: E(xitit) = 0, t = 1, . . . , T
– rank assumption or no perfect multicollinearity

• efficient: {it : t = 1, . . . , T } is homoskedastic and serially uncorrelated

20
Panel Data Analysis 1. Introduction

• unobserved effects formulation of Equation (4)

yit = x0itβ + αi + uit (5)

– error term it is it = αi + uit


– it: composite error term, comprising a time-invariant component αi and an
idiosyncratic error component uit

• different panel data models differ by how αi is treated

1. pooled OLS (POLS) estimator: ignores panel dimension and treats data as one
big cross section
– exogeneity assumption: E(xitit) = 0 with E(xituit) = 0 and E(xitαi) = 0
– even if exogeneity assumption satisfied, POLS has efficiency problem: it
depends on αi for all t
→ correlation between is and it does not decrease as distance |t − s|
increases

21
Panel Data Analysis 1. Introduction

2. random effects (RE) estimator


– similar idea as pooled OLS but strict exogeneity assumption required
– reason: RE estimator is a Generalized Least Squares (GLS) estimator which
requires strict exogeneity to produce consistent estimates
– RE estimator imposes specific structure on composite error term it (exploits
serial correlation in it) to achieve efficiency

3. fixed effects (FE) estimator


– like RE estimator, FE estimator also requires strict exogeneity assumption
– FE estimator does not make any assumptions on αi but allows for arbitrary
dependence between αi and xit
– trick: transform Equation (5) to eliminate the unobserved effect αi (see e.g.
differencing in 1.1)
– FE estimator is an OLS estimator on transformed data

22
Panel Data Analysis 1. Introduction

Stata example: pooled OLS in unobserved effects model

• generate data set of N = 200 units that are observed over T = 5 periods, such
that we obtain N × T = 1000 observations

• data generating process (DGP): outcome is generated through


yit = β0 + β1xit + αi + uit
– true parameters β0 = 1 and β1 = 2
iid
– explanatory variable: xit = 0.5αi + vit, where αi ∼ N (0, 1) across i, and
iid
vit ∼ N (0, 1) across i and t (implies xit ∼ N (0, 0.52 + 1))
iid
– idiosyncratic error: uit ∼ N (0, 1) across i and t; uit ⊥ αi, xit

• What result do we get for β1 from an OLS regression? How does the estimator β̂1
change as N increases?

23
Panel Data Analysis 1. Introduction

Empirical example: The impact of retirement on life satisfaction

• Paper: Bonsang & Klein (2012): “Retirement and subjective well-being” (link)
– study the effect of retirement on life satisfaction using panel data
– distinguish voluntary and involuntary retirement: differences in consumption-
leisure trade-off according to classical life cycle model
– different panel models to identify causal effect of retirement on life satisfaction

• example aims illustrating the advantage of having panel data in a situation where
the variable of interest is endogenous: OLS, FE, internal instruments

• Exercise (homework)
– What is the endogeneity problem with retirement? Name at least one source
of endogeneity.
– What role could income play in this relationship? How may the endogeneity
problem with income differ from the endogeneity problem with retirement?

24

You might also like