0% found this document useful (0 votes)
33 views

Topic 1_An Introduction to Panel Data Analysis

An Introduction to Panel Data Analysis

Uploaded by

peninah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Topic 1_An Introduction to Panel Data Analysis

An Introduction to Panel Data Analysis

Uploaded by

peninah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Topic 1

An Introduction
to Panel Data
Analysis

Notes by Gillian Kimundi ([email protected])


A. Advantages and Disadvantages of Panel
Data
B. Balanced vs Unbalanced Panel Data
C. Differences between panel data and
independently pooled cross sections
D. Unobserved heterogeneity
E. Pooled OLS techniques in panel
estimation

Notes by Gillian Kimundi ([email protected])


An Introduction to Panel Data
Analysis
• Panel data, also known as longitudinal data,
have both time series and cross-sectional
dimensions.
• “… ‘Panel data’ refers to the pooling of
observations on a cross-section of households,
countries, firms, etc. over several time
periods…” (Baltagi, 2005)

• Arises when we measure the same collection


of people or objects over a period of time.

Notes by Gillian Kimundi ([email protected])


Examples

Household data Sector


Aggregate
e.g. income, Firm investment productivity
Household
consumption, (across firms and
expenditures by (across sectors
savings, assets time)
country and year and time)
(across time)

Bank Specific Data Regional migration Country income


e.g Performance, and capital flows per capita,
Risk, Capitalization (across countries inflation, short
(across time) and time) rate

Notes by Gillian Kimundi ([email protected])


Part A: Advantages and Disadvantages of Panel
Data

Notes by Gillian Kimundi ([email protected])


Most panel data come from the very
Advantages of Panel Data complicated process of everyday
economic life.
1.
•In general, different individuals/firms
Can be used to deal with unobserved may be subject to the influences of
heterogeneity in cross sectional different factors.
units. In any cross-section there is a •In explaining their behavior, one may
myriad of unmeasured explanatory extend the list of factors ad infinitum.
variables. It is neither feasible nor desirable to
Omitting these variables causes bias include all the factors affecting the
in estimation. The ability to deal with outcome of all individuals/firms in a
this omitted variable problem is the model specification
main attribute of panel data. •Because the purpose of modeling is
not to mimic the reality but to capture
the essential forces affecting the
Notes by Gillian Kimundi ([email protected]) outcome
Advantages of Panel Data

2.

Panel data benefits from a


large number of data points
which increases degrees of
freedom, reduces collinearity
among variables hence
improving efficiency/precision
of econometric estimates.

Notes by Gillian Kimundi ([email protected])


Advantages of Panel Data
“…Brings about a greater
3. capacity for constructing
more realistic behavioral
We can address a broader range of hypotheses by blending
issues and tackle more complex interindividual (between)
problems with panel data than with differences with
pure time series or pure cross- intraindividual (within)
sectional data alone.
dynamics …”
Cross sectional data tell us nothing
about dynamics. Time series data
typically relate to aggregate Hsiao (2014)
dynamic behaviour.

Notes by Gillian Kimundi ([email protected])


Advantages of Panel Data
“… if micro-units are
4. heterogeneous, not only can
the time series properties of
Micro-level panel data aggregate data be very
(individuals, firms and different from those of
disaggregate data, but also
households) may be more policy evaluation based on
accurately measured. Biases aggregate data may be grossly
resulting from aggregation misleading…”
over firms or individuals may
be reduced or eliminated Hsiao (2014)

Notes by Gillian Kimundi ([email protected])


Problems of coverage (incomplete Disadvantages of
account of the population of Panel Data
interest)
1.
Nonresponse (due to lack of
cooperation of the respondent or Design and data collection
because of interviewer error), problems:

Time-in-sample bias (rotation group


bias)-- respondents provide less More detail alongside
information as the number
of times they are interviewed
increases
Notes by Gillian Kimundi ([email protected])
Disadvantages of Measurement errors may arise
Panel Data because of

2. 1) Faulty responses due to unclear


questions, memory errors

Distortions of 2) Deliberate distortion of responses


measurement errors (e.g. prestige bias)

3) Inappropriate informants, mis-


More detail alongside recording of responses and
interviewer effects
Notes by Gillian Kimundi ([email protected])
May arise due to:
Disadvantages of
1) Presence of common shocks and Panel Data
unobserved components that ultimately
become part of the error term
3.
2) Spatial dependence: Propensity for
nearby locations to influence each other Cross-section dependence
and to possess similar attributes
Macro panels on countries or regions
with long time series that do not
3) Idiosyncratic dependence in the account for cross-country dependence
disturbances with no particular pattern may lead to misleading inference.
Several panel unit root tests
Causes coefficients to be inefficient suggested in the literature assumed
(higher standard errors) – to be covered cross-section independence.
in more detail
Notes by Gillian Kimundi ([email protected])
Part B: Balanced vs Unbalanced Panels

Notes by Gillian Kimundi ([email protected])


Balanced vs Unbalanced
Panels

In complete or balanced panels


individuals are observed over the
entire sample period.

In unbalanced panels there is


randomly missing data or non-
randomly missing data.
Notes by Gillian Kimundi ([email protected])
Balanced vs Unbalanced Panels
Unbalanced panels are more likely to be
the norm in typical economic and financial For example, in collecting data on Kenyan
empirical studies airlines over time, some firms may have dropped
out of the market while new entrants emerged
over the sample period observed.

In large panel data sets, there are always Similarly, while using labor or consumer panels
some cross-sections that may drop out of on households, one may find that some
the sample. households moved and can no longer be included
in the panel.

In micro-panel data sets, there are


problems with non-response which If one is collecting data on a set of countries over
causes: time, some countries can be traced back longer
Holes in Cross than others.
Gaps in Time Series
Sections Notes by Gillian Kimundi ([email protected])
Notes by Gillian Kimundi ([email protected])

Random vs Non-Randomly missing data


The reason for the absence of data is important. We have to make a
distinction between:
randomly missing data non-randomly missing data
(Missing-at-Random) (Not Missing-at-Random)

If “attrition” is random (independent of the observable and


unobservable variables), then this is not a problem.

However, if reason for missing data is not random (dependent on


the observable and unobservable variables), then this can lead to
biased estimates.
Part C: Panel Data vs Independently Pooled
Cross Sections

Notes by Gillian Kimundi ([email protected])


Panel Data vs Independently
Pooled Cross Sections

An independently pooled cross section


is obtained by sampling randomly from
a large population at different points in
time (usually in different years).

For example, in each year we can draw a


random sample on hourly wages,
education, experience, and so on, from
the population of working people in a
given country.
Notes by Gillian Kimundi ([email protected])
Such data means you have a “time series
of cross sections,” but the observations
in cross sections are necessarily for the
same unit.

Independently
This likely leads to observations that are
Pooled Cross not identically distributed.
Sections

Many surveys of individuals, families, and


firms are repeated at regular intervals,
often each year, e.g. surveys that
Notes by Gillian Kimundi ([email protected])
randomly sample households each year.
The 2019 FinAccess household survey is the Obtain up-to-date data on a range of
fifth in a series of surveys that measure socioeconomic indicators used to monitor the
drivers and usage of financial services in implementation of development initiatives --
Kenya. household characteristics, housing conditions,
education, general health characteristics,
Notes by Gillian Kimundi ([email protected]) nutrition, income and credit, etc
A panel data set,
however, while having To collect panel data,
both a cross sectional the same individuals,
and a time series families, firms, cities,
dimension differs from and counties are
an independently surveyed across time.
pooled cross section.

Panel Data
e.g. A panel set on
households collecting
Then, these same
data on savings,
cross-sections are re-
expenditure, wages,
interviewed at several
education, etc is
subsequent points in
collected by sampling
time.
cross-sections at a
Notes by Gillian Kimundi ([email protected]) given point in time.
Panel Data

For the econometric analysis of panel


data, we cannot assume that the
observations are independently
distributed across time.

For example, unobserved factors such as


“ability” that affect someone’s wage in
1990 will also affect that person’s wage in
1991; unobserved factors that affect a
bank’s performance in 2013 are also likely
to affect that bank in 2020. Notes by Gillian Kimundi ([email protected])
Policy Analysis with Independently Pooled Cross Sections
Pooled cross sections can be very useful for evaluating the impact of a
certain event, intervention or policy.

Two cross sectional data sets collected before and after the occurrence
of an event can be used to determine its effect on some economic
outcomes.

e.g. some exogenous event can change the environment in which


individuals, firms, families, firms and cities operate

a change in government policy, new law or regulation


a structural change within a company (mergers and acquisitions)
Notes by Gillian Kimundi ([email protected])
Policy Analysis with Independently Pooled Cross
Sections

Such an analysis always has a control group, which is not


affected by the policy change, and a treatment group
which is thought to be thought to be affected by the policy
change.

This is also known as a quasi-experiment. It differs from a


true experiment in that these groups are not randomly and
explicitly chosen

Notes by Gillian Kimundi ([email protected])


Part D: Unobserved Heterogeneity

Notes by Gillian Kimundi ([email protected])


Unobserved Heterogeneity
A sizeable part of econometric activity deals with
empirical description and forecasting, but another
aims at quantifying structural or causal
Arellano, relationships.
Manuel
Structural relations are needed for policy evaluation
(2003). and for testing theories.
Panel Data
Econometrics The regression model is an essential tool for
descriptive and structural econometrics.

However, sometimes we expect correlation


between explanatory variables and errors in the
regression, biasing the results
Notes by Gillian Kimundi ([email protected])
Unobserved Heterogeneity
However, sometimes we expect correlation
between explanatory variables and errors in the
regression, biasing the results
Arellano,
Manuel There may be correlation due to unobserved
heterogeneity.
(2003).
Panel Data If characteristics that have a direct effect on both
Econometrics left- (y) and right-hand side (x) variables are
omitted, …

The explanatory variables will be correlated with


errors and regression coefficients will be biased
measures of any structural effects or relationships
Notes by Gillian Kimundi ([email protected])
Unobserved Heterogeneity
The traditional response of econometrics to these
problems has been multiple regression (adding
omitted variables) and instrumental variable
Arellano, models.
Manuel
Regrettably, we often lack data on the other
(2003). variables or good instruments to rid the
Panel Data problem
Econometrics
A major motivation for using panel data has been
the ability to control for possibly correlated, time-
invariant heterogeneity without observing it.
Notes by Gillian Kimundi ([email protected])
Summary from Arellano, Manuel (2003). Panel Data Econometrics

A Classic Example: Agricultural Production (Mundlak 1961,


Chamberlain 1984) Suppose Equation (1) represents the
Cobb-Douglas production function of an agricultural product.

𝒚𝒊𝒕 = 𝜷𝒙𝒊𝒕 + 𝜼𝒊 + 𝒗𝒊𝒕 Eq 1

The index i denotes farms and t time periods (seasons or


years).

Notes by Gillian Kimundi ([email protected])


A Classic Example: Agricultural Production (Mundlak 1961,
Chamberlain 1984) Suppose Equation (1) represents the Cobb-
Douglas production function of an agricultural product.

𝒚𝒊𝒕 = 𝜷𝒙𝒊𝒕 + 𝜼𝒊 + 𝒗𝒊𝒕 Eq 1


𝒚𝒊𝒕 = Log output
𝒙𝒊𝒕 = Log of variable input (e.g. labour, rainfall,
altitude, capital, etc)
𝜼𝒊 = An input that remains constant over time but
varies across farms (e.g. soil quality)
𝒗𝒊𝒕 = A stochastic error term
Notes by Gillian Kimundi ([email protected])
If 𝜼𝒊 is observed, 𝜷 can be identified from a multiple
regression of 𝒚 𝒐𝒏 𝒙 and 𝜼.

If 𝜼𝒊 is not observed identification of 𝜷 requires either:

1)lack of correlation between 𝒙𝒊𝒕 and 𝜼𝒊 , in which case


(unbiased 𝜷)
𝑪𝒐𝒗 𝒙𝒊𝒕 , 𝒚𝒊𝒕
𝑪𝒐𝒗 𝒙𝒊𝒕 , 𝜼𝒊 = 𝟎 ⇒ 𝜷=
𝑽𝒂𝒓 𝒙𝒊𝒕
Notes by Gillian Kimundi ([email protected])
If we assume that the idiosyncratic error 𝜼𝒊 is uncorrelated
with 𝒙𝒊𝒕 , we can run an OLS. This is known as Pooled OLS

If the unobserved factor 𝜼𝒊 is actually correlated to 𝒙𝒊𝒕 , Pooled


OLS is biased and inconsistent .

The resulting bias is sometimes called heterogeneity bias but


it’s really just bias caused from omitting a variable.

Pooled OLS will therefore not solve the unobserved


heterogeneity problem
Notes by Gillian Kimundi ([email protected])
OR

2) the availability of an external instrument 𝑧𝑖𝑡 that is


uncorrelated with both 𝒗𝒊𝒕 𝒂𝒏𝒅 𝜼𝒊 (error components) but
correlated with 𝒙𝒊𝒕

Notes by Gillian Kimundi ([email protected])


Suppose that neither of the above two options is available, but we
observe 𝒚𝒊𝒕 and 𝒙𝒊𝒕 for time T = 1 and T = 2) such that:
𝒚𝒊𝟏 = 𝜷𝒙𝒊𝟏 + 𝜼𝒊 + 𝒗𝒊𝟏
𝒚𝒊𝟐 = 𝜷𝒙𝒊𝟐 + 𝜼𝒊 + 𝒗𝒊𝟐

We can then identify 𝜷 in a regression of first differences even if 𝜼𝒊


is not observed
𝒚𝒊𝟐 − 𝒚𝒊𝟏 = 𝜷 𝒙𝒊𝟐 − 𝒙𝒊𝟏 + 𝒗𝒊𝟐 − 𝒗𝒊𝟏

𝑪𝒐𝒗 𝜟𝒙𝒊𝟐 , 𝜟𝒚𝒊𝟐


𝜷=
𝑽𝒂𝒓 𝜟𝒙𝒊𝟐 Notes by Gillian Kimundi ([email protected])
Notes by Gillian Kimundi ([email protected])

𝒚𝒊𝟐 − 𝒚𝒊𝟏 = 𝜷 𝒙𝒊𝟐 − 𝒙𝒊𝟏 + 𝒗𝒊𝟐 − 𝒗𝒊𝟏


𝑪𝒐𝒗 𝜟𝒙𝒊𝟐 , 𝜟𝒚𝒊𝟐
𝜷=
𝑽𝒂𝒓 𝜟𝒙𝒊𝟐

When we obtain the OLS estimator of 𝜷 from this last equation


we call the resulting estimator is known as the first differenced
estimator.

Thus, even in the absence of data on 𝜼𝒊 , the availability of panel


data affords an unbiased identification of the parameter 𝜷
Unobserved Heterogeneity
In general, different individuals may be subject
to the influences of different factors.

In explaining individual behavior, one may extend the list


Hsiao, Cheng of factors ad infinitum.
(2014).
Analysis of It is neither feasible nor desirable to include all the
factors affecting the outcome of all individuals
Panel Data.
Third Edition Because the purpose of modeling is not to mimic the
reality but to capture the essential forces affecting the
outcome.

It is typical to leave out those factors that are


believed to have insignificant impacts
Notes by Gillian Kimundi ([email protected])
Unobserved Heterogeneity

The challenge of panel data analysis is


how to model the heterogeneity
Hsiao, Cheng across individuals and over time that
(2014). are not captured by x.
Analysis of
Panel Data. The focus of panel data analysis is how
Third Edition to control the impact of unobserved
heterogeneity to obtain valid
inference on the common parameters,
𝜷.
Notes by Gillian Kimundi ([email protected])

You might also like