PD2004 1

The document provides an overview of longitudinal and panel data, highlighting their definitions, benefits, drawbacks, and modeling techniques. It emphasizes the advantages of longitudinal data in studying dynamic relationships and addressing omitted variable bias, while also noting issues like attrition and selection bias. Historical context and examples, such as divorce rates and AFDC payments, are used to illustrate the concepts discussed.

Uploaded by

Kalu Abu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views24 pages

PD2004 1

Uploaded by

Kalu Abu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

Chapter 1

Introduction
• What are longitudinal and panel data?
• Benefits and drawbacks of longitudinal data
• Longitudinal data models
• Historical notes
1.1 What are longitudinal and panel data?
• With regression data, we collect a cross-section of subjects.
– The interest is comparing characteristics of the subject, that is,
investigating relationships among the variables.
• In contrast, with time series data, we identify one or more
subjects and observe them over time.
– This allows us to study relationships over time, the so-called
dynamic aspect of a problem.
• Longitudinal/panel data represent a marriage of regression
and time series data.
– As with regression, we collect a cross-section of subjects.
– With panel data, we observe each subject over time.
• The descriptor panel data comes from surveys of individuals; a
panel is a group of individuals surveyed repeatedly over time.
Example 1.1 - Divorce rates
• Figure 1.1 shows the 1965 divorce rates versus AFDC (Aid
to Families with Dependent Children) for the fifty states.
– The correlation is -0.37.
– Counter-intuitive? - we might expect a positive relationship
between welfare payments (AFDC) and divorce rates.

1965 Divorce Rates to AFDC Payments

Divorce
Rates 5

0
20 120 220
AFDC Payments
Example 1.1 - Divorce rates
• A similar figure shows a negative relationship for 1975 (the
correlation is -0.425)
• Figure 1.2 shows both 1965 and 1975 data, with a line
connecting each state
– The line represents a change over time (dynamic), not a
cross-sectional relationship.
– Each line displays a positive relationship - as welfare
payments increase so do divorce rates.
– This is not to argue for a causal relationship between
welfare payments and divorce rates.
• The data are still observational.
• The dynamic relationship between divorce and AFDC is
different from the cross-sectional relationship.
Figure 1.2 1965 and 1975 Divorce
rates versus AFDC
Comparing 1965 and 1975 Divorce
Rates to AFDC Payments
Divorce 8
Rate
7

0
0 50 100 150 200 250 300 350

AFDC Payments
Some notation
• Longitudinal/panel data - regression data with “double
subscripts.”
• Let yit be the response for the ith subject during the tth time
period.
• We observe the ith subject over t=1, ..., Ti time periods, for
each of i=1, ..., n subjects.
– First subject - (y11, y12, ... , y1T1 )
– Second subject - (y21, y22, ... , y2T2 )
– ...
– ...
– The nth subject - (yn1, yn2, ... , ynTn )
Prevalence of panel data analysis
• Importance in the literature
– Panel data are also known as “cross-section time series” data in the social
sciences
– Referred to as “longitudinal data analysis” in the biological sciences
– ABI/INFORM - 326 articles in 2002 and 2003.
– The ISI Web of Science - 879 articles in 2002 and 2003.
• Important panel data bases
– Historically, we have:
• Panel Survey of Income Dyanmics (PSID)
• National Longitudinal Survey of Labor Market Experience (NLS)
– Financial and Accounting
• Compustat, CRSP, NAIC
– Market scanner databases
• See Appendix F
Appendix F. Selected Longitudinal
and Panel Data Sets
• Table F.1 – 20 International Household Panel Studies
• Table F.2 – 5 Studies focused on youth and education
• Table F.3 – 4 Studies focused on the elderly and retirement
• Table F.4 – 7 miscellaneous studies, including
– election data,
– manufacturing data,
– medical expenditure data and
– insurance company data
1.2 Benefits and drawbacks of
longitudinal data
• Several advantages of longitudinal data compared to
– data that are either purely cross-sectional (regression) or
– purely time series data.
• Having longitudinal data allows us to:
– Study dynamic relationships
– Study heterogeneity
• Reduce omitted variable bias
• With longitudinal data, one can also argue
– Estimators are more efficient
– Addresses the causal nature of relationships
• Main drawback - attrition
Dynamic relationships
• Static versus dynamic relationships
– Figure 1.1 showed a cross-sectional (static)
relationship.
• We estimate a decrease of 0.95 % in divorce rates for
each $100 increase in AFDC payments.
– Figure 1.2 showed a temporal (dynamic)
relationship.
• We estimate an increase of 2.9% in divorce rates for
each $100 increase in AFDC payments.
• From 1965 to 1975, AFDC payments increased an
average of $59 and divorce rates increased 2.5%.
Historical approach
• In early panel data studies, pooled cross-sectional data were
analyzed by
– estimating cross-sectional parameters using regression
and
– using time series methods to model the regression
parameter estimates, treating the estimates as known
with certainty.
• Theil and Goldberger (1961) provide an early discussion on
the advantages of estimating these two aspects
simultaneously.
Dynamic relationships and time
series analysis
• When studying dynamic relationships, univariate time series
methods are the most well-developed.
– However, these methods do not account for relationships
among different subjects.
– Multivariate time series accounts for relationships
among a limited number of different subjects.
– Time series methods requires a fair number (generally, at
least 30) observations to make reliable inferences.
Panel data as repeated time series
• With panel data, we observe several (repeated) subjects for each time period.
– By taking averages over subjects,
• our statistics are more reliable
• we require fewer time series observations to estimate dynamic patterns.
– For repeated subjects, the model is
yit =  + it, t=1, ..., Ti, i=1, ..., n.
• Here,  is the overall mean and it represents subject-specific dynamic
patterns.
– “Unfortunately,” we don’t get identical repeated looks.
• We hope to control for differences among subjects by introducing explanatory
variables, or covariates.
– A basic model is yit =  + xit´  + it, where xit is the explanatory variable.
• Introducing explanatory variables leaves us with only subject-specific
dynamic patterns, that is, yit - ( + xit´ = it
Heterogeneity
• Subjects are unique.
– In cross-sectional analysis, we use yit =  + xit´ + it
• ascribe the uniqueness to " it ".
– In panel data, we have an opportunity to model this uniqueness.
– The model yit = i + xit´ + it is
• unidentifiable in cross-sectional regression.
• In panel data, we can estimate  and 1, .., n.
• Subject-specific parameters, such as i, provide an important
mechanism for controlling heterogeneity of individuals.
• Vocabulary:
– When {i} are fixed, unknown parameters to be estimated, we call this a
fixed effects model.
– When {i} are drawn from an unknown population, that is, random
variables, we call this a model with random effects.
Heterogeneity bias
• Suppose that a data analyst mistakenly uses the model
yit =  + xit´ + it
when yit = i + xit´ + it is the true model.
– This is an example of heterogeneity bias, or a problem
with aggregation with data.
• Similarly, one could have different (heterogeneous) slopes
yit =  + xit´i + it
• or different intercepts and slopes
yit = i + xit´i + it
Omitted variables
• Panel data serves to reduce the omitted variable bias.
• When omitted variables are time constant, we can still get
reliable estimates.
• Consider the “true” model yit =  + xit´ + zi´ + it.
– Unfortunately, we cannot (or not thought to) measure zi.
– It is “lurking” or “latent.” By considering the changes
yit* = yit - yi,t-1 = ( + xit´ + zi´ + it) - ( + xit-1´ + zi´ + it-1)
= (xit - xit -1 )´ + it - it-1) = xit* ´  + it*
– we do not need to worry about the bias that ordinarily
arises from the latent variable, zi .
• Introducing the subject-specific variable i, accounts for the
presence of many types of latent variables.
Efficiency of Estimators
• Subject-specific variables i also account for a large portion of
the variability in many data sets
– This reduces the mean square error
– Increases the efficiency (or reduces the standard errors) of
our parameter estimators.
• With panel data, we generally have more observations than
with time series or regression.
• A longitudinal data design may yield more efficient estimators
than estimators based on a comparable amount of data from
alternative designs.
– Suppose that the interest is in assessing the average change in a
response over time, such as the divorce rate.
– A repeated cross-section yields Var y 1  y  2  Var y 1  Var y  2
– Longitudinal data design yields
Var y 1  y  2  Var y 1  Var y  2  2 Covy 1 , y  2 
Causality and correlation
• Three ingredients necessary for establishing causality, taken
from the sociology literature:
– A statistically significant relationship is required.
– The association between two variables must not be due
to another, omitted, variable.
– The “causal” variable must precede the other variable in
time.
• Longitudinal data are based on measurements taken over
time and thus address the third requirement of a temporal
ordering of events.
• Moreover, longitudinal data models provide additional
strategies for accommodating omitted variables that are not
available in purely cross-sectional data.
Drawbacks: Sampling Design (attrition)

• Selection bias
– may occur when a rule other than simple random
sampling is used to select observational units
– Example – “endogeneous” decisions by agents to join a
labor pool or participate in a social program.
• Missing data
– Because we follow the same subjects over time,
nonresponse typically increases through time.
– Example: US Panel Study of Income Dynamics (PSID):
• In the first year (1968), the nonresponse rate was 24%.
• By 1985, the nonresponse rate was about 50%.
1.3 Longitudinal data models
• Types of inference
– Primary. We are interested in the effect that an (exogenous)
explanatory variable has on a response, controlling for other
variables (including omitted variables).
– Forecasting. We would like to predict future values of the response
from a specific subject.
– Conditional means.
• We would like to predict the expected value of a future
response from a specific subject.
• Here, the conditioning is on latent (unobserved)
characteristics associated with the subject.
• Types of applications - many
Social science statistical modeling
• A model based on data characteristics is known as a
sampling based model. The model arises from a data
generating process.
• In contrast, a structural model is a statistical model that
represents causal relationships, as opposed to relationships
that simply capture statistical associations.
• Why bother with an extra layer of theory when considering
statistical models? Manski (1992) offers :
– Interpretation - the primary purpose of many statistical analyses is
to assess relationships generated by theory from a scientific field.
– Structural models utilize additional information from an underlying
functional field. If this information is utilized correctly, then in
some sense the structural model should provide a better
representation than a model without this information. (explanation)
– Particularly for public policy analysis, the goal of a statistical
analysis is to infer the likely behavior of data outside of those
realized (extrapolation).
Modeling issues
• With subject-specific parameters, there can be many
parameters that describe the model
– “Fixed” versus “random” effects models
• Incorporating dynamic structure is important
– Econometric “dynamic” models (lagged endogenous)
versus serial correlation approach
• Linear versus nonlinear (generalized linear) models
– Marginal versus hierarchical estimation approaches
• Parametric versus semiparametric models
• We wish to separate the effects of:
– the mean
– the cross-sectional variance and
– serial correlation structure
1.4 Historical notes
• The term ‘panel study’ was coined in a marketing context when
Lazarsfeld and Fiske (1938)
– Considered the effect of radio advertising on product sales.
– People buy a product would be more likely to hear the advertisement, or vice
versa.
– They proposed repeatedly interviewing a set of people (the ‘panel’) to clarify
the issue.
• Econometrics
– Early economics applications include Kuh (1959), Johnson (1960), Mundlak
(1961) and Hoch (1962).
• Biostatistics
– Wishart (1938), Rao (1959, 1965), Potthoff and Roy (1964) – used
multivariate analysis to consider the problem of polynomial growth curves of
serial measurements from a single group of subjects.
– Grizzle and Allen (1969) – introduced covariates

Quantitative Analysis For Management
No ratings yet
Quantitative Analysis For Management
43 pages
Appendix C: Interest and Annuity Tables For Discrete Compounding
100% (1)
Appendix C: Interest and Annuity Tables For Discrete Compounding
19 pages
Guja - Chap 16 PDF
No ratings yet
Guja - Chap 16 PDF
26 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
Panel Data Analysis - Advantages and Challenges: Wise Working Paper Series WISEWP0602
No ratings yet
Panel Data Analysis - Advantages and Challenges: Wise Working Paper Series WISEWP0602
35 pages
Etc 2410 Notes
50% (2)
Etc 2410 Notes
133 pages
Introduction To Panel Data UG-students
100% (1)
Introduction To Panel Data UG-students
57 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
42 pages
Dougherty5e C14G01 2016 05 27
No ratings yet
Dougherty5e C14G01 2016 05 27
34 pages
Panel Data Assignment
No ratings yet
Panel Data Assignment
24 pages
Panel Data Notes
No ratings yet
Panel Data Notes
26 pages
Game Theory: Basic Characteristics of A Game
100% (3)
Game Theory: Basic Characteristics of A Game
4 pages
Panel Data Models
No ratings yet
Panel Data Models
112 pages
Lecture - 8-Statistical Inference - Single Population
No ratings yet
Lecture - 8-Statistical Inference - Single Population
25 pages
03-01 Cost Risk Analysis (CRA) - CRA Workshop Presentation Template
No ratings yet
03-01 Cost Risk Analysis (CRA) - CRA Workshop Presentation Template
21 pages
Panel Data Method-Baltagi
100% (1)
Panel Data Method-Baltagi
51 pages
Panel Data
100% (2)
Panel Data
5 pages
Econometrics II CH-4
No ratings yet
Econometrics II CH-4
25 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Game Theory Report
No ratings yet
Game Theory Report
14 pages
A Practical Guide To Using Panel Data
No ratings yet
A Practical Guide To Using Panel Data
6 pages
BSM Toolbox Original
No ratings yet
BSM Toolbox Original
651 pages
Acc 106 P3 Lesson
No ratings yet
Acc 106 P3 Lesson
6 pages
Advanced Econometrics
No ratings yet
Advanced Econometrics
61 pages
Lec06 - Panel Data
No ratings yet
Lec06 - Panel Data
160 pages
Panel Data
No ratings yet
Panel Data
105 pages
Panel Data Slides - 230919 - 160722
No ratings yet
Panel Data Slides - 230919 - 160722
92 pages
Dwedw
No ratings yet
Dwedw
217 pages
ECN3322 - Panel Data-1
No ratings yet
ECN3322 - Panel Data-1
56 pages
Panel Data Analysis With Stata Part 1 Fixed Effects and Random Effects Models
No ratings yet
Panel Data Analysis With Stata Part 1 Fixed Effects and Random Effects Models
57 pages
Block 3
No ratings yet
Block 3
105 pages
Unbalanced Panel Data PDF
No ratings yet
Unbalanced Panel Data PDF
51 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Chapter 7
No ratings yet
Chapter 7
121 pages
PanelDataAnalysiswithStata1FEandREModelsMPRA Paper 76869
No ratings yet
PanelDataAnalysiswithStata1FEandREModelsMPRA Paper 76869
58 pages
Panel Econometrics History
No ratings yet
Panel Econometrics History
65 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
31 pages
AE 2023 Lecture10
No ratings yet
AE 2023 Lecture10
40 pages
SurveyData 3
No ratings yet
SurveyData 3
49 pages
Panel Data Models
No ratings yet
Panel Data Models
25 pages
BFtutorial
No ratings yet
BFtutorial
58 pages
Introduction To Panel Data Analysis
No ratings yet
Introduction To Panel Data Analysis
18 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Panel Data Assign
No ratings yet
Panel Data Assign
19 pages
BellJonesExplainingFixedEffects Withappendix
No ratings yet
BellJonesExplainingFixedEffects Withappendix
40 pages
Cross Sectional Data
No ratings yet
Cross Sectional Data
60 pages
Note 2
No ratings yet
Note 2
37 pages
Econometrics 5
No ratings yet
Econometrics 5
29 pages
Primer On Panel Data Analysis PDF
No ratings yet
Primer On Panel Data Analysis PDF
11 pages
Block 3
No ratings yet
Block 3
36 pages
Panel Data Regression Models-Seminar
No ratings yet
Panel Data Regression Models-Seminar
18 pages
00 Panels1e
No ratings yet
00 Panels1e
20 pages
Chapter 2 Panel Data
No ratings yet
Chapter 2 Panel Data
17 pages
Ecotrics (PR) Panel Data Reference
No ratings yet
Ecotrics (PR) Panel Data Reference
22 pages
Panel Data Methods
No ratings yet
Panel Data Methods
17 pages
Chapter 4
No ratings yet
Chapter 4
33 pages
Regression Analysis-Statistics Notes
No ratings yet
Regression Analysis-Statistics Notes
8 pages
Panel Data Analysis For Economics and The Melbourne Institute
No ratings yet
Panel Data Analysis For Economics and The Melbourne Institute
36 pages
ARM 2nd Mid
No ratings yet
ARM 2nd Mid
13 pages
Ecmetrics II Ch4
No ratings yet
Ecmetrics II Ch4
56 pages
Chapter 16. Simultaneous Equations Models
No ratings yet
Chapter 16. Simultaneous Equations Models
23 pages
Econometrics 2
No ratings yet
Econometrics 2
20 pages
Panel Data
No ratings yet
Panel Data
9 pages
Panel Data From Time Series of Cross-Sections
No ratings yet
Panel Data From Time Series of Cross-Sections
18 pages
Data Science With Python
No ratings yet
Data Science With Python
27 pages
Unit - IV Design of Experiments
No ratings yet
Unit - IV Design of Experiments
16 pages
Get SAS For Linear Models Fourth Edition Ramon Littell PDF Ebook With Full Chapters Now
No ratings yet
Get SAS For Linear Models Fourth Edition Ramon Littell PDF Ebook With Full Chapters Now
52 pages
Covariance Correlation
No ratings yet
Covariance Correlation
14 pages
Pengaruh Kepribadian, Orientasi Kerja Dan Penempatan Pegawai Terhadap Kinerja Karyawan Pt. Advantage Supply Chain Management (SCM) Cabang Batam
No ratings yet
Pengaruh Kepribadian, Orientasi Kerja Dan Penempatan Pegawai Terhadap Kinerja Karyawan Pt. Advantage Supply Chain Management (SCM) Cabang Batam
14 pages
RohanChakraborty FinancialAnalytics CA2 PDF
No ratings yet
RohanChakraborty FinancialAnalytics CA2 PDF
10 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
14 pages
Panel Data
No ratings yet
Panel Data
13 pages
Regresi Data Panel
No ratings yet
Regresi Data Panel
10 pages
BRM Exam II Questions
No ratings yet
BRM Exam II Questions
10 pages
2085-Article Text-5597-1-10-20220804
No ratings yet
2085-Article Text-5597-1-10-20220804
12 pages
Panel S9-In FEM, Gender Is Controlled For But Not Estimated
No ratings yet
Panel S9-In FEM, Gender Is Controlled For But Not Estimated
16 pages
Emping Stat Ass
No ratings yet
Emping Stat Ass
5 pages
Dougherty Chap14
No ratings yet
Dougherty Chap14
16 pages
Econometric S
No ratings yet
Econometric S
28 pages
Quality Kitchens Meat Loaf Mix: Team 8
No ratings yet
Quality Kitchens Meat Loaf Mix: Team 8
7 pages
Excel Beta Example
No ratings yet
Excel Beta Example
5 pages
Why Panel Data - Hsiao
No ratings yet
Why Panel Data - Hsiao
19 pages
Siti Nurrohim - 202016041
No ratings yet
Siti Nurrohim - 202016041
4 pages
This Study Resource Was: MC Qu. 9-54 The Construction Manager For ABC..
No ratings yet
This Study Resource Was: MC Qu. 9-54 The Construction Manager For ABC..
3 pages
MS-8 2
No ratings yet
MS-8 2
3 pages
Common Statistical Pitfalls in Basic Science Research: Sample Size Considerations
No ratings yet
Common Statistical Pitfalls in Basic Science Research: Sample Size Considerations
9 pages

PD2004 1

Uploaded by

PD2004 1

Uploaded by

Chapter 1

1965 Divorce Rates to AFDC Payments

You might also like