Structure of Economic Data
Structure of Economic Data
Structure of Economic Data
Abbott
Cross-Sectional Data
• A common characteristic of cross-sectional data sets is that they usually are constructed by
random sampling from underlying populations.
Random sampling means that the observations can be assumed to be statistically
independent. Random sampling is sufficient to satisfy the statistical assumption of zero error
covariances, or nonautoregressive errors, in regression models.
• A distinguishing characteristic of any time-series data set is that the observations have a
natural ordering -- specifically a chronological ordering.
• A common characteristic of time-series data sets is that they seldom, if ever, can be assumed
to be generated by random sampling.
The observations in an economic time-series data set almost always violate the statistical
assumption of zero error covariances, or nonautoregressive errors, in regression models.
• Definition: A pooled cross-section time-series data set consists of two or more different
samples of cross-sectional observations from the same population taken at two or more
points in time.
• Observations in a pooled cross-section time-series data set have both an individual
identifier and a time or period identifier.
• The cross-sectional units of observation may be individual economic agents (e.g.,
individual persons, households, or firms), economic aggregates (e.g., industries and
occupations), or geographic aggregates (e.g., urban areas, provinces/states and countries).
• Example: Each Canadian Census takes a random sample of members of the Canadian
population for which it collects detailed information on demographic and economic
characteristics. The resulting sample is called a public use sample.
• There is one public use census sample of Canadian residents for 1986, and a different
public use sample of Canadian residents for 1991. Each is a cross-sectional random
sample of the population of Canadian residents.
• These two cross-sectional samples can be pooled or combined into a single pooled cross-
sectional time-series sample of Canadian residents. This pooled sample is obviously
larger than the two cross-sectional samples for 1986 and 1991.
• With pooled cross-section time-series data sets, each cross-sectional sample is frequently
assembled by random sampling from an underlying population.
Random sampling means that the observations in a pooled cross-section time-series data set
can be assumed to be statistically independent across the cross-sectional observations for
each period of time.
• Definition: A panel or longitudinal data set consists of two or more sets of observations on
the same sample of cross-sectional members at two or more points in time.
• A panel data set consists of repeated observations over time on the same set of cross-
sectional units.
• A panel data set therefore provides time series observations for each cross-sectional
member in the data set. It follows the same cross-sectional units over time.
• The cross-sectional units of observation may be either individual economic agents (such
as individual persons, households, or firms), geographical units (such as cities or
provinces), or other entities (such as occupations or industries).
The variables in a panel data set have both an individual identifier and a time or period
identifier. For example, Xi,t denotes the value of the variable X for individual i in period t.
• Individual- and time-varying variables that vary both over cross-sectional units and
over time, e.g., Xi,t.
• Individual-constant time-varying variables that vary over time but take the same value
for all cross-sectional units of observation in any one time period, e.g., Wt.
• Individual-varying time-constant variables that vary over cross-sectional units but take
the same value in all time periods for any one cross-sectional unit, e.g., Zi.
• Having multiple observations on the same cross-sectional units allows us to control for
certain unobserved characteristics of individuals, firms, provinces, and so on. For
example, a panel data set on the earnings, education and other variables of individual
workers allows us to control for unobserved differences among individuals in innate
ability.
• Panel data sets allow us to investigate lags in the behaviour of individual economic
units. For example, a panel data set on the earnings, education and other variables of
individual workers allows us to investigate the extent to which earnings in one year
depend upon earnings in previous years.